Quang Huy
NothingLQH
·
AI & ML interests
None yet
Recent Activity
updated
a collection
2 months ago
SpeechToText
updated
a collection
3 months ago
Image
updated
a collection
3 months ago
Image
Organizations
None yet
Automation
TextToVideo
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 369 • 42
Code
Prompt
Story
SpeechToText
-
Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 25 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 3.21M • • 2.81k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.92k • 457
Anime
Video
IdeaMusic
Vistral-7B-Chat
TextToSpeech
MJ6
Translation
ControlVPS
ORC
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 79.1k • 65 -
Running on T4446
Resemble Enhance
🚀446Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 12.9M • 1.53k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 5 • 14
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 181 • • 348 -
IndexTeam/Index-anisora
Updated • 12 • 221
TextToText
NLP
3D
LiveImage
DatasetLanguage
Image
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • 0.7B • Updated • 129k • 1.53k -
Running on ZeroFeatured565
Midi Music Generator
🎼565Generate MIDI music with custom instruments and settings
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 273 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 26 • 6
ConvertHTMLtoJSON
MJ6
Automation
Translation
TextToVideo
ControlVPS
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 369 • 42
ORC
Code
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 79.1k • 65 -
Running on T4446
Resemble Enhance
🚀446Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 12.9M • 1.53k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 5 • 14
Prompt
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 181 • • 348 -
IndexTeam/Index-anisora
Updated • 12 • 221
Story
TextToText
SpeechToText
-
Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 25 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 3.21M • • 2.81k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.92k • 457
NLP
Anime
3D
Video
LiveImage
IdeaMusic
DatasetLanguage
Vistral-7B-Chat
Image
TextToSpeech
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • 0.7B • Updated • 129k • 1.53k -
Running on ZeroFeatured565
Midi Music Generator
🎼565Generate MIDI music with custom instruments and settings
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 273 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 26 • 6