Stateful Conformer with Cache-based Inference for Streaming Automatic Speech Recognition Paper • 2312.17279 • Published Dec 27, 2023 • 4
view article Article How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent nvidia • 1 day ago • 35
CoreML Speech Models Collection Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 24 items • Updated about 23 hours ago • 4
MLX Speech Models Collection Speech AI models for Apple Silicon via MLX. ASR, TTS, VAD, diarization, speaker embedding. • 56 items • Updated about 18 hours ago • 5
Unified Panoramic Geometry Estimation via Multi-View Foundation Models Paper • 2605.26368 • Published 12 days ago • 4
CubePart: An Open-Vocabulary Part-Controllable 3D Generator Paper • 2605.28763 • Published 10 days ago • 14
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 10 days ago • 72
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 10 days ago • 419
EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM Paper • 2312.06660 • Published Dec 11, 2023 • 2
InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation Paper • 2509.24663 • Published Sep 29, 2025 • 18
Talker-T2AV: Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling Paper • 2604.23586 • Published Apr 26 • 6
Lance MLX Collection Feature-complete MLX port of ByteDance Lance: t2i, image_edit, x2t_image, t2v, video_edit, x2t_video. • 4 items • Updated 3 days ago • 4
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild Paper • 2605.22064 • Published 16 days ago • 5
Lance: Unified Multimodal Modeling by Multi-Task Synergy Paper • 2605.18678 • Published 19 days ago • 78
DeepVQE: Real Time Deep Voice Quality Enhancement for Joint Acoustic Echo Cancellation, Noise Suppression and Dereverberation Paper • 2306.03177 • Published Jun 5, 2023 • 1