Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Joya Chen
chenjoya
AI & ML interests
Video LLM
Recent Activity
upvoted a paper 1 day ago
Beyond Language Modeling: An Exploration of Multimodal Pretraining upvoted a paper 28 days ago
Olaf-World: Orienting Latent Actions for Video World Modeling