Zhensong Zhang's picture

38

Zhensong Zhang

JasonCU

[email protected]

AI & ML interests

None yet

Recent Activity

upvoted a paper 9 days ago

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

upvoted a paper 10 days ago

Physical Simulator In-the-Loop Video Generation

upvoted a paper 10 days ago

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

View all activity

Organizations

None yet

upvoted a paper 9 days ago

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Paper • 2603.03269 • Published 17 days ago • 61

upvoted 2 papers 10 days ago

Physical Simulator In-the-Loop Video Generation

Paper • 2603.06408 • Published 14 days ago • 11

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Paper • 2603.07660 • Published 12 days ago • 83

upvoted a paper 16 days ago

LongVideo-R1: Smart Navigation for Low-cost Long Video Understanding

Paper • 2602.20913 • Published 24 days ago • 11

upvoted a paper 29 days ago

jina-embeddings-v5-text: Task-Targeted Embedding Distillation

Paper • 2602.15547 • Published Feb 17 • 26

upvoted a paper 30 days ago

TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions

Paper • 2602.08711 • Published Feb 9 • 28

upvoted 6 papers about 1 month ago

BrowseComp-V^3: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Paper • 2602.12876 • Published Feb 13 • 10

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published Feb 13 • 30

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

Reinforced Attention Learning

Paper • 2602.04884 • Published Feb 4 • 28

Kimi K2.5: Visual Agentic Intelligence

Paper • 2602.02276 • Published Feb 2 • 259

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published Jan 29 • 155

upvoted a paper about 2 months ago

Agentic Very Long Video Understanding

Paper • 2601.18157 • Published Jan 26 • 19

upvoted 7 papers 2 months ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 30

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 195

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 200

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published Jan 11 • 214

NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Paper • 2601.00393 • Published Jan 1 • 133

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 68

Latent Implicit Visual Reasoning

Paper • 2512.21218 • Published Dec 24, 2025 • 69