4 33 40

Haiwen Diao

Paranioar

https://Paranioar.github.io/

AI & ML interests

Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model

Recent Activity

upvoted a paper 6 days ago

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

upvoted a paper 6 days ago

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

upvoted a paper 7 days ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

View all activity

Organizations

upvoted 2 papers 6 days ago

MultiShotMaster: A Controllable Multi-Shot Video Generation Framework

Paper • 2512.03041 • Published 7 days ago • 59

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

Paper • 2512.02556 • Published 7 days ago • 184

upvoted a paper 7 days ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published 14 days ago • 149

upvoted 2 papers 14 days ago

SAM 3: Segment Anything with Concepts

Paper • 2511.16719 • Published 19 days ago • 109

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published 19 days ago • 91

upvoted a paper 18 days ago

Scaling Spatial Intelligence with Multimodal Foundation Models

Paper • 2511.13719 • Published 22 days ago • 44

upvoted a paper 21 days ago

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Paper • 2511.13648 • Published 22 days ago • 52

upvoted a paper 22 days ago

Simulating the Visual World with Artificial Intelligence: A Roadmap

Paper • 2511.08585 • Published 28 days ago • 29

upvoted a paper about 1 month ago

Uniform Discrete Diffusion with Metric Path for Video Generation

Paper • 2510.24717 • Published Oct 28 • 39

upvoted 2 papers about 2 months ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9 • 266

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16 • 65

upvoted a collection about 2 months ago

NEO1_0

Collection

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale • 7 items • Updated Oct 17 • 4

upvoted a paper about 2 months ago

Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation

Paper • 2510.08673 • Published Oct 9 • 125

upvoted a paper 2 months ago

Visual Jigsaw Post-Training Improves MLLMs

Paper • 2509.25190 • Published Sep 29 • 36

upvoted 3 papers 4 months ago

upvoted a paper 5 months ago

FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2 • 19

upvoted 2 papers 7 months ago

Shifting AI Efficiency From Model-Centric to Data-Centric Compression

Paper • 2505.19147 • Published May 25 • 144

End-to-End Vision Tokenizer Tuning

Paper • 2505.10562 • Published May 15 • 22

Haiwen Diao

AI & ML interests

Recent Activity

Organizations

Paranioar's activity