🚀ReVisual-R1 is a 7B open-source multimodal language model that follows a three-stage curriculum—cold-start pre-training, multimodal reinforcement.
Shawn
csfufu
AI & ML interests
None yet
Recent Activity
upvoted
a
paper
about 14 hours ago
Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models
upvoted
a
paper
about 15 hours ago
Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models
upvoted
a
paper
about 2 months ago
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent