4 15 3

Xuehui Wang

huiserwang

https://huiserwang.site

huiserwang

AI & ML interests

Segmentation

Recent Activity

upvoted a paper 19 days ago

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

liked a model 21 days ago

miromind-ai/MiroThinker-v1.0-72B

updated a dataset about 2 months ago

huiserwang/Layout_HW

View all activity

Organizations

upvoted a paper 19 days ago

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

Paper • 2511.11793 • Published 23 days ago • 158

upvoted 2 papers about 2 months ago

MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization

Paper • 2510.08540 • Published Oct 9 • 109

NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints

Paper • 2510.08565 • Published Oct 9 • 19

upvoted a paper 4 months ago

MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents

Paper • 2507.19478 • Published Jul 25 • 31

upvoted a paper 6 months ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 200

upvoted 3 articles 6 months ago

Article

A Dive into Vision-Language Models

Feb 3, 2023

•

Article

Vision Language Models Explained

Apr 11, 2024

•

496

Article

Vision Language Models (Better, faster, stronger)

May 12

•

568

upvoted a paper 6 months ago

ZeroGUI: Automating Online GUI Learning at Zero Human Cost

Paper • 2505.23762 • Published May 29 • 45

upvoted 2 papers 8 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 303

Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing

Paper • 2504.02826 • Published Apr 3 • 68

upvoted a paper 9 months ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25 • 51

upvoted 2 papers 12 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 159

upvoted a paper over 1 year ago

Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 54

Xuehui Wang

AI & ML interests

Recent Activity

Organizations

huiserwang's activity

A Dive into Vision-Language Models

Vision Language Models Explained

Vision Language Models (Better, faster, stronger)