Huang Qidong

shikiw

https://shikiw.github.io/

AI & ML interests

multi-modal LLMs

Recent Activity

authored a paper 6 days ago

Qwen3-VL Technical Report

upvoted a paper 6 days ago

Qwen3-VL Technical Report

authored a paper 13 days ago

Diversity-Aware Meta Visual Prompting

View all activity

Organizations

None yet

authored a paper 6 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 14 days ago • 118

upvoted a paper 6 days ago

Qwen3-VL Technical Report

Paper • 2511.21631 • Published 14 days ago • 118

authored 5 papers 13 days ago

Diversity-Aware Meta Visual Prompting

Paper • 2303.08138 • Published Mar 14, 2023

Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting

Paper • 2308.10315 • Published Aug 20, 2023

Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

Paper • 2502.08590 • Published Feb 12 • 42

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Paper • 2502.11903 • Published Feb 17

CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning

Paper • 2509.22647 • Published Sep 26 • 32

liked a model 13 days ago

Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 15 • 2.26M • • 534

liked 2 models about 2 months ago

Qwen/Qwen3-VL-4B-Instruct

Image-Text-to-Text • 4B • Updated Oct 15 • 819k • 258

Qwen/Qwen3-VL-235B-A22B-Thinking-FP8

Image-Text-to-Text • 236B • Updated 14 days ago • 25.5k • 24

liked 2 models 3 months ago

Qwen/Qwen3-VL-235B-A22B-Instruct

Image-Text-to-Text • 236B • Updated 14 days ago • 75.4k • • 328

Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated 14 days ago • 6.6k • • 341

authored a paper 6 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24 • 26

liked a dataset 6 months ago

long-xing1/ScaleCap-450k

Viewer • Updated Jun 25 • 455k • 35 • 5

upvoted a paper 6 months ago

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Paper • 2506.19848 • Published Jun 24 • 26

upvoted 2 papers 8 months ago

VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning

Paper • 2504.07956 • Published Apr 10 • 47

MM-IFEngine: Towards Multimodal Instruction Following

Paper • 2504.07957 • Published Apr 10 • 35

upvoted a paper 9 months ago

Visual-RFT: Visual Reinforcement Fine-Tuning

Paper • 2503.01785 • Published Mar 3 • 85

upvoted 2 papers 10 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 74

SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Paper • 2502.13128 • Published Feb 18 • 41

Huang Qidong

AI & ML interests

Recent Activity

Organizations

shikiw's activity