MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model Paper • 2406.11193 • Published Jun 17, 2024
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality Paper • 2410.04780 • Published Oct 7, 2024
UrbanCLIP: Learning Text-enhanced Urban Region Profiling with Contrastive Language-Image Pretraining from the Web Paper • 2310.18340 • Published Oct 22, 2023
RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning Paper • 2502.00848 • Published Feb 2, 2025 • 1
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges Paper • 2412.11936 • Published Dec 16, 2024
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video Paper • 2505.02064 • Published May 4, 2025 • 4
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models Paper • 2410.03577 • Published Oct 4, 2024 • 1
PhysicsArena: The First Multimodal Physics Reasoning Benchmark Exploring Variable, Process, and Solution Dimensions Paper • 2505.15472 • Published May 21, 2025 • 3
SafeEraser: Enhancing Safety in Multimodal Large Language Models through Multimodal Machine Unlearning Paper • 2502.12520 • Published Feb 18, 2025
EssayJudge: A Multi-Granular Benchmark for Assessing Automated Essay Scoring Capabilities of Multimodal Large Language Models Paper • 2502.11916 • Published Feb 17, 2025 • 1
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios Paper • 2411.02708 • Published Nov 5, 2024 • 1
GM-PRM: A Generative Multimodal Process Reward Model for Multimodal Mathematical Reasoning Paper • 2508.04088 • Published Aug 6, 2025
Don't Just Chase "Highlighted Tokens" in MLLMs: Revisiting Visual Holistic Context Retention Paper • 2510.02912 • Published Oct 3, 2025 • 1
CausalEmbed: Auto-Regressive Multi-Vector Generation in Latent Space for Visual Document Embedding Paper • 2601.21262 • Published Jan 29 • 1
Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations Paper • 2603.01666 • Published 8 days ago • 1
Beyond the Grid: Layout-Informed Multi-Vector Retrieval with Parsed Visual Document Representations Paper • 2603.01666 • Published 8 days ago • 1