HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation Paper • 2506.21546 • Published Jun 26 • 2
Uncertainty in Action: Confidence Elicitation in Embodied Agents Paper • 2503.10628 • Published Mar 13
Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting Paper • 2506.17212 • Published Jun 20
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts? Paper • 2507.19598 • Published Jul 25
PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentation Paper • 2412.15209 • Published Dec 19, 2024
Live-SWE-agent: Can Software Engineering Agents Self-Evolve on the Fly? Paper • 2511.13646 • Published 22 days ago • 7
Uncertainty in Action: Confidence Elicitation in Embodied Agents Paper • 2503.10628 • Published Mar 13
Part$^{2}$GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting Paper • 2506.17212 • Published Jun 20
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions Paper • 2406.15877 • Published Jun 22, 2024 • 48
Agentless: Demystifying LLM-based Software Engineering Agents Paper • 2407.01489 • Published Jul 1, 2024 • 64
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts Paper • 2404.15247 • Published Apr 23, 2024 • 3
NeuRI: Diversifying DNN Generation via Inductive Rule Inference Paper • 2302.02261 • Published Feb 4, 2023 • 3
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation Paper • 2305.01210 • Published May 2, 2023 • 3
NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers Paper • 2207.13066 • Published Jul 26, 2022
Coverage-Guided Tensor Compiler Fuzzing with Joint IR-Pass Mutation Paper • 2202.09947 • Published Feb 21, 2022