3 9 3

JillJia

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

updated a dataset 3 months ago

JillJia/dataset

published a dataset 3 months ago

JillJia/dataset

View all activity

Organizations

upvoted a paper 3 months ago

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9, 2025 • 36

updated a dataset 3 months ago

JillJia/dataset

Preview • Updated Oct 9, 2025 • 220

published a dataset 3 months ago

JillJia/dataset

Preview • Updated Oct 9, 2025 • 220

liked a Space 3 months ago

BigCodeArena

🚀

Compare two AI models by sending them code and seeing their responses

upvoted an article 3 months ago

Article

BigCodeArena: Judging code generations end to end with code executions

Oct 7, 2025

•

upvoted 2 papers 5 months ago

R-Zero: Self-Evolving Reasoning LLM from Zero Data

Paper • 2508.05004 • Published Aug 7, 2025 • 129

VeriGUI: Verifiable Long-Chain GUI Dataset

Paper • 2508.04026 • Published Aug 6, 2025 • 161

upvoted a paper 10 months ago

Optimizing Decomposition for Optimal Claim Verification

Paper • 2503.15354 • Published Mar 19, 2025 • 18

updated a Space 10 months ago

Arena Annotation Progress

😻

Display battle counts per annotator

upvoted a paper 11 months ago

IHEval: Evaluating Language Models on Following the Instruction Hierarchy

Paper • 2502.08745 • Published Feb 12, 2025 • 20

liked a dataset about 1 year ago

wyu1/Leopard-Instruct

Viewer • Updated Nov 8, 2024 • 1.03M • 34.4k • 65

commented a paper about 1 year ago

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2, 2024 • 26 •

authored a paper over 1 year ago

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2, 2024 • 26

upvoted 3 papers over 1 year ago

LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks

Paper • 2410.01744 • Published Oct 2, 2024 • 26

BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

Paper • 2406.15877 • Published Jun 22, 2024 • 48

Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning

Paper • 2406.12050 • Published Jun 17, 2024 • 19

liked a model over 2 years ago

chitanda/llama-panda-zh-coig-7b-delta

Text Generation • Updated May 2, 2023 • 7 • 7

JillJia

AI & ML interests

Recent Activity

Organizations

JillJia's activity

BigCodeArena

BigCodeArena: Judging code generations end to end with code executions

Arena Annotation Progress