In a Training Loop 🔄

4 10 2

Zihan Ma

MichaelErchi

https://mazihan880.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

new activity 4 months ago

opencompass/CodeForce_SAGA:Update README.md

authored a paper 5 months ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

View all activity

Organizations

upvoted a paper about 2 months ago

OdysseyArena: Benchmarking Large Language Models For Long-Horizon, Active and Inductive Interactions

Paper • 2602.05843 • Published Feb 5 • 60

New activity in opencompass/CodeForce_SAGA 4 months ago

Update README.md

#4 opened 4 months ago by

MichaelErchi

authored 2 papers 5 months ago

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published Nov 11, 2025 • 3

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published Nov 18, 2025 • 17

upvoted 2 papers 5 months ago

ATLAS: A High-Difficulty, Multidisciplinary Benchmark for Frontier Scientific Reasoning

Paper • 2511.14366 • Published Nov 18, 2025 • 17

How Brittle is Agent Safety? Rethinking Agent Risk under Intent Concealment and Task Complexity

Paper • 2511.08487 • Published Nov 11, 2025 • 3

authored a paper 8 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 273

upvoted 2 papers 8 months ago

Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21, 2025 • 273

CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward

Paper • 2508.03686 • Published Aug 5, 2025 • 39

liked 2 datasets 8 months ago

opencompass/CodeCompass

Updated Aug 1, 2025 • 513 • 1

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1, 2025 • 5.57k • 232 • 1

New activity in opencompass/CodeForce_SAGA 8 months ago

Update metadata: task category and add library name

#2 opened 9 months ago by

nielsr

New activity in opencompass/CodeCompass 8 months ago

Improve dataset card: Update task category, add library_name, link paper

#1 opened 9 months ago by

nielsr

updated 2 datasets 9 months ago

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1, 2025 • 5.57k • 232 • 1

opencompass/CodeCompass

Updated Aug 1, 2025 • 513 • 1

published a dataset 9 months ago

opencompass/CodeForce_SAGA

Viewer • Updated Aug 1, 2025 • 5.57k • 232 • 1

authored a paper 9 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9, 2025 • 29

upvoted a paper 9 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9, 2025 • 29

commented a paper 9 months ago

Rethinking Verification for LLM Code Generation: From Generation to Testing

Paper • 2507.06920 • Published Jul 9, 2025 • 29 •

published a dataset 9 months ago

opencompass/CodeCompass

Updated Aug 1, 2025 • 513 • 1

Zihan Ma

AI & ML interests

Recent Activity

Organizations

MichaelErchi's activity

Update README.md

Update metadata: task category and add library name

Improve dataset card: Update task category, add library_name, link paper