Seonil Son's picture

Seonil Son

sonsus

·

AI & ML interests

LLM alignment and evals.

Recent Activity

upvoted a paper 2 days ago

How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM's Residual Role in a Planning Agent

upvoted a paper 2 days ago

Becoming Experienced Judges: Selective Test-Time Learning for Evaluators

upvoted a paper 2 days ago

V-Agent: An Interactive Video Search System Using Vision-Language Models

View all activity

Organizations

None yet

upvoted 3 papers 2 days ago

How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM's Residual Role in a Planning Agent

Paper • 2604.07236 • Published 3 days ago • 1

Becoming Experienced Judges: Selective Test-Time Learning for Evaluators

Paper • 2512.06751 • Published Dec 7, 2025 • 1

V-Agent: An Interactive Video Search System Using Vision-Language Models

Paper • 2512.16925 • Published Nov 4, 2025 • 2

upvoted a paper 17 days ago

ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces

Paper • 2604.05172 • Published 25 days ago • 24

upvoted 2 papers about 1 month ago

SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning

Paper • 2603.22057 • Published Mar 23 • 46

RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models

Paper • 2603.21341 • Published Mar 22 • 23

updated a model 4 months ago

sonsus/gemma-3-0.7b-vlm-custom

0.7B • Updated Dec 24, 2025 • 1

published a model 4 months ago

sonsus/gemma-3-0.7b-vlm-custom

0.7B • Updated Dec 24, 2025 • 1

liked a model 4 months ago

GuminiResearch/Gumini-1.5B-Base

Text Generation • 2B • Updated Dec 18, 2025 • 86 • 8

liked a model 5 months ago

GuminiResearch/Gumini-1B-Base

Text Generation • 1B • Updated Dec 17, 2025 • 10 • 7

New activity in google/gemma-3-4b-pt 5 months ago

Wrong configs

#5 opened about 1 year ago by

New activity in NCSOFT/harim_plus 6 months ago

Create test.txt

#1 opened 6 months ago by

upvoted a paper 7 months ago

VARCO-VISION-2.0 Technical Report

Paper • 2509.10105 • Published Sep 12, 2025 • 7

New activity in lerobot/act_aloha_sim_transfer_cube_human 8 months ago

Fix: hyperlink to `eval.py` script corrected.

#6 opened 8 months ago by

liked a dataset 8 months ago

fgenie777/Arena-Lite-Experiments-Result-Data

Updated Aug 4, 2025 • 13 • 1

New activity in fgenie777/Arena-Lite-Experiments-Result-Data 9 months ago

Update README.md

#1 opened 9 months ago by

updated a Space 9 months ago

Arena-Lite

Tournaments for Efficient & Reliable LLM Benchmarking

New activity in NCSOFT/ArenaLite 9 months ago

Adding your own judge prompt: Working example

#2 opened 9 months ago by

Submitting OPENAI API KEY is quite worrisome: Local-hosting of Arena-Lite

#1 opened 9 months ago by

published a Space 9 months ago

Arena-Lite

Tournaments for Efficient & Reliable LLM Benchmarking