AI & ML interests

Sentient enables Loyal AI: community-built, community-owned, community-aligned, and community-controlled. Our mission is to ensure that when AGI is created, it is Loyal—not to corporations, but to humanity.

namanvats 
posted an update about 16 hours ago
view post
Post
68
Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.

Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:

* Goose: 0.450 → 0.525
* OpenHands-SDK: 0.575 → 0.500

A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.

What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.

Same model, same task slice, different harness behavior under a tighter interaction budget.

Dataset:
namanvats/harbor-goose-openhands-benchmark

Code/configs:
https://github.com/namanvats/harbor-agent-ablation