Post
55
Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).
An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.
Trained Qwen/Qwen2.5-1.5B-Instruct with:
Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:
GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.
🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space
An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.
Trained Qwen/Qwen2.5-1.5B-Instruct with:
Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:
GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.
🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space