T AKHIL KUMAR REDDY PRO

akhiilll

AI & ML interests

None yet

Recent Activity

reacted to theirpost with 🔥 about 4 hours ago

Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India). An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment. Trained Qwen/Qwen2.5-1.5B-Instruct with: Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay) Headline training evidence: GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10). Plots + logs are committed in the Space under runs/. Live demo + repo + writeup linked below. 🔗 Env (Space URL): https://huggingface.co/spaces/akhiilll/claims-env 🧪 Notebook: https://huggingface.co/spaces/akhiilll/claims-env/blob/main/training/InsureClaim_Training_Colab.ipynb 📝 Blog: docs/HF_MINI_BLOG.md in the Space

posted an update about 15 hours ago

updated a model about 15 hours ago

akhiilll/claims-env-pro-grpo

View all activity

Organizations

None yet

reacted to their post with 🔥 about 4 hours ago

Post

Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space

posted an update about 15 hours ago

Post

Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudication—the monthly “tool-heavy” workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 → ~0.06 (real weight updates), completion length shrinks (~25 → ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

🔗 Env (Space URL): akhiilll/claims-env
🧪 Notebook: akhiilll/claims-env
📝 Blog: docs/HF_MINI_BLOG.md in the Space

T AKHIL KUMAR REDDY PRO

AI & ML interests

Recent Activity

Organizations

akhiilll's activity