Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

SeaWolf-AIย 
posted an update 2 days ago
view post
Post
8337
๐Ÿงฌ Introducing Darwin-9B-NEG โ€” the first model with Native Entropy Gating (NEG)

๐Ÿ”— Try it now: FINAL-Bench/Darwin-9B-NEG
๐Ÿ”— Q4 bit : FINAL-Bench/Darwin-9B-MFP4

We're thrilled to release Darwin-9B-NEG, a 9B-parameter reasoning model
that embeds an architecturally-internalised sense of self-confidence directly
into the transformer โ€” our proprietary Native Entropy Gating (NEG) technology.

๐Ÿ“Š GPQA Diamond (198 PhD-level questions):

โ–ธ Baseline Darwin-9B (no NEG) โ†’ 51.01 %
โ–ธ Pure NEG (greedy ยท 1ร— cost) โ†’ 63.64 % ๐Ÿ”ฅ +12.63 %p
โ–ธ + Permutation (4ร— cost) โ†’ 76.26 %
โ–ธ + Ensemble Refinement (~20ร—) โ†’ 84.34 % ๐Ÿ†

With only 9 billion parameters and 1ร— inference cost, Pure NEG jumps
+12.63 %p over the same model without NEG. Going all-in with ensemble
refinement pushes it to 84.34 % โ€” surpassing the published Qwen3.5-9B
leaderboard score (81.7 %) by +2.64 %p.

๐Ÿ”ฌ What makes NEG different from Multi-Turn Iteration (MTI)?

Classical MTI needs 3-8ร— extra inference passes. NEG instead lives
INSIDE the single decoding loop. Two tiny modules ride with the
transformer: NEG-Head predicts per-token entropy from the last hidden
state, and NEG-Gate conditionally restricts the top-k choice when
confidence is low. The gate activates in only 4.36 % of tokens โ€”
essentially free at inference time.

โœจ Key differentiators
โ€ข Architecturally internalised โ€” model file *is* the feature
โ€ข 1ร— inference cost (vs. 3-8ร— for MTI)
โ€ข Drop-in with vLLM / SGLang / TGI / transformers โ€” no extra engine
โ€ข +12.63 %p reasoning at zero latency overhead
โ€ข Single-file deployment, Apache 2.0 licensed

๐Ÿงฌ Lineage
Qwen/Qwen3.5-9B โ†’ Darwin-9B-Opus (V7 evolutionary merge) โ†’ Darwin-9B-NEG (V8 + NEG training)

#Darwin #NEG #NativeEntropyGating #GPQA #Reasoning #LLM #OpenSource #Apache2
AkimfromParisย 
posted an update 3 days ago
view post
Post
2406
๐ŸŒธ ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™ ๐™‘2 ๐™ค๐™ฃ ๐™ƒ๐™ช๐™œ๐™œ๐™ž๐™ฃ๐™œ ๐™๐™–๐™˜๐™š ๐Ÿ‡ฏ๐Ÿ‡ต // ๐ŸŒธ ใƒใ‚ฎใƒณใ‚ฐใƒ•ใ‚งใ‚คใ‚น็‰ˆใ€Œ ๐—ข๐—ฝ๐—ฒ๐—ป ๐—๐—ฎ๐—ฝ๐—ฎ๐—ป๐—ฒ๐˜€๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—Ÿ๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ ๐—ฉ๐Ÿฎ ใ€ๅ…ฌ้–‹ ๐Ÿ‡ฏ๐Ÿ‡ต

I am thrilled to announce the launch of version 2 of the ๐™Š๐™ฅ๐™š๐™ฃ ๐™…๐™–๐™ฅ๐™–๐™ฃ๐™š๐™จ๐™š ๐™‡๐™‡๐™ˆ ๐™‡๐™š๐™–๐™™๐™š๐™ง๐™—๐™ค๐™–๐™ง๐™™. This initiative is driven by the "Fine-tuning and Evaluation" team, led by Professor Miyao at the The University of Tokyo, under the Research and Development Center for Large Language Models (LLMC) at Japanโ€™s National Institute of Informatics (NII).

๐™Ž๐™ฉ๐™ง๐™–๐™ฉ๐™š๐™œ๐™ž๐™˜ ๐™–๐™ฃ๐™™ ๐™ฉ๐™š๐™˜๐™๐™ฃ๐™ž๐™˜๐™–๐™ก ๐™ช๐™ฅ๐™œ๐™ง๐™–๐™™๐™š๐™จ:
- Our new backend features eight A100 GPUs, enabling the evaluation of open-source models of more than 100B parameters.
- Submissions now require a Hugging Face Hub login to ensure accountability.
- We have added metrics for evaluation time, COโ‚‚ emissions (thx to Code Carbon ๐ŸŒฑ ), alongside reasoning capabilities.

๐˜ฟ๐™–๐™ฉ๐™–๐™จ๐™š๐™ฉ๐™จ ๐™–๐™ฃ๐™™ ๐™š๐™ซ๐™–๐™ก๐™ช๐™–๐™ฉ๐™ž๐™ค๐™ฃ ๐™จ๐™ฉ๐™–๐™ฃ๐™™๐™–๐™ง๐™™๐™จ:
- New datasets cover reasoning, mathematics, exams, and instruction following.
- Math evaluations now span from grade-school levels to expert-tier challenges (GSM8K, PolyMath, AIME).
- While integrating English-heavy and multilingual benchmarks (including Humanityโ€™s Last Exam, GPQA, and BBH in both English and Japanese), we continue to prioritize unique Japanese cultural datasets.

llm-jp/open-japanese-llm-leaderboard-v2

ใฉใ†ใžใŠ้ก˜ใ„่‡ดใ—ใพใ™๏ผ๐Ÿ˜Š
SeanLee97ย 
posted an update 5 days ago
view post
Post
7993
Our lab recently released a paper where we introduce ShadowPEFT, a new Parameter-Efficient Fine-Tuning (PEFT) paradigm tailored for edge computing scenarios.

Unlike traditional approaches such as LoRA and its variants, which inject trainable parameters directly into the weights of Transformer, requiring tight coupling with the backbone.

ShadowPEFT instead enhances the frozen large base model by adding a lightweight, centralized, pretrainable, and detachable Shadow network.
This shadow network operates in parallel with the base model, delivering learned corrections to each decoder layer. Because the shadow module is architecturally decoupled from the backbone, it can be independently trained, stored, and deployed, benefiting edge computing scenarios and edge-cloud collaboration computing.

- HF Paper: ShadowPEFT: Shadow Network for Parameter-Efficient Fine-Tuning (2604.19254)
- GitHub: https://github.com/ShadowLLM/shadow-peft
- HF Collection: https://huggingface.co/collections/shadow-llm/shadow-peft-models
  • 7 replies
ยท
Tonicย 
posted an update 3 days ago
view post
Post
3195
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

I'm sharing huggingface's largest dataset of annotated statelite images today.

check it out here : NuTonic/sat-image-boundingbox-sft-full

I hope you like it , the idea is to be able to use this with small vision models ๐Ÿš€
anakin87ย 
posted an update 3 days ago
view post
Post
3185
A small model that struggled against a random opponent now beats GPT-5-mini at tic-tac-toe

I took LiquidAI/LFM2-2.6B and trained it through play.

๐Ÿง‘โ€๐Ÿณ Here's how:

1๏ธโƒฃ Build a solid RL env with Verifiers (Prime Intellect)
2๏ธโƒฃ Generate synthetic data: <200 games sampled from GPT-5-mini playing in the env
3๏ธโƒฃ SFT warm-up to teach format
4๏ธโƒฃ Group-based RL (CISPO) against opponents making 20-70% random moves
5๏ธโƒฃ RL again with stronger opponents (0-25% random moves) + 1.25 temperature to push exploration and shake off suboptimal strategies

Done! Beats GPT-5-mini ๐Ÿ†

---

๐ŸŽฎ Play against the model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿค— Model: anakin87/LFM2-2.6B-mr-tictactoe

๐Ÿ“š Walkthrough/course: https://github.com/anakin87/llm-rl-environments-lil-course

๐Ÿค— Dataset and checkpoints: https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe
Benedictatย 
posted an update 3 days ago
view post
Post
3524
Built a WeChat Mini Program in 20 minutes flat with Hy3 Preview + WorkBuddyโ€ฆ

and I didnโ€™t type a single line of code. Not even a semicolon.

This Coding Agent is on steroids. Its comprehension in long back-and-forths is night and day better, and that 256K context window swallows the entire project structure whole.

Tell it what you want, and it actually gets the full picture no confused blank stares from the AI.

And weโ€™re not messing around with dinky little code snippets here. It spits out a fully functional project

app.json, every pageโ€™s wxml/wxss/js/json, even Mock data pre-packed. Import it into WeChat Dev Tools and it runs on the first try

Only one tiny visual nitpick, zero logic bugs. Point out the flaw, and it fixes it instantly no new bugs, no passive-aggressive code breaks, no headaches

The entire vibe Tell it your idea โ†’ Get a complete working project โ†’ Mention a tiny flaw โ†’ AI polishes it.

No coding, no endless edits, no soul-crushing debugging that makes you want to throw your laptop. Absolute game-changer
imnotkittyย 
posted an update 3 days ago
view post
Post
3869
tencent/Hy3-preview is out: an open-weights MoE reasoning model.

โœ… 295B total / 21B active / 256K context
โœ… Fused fast-and-slow thinking in a single model
โœ… First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb โ†’ Apr)

Benchmarks:
๐Ÿ‘‰ SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch โ€” competitive results, particularly strong on agentic tool use
๐Ÿ‘‰ Top score on Tsinghua's 2026 Spring math PhD qualifying exam
๐Ÿ‘‰ Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life

More details can be found in my article: https://huggingface.co/blog/imnotkitty/hy3-preview
  • 2 replies
ยท
prithivMLmodsย 
posted an update 3 days ago
view post
Post
1052
Now, a collection of various compression schemes for Qwen3.6 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. ๐Ÿ‘‡

๐Ÿ”— Qwen3.6-MoE: https://huggingface.co/collections/prithivMLmods/qwen36-35b-a3b-compressions
๐Ÿ”— Qwen3.6-27B Compressions: https://huggingface.co/collections/prithivMLmods/qwen36-27b-compressions

๐Ÿค— > To learn more, visit the app page or the respective model pages.
qgallouedecย 
posted an update about 4 hours ago
view post
Post
29

TRL v1.3 ships day-one training support for Qwen 3.6 ๐Ÿš€

The new Qwen 3.6 family (Qwen/Qwen3.6-27B, Qwen/Qwen3.6-35B-A3B) reuses the Qwen3.5-MoE architecture but ships a slightly different chat template, so we updated the stack end-to-end: new training template with {% generation %} markers, tool-call response schema routing, tiny test models for the VLM matrix.

SFT with assistant-only loss works out of the box:

from trl import SFTConfig, SFTTrainer

trainer = SFTTrainer(
    model="Qwen/Qwen3.6-27B",
    args=SFTConfig(assistant_only_loss=True),
    train_dataset=dataset,
)
trainer.train()


So does GRPO tool-calling โ€” just hand tools=[...] to GRPOTrainer.

v1.3 also brings a new experimental TPO trainer (Triple Preference Optimization), speculative decoding in trl vllm-serve (Qwen3 MTP / Eagle3 drafts), 12 more KTO โ†” DPO alignment PRs (KTO promotion to stable is now in reach), three more {% generation %} chat templates (Gemma/Gemma 2, Phi-3, GLM-4-MoE), and a chunky SFT entropy bug fix.

Full release notes: https://github.com/huggingface/trl/releases/tag/v1.3.0
akhiilllย 
posted an update about 12 hours ago
view post
Post
47
Just shipped ClaimSense Adjudication Gym at OpenEnv Hackathon 2026 (Scaler India).

An OpenEnv RL environment for enterprise insurance claims adjudicationโ€”the monthly โ€œtool-heavyโ€ workflow real adjusters do: pull policy + claim history, run fraud checks, verify purchase/transactions, then approve / deny / escalate under partial observability with long-horizon credit assignment.

Trained Qwen/Qwen2.5-1.5B-Instruct with:

Rollout evaluation on HF Jobs (A10G) and a random baseline for comparison
Real GRPO weight updates (TRL GRPOTrainer) with LoRA adapters and two independent reward functions (format + env replay)
Headline training evidence:

GRPO run: 80 steps, 640 rollouts, KL rises ~0 โ†’ ~0.06 (real weight updates), completion length shrinks (~25 โ†’ ~10).
Plots + logs are committed in the Space under runs/.
Live demo + repo + writeup linked below.

๐Ÿ”— Env (Space URL): akhiilll/claims-env
๐Ÿงช Notebook: akhiilll/claims-env
๐Ÿ“ Blog: docs/HF_MINI_BLOG.md in the Space