β 295B total / 21B active / 256K context β Fused fast-and-slow thinking in a single model β First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb β Apr)
Benchmarks: π SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch β competitive results, particularly strong on agentic tool use π Top score on Tsinghua's 2026 Spring math PhD qualifying exam π Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life
Over the past year, we've seen a shift in LLM Post-Training. Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.
Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.
But what actually are these environments in practiceβ And how do you build them effectivelyβ
Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models. I've packaged everything I learned into this short course.
What you'll learn
πΉ Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain πΉ How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts πΉ Common patterns: How to build single-turn, multi-turn, and tool-use environments
πΉ Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master πΈ Build the game Environment πΈ Use it to generate synthetic data for SFT warm-up πΈ Group-based Reinforcement Learning
If you're interested in building "little worlds" where LLMs can learn, this course is for you.
βWe are thrilled to announce the launch of SKT-OMNI-CORPUS-2T, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch. βDeveloped at SKT AI LABS, this corpus is not just a collection of data; itβs a mission to decentralize high-grade AI training for regional languages and global knowledge.
βπ Key Highlights:
ββ’β’ Massive Scale: Targeting a multi-terabyte architecture for 2T-level tokenization.
β’β’ βPure Quality: Curated from 500+ Elite Sources
β’β’ βStructured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-π» series) for seamless distributed training.
βπ€ Open for Collaboration!
βWe are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture designβletβs build the future together.
Introducing GRM2, a powerful 3 billion parameter model designed for long-term reasoning and high performance in complex tasks.
Even with only 3 billion parameters, it outperforms qwen3-32b in several benchmarks and complex reasoning tasks.
With just 3 billion parameters, it can also generate extensive and complex code with over 1000 lines, utilize tools comparable to larger models, and is perfect for agentic tasks.
GRM2 is licensed under Apache 2.0, making it ideal as a base for FineTune in other tasks. You can see more here: OrionLLM/GRM2-3b