119 331

Dokyoon

leeloolee

Eruly

AI & ML interests

Recent Activity

liked a model 1 day ago

Hcompany/Holo-3.1-9B

upvoted a paper 1 day ago

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

liked a model 3 days ago

Kwai-Keye/Keye-VL-2.0-30B-A3B

View all activity

Organizations

liked a model 1 day ago

Hcompany/Holo-3.1-9B

Image-Text-to-Text • 9B • Updated 3 days ago • 595 • 17

upvoted a paper 1 day ago

AUDITFLOW: Executable Symbolic Environments for Structured Financial Reporting Verification

Paper • 2606.03031 • Published 4 days ago • 6

liked a model 3 days ago

Kwai-Keye/Keye-VL-2.0-30B-A3B

Image-Text-to-Text • 31B • Updated 8 days ago • 2.24k • 108

liked a dataset 3 days ago

InternScience/SGI-DeepResearch

Viewer • Updated 3 days ago • 318 • 1.07k • 7

liked a Space 7 days ago

Open AI Co-Scientist

📊

Open-source implementation of Google's AI Co-Scientist

liked 2 datasets 8 days ago

FrontierCS/Frontier-CS

Viewer • Updated about 2 hours ago • 263 • 3.77k • 6

google/FACTS-grounding-public

Viewer • Updated Dec 19, 2024 • 868 • 2.16k • 46

reacted to imnotkitty's post with 🔥 about 1 month ago

Post

4001

tencent/Hy3-preview is out: an open-weights MoE reasoning model.

✅ 295B total / 21B active / 256K context
✅ Fused fast-and-slow thinking in a single model
✅ First model trained on Hunyuan's rebuilt pretraining + RL infra (Feb → Apr)

Benchmarks:
👉 SWE-Bench Verified, Terminal-Bench 2.0, BrowseComp, WideSearch — competitive results, particularly strong on agentic tool use
👉 Top score on Tsinghua's 2026 Spring math PhD qualifying exam
👉 Strong context-learning and instruction-following on Tencent's CL-bench / CL-bench-Life

More details can be found in my article: https://huggingface.co/blog/imnotkitty/hy3-preview

2 replies

upvoted a paper about 1 month ago

Reinforcement-aware Knowledge Distillation for LLM Reasoning

Paper • 2602.22495 • Published Feb 26 • 6

liked a model about 1 month ago

microsoft/maira-2-sae

Feature Extraction • Updated Jul 23, 2025 • 9

reacted to anakin87's post with ❤️ about 2 months ago

Post

3311

📣 I just published a free course on Reinforcement Learning Environments for Language Models!

📌 COURSE: https://github.com/anakin87/llm-rl-environments-lil-course

Over the past year, we've seen a shift in LLM Post-Training.
Previously, Supervised Fine-Tuning was the most important part: making models imitate curated Question-Answer pairs.

Now we also have Reinforcement Learning with Verifiable Rewards. With techniques like GRPO, models can learn through trial and error in dynamic environments. They can climb to new heights without relying on expensively prepared data.

But what actually are these environments in practice❓ And how do you build them effectively❓

Fascinated by these concepts, I spent time exploring this space through experiments, post-training Small Language Models.
I've packaged everything I learned into this short course.

What you'll learn

🔹 Agents, Environments, and LLMs: how to map Reinforcement Learning concepts to the LLM domain
🔹 How to use Verifiers (open-source library by Prime Intellect) to build RL environments as software artifacts
🔹 Common patterns: How to build single-turn, multi-turn, and tool-use environments

🔹 Hands-on: turn a small language model (LFM2-2.6B by LiquidAI) into a Tic Tac Toe master
🔸 Build the game Environment
🔸 Use it to generate synthetic data for SFT warm-up
🔸 Group-based Reinforcement Learning

If you're interested in building "little worlds" where LLMs can learn, this course is for you.

---

🤗🕹️ Play against the trained model: anakin87/LFM2-2.6B-mr-tictactoe

📚 HF collection (datasets + models): https://huggingface.co/collections/anakin87/lfm2-26b-mr-tic-tac-toe

1 reply

liked a dataset 2 months ago

InternScience/ResearchClawBench

Benchmark • Updated about 2 hours ago • 4.54k • 5

liked a model 2 months ago

rl-research/DR-Tulu-8B-results

Updated Mar 26 • 1

upvoted a paper 2 months ago

Grounding Everything in Tokens for Multimodal Large Language Models

Paper • 2512.10554 • Published Dec 11, 2025 • 2

reacted to Shrijanagain's post with 🔥 2 months ago

Post

5651

We are thrilled to announce the launch of SKT-OMNI-CORPUS-2T, a massive-scale, high-quality dataset designed to power the next generation of Foundation Models (LLMs) from scratch.
Developed at SKT AI LABS, this corpus is not just a collection of data; it’s a mission to decentralize high-grade AI training for regional languages and global knowledge.

💎 Key Highlights:

•• Massive Scale: Targeting a multi-terabyte architecture for 2T-level tokenization.

•• Pure Quality: Curated from 500+ Elite Sources

•• Structured for MoE: Perfectly sharded into 3.5GB standardized units (SKT-𝕻 series) for seamless distributed training.

🤝 Open for Collaboration!

We are looking for AI researchers, CUDA engineers, and data scientists to join us in this journey of building Project Surya and the ST-X Series models. Whether it's optimization, custom tokenization, or architecture design—let’s build the future together.

Explore the Dataset on Hugging Face:

🔗 https://huggingface.co/datasets/Shrijanagain/SKT-OMNI-CORPUS-146T-V1

DSR -- 🔗 https://huggingface.co/datasets/Shrijanagain/SKT-DSRx10000

#AI #MachineLearning #OpenSource #IndicAI #SKTAILABS #LLM #BigData #HuggingFace #InnovationIndia

upvoted a paper 2 months ago

On the Direction of RLVR Updates for LLM Reasoning: Identification and Exploitation

Paper • 2603.22117 • Published Mar 23 • 29

upvoted an article 2 months ago

Article

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

burtenshaw

•

Jan 20

• 12

reacted to DedeProGames's post with 🚀 2 months ago

Post

5278

Introducing GRM2, a powerful 3 billion parameter model designed for long-term reasoning and high performance in complex tasks.

Even with only 3 billion parameters, it outperforms qwen3-32b in several benchmarks and complex reasoning tasks.

With just 3 billion parameters, it can also generate extensive and complex code with over 1000 lines, utilize tools comparable to larger models, and is perfect for agentic tasks.

GRM2 is licensed under Apache 2.0, making it ideal as a base for FineTune in other tasks.
You can see more here: OrionLLM/GRM2-3b

upvoted an article 2 months ago

Article

Build a Domain-Specific Embedding Model in Under a Day

nvidia

•

Mar 20

• 73

upvoted a paper 2 months ago

HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning

Paper • 2603.17024 • Published Mar 17 • 110

Dokyoon

AI & ML interests

Recent Activity

Organizations

leeloolee's activity

Open AI Co-Scientist

Scaling OpenEnv: From Free Usage to Thousands of Concurrent Environments

Build a Domain-Specific Embedding Model in Under a Day