Hugging Science

Team

community

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

VibecoderMcSwaggins updated a dataset 2 days ago

hugging-science/isles24-stroke

Kameshr authored a paper 21 days ago

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

jtlicardo authored a paper about 1 month ago

Performance Trade-offs of Optimizing Small Language Models for E-Commerce

View all activity

Articles

SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution

9 days ago

•

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

22 days ago

•

Advancing Predictive ADMET Modeling Through Community-Driven Science: The ExpansionRx-OpenADMET Blind Challenge

Oct 27

•

Promoter-GPT: Writing DNA Instructions with Language Models

Oct 22

•

AI for Food Allergies

Oct 16

•

VibecoderMcSwaggins

updated a dataset 2 days ago

hugging-science/isles24-stroke

Updated 2 days ago • 95 • 1

invi-bhagyesh

authored a paper 19 days ago

TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models

Paper • 2511.15807 • Published 21 days ago

vumichien

authored 2 papers about 2 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29 • 7

BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution

Paper • 2510.08697 • Published Oct 9 • 36

rchan26

authored a paper 4 months ago

Retrieval-augmented reasoning with lean language models

Paper • 2508.11386 • Published Aug 15 • 5

merterbak

posted an update 4 months ago

Post

3928

OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗

merterbak/gpt-oss-20b-demo

abir-hr196

authored 2 papers 4 months ago

Activation Space Interventions Can Be Transferred Between Large Language Models

Paper • 2503.04429 • Published Mar 6 • 2

TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research

Paper • 2503.12730 • Published Mar 17 • 4

rchan26

authored a paper 7 months ago

Behind Maya: Building a Multilingual Vision Language Model

Paper • 2505.08910 • Published May 13 • 2

merterbak

posted an update 7 months ago

Post

4215

Qwen 3 technical report released🚀
Report: https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf

merterbak

posted an update 7 months ago

Post

2641

Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages.
Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4
Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master
Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf

merterbak

posted an update 7 months ago

Post

1719

Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. 🚀

Phi4 mini reasoning(SFT): microsoft/Phi-4-mini-reasoning
Phi-4 reasoning(SFT): microsoft/Phi-4-reasoning
Phi-4 reasoning plus (SFT + RL): microsoft/Phi-4-reasoning-plus
Demo: https://github.com/marketplace/models/azureml/Phi-4-reasoning/playground
Articles: https://arxiv.org/pdf/2504.21318
https://arxiv.org/pdf/2504.21233
Blog: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/

1 reply

merterbak

posted an update 8 months ago

Post

5020

Qwen 3 models released🔥
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

✅ Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
✅Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
✅ Three stage done while pretraining:
• Stage 1: General language learning and knowledge building.
• Stage 2: Reasoning boost with STEM, coding, and logic skills.
• Stage 3: Long context training
✅ It supports MCP in the model
✅ Strong agent skills
✅ Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
✅ Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.

merterbak

posted an update 8 months ago

Post

3637

FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.

✅ First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. 🤖
✅ Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. 💡
✅ Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.

FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner

merterbak

posted an update 8 months ago

Post

2338

Here’s a cool paper I found: “Massive Image Embedding Benchmark (MIEB).” It is a new tool to test how good image embedding models are. It has 130 different tasks grouped into 8 categories, like image search, classification, clustering similar images, answering questions based on images, and understanding documents. It even covers 38 different languages.

The authors tested 50 models and found that no single model was best at everything. Some models were great at recognizing text inside images but struggled to handle complicated tasks like matching images and text that appear together.

Paper: https://arxiv.org/pdf/2504.10471v1
Code: https://github.com/embeddings-benchmark/mteb

2 replies

merterbak

posted an update 8 months ago

Post

3066

OpenAI published 2 benchmark datasets on Hugging Face 🔥
openai/mrcr
openai/graphwalks
MRCR tests how well a model can find the right answer when many similar questions are spread out in a long context. Graphwalks checks if a model can follow steps in a big graph and find the correct nodes by thinking through the structure

merterbak

posted an update 8 months ago

Post

3711

OpenAI has released BrowseComp an open source benchmark designed to evaluate the web browsing capabilities of AI agents. This dataset comprising 1,266 questions challenges AI models to navigate the web and uncover complex and obscure information. Crafted by human trainers, the questions are intentionally difficult. (unsolvable by another person in under ten minutes and beyond the reach of existing models like ChatGPT with and without browsing and an early version of OpenAI's Deep Research tool.)

Blog Post: https://openai.com/index/browsecomp/
Paper: https://cdn.openai.com/pdf/5e10f4ab-d6f7-442e-9508-59515c65e35d/browsecomp.pdf
Code in simple eval repo: https://github.com/openai/simple-evals

merterbak

posted an update 8 months ago

Post

4771

Qwen 3 can launch very soon. 👀

https://github.com/ggml-org/llama.cpp/pull/12828

3 replies

merterbak

posted an update 8 months ago

Post

3054

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

merterbak

posted an update 10 months ago

Post

3827

🔥 Meet Muse: that can generate a game environment based on visuals or players’ controller actions. It was developed by Microsoft Research in collaboration with Ninja Theory (Hellblade developer). It’s built on something called the World and Human Action Model (WHAM-1.6B model). They trained on 7 years of Bleeding Edge gameplay and it can generate 2 minute long 3D game sequences with consistent physics and character behaviors all from just a second of input. They’ve gone and open-sourced it too. Open weights, the WHAM Demonstrator, and sample data on Azure AI Foundry for anyone to play with. Hope so soon on Hugging Face 🤗.

📄 Paper: https://www.nature.com/articles/s41586-025-08600-3
Blog Post: https://www.microsoft.com/en-us/research/blog/introducing-muse-our-first-generative-ai-model-designed-for-gameplay-ideation/

1 reply

AI & ML interests

Recent Activity

Articles

SARLO-80: Worldwide Slant SAR Language Optic Dataset at 80 cm Resolution

The Pharmome Map: a comprehensive public dataset for drug-target interaction modeling

Advancing Predictive ADMET Modeling Through Community-Driven Science: The ExpansionRx-OpenADMET Blind Challenge

Promoter-GPT: Writing DNA Instructions with Language Models

AI for Food Allergies

Team members 787

hugging-science's activity