Bram Vanroy's picture

Bram Vanroy PRO

BramVanroy

·

https://bramvanroy.github.io/

AI & ML interests

Artificial intelligence, natural language processing, computational linguistics

Recent Activity

upvoted an article 3 days ago

We Got Claude to Fine-Tune an Open Source LLM

updated a dataset 3 days ago

BramVanroy/finemath-4plus-seqlen36k

published a dataset 3 days ago

BramVanroy/finemath-4plus-seqlen36k

View all activity

Organizations

upvoted an article 3 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

5 days ago

•

331

upvoted a paper 13 days ago

The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models

Paper • 2510.13996 • Published Oct 15 • 8

upvoted a paper about 1 month ago

Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training

Paper • 2506.01732 • Published Jun 2 • 6

upvoted a collection about 2 months ago

Leesplank Wim

18 items • Updated 26 days ago • 1

upvoted a paper about 2 months ago

Phi-4 Technical Report

Paper • 2412.08905 • Published Dec 12, 2024 • 122

upvoted a collection 2 months ago

Qwen3

84 items • Updated Aug 6 • 1.47k

upvoted an article 3 months ago

Article

mmBERT: ModernBERT goes Multilingual

+4

Sep 9

•

129

upvoted 2 collections 3 months ago

robots-txt

6 items • Updated Aug 2 • 8

open-sci-ref-0.01

Research baseline models trained on various open reference datasets • 12 items • Updated Jul 23 • 4

upvoted an article 3 months ago

Article

Releasing Common Corpus: the largest public domain dataset for training LLMs

Mar 20, 2024

•

29

upvoted an article 5 months ago

Article

SmolLM3: smol, multilingual, long-context reasoner

+21

Jul 8

•

735

upvoted a paper 5 months ago

FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language

Paper • 2506.20920 • Published Jun 26 • 75

upvoted an article 7 months ago

Article

Assisted Generation: a new direction toward low-latency text generation

May 11, 2023

•

74

upvoted a paper 12 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 376

upvoted a collection about 1 year ago

Common Models

The first generation of models pretrained on Common Corpus. • 5 items • Updated Dec 5, 2024 • 41

upvoted an article about 1 year ago

Article

EuroLLM-9B

Dec 2, 2024

•

137

upvoted a collection about 1 year ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Jul 21 • 666

upvoted a collection almost 2 years ago

GEITje 7B: A Large Open Dutch Language Model

All models and datasets relating to GEITje • 8 items • Updated Jan 25 • 5

upvoted a collection about 2 years ago

Recent models: last 100 repos, sorted by creation date

The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31, 2024 • 564