MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26 • 9
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models Paper • 2506.04180 • Published Jun 4 • 33
MangaliCa Train / Eval Dataset 🐗 Collection Collection of MangaliCa's pre-training datasets • 8 items • Updated May 22 • 2
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21 • 234
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models Paper • 2412.02980 • Published Dec 4, 2024 • 15
Best Practices and Lessons Learned on Synthetic Data for Language Models Paper • 2404.07503 • Published Apr 11, 2024 • 31
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published Apr 22 • 64
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published Apr 14 • 85
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters Paper • 2504.08791 • Published Apr 7 • 137
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 Mar 26 • 175