-
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 14 -
ProTIP: Progressive Tool Retrieval Improves Planning
Paper • 2312.10332 • Published • 8 -
Paloma: A Benchmark for Evaluating Language Model Fit
Paper • 2312.10523 • Published • 13 -
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Paper • 2406.17557 • Published • 98
daje kang
daje
AI & ML interests
None yet
Recent Activity
liked
a dataset
7 days ago
nvidia/ToolScale
updated
a model
15 days ago
daje/whisper-v3-turbo-address
published
a model
15 days ago
daje/whisper-v3-turbo-address