Kian Kyars
kyars
AI & ML interests
None yet
Recent Activity
updated a Space about 3 hours ago
kyars/sandbox-448fd80a published a model about 23 hours ago
kyars/CogDrift-R1-14B updated a Space about 24 hours ago
kyars/sandbox-e6e41285Organizations
Kudos
❤️ 1
#3 opened 27 days ago
by
kyars
commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 4 months ago
Yes, it's done for each transformer block in an LM because each transformer block has different attention heads. If you do it for only one transformer block across all blocks, then you don't get the same representation.
commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 4 months ago
I think I got lost around the standard inference versus Kv caching section because I couldn't understand the matmuls happening based on each flashing repetition of those yellow blocks. But perhaps I just need to go through the blog post once again to try to better understand it.
commented on KV Caching Explained: Optimizing Transformer Inference Efficiency 5 months ago
I didn't understand the explanation
commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 7 months ago
This technique is not a better lesson pilled at all. Waste of time when the model will just learn to do this anyways.
awesome resource
#1 opened 9 months ago
by
kyars