Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
1
2
3
Miro Doporto
PRO
spanofzero
Follow
cahlen's profile picture
lulavc's profile picture
papaver-somniferum's profile picture
3 followers
·
8 following
AI & ML interests
None yet
Recent Activity
replied
to
Zoberzzz
's
post
about 5 hours ago
Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.
replied
to
Zoberzzz
's
post
about 5 hours ago
Hackernews post · TXT Show HN: I compressed a 160GB KV cache to 640MB at 0.9994 fidelity on a $300 GPU Title: Show HN: DenseMem — 256x KV cache compression, 0.9994 fidelity, runs on consumer hardware --- A 72B model at 32K context needs 160GB of KV cache. That's an H100 and $32,000 in HBM3e memory. I built a protocol that stores the same KV cache in 640MB of DDR5 RAM — on a consumer RTX 4090 and Core i9. 256x compression. 0.9994 cosine similarity. 1.95ms average fetch latency. Verified. **How:** Transformer KV cache activations are highly structured and correlated. SVD at rank=64 exploits that structure. Random noise compresses to 0.12 fidelity. Real KV cache activations compress to 0.9994. The math works because the data isn't random — it has geometry. The system manages a two-tier hierarchy: VRAM is the hot tier, DDR5 is the warm tier. An attention-weighted evictor (0.5 attn + 0.3 recency + 0.2 freq) decides what stays hot. A prefetcher using layer lookahead and token prediction pre-positions pages before they're needed. Average fetch latency: 1.95ms. Max under load: 3.96ms. Current hit rate is 25% — bottlenecked by my i9's 2-channel DDR5 bandwidth (~38 GB/s). On an 8-channel Threadripper PRO (~224 GB/s) I'm projecting 65-75%. **Running live:** - Qwen2.5-7B on RTX 4090 at 32K context (was 4K) - Every inference tick compressed INT8 via PCA → DDR5 - 2.4s cold start **The cost math:** - Uncompressed 72B KV cache: $32,000 in HBM3e - FoldedMemory: $1.88 in DDR5 - 99.4% cost reduction. Verified on consumer hardware. GitHub: https://github.com/thorshammerztp-arch/densemem-protocol Patent Pending: US 64/045,595 Solo developer. Navy veteran. No funding. Consumer hardware.
upvoted
a
changelog
about 18 hours ago
Spaces agents.md for your coding agents
View all activity
Organizations
spanofzero
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
spanofzero/SpaceTravelersUniversalPlaylist
about 1 month ago
[bot] Conversion to Parquet
1
#1 opened about 1 month ago by
parquet-converter