SparseFlow-Chat v5
An efficient conversational AI with sparse attention - achieving significant compute savings.
π Performance
| Metric | Value |
|---|---|
| Parameters | 39,840,002 |
| Perplexity | 1.00 |
| Token Sparsity | 87.5% |
| Attention Saved | 87.5% |
ποΈ Architecture
- Sparse Token Router: O(nΓk) instead of O(nΒ²) attention
- Persistent Memory Banks: Store and retrieve knowledge
- Channel Sparsity: Activates only top-k channels
Complexity Comparison
| Operation | Transformer | SparseFlow | Speedup |
|---|---|---|---|
| Attention | O(nΒ²) | O(nΓk) | 8x |
| FFN | O(nΓdΒ²) | O(nΓkΓd) | ~4x |
π¬ Usage
# Load model
import torch
checkpoint = torch.load("model.pt")
# ... initialize model with config.json
model.load_state_dict(checkpoint['model'])
# Chat
response = chat("What is the capital of France?")
# -> "The capital of France is Paris."
π Created By
Logo (Mike Amega) β Ame Web Studio
February 2025
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support