AI & ML interests

None defined yet.

Recent Activity

WilhelmT  updated a model 22 days ago
embedl/gemma-3-270m-it-FlashHead
swaze  updated a Space 23 days ago
embedl/README
WilhelmT  updated a model 24 days ago
embedl/Qwen3-0.6B-FlashHead
View all activity

Embedl

Embedl develops advanced tools and algorithms for Edge AI. Our mission is to make AI models run faster, more energy-efficient, and reliably across diverse hardware platforms, while significantly reducing development time.

We help teams deploy high-performance AI on real-world, resource-constrained devices.

Embedl Models (Community)

Pre-optimized models that can be used off-the-shelf or customized for specific hardware target supported by the embedl-models package.

First release highlights:

  • The fastest Small Language Models (SLMs) using FlashHead, a novel architectural improvement to the language-model head
  • Works with popular models like Llama, Gemma, and Qwen
  • Provides speedups on top of:
    • Quantization
    • Flash Attention
    • Other standard optimizations

Device: Nvidia Jetson Thor

Model Generation speed (tokens/s)
embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16 100
Llama-3.2-3B-Instruct-W4A16* 80
RedHatAI/Llama-3.2-3B-Instruct-FP8 64
meta-llama/Llama-3.2-3B-Instruct 37

*Embedl quantized model for benchmarking similar to the FlashHead-W4A16 but without the faster FlashHead and custom generation loop.


Contact

Headquarters (Sweden)
Gamla Almedalsvägen 39
412 63 Gothenburg, Sweden

Email: [email protected]

datasets 0

None public yet