🏅 Quantized Embeddings are here! Unlike model quantization, embedding quantization is a post-processing step for embeddings that converts e.g. float32 embeddings to binary or int8 embeddings. This saves 32x or 4x memory & disk space, and these embeddings are much easier to compare!
Our results show 25-45x speedups in retrieval compared to full-size embeddings, while keeping 96% of the performance!