Whooooooa! Whaattttt!? Da Farrrq?!?!
Best Quants in the history of AI! - Even the Q3 produces insane results!
It's fkn REVOLUTION! !!!!!!
How did they achieve this??- ASBOLUTE FKN GENIUS!
China is on another level in the AI game!
My mind has never been so blown!
Thank you China - MASSSSSIVE LOVE FROM THE UK!
Where and how do I run the quantized model? seriously i have no idea, I'm a beginner.
@Klasta
you have ComfyUI, right? If not, download and install the latest release from https://github.com/comfyanonymous/ComfyUI/releases/tag/v0.3.75
Then download example_workflow.json from this repo and open it in ComfyUI, it has instructions for the model files to download.
Now, I have to join @RKAAI in praising this model. It just keeps blowing my mind.
The latest thing is, I decided to try training a LoRA for it in ComfyUI. I only have 4 GB of VRAM and can train SDXL-size models only with a Q5 quant at best, so I totally expected it to never work but there it goes running the training iterations with Q3_K_S with my ancient GPU at the moment and it's hardly any slower than SDXL in doing so! It's only using about half the available VRAM which is weird because it did OOM earlier with a Q4. I changed the text encoder to a smaller quant but I didn't think that would matter because it should unload it in between... I have no idea. Strange and wonderful.
@rththr
simply load the GGUF and plug into the Train LoRA node.
You could use a normal VAE encode instead of a TAE, and replace the CLIP loader with a regular one if you have a fast CPU. I need MultiGPU to make it run on the GPU because encoding captions on my ancient CPU takes a very long time.
However, it is very fragile. Works only in ComfyUI 0.3.75 and even then it sometimes breaks with a "RuntimeError: Inference tensors do not track version counter." if I change something in the dataset. I haven't been able to get the new training nodes in 0.3.76 to work with Z-Image GGUFs at all.
Maybe it's not even supposed to work because PyTorch gives an error about "only floating point tensors can require gradients" when trying to force a GGUF to load in full non-inference mode..?