UPDATE
I am still getting OOM on a L4 24GB , 64gb ram system.
Aggressive quant is most likely a failure and will not work as is.
2d weights only quant does load and run but still getting OOM from improper loading method.
Not sure of the best way to even run the base model yet, Should work where you offload the dense layers into ram and only leave the active layers in vram.
UNTESTED
I cant test this model myself as its too big.
Nucleus-Image_noreshape2Dweightonly.safetensors This model has only the 2D non-MoE expert layers quantized to nvfp4 which should give speed to the active layers in vram without loss of the dense layers.
I will try testing this model once more support comes out for the loading methods of the model.
Nucleus-Image_transformer_aggressive_nvfp4.safetensors This model is every layer quanted just for context reasons of what is possblie. The dense layers might be sensitive to the quant.
- Downloads last month
- 23
Model tree for ApacheOne/Nucleus-Image-NVFP4_mixed
Base model
NucleusAI/Nucleus-Image