Commit History

Add GPU estimator, DDG search, and cancel support
4ce42e8

Alikestocode commited on

Load vLLM from local snapshot to support default subfolders
0e2f6c4

Alikestocode commited on

Default to Gemma router and limit prefetch
6fb2aa6

Alikestocode commited on

Fix prefetch init order
2790442

Alikestocode commited on

Parallelize AWQ model prefetching
e829b15

Alikestocode commited on

Disable vLLM by default on MIG devices
63c8de5

Alikestocode commited on

Remove unsupported vLLM device kwarg
f886036

Alikestocode commited on

Adjust CUDA handling and set explicit device for vLLM
8fd14bc

Alikestocode commited on

Fix UnboundLocalError: remove duplicate torch import
5ee455a

Alikestocode commited on

Improve vLLM device detection: force torch CUDA reinit
75aac04

Alikestocode commited on

Fix remaining pipeline calls to use transformers_repo
34ee4d1

Alikestocode commited on

Fix vLLM device detection and AWQ model loading
41f50c5

Alikestocode commited on

Fix AWQ model loading: point to default/ subfolder and fix tokenizer loading
a76dbfd

Alikestocode commited on

Add test scripts for AWQ models on ZeroGPU Space
dd11bd9

Alikestocode commited on

Update Qwen model repo to AWQ quantized version
27234fe

Alikestocode commited on

Update Qwen model to use AWQ quantized version
d02a9d8

Alikestocode commited on

Update Gemma model to use AWQ quantized version
b36a0b0

Alikestocode commited on

Lower Gemma AWQ group size to 16
f8c20fd

Alikestocode commited on

Use model-specific AWQ configs (Gemma group_size=64)
98c0d4d

Alikestocode commited on

Remove processor arg from oneshot to avoid tokenizer conflict
4b47dea

Alikestocode commited on

Pass processor to oneshot for text-only models
56105b6

Alikestocode commited on

Fix processor error: pass tokenizer explicitly for text-only models
5bf2e9f

Alikestocode commited on

Add stage='default' parameter to oneshot() call
5bceece

Alikestocode commited on

Fix oneshot() API: use Recipe and Dataset objects
671d7f9

Alikestocode commited on

Fix oneshot() API: use correct parameter names from documentation
35d8225

Alikestocode commited on

Try alternative oneshot() API parameter names
e9f4b24

Alikestocode commited on

Fix oneshot() API: use correct parameter names
7e31310

Alikestocode commited on

Fix cell 7: convert markdown note to Python comment
4001f22

Alikestocode commited on

Remove duplicate build_awq_modifier_config - keep existing correct version
5bf02e9

Alikestocode commited on

Add build_awq_modifier_config helper using QuantizationScheme objects
cf9ed91

Alikestocode commited on

Fix quantization_config structure: use correct AWQ format
3f08592

Alikestocode commited on

Fix modifiers initialization: ensure it's always defined
f3114ba

Alikestocode commited on

Fix BaseQuantizationConfig import: add fallback approaches
a49281c

Alikestocode commited on

Add local test script for quantization notebook validation
011c926

Alikestocode commited on

Fix QuantizationConfig: use config_groups with BaseQuantizationConfig
ecf6a69

Alikestocode commited on

Fix AWQModifier: use quantization_config with num_bits
022b2da

Alikestocode commited on

Add note about restarting kernel if AWQModifier errors occur
33a1d2e

Alikestocode commited on

Simplify AWQModifier usage - remove try/except wrapper
e08f8c4

Alikestocode commited on

Fix AWQModifier parameters - use default configuration
cef8ecd

Alikestocode commited on

Fix delete_revisions import with fallback cache cleanup
7a2a590

Alikestocode commited on

Fix delete_revisions import - use fallback cache cleanup method
4be72e0

Alikestocode commited on

Fix AWQModifier import path: use modifiers.awq instead of modifiers.quantization
f0033ab

Alikestocode commited on

Fix LLM Compressor package name: llmcompressor (no hyphen)
2326498

Alikestocode commited on

Remove duplicate LLM Compressor section - now primary method
d4bc333

Alikestocode commited on

Replace AutoAWQ with LLM Compressor (vLLM native) in Colab notebook
ae07f77

Alikestocode commited on

Add advanced vLLM and LLM Compressor optimizations
808203f

Alikestocode commited on

Add disk space cleanup after quantization in Colab notebook
24107f3

Alikestocode commited on

Fix linter error: use %pip instead of !pip in Colab notebook
2dff966

Alikestocode commited on

Add Colab notebook for AWQ quantization of router models
a79bc8f

Alikestocode commited on

Clarify LLM Compressor optional status - vLLM has native AWQ support
b2bf767

Alikestocode commited on