stelterlab
/

DeepSeek-R1-Distill-Qwen-14B-AWQ

How to reduce "Think" responses when using vLLM for inference?

#1 opened 8 months ago by