Instructions to use varuneshv/VCoder-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use varuneshv/VCoder-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="varuneshv/VCoder-GGUF", filename="qwen2.5-coder-3b-instruct.Q8_0.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use varuneshv/VCoder-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf varuneshv/VCoder-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf varuneshv/VCoder-GGUF:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf varuneshv/VCoder-GGUF:Q8_0 # Run inference directly in the terminal: llama-cli -hf varuneshv/VCoder-GGUF:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf varuneshv/VCoder-GGUF:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf varuneshv/VCoder-GGUF:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf varuneshv/VCoder-GGUF:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf varuneshv/VCoder-GGUF:Q8_0
Use Docker
docker model run hf.co/varuneshv/VCoder-GGUF:Q8_0
- LM Studio
- Jan
- vLLM
How to use varuneshv/VCoder-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "varuneshv/VCoder-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "varuneshv/VCoder-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/varuneshv/VCoder-GGUF:Q8_0
- Ollama
How to use varuneshv/VCoder-GGUF with Ollama:
ollama run hf.co/varuneshv/VCoder-GGUF:Q8_0
- Unsloth Studio
How to use varuneshv/VCoder-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for varuneshv/VCoder-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for varuneshv/VCoder-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for varuneshv/VCoder-GGUF to start chatting
- Pi
How to use varuneshv/VCoder-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf varuneshv/VCoder-GGUF:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "varuneshv/VCoder-GGUF:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use varuneshv/VCoder-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf varuneshv/VCoder-GGUF:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default varuneshv/VCoder-GGUF:Q8_0
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use varuneshv/VCoder-GGUF with Docker Model Runner:
docker model run hf.co/varuneshv/VCoder-GGUF:Q8_0
- Lemonade
How to use varuneshv/VCoder-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull varuneshv/VCoder-GGUF:Q8_0
Run and chat with the model
lemonade run user.VCoder-GGUF-Q8_0
List all available models
lemonade list
VCoder
VCoder is a Python-focused coding assistant fine-tuned from Qwen2.5-Coder-3B-Instruct using LoRA and Unsloth.
The model was trained on 15,000 Python instruction-response examples from the Python Code Instructions 15K dataset and optimized for Python code generation, problem solving, debugging, and algorithm implementation.
Model Details
| Attribute | Value |
|---|---|
| Base Model | Qwen2.5-Coder-3B-Instruct |
| Fine-Tuning Method | LoRA |
| Framework | Unsloth |
| Dataset | Python Code Instructions 15K |
| Training Samples | 15,000 |
| GPU | NVIDIA Tesla T4 |
| Quantized Format | GGUF Q8_0 |
| Primary Language | Python |
Training Pipeline
Training was performed incrementally:
| Stage | Samples |
|---|---|
| Stage 1 | 0 - 5,000 |
| Stage 2 | 5,000 - 10,000 |
| Stage 3 | 10,000 - 15,000 |
The model was trained using parameter-efficient fine-tuning (LoRA), allowing adaptation of the base model while keeping computational requirements low.
Benchmark Results
HumanEval Comparison
The model was evaluated against the original Qwen2.5-Coder-3B-Instruct on HumanEval coding tasks.
| Model | Pass@1 |
|---|---|
| Base Qwen2.5-Coder-3B | 61.0% |
| VCoder | 68.0% |
Improvement
+7.0% Pass@1 improvement
This demonstrates that the fine-tuned model performs better on Python coding tasks than the original base model.
Example Usage
Python
prompt = """
### Instruction:
Write a Python function to reverse a string.
### Input:
### Response:
"""
Example Output
def reverse_string(text):
return text[::-1]
Supported Tasks
- Python Code Generation
- Algorithm Design
- Data Structures
- Debugging
- Code Refactoring
- Coding Interview Questions
- Competitive Programming
- Function Completion
GGUF Usage
Compatible with:
- Ollama
- LM Studio
- llama.cpp
Ollama
FROM ./VCoder.Q8_0.gguf
Build:
ollama create vcoder -f Modelfile
Run:
ollama run vcoder
Training Dataset
Dataset used:
Python Code Instructions 15K
The dataset contains instruction-response pairs focused on Python programming tasks including:
- Function generation
- Data manipulation
- Algorithms
- Debugging
- Problem solving
Limitations
- Primarily optimized for Python.
- Benchmark performed on a subset of HumanEval tasks.
- May generate incorrect code for highly specialized domains.
- Should not be used as the sole source of production-critical code.
Acknowledgements
- Qwen Team for Qwen2.5-Coder
- Unsloth for efficient fine-tuning
- Hugging Face
- OpenAI HumanEval Benchmark
Citation
@misc{vcoder2026,
title={VCoder: Python Code Generation Model},
author={Varunesh V, Prawin R K, Sarguru N},
year={2026},
base_model={Qwen2.5-Coder-3B-Instruct}
}
Github : https://github.com/varunesh-v Mail : [email protected]
- Downloads last month
- -
8-bit