aeb56 commited on
Commit
a951334
·
1 Parent(s): 7a80ad4

Initial commit: LoRA model merger

Browse files
Files changed (5) hide show
  1. .gitignore +69 -0
  2. Dockerfile +46 -0
  3. README.md +59 -6
  4. app.py +410 -0
  5. requirements.txt +12 -0
.gitignore ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+
23
+ # Virtual environments
24
+ venv/
25
+ ENV/
26
+ env/
27
+ .venv
28
+
29
+ # Models and cache
30
+ models/
31
+ merged_model/
32
+ cache/
33
+ *.bin
34
+ *.safetensors
35
+ *.gguf
36
+ *.pth
37
+ *.pt
38
+
39
+ # Hugging Face
40
+ .cache/
41
+ huggingface/
42
+
43
+ # IDEs
44
+ .vscode/
45
+ .idea/
46
+ *.swp
47
+ *.swo
48
+ *~
49
+
50
+ # OS
51
+ .DS_Store
52
+ Thumbs.db
53
+
54
+ # Logs
55
+ *.log
56
+ logs/
57
+
58
+ # Environment variables
59
+ .env
60
+ .env.local
61
+
62
+ # Jupyter
63
+ .ipynb_checkpoints/
64
+
65
+ # Temporary files
66
+ tmp/
67
+ temp/
68
+ *.tmp
69
+
Dockerfile ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
2
+
3
+ # Set environment variables
4
+ ENV DEBIAN_FRONTEND=noninteractive
5
+ ENV PYTHONUNBUFFERED=1
6
+ ENV CUDA_HOME=/usr/local/cuda
7
+ ENV PATH="${CUDA_HOME}/bin:${PATH}"
8
+ ENV LD_LIBRARY_PATH="${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}"
9
+
10
+ # Install system dependencies
11
+ RUN apt-get update && apt-get install -y \
12
+ python3.10 \
13
+ python3-pip \
14
+ git \
15
+ git-lfs \
16
+ wget \
17
+ && rm -rf /var/lib/apt/lists/*
18
+
19
+ # Upgrade pip
20
+ RUN pip3 install --upgrade pip
21
+
22
+ # Set working directory
23
+ WORKDIR /app
24
+
25
+ # Copy requirements first for better caching
26
+ COPY requirements.txt .
27
+
28
+ # Install Python dependencies
29
+ RUN pip3 install --no-cache-dir -r requirements.txt
30
+
31
+ # Copy application files
32
+ COPY . .
33
+
34
+ # Create directories for models
35
+ RUN mkdir -p /app/models /app/merged_model
36
+
37
+ # Expose port for Gradio
38
+ EXPOSE 7860
39
+
40
+ # Set HuggingFace cache directory
41
+ ENV HF_HOME=/app/cache
42
+ ENV TRANSFORMERS_CACHE=/app/cache
43
+
44
+ # Run the application
45
+ CMD ["python3", "app.py"]
46
+
README.md CHANGED
@@ -1,11 +1,64 @@
1
  ---
2
- title: Fnmodel
3
- emoji: 📉
4
- colorFrom: red
5
- colorTo: pink
6
  sdk: docker
7
  pinned: false
8
- license: unknown
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: LoRA Model Merger
3
+ emoji: 🔗
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
+ license: apache-2.0
9
+ app_port: 7860
10
  ---
11
 
12
+ # 🔗 LoRA Model Merger
13
+
14
+ A Hugging Face Space for merging fine-tuned LoRA adapters with base models.
15
+
16
+ ## Overview
17
+
18
+ This Space provides an easy-to-use interface for merging LoRA (Low-Rank Adaptation) fine-tuned models with their base models. Specifically designed for:
19
+
20
+ - **Base Model:** `moonshotai/Kimi-Linear-48B-A3B-Instruct`
21
+ - **LoRA Adapters:** `Optivise/kimi-linear-48b-a3b-instruct-qlora-fine-tuned`
22
+
23
+ ## Features
24
+
25
+ ✅ **Easy Model Merging** - Simple UI to merge LoRA adapters with base model
26
+ ✅ **Built-in Testing** - Test your merged model with custom prompts
27
+ ✅ **Hub Integration** - Upload merged models directly to Hugging Face Hub
28
+ ✅ **GPU Optimized** - Designed for 4xL40S GPU setup
29
+
30
+ ## Usage
31
+
32
+ 1. **Merge Models**: Provide your Hugging Face token and click "Start Merge Process"
33
+ 2. **Test Inference**: Test the merged model with sample prompts
34
+ 3. **Upload to Hub**: Optionally upload the merged model to your Hugging Face account
35
+
36
+ ## Requirements
37
+
38
+ - **Hardware:** 4x NVIDIA L40S GPUs (or equivalent with ~192GB VRAM)
39
+ - **Software:** Docker, CUDA 12.1+
40
+ - **Access:** Valid Hugging Face token for model access
41
+
42
+ ## Technical Details
43
+
44
+ The merge process:
45
+ 1. Downloads the base model (~48B parameters)
46
+ 2. Loads LoRA adapter weights
47
+ 3. Merges adapters into base model using PEFT
48
+ 4. Saves the unified model for inference
49
+
50
+ ## Notes
51
+
52
+ - Merge process can take 10-30 minutes depending on network speed
53
+ - Merged model will be approximately the same size as the base model
54
+ - Ensure you have appropriate access rights to both base and LoRA models
55
+
56
+ ## Support
57
+
58
+ For issues or questions:
59
+ - [PEFT Documentation](https://huggingface.co/docs/peft)
60
+ - [Transformers Documentation](https://huggingface.co/docs/transformers)
61
+
62
+ ---
63
+
64
+ Built with ❤️ using Transformers, PEFT, and Gradio
app.py ADDED
@@ -0,0 +1,410 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import torch
3
+ import gradio as gr
4
+ from transformers import AutoModelForCausalLM, AutoTokenizer
5
+ from peft import PeftModel, PeftConfig
6
+ import gc
7
+ from huggingface_hub import login, snapshot_download
8
+ import logging
9
+ from datetime import datetime
10
+
11
+ # Configure logging
12
+ logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
13
+ logger = logging.getLogger(__name__)
14
+
15
+ # Constants
16
+ BASE_MODEL_NAME = "moonshotai/Kimi-Linear-48B-A3B-Instruct"
17
+ LORA_MODEL_NAME = "Optivise/kimi-linear-48b-a3b-instruct-qlora-fine-tuned"
18
+ OUTPUT_DIR = "/app/merged_model"
19
+
20
+ class ModelMerger:
21
+ def __init__(self):
22
+ self.base_model = None
23
+ self.tokenizer = None
24
+ self.merged_model = None
25
+
26
+ def clear_memory(self):
27
+ """Clear GPU memory"""
28
+ if self.base_model is not None:
29
+ del self.base_model
30
+ if self.merged_model is not None:
31
+ del self.merged_model
32
+ gc.collect()
33
+ torch.cuda.empty_cache()
34
+
35
+ def login_huggingface(self, token):
36
+ """Login to Hugging Face"""
37
+ try:
38
+ login(token=token)
39
+ logger.info("Successfully logged in to Hugging Face")
40
+ return "✅ Successfully logged in to Hugging Face"
41
+ except Exception as e:
42
+ logger.error(f"Login failed: {str(e)}")
43
+ return f"❌ Login failed: {str(e)}"
44
+
45
+ def merge_models(self, hf_token, progress=gr.Progress()):
46
+ """Merge LoRA adapters with base model"""
47
+ try:
48
+ # Login to HF
49
+ if hf_token:
50
+ progress(0.05, desc="Logging in to Hugging Face...")
51
+ login(token=hf_token)
52
+ logger.info("Logged in to Hugging Face")
53
+
54
+ # Clear any existing models from memory
55
+ progress(0.1, desc="Clearing GPU memory...")
56
+ self.clear_memory()
57
+
58
+ # Load tokenizer
59
+ progress(0.15, desc="Loading tokenizer...")
60
+ logger.info("Loading tokenizer...")
61
+ self.tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_NAME, trust_remote_code=True)
62
+
63
+ # Load base model
64
+ progress(0.25, desc="Loading base model (this may take several minutes)...")
65
+ logger.info(f"Loading base model: {BASE_MODEL_NAME}")
66
+ self.base_model = AutoModelForCausalLM.from_pretrained(
67
+ BASE_MODEL_NAME,
68
+ torch_dtype=torch.bfloat16,
69
+ device_map="auto",
70
+ trust_remote_code=True,
71
+ low_cpu_mem_usage=True,
72
+ )
73
+ logger.info("Base model loaded successfully")
74
+
75
+ # Load LoRA configuration
76
+ progress(0.50, desc="Loading LoRA adapters...")
77
+ logger.info(f"Loading LoRA adapters from: {LORA_MODEL_NAME}")
78
+
79
+ # Merge LoRA weights
80
+ self.merged_model = PeftModel.from_pretrained(
81
+ self.base_model,
82
+ LORA_MODEL_NAME,
83
+ torch_dtype=torch.bfloat16,
84
+ )
85
+ logger.info("LoRA adapters loaded successfully")
86
+
87
+ progress(0.70, desc="Merging LoRA weights with base model...")
88
+ logger.info("Merging LoRA weights...")
89
+ self.merged_model = self.merged_model.merge_and_unload()
90
+ logger.info("Models merged successfully")
91
+
92
+ # Save merged model
93
+ progress(0.85, desc="Saving merged model...")
94
+ logger.info(f"Saving merged model to: {OUTPUT_DIR}")
95
+ os.makedirs(OUTPUT_DIR, exist_ok=True)
96
+
97
+ self.merged_model.save_pretrained(
98
+ OUTPUT_DIR,
99
+ safe_serialization=True,
100
+ max_shard_size="5GB"
101
+ )
102
+ self.tokenizer.save_pretrained(OUTPUT_DIR)
103
+
104
+ progress(1.0, desc="Complete!")
105
+ logger.info("Merge completed successfully")
106
+
107
+ # Get model info
108
+ total_params = sum(p.numel() for p in self.merged_model.parameters())
109
+ trainable_params = sum(p.numel() for p in self.merged_model.parameters() if p.requires_grad)
110
+
111
+ result_message = f"""
112
+ ✅ **Merge Completed Successfully!**
113
+
114
+ **Model Information:**
115
+ - Base Model: `{BASE_MODEL_NAME}`
116
+ - LoRA Adapters: `{LORA_MODEL_NAME}`
117
+ - Output Directory: `{OUTPUT_DIR}`
118
+ - Total Parameters: {total_params:,}
119
+ - Trainable Parameters: {trainable_params:,}
120
+ - Timestamp: {datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
121
+
122
+ **Next Steps:**
123
+ 1. The merged model is saved in the container at `/app/merged_model`
124
+ 2. You can now test the model using the inference tab
125
+ 3. To upload to Hugging Face, use the upload section
126
+ """
127
+
128
+ return result_message
129
+
130
+ except Exception as e:
131
+ logger.error(f"Error during merge: {str(e)}", exc_info=True)
132
+ self.clear_memory()
133
+ return f"❌ **Error during merge:**\n\n{str(e)}\n\nPlease check the logs for more details."
134
+
135
+ def test_inference(self, prompt, max_length, temperature, top_p, progress=gr.Progress()):
136
+ """Test the merged model with a prompt"""
137
+ try:
138
+ if self.merged_model is None:
139
+ return "❌ Please merge the models first before testing inference."
140
+
141
+ progress(0.3, desc="Tokenizing input...")
142
+ inputs = self.tokenizer(prompt, return_tensors="pt").to(self.merged_model.device)
143
+
144
+ progress(0.5, desc="Generating response...")
145
+ with torch.no_grad():
146
+ outputs = self.merged_model.generate(
147
+ **inputs,
148
+ max_length=max_length,
149
+ temperature=temperature,
150
+ top_p=top_p,
151
+ do_sample=True,
152
+ pad_token_id=self.tokenizer.eos_token_id,
153
+ )
154
+
155
+ progress(0.9, desc="Decoding output...")
156
+ response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
157
+
158
+ progress(1.0, desc="Complete!")
159
+ return response
160
+
161
+ except Exception as e:
162
+ logger.error(f"Error during inference: {str(e)}", exc_info=True)
163
+ return f"❌ **Error during inference:**\n\n{str(e)}"
164
+
165
+ def upload_to_hub(self, repo_name, hf_token, private, progress=gr.Progress()):
166
+ """Upload merged model to Hugging Face Hub"""
167
+ try:
168
+ if self.merged_model is None:
169
+ return "❌ Please merge the models first before uploading."
170
+
171
+ if not repo_name:
172
+ return "❌ Please provide a repository name."
173
+
174
+ if not hf_token:
175
+ return "❌ Please provide a Hugging Face token."
176
+
177
+ progress(0.1, desc="Logging in...")
178
+ login(token=hf_token)
179
+
180
+ progress(0.3, desc="Uploading model to Hugging Face Hub...")
181
+ logger.info(f"Uploading to: {repo_name}")
182
+
183
+ self.merged_model.push_to_hub(
184
+ repo_name,
185
+ private=private,
186
+ safe_serialization=True,
187
+ max_shard_size="5GB"
188
+ )
189
+
190
+ progress(0.8, desc="Uploading tokenizer...")
191
+ self.tokenizer.push_to_hub(repo_name, private=private)
192
+
193
+ progress(1.0, desc="Complete!")
194
+ logger.info("Upload completed successfully")
195
+
196
+ repo_url = f"https://huggingface.co/{repo_name}"
197
+ return f"✅ **Successfully uploaded to Hugging Face Hub!**\n\nRepository: [{repo_name}]({repo_url})"
198
+
199
+ except Exception as e:
200
+ logger.error(f"Error during upload: {str(e)}", exc_info=True)
201
+ return f"❌ **Error during upload:**\n\n{str(e)}"
202
+
203
+ # Initialize merger
204
+ merger = ModelMerger()
205
+
206
+ # Create Gradio interface
207
+ with gr.Blocks(theme=gr.themes.Soft(), title="LoRA Model Merger") as demo:
208
+ gr.Markdown("""
209
+ # 🔗 LoRA Model Merger
210
+
211
+ Merge your fine-tuned LoRA adapters with the base model for the **Kimi-Linear-48B-A3B-Instruct** model.
212
+
213
+ **Models:**
214
+ - **Base Model:** `moonshotai/Kimi-Linear-48B-A3B-Instruct`
215
+ - **LoRA Adapters:** `Optivise/kimi-linear-48b-a3b-instruct-qlora-fine-tuned`
216
+
217
+ **Hardware:** Running on 4xL40S GPUs
218
+ """)
219
+
220
+ with gr.Tabs():
221
+ # Tab 1: Merge Models
222
+ with gr.Tab("🔄 Merge Models"):
223
+ gr.Markdown("""
224
+ ### Step 1: Merge LoRA Adapters with Base Model
225
+
226
+ This process will:
227
+ 1. Download the base model and LoRA adapters
228
+ 2. Merge the LoRA weights into the base model
229
+ 3. Save the merged model for inference
230
+
231
+ ⚠️ **Note:** This process may take 10-30 minutes depending on model size and network speed.
232
+ """)
233
+
234
+ with gr.Row():
235
+ hf_token_merge = gr.Textbox(
236
+ label="Hugging Face Token",
237
+ placeholder="hf_...",
238
+ type="password",
239
+ info="Required for accessing private models or avoiding rate limits"
240
+ )
241
+
242
+ merge_button = gr.Button("🚀 Start Merge Process", variant="primary", size="lg")
243
+ merge_output = gr.Markdown(label="Merge Status")
244
+
245
+ merge_button.click(
246
+ fn=merger.merge_models,
247
+ inputs=[hf_token_merge],
248
+ outputs=merge_output
249
+ )
250
+
251
+ # Tab 2: Test Inference
252
+ with gr.Tab("🧪 Test Inference"):
253
+ gr.Markdown("""
254
+ ### Step 2: Test the Merged Model
255
+
256
+ Test the merged model with custom prompts to verify it's working correctly.
257
+ """)
258
+
259
+ with gr.Row():
260
+ with gr.Column():
261
+ test_prompt = gr.Textbox(
262
+ label="Test Prompt",
263
+ placeholder="Enter your test prompt here...",
264
+ lines=5,
265
+ value="Hello, how are you today?"
266
+ )
267
+
268
+ with gr.Row():
269
+ max_length = gr.Slider(
270
+ minimum=50,
271
+ maximum=2048,
272
+ value=512,
273
+ step=1,
274
+ label="Max Length"
275
+ )
276
+ temperature = gr.Slider(
277
+ minimum=0.1,
278
+ maximum=2.0,
279
+ value=0.7,
280
+ step=0.1,
281
+ label="Temperature"
282
+ )
283
+ top_p = gr.Slider(
284
+ minimum=0.1,
285
+ maximum=1.0,
286
+ value=0.9,
287
+ step=0.05,
288
+ label="Top P"
289
+ )
290
+
291
+ test_button = gr.Button("🎯 Generate", variant="primary")
292
+
293
+ with gr.Column():
294
+ test_output = gr.Textbox(
295
+ label="Model Output",
296
+ lines=15,
297
+ interactive=False
298
+ )
299
+
300
+ test_button.click(
301
+ fn=merger.test_inference,
302
+ inputs=[test_prompt, max_length, temperature, top_p],
303
+ outputs=test_output
304
+ )
305
+
306
+ # Tab 3: Upload to Hub
307
+ with gr.Tab("☁️ Upload to Hub"):
308
+ gr.Markdown("""
309
+ ### Step 3: Upload Merged Model to Hugging Face Hub
310
+
311
+ Upload your merged model to Hugging Face Hub for easy sharing and deployment.
312
+ """)
313
+
314
+ with gr.Row():
315
+ with gr.Column():
316
+ repo_name = gr.Textbox(
317
+ label="Repository Name",
318
+ placeholder="username/model-name",
319
+ info="Format: username/model-name"
320
+ )
321
+ hf_token_upload = gr.Textbox(
322
+ label="Hugging Face Token (with write access)",
323
+ placeholder="hf_...",
324
+ type="password",
325
+ info="Token must have write permissions"
326
+ )
327
+ private_repo = gr.Checkbox(
328
+ label="Private Repository",
329
+ value=True,
330
+ info="Keep the model private"
331
+ )
332
+ upload_button = gr.Button("📤 Upload to Hub", variant="primary", size="lg")
333
+
334
+ with gr.Column():
335
+ upload_output = gr.Markdown(label="Upload Status")
336
+
337
+ upload_button.click(
338
+ fn=merger.upload_to_hub,
339
+ inputs=[repo_name, hf_token_upload, private_repo],
340
+ outputs=upload_output
341
+ )
342
+
343
+ # Tab 4: Info & Help
344
+ with gr.Tab("ℹ️ Info & Help"):
345
+ gr.Markdown("""
346
+ ## About This Space
347
+
348
+ This Space allows you to merge LoRA (Low-Rank Adaptation) fine-tuned models with their base models.
349
+
350
+ ### What is LoRA Merging?
351
+
352
+ LoRA is a parameter-efficient fine-tuning technique that adds small adapter layers to a pretrained model.
353
+ To use the fine-tuned model without the PEFT library overhead, you can merge these adapters back into
354
+ the base model, creating a single unified model.
355
+
356
+ ### Process Overview
357
+
358
+ 1. **Merge:** Combines the LoRA adapters with the base model
359
+ 2. **Test:** Verify the merged model works correctly with inference
360
+ 3. **Upload:** Share your merged model on Hugging Face Hub
361
+
362
+ ### Hardware Requirements
363
+
364
+ - **Current Setup:** 4x NVIDIA L40S GPUs (48GB VRAM each)
365
+ - **Model Size:** ~48B parameters
366
+ - **Memory Usage:** ~96-120GB VRAM during merge
367
+
368
+ ### Tips
369
+
370
+ - The merge process can take 10-30 minutes
371
+ - Make sure you have a valid Hugging Face token with appropriate permissions
372
+ - Test the model thoroughly before uploading to Hub
373
+ - Consider keeping the uploaded model private initially
374
+
375
+ ### Troubleshooting
376
+
377
+ **Out of Memory Errors:**
378
+ - The model is very large (48B parameters)
379
+ - Try restarting the Space to clear memory
380
+
381
+ **Authentication Errors:**
382
+ - Ensure your HF token has read access to the base model
383
+ - For private models, token must have appropriate permissions
384
+
385
+ **Slow Download/Upload:**
386
+ - Large models take time to transfer
387
+ - Network speed affects download/upload times
388
+
389
+ ### Support
390
+
391
+ For issues or questions, please check:
392
+ - [PEFT Documentation](https://huggingface.co/docs/peft)
393
+ - [Transformers Documentation](https://huggingface.co/docs/transformers)
394
+ """)
395
+
396
+ gr.Markdown("""
397
+ ---
398
+ **Note:** This Space requires significant computational resources. Ensure you have appropriate GPU allocation.
399
+ """)
400
+
401
+ # Launch the app
402
+ if __name__ == "__main__":
403
+ demo.queue(max_size=5)
404
+ demo.launch(
405
+ server_name="0.0.0.0",
406
+ server_port=7860,
407
+ share=False,
408
+ show_error=True
409
+ )
410
+
requirements.txt ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ torch>=2.1.0
2
+ transformers>=4.40.0
3
+ peft>=0.10.0
4
+ accelerate>=0.27.0
5
+ bitsandbytes>=0.42.0
6
+ gradio>=4.19.0
7
+ huggingface-hub>=0.20.0
8
+ sentencepiece>=0.1.99
9
+ protobuf>=3.20.0
10
+ safetensors>=0.4.0
11
+ scipy>=1.10.0
12
+