Spaces:
Paused
Paused
aeb56
commited on
Commit
Β·
69cd0c5
1
Parent(s):
3e60f36
Disable chat/inference, focus on evaluation only
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
-
title: Kimi 48B Fine-tuned -
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
| 6 |
sdk: docker
|
|
@@ -10,9 +10,9 @@ app_port: 7860
|
|
| 10 |
suggested_hardware: l4x4
|
| 11 |
---
|
| 12 |
|
| 13 |
-
#
|
| 14 |
|
| 15 |
-
|
| 16 |
|
| 17 |
## Model Information
|
| 18 |
|
|
@@ -20,53 +20,57 @@ High-performance inference Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct
|
|
| 20 |
- **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
|
| 21 |
- **Parameters:** 48 Billion
|
| 22 |
- **Fine-tuning:** QLoRA on attention layers
|
| 23 |
-
- **
|
| 24 |
|
| 25 |
## Features
|
| 26 |
|
| 27 |
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
| 37 |
|
| 38 |
-
βοΈ **
|
| 39 |
-
-
|
| 40 |
-
-
|
| 41 |
-
-
|
| 42 |
-
-
|
| 43 |
|
| 44 |
## Usage
|
| 45 |
|
| 46 |
### Quick Start
|
| 47 |
|
| 48 |
-
1. **
|
| 49 |
-
- Click "π
|
| 50 |
-
- Wait
|
| 51 |
-
-
|
|
|
|
| 52 |
|
| 53 |
-
2. **
|
| 54 |
-
-
|
| 55 |
-
-
|
| 56 |
-
-
|
|
|
|
|
|
|
| 57 |
|
| 58 |
-
3. **
|
| 59 |
-
-
|
| 60 |
-
-
|
| 61 |
-
-
|
| 62 |
|
| 63 |
-
## Why
|
| 64 |
|
| 65 |
-
|
| 66 |
-
- **
|
| 67 |
-
- **
|
| 68 |
-
- **
|
| 69 |
-
- **
|
| 70 |
|
| 71 |
## Hardware Requirements
|
| 72 |
|
|
@@ -83,42 +87,33 @@ vLLM is a high-throughput and memory-efficient inference engine:
|
|
| 83 |
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
|
| 84 |
- **Training:** Attention layers only
|
| 85 |
|
| 86 |
-
###
|
| 87 |
|
| 88 |
-
**
|
| 89 |
-
-
|
| 90 |
-
-
|
| 91 |
-
-
|
|
|
|
| 92 |
|
| 93 |
-
**
|
| 94 |
-
-
|
| 95 |
-
-
|
|
|
|
|
|
|
| 96 |
|
| 97 |
-
**
|
| 98 |
-
-
|
| 99 |
-
-
|
|
|
|
|
|
|
| 100 |
|
| 101 |
-
##
|
| 102 |
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
```bash
|
| 106 |
-
curl -X POST "http://localhost:8000/v1/chat/completions" \
|
| 107 |
-
-H "Content-Type: application/json" \
|
| 108 |
-
--data '{
|
| 109 |
-
"model": "optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune",
|
| 110 |
-
"messages": [
|
| 111 |
-
{"role": "user", "content": "Hello!"}
|
| 112 |
-
]
|
| 113 |
-
}'
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
## Support
|
| 117 |
-
|
| 118 |
-
- [vLLM Documentation](https://docs.vllm.ai/)
|
| 119 |
- [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
|
|
|
|
| 120 |
- [Transformers Documentation](https://huggingface.co/docs/transformers)
|
| 121 |
|
| 122 |
---
|
| 123 |
|
| 124 |
-
**Powered by
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Kimi 48B Fine-tuned - Evaluation
|
| 3 |
+
emoji: π
|
| 4 |
colorFrom: purple
|
| 5 |
colorTo: blue
|
| 6 |
sdk: docker
|
|
|
|
| 10 |
suggested_hardware: l4x4
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# π Kimi Linear 48B A3B Instruct - Evaluation
|
| 14 |
|
| 15 |
+
Model evaluation Space for the fine-tuned Kimi-Linear-48B-A3B-Instruct model. **Chat/inference functionality is currently disabled** - this Space focuses on running benchmarks and evaluations only.
|
| 16 |
|
| 17 |
## Model Information
|
| 18 |
|
|
|
|
| 20 |
- **Base Model:** [moonshotai/Kimi-Linear-48B-A3B-Instruct](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
|
| 21 |
- **Parameters:** 48 Billion
|
| 22 |
- **Fine-tuning:** QLoRA on attention layers
|
| 23 |
+
- **Evaluation Framework:** LM Evaluation Harness
|
| 24 |
|
| 25 |
## Features
|
| 26 |
|
| 27 |
+
π **Model Evaluation**
|
| 28 |
+
- LM Evaluation Harness integration
|
| 29 |
+
- Multiple benchmark support (ARC-Challenge, TruthfulQA, Winogrande)
|
| 30 |
+
- Automated testing and reporting
|
| 31 |
+
- Results saved for analysis
|
| 32 |
|
| 33 |
+
β‘ **High-Performance**
|
| 34 |
+
- Multi-GPU model loading
|
| 35 |
+
- Optimized memory distribution
|
| 36 |
+
- bfloat16 precision
|
| 37 |
+
- Supports 48B parameter models
|
| 38 |
|
| 39 |
+
βοΈ **Easy to Use**
|
| 40 |
+
- Simple Gradio interface
|
| 41 |
+
- One-click model loading
|
| 42 |
+
- Select benchmarks via checkboxes
|
| 43 |
+
- Real-time progress updates
|
| 44 |
|
| 45 |
## Usage
|
| 46 |
|
| 47 |
### Quick Start
|
| 48 |
|
| 49 |
+
1. **Load Model**
|
| 50 |
+
- Click "π Load Model" button in the Controls tab
|
| 51 |
+
- Wait 5-10 minutes for model initialization
|
| 52 |
+
- Model will be distributed across available GPUs
|
| 53 |
+
- Look for "β
Model loaded successfully"
|
| 54 |
|
| 55 |
+
2. **Run Evaluation**
|
| 56 |
+
- Go to the "π Evaluation" tab
|
| 57 |
+
- Select benchmarks to run (ARC-Challenge, TruthfulQA, Winogrande)
|
| 58 |
+
- Click "π Start Evaluation"
|
| 59 |
+
- Wait 30-60 minutes for results
|
| 60 |
+
- Results will be displayed and saved to `/tmp/eval_results_[timestamp]/`
|
| 61 |
|
| 62 |
+
3. **View Results**
|
| 63 |
+
- Evaluation results include metrics for each benchmark
|
| 64 |
+
- Results are automatically formatted and displayed
|
| 65 |
+
- Full results JSON files are saved for detailed analysis
|
| 66 |
|
| 67 |
+
## Why LM Evaluation Harness?
|
| 68 |
|
| 69 |
+
The LM Evaluation Harness is a standard framework for evaluating language models:
|
| 70 |
+
- **Standardized:** Consistent benchmarks across models
|
| 71 |
+
- **Comprehensive:** Wide variety of tasks and metrics
|
| 72 |
+
- **Reproducible:** Deterministic evaluation results
|
| 73 |
+
- **Trusted:** Used by major research organizations
|
| 74 |
|
| 75 |
## Hardware Requirements
|
| 76 |
|
|
|
|
| 87 |
- **Target Modules:** q_proj, k_proj, v_proj, o_proj
|
| 88 |
- **Training:** Attention layers only
|
| 89 |
|
| 90 |
+
### Benchmark Details
|
| 91 |
|
| 92 |
+
**ARC-Challenge**
|
| 93 |
+
- Advanced Reasoning Challenge
|
| 94 |
+
- 1,172 multiple-choice science questions
|
| 95 |
+
- Tests complex reasoning and knowledge
|
| 96 |
+
- Metrics: accuracy, accuracy_norm
|
| 97 |
|
| 98 |
+
**TruthfulQA**
|
| 99 |
+
- Tests model's truthfulness
|
| 100 |
+
- Multiple-choice format (mc2)
|
| 101 |
+
- Evaluates factual correctness
|
| 102 |
+
- Metrics: accuracy, bleu, rouge
|
| 103 |
|
| 104 |
+
**Winogrande**
|
| 105 |
+
- Common sense reasoning
|
| 106 |
+
- Pronoun resolution tasks
|
| 107 |
+
- 1,267 test questions
|
| 108 |
+
- Metrics: accuracy
|
| 109 |
|
| 110 |
+
## Support & Resources
|
| 111 |
|
| 112 |
+
- [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
- [Model Page](https://huggingface.co/optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune)
|
| 114 |
+
- [Base Model Page](https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct)
|
| 115 |
- [Transformers Documentation](https://huggingface.co/docs/transformers)
|
| 116 |
|
| 117 |
---
|
| 118 |
|
| 119 |
+
**Powered by LM Evaluation Harness** π | Built with β€οΈ
|
app.py
CHANGED
|
@@ -65,7 +65,7 @@ class ChatBot:
|
|
| 65 |
else:
|
| 66 |
device_info = ""
|
| 67 |
|
| 68 |
-
yield f"β
**Model loaded successfully!**{device_info}\n\nYou can now use
|
| 69 |
|
| 70 |
except Exception as e:
|
| 71 |
self.loaded = False
|
|
@@ -220,11 +220,13 @@ class ChatBot:
|
|
| 220 |
bot = ChatBot()
|
| 221 |
|
| 222 |
# UI with Tabs
|
| 223 |
-
with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
|
| 224 |
gr.Markdown("""
|
| 225 |
-
#
|
| 226 |
|
| 227 |
**Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
|
|
|
|
|
|
|
| 228 |
""")
|
| 229 |
|
| 230 |
# Show GPU info
|
|
@@ -244,11 +246,14 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
|
|
| 244 |
gr.Markdown("""
|
| 245 |
### βΉοΈ Instructions
|
| 246 |
1. **Click "Load Model"** - Takes 5-10 minutes
|
| 247 |
-
2. **Use
|
| 248 |
-
|
|
|
|
| 249 |
""")
|
| 250 |
|
| 251 |
-
# Tab 2: Chat
|
|
|
|
|
|
|
| 252 |
with gr.Tab("π¬ Chat"):
|
| 253 |
with gr.Row():
|
| 254 |
with gr.Column(scale=1):
|
|
@@ -272,6 +277,7 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
|
|
| 272 |
send = gr.Button("Send", variant="primary", scale=1)
|
| 273 |
|
| 274 |
clear = gr.Button("Clear Chat")
|
|
|
|
| 275 |
|
| 276 |
# Tab 3: Evaluation
|
| 277 |
with gr.Tab("π Evaluation"):
|
|
@@ -319,6 +325,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
|
|
| 319 |
# Events
|
| 320 |
load_btn.click(bot.load_model, outputs=status)
|
| 321 |
|
|
|
|
|
|
|
|
|
|
| 322 |
def respond(message, history, system, max_tok, temp, top):
|
| 323 |
bot_message = bot.chat(message, history, system, max_tok, temp, top)
|
| 324 |
history.append((message, bot_message))
|
|
@@ -327,7 +336,9 @@ with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned") as demo:
|
|
| 327 |
msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
|
| 328 |
send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
|
| 329 |
clear.click(lambda: None, None, chatbot)
|
|
|
|
| 330 |
|
|
|
|
| 331 |
eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
|
| 332 |
|
| 333 |
if __name__ == "__main__":
|
|
|
|
| 65 |
else:
|
| 66 |
device_info = ""
|
| 67 |
|
| 68 |
+
yield f"β
**Model loaded successfully!**{device_info}\n\nYou can now use the Evaluation tab."
|
| 69 |
|
| 70 |
except Exception as e:
|
| 71 |
self.loaded = False
|
|
|
|
| 220 |
bot = ChatBot()
|
| 221 |
|
| 222 |
# UI with Tabs
|
| 223 |
+
with gr.Blocks(theme=gr.themes.Soft(), title="Kimi 48B Fine-tuned - Evaluation") as demo:
|
| 224 |
gr.Markdown("""
|
| 225 |
+
# π Kimi Linear 48B A3B - Evaluation
|
| 226 |
|
| 227 |
**Model:** `optiviseapp/kimi-linear-48b-a3b-instruct-fine-tune`
|
| 228 |
+
|
| 229 |
+
**This Space is configured for model evaluation only. Chat/inference is disabled.**
|
| 230 |
""")
|
| 231 |
|
| 232 |
# Show GPU info
|
|
|
|
| 246 |
gr.Markdown("""
|
| 247 |
### βΉοΈ Instructions
|
| 248 |
1. **Click "Load Model"** - Takes 5-10 minutes
|
| 249 |
+
2. **Use Evaluation tab** - To run benchmarks
|
| 250 |
+
|
| 251 |
+
**Note:** Chat/inference functionality is currently disabled. This Space focuses on model evaluation only.
|
| 252 |
""")
|
| 253 |
|
| 254 |
+
# Tab 2: Chat - DISABLED
|
| 255 |
+
# Uncomment this section to re-enable chat functionality
|
| 256 |
+
"""
|
| 257 |
with gr.Tab("π¬ Chat"):
|
| 258 |
with gr.Row():
|
| 259 |
with gr.Column(scale=1):
|
|
|
|
| 277 |
send = gr.Button("Send", variant="primary", scale=1)
|
| 278 |
|
| 279 |
clear = gr.Button("Clear Chat")
|
| 280 |
+
"""
|
| 281 |
|
| 282 |
# Tab 3: Evaluation
|
| 283 |
with gr.Tab("π Evaluation"):
|
|
|
|
| 325 |
# Events
|
| 326 |
load_btn.click(bot.load_model, outputs=status)
|
| 327 |
|
| 328 |
+
# Chat event handlers - DISABLED
|
| 329 |
+
# Uncomment these lines to re-enable chat functionality
|
| 330 |
+
"""
|
| 331 |
def respond(message, history, system, max_tok, temp, top):
|
| 332 |
bot_message = bot.chat(message, history, system, max_tok, temp, top)
|
| 333 |
history.append((message, bot_message))
|
|
|
|
| 336 |
msg.submit(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
|
| 337 |
send.click(respond, [msg, chatbot, system_prompt, max_tokens, temperature, top_p], [chatbot, msg])
|
| 338 |
clear.click(lambda: None, None, chatbot)
|
| 339 |
+
"""
|
| 340 |
|
| 341 |
+
# Evaluation event handler
|
| 342 |
eval_btn.click(bot.run_evaluation, inputs=tasks, outputs=eval_results)
|
| 343 |
|
| 344 |
if __name__ == "__main__":
|