YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Introduction
The MiniCPM-MoE-8x2B is a decoder-only transformer-based generative language model.
The MiniCPM-MoE-8x2B adopt a Mixture-of-Experts(MoE) architecture, which has 8 experts per layer and activates 2 of 8 experts for each token.
Usage
This is a model version after instruction tuning but without other rlhf methods. Chat template is automatically applied.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)
path = 'openbmb/MiniCPM-MoE-8x2B'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)
responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮?差距多少?", temperature=0.8, top_p=0.8)
print(responds)
Note
- You can alse inference with vLLM(>=0.4.1), which is compatible with this repo and has a much higher inference throughput.
- The precision of model weights in this repo is bfloat16. Manual convertion is needed for other kinds of dtype.
- For more details, please refer to our github repo.
Statement
- As a language model, MiniCPM-MoE-8x2B generates content by learning from a vast amount of text.
- However, it does not possess the ability to comprehend or express personal opinions or value judgments.
- Any content generated by MiniCPM-MoE-8x2B does not represent the viewpoints or positions of the model developers.
- Therefore, when using content generated by MiniCPM-MoE-8x2B, users should take full responsibility for evaluating and verifying it on their own.
- Downloads last month
- 115
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support