Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
248
StableMed is a 3 billion parameter decoder-only language model fine tuned on 18k rows of medical questions over 1 epoch.
Get started generating text with StableMed by using the following code snippet:
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("cxllin/StableMed-3b")
model = AutoModelForCausalLM.from_pretrained(
"stabilityai/stablelm-3b-4e1t",
trust_remote_code=True,
torch_dtype="auto",
)
model.cuda()
inputs = tokenizer("The weather is always wonderful", return_tensors="pt").to("cuda")
tokens = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.75,
top_p=0.95,
do_sample=True,
)
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications:
| Parameters | Hidden Size | Layers | Heads | Sequence Length |
|---|---|---|---|---|
| 2,795,443,200 | 2560 | 32 | 32 | 4096 |