Korean Language Model

A ~100M parameter Korean language model trained from scratch on Korean Wikipedia.

Quick Start

from transformers import AutoModelForCausalLM
import sentencepiece as spm
import torch

# Load model (trust_remote_code required for custom architecture)
model = AutoModelForCausalLM.from_pretrained(
    "shopkeeper/korean-wiki-120125",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
)
model = model.to("cuda")

# Load tokenizer
from huggingface_hub import hf_hub_download
tokenizer_path = hf_hub_download(repo_id="shopkeeper/korean-wiki-120125", filename="ko_wiki_tokenizer.model")
tokenizer = spm.SentencePieceProcessor()
tokenizer.load(tokenizer_path)

# Generate text
prompt = "ํ•œ๊ตญ์˜ ์—ญ์‚ฌ๋Š”"
input_ids = torch.tensor([[tokenizer.bos_id()] + tokenizer.encode(prompt)]).to("cuda")

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=100,
        temperature=0.8,
        top_p=0.9,
        do_sample=True,
        pad_token_id=tokenizer.eos_id(),
    )

print(tokenizer.decode(output[0].tolist()))

example output

ํ•œ๊ตญ์˜ ์—ญ์‚ฌ๋Š” ์‚ผ๊ตญ์‹œ๋Œ€๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•œ๋‹ค. ๊ณ ๊ตฌ๋ ค์™€ ๋ฐฑ์ œ, ์‹ ๋ผ์˜ ์„ธ ์™•๊ตญ, ์‹ ๋ผ์˜ ์‹ ๋ผ๋Š” ์‚ผ๊ตญํ†ต์ผ์„ ์ด๋ฃฉํ•˜์˜€๋‹ค. ์ดํ›„ ์‹ ๋ผ๋Š” ๊ณ ๊ตฌ๋ ค์˜ ๊ณ„์Šน๊ตญ๊ฐ€์ธ ๊ณ ๊ตฌ๋ ค๋ฅผ ๊ณ„์Šนํ•˜์—ฌ ์‚ผ๊ตญํ†ต์ผ์„ ๋‹ฌ์„ฑํ•˜์˜€๋‹ค. ๋‚จ๋ถ๊ตญ ์‹œ๋Œ€๊ฐ€ ๋˜์ž ๋‚จ๋ถ๊ตญ ์‹œ๋Œ€๋ฅผ ๊ฑฐ์น˜๋ฉฐ, ์‚ผ๊ตญ ์‹œ๋Œ€๊ฐ€ ๋งˆ๋ฌด๋ฆฌ ๋˜์—ˆ๋‹ค. ์‹ ๋ผ์˜ ์‚ผ๊ตญํ†ต์ผ์€ ์‹ ๋ผ๋กœ๋ถ€ํ„ฐ ์‹ ๋ผ๋กœ ๋„˜์–ด๊ฐ€๋Š” ๊ณผ์ •์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ ธ์•ผ ํ•  ์ •์น˜์ , ๊ฒฝ์ œ์ , ์‚ฌํšŒ์ , ๋ฌธํ™”์ , ์ •์น˜์ , ๋ฌธํ™”์ , ๋ฌธํ™”์  ๋ณ€ํ™”๋ฅผ ๊ฒช๊ฒŒ ๋˜์—ˆ๋‹ค. ์‹ ๋ผ์˜ ์‚ผ๊ตญํ†ต์ผ์€ ์‚ผ๊ตญ ํ†ต์ผ์˜ ์ฒซ ๋ฒˆ์งธ ๋‹จ๊ณ„์ด๋‹ค. ์‹ ๋ผ. ์‹ 

Model Details

Property Value
Parameters ~100M
Hidden dimension 768
Layers 12
Attention heads 12
Max sequence length 1024
Vocabulary size 32000

Architecture Features

  • RMSNorm (pre-normalization)
  • Rotary Position Embeddings (RoPE)
  • SwiGLU activation
  • Multi-Head Attention with KV Cache
  • No bias terms
  • Weight tying (embedding & output)

Training

  • Data: Korean Wikipedia (120125 dump)
  • Epochs: 3
  • Precision: bfloat16
  • Optimizer: AdamW with cosine LR schedule
  • Tokenizer: SentencePiece (32k vocab)

License

MIT

Downloads last month
111
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support