script_1
Collection
12 items β’ Updated
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
A merged super-vocabulary built from 11 tokenizer(s).
Vocab size: 165022
flexitok/bpe_script_Arab_16000flexitok/bpe_script_CmJp_16000flexitok/bpe_ltr_ell_Grek_8000_v2flexitok/bpe_ltr_fw_edu_32000_v2flexitok/bpe_ltr_hun_Latn_8000_v2flexitok/bpe_ltr_rus_Cyrl_16000_v2flexitok/bpe_ltr_tur_Latn_8000_v2flexitok/bpe_script_Germ_32000flexitok/bpe_script_Roma_32000flexitok/bpe_script_SEAS_16000flexitok/bpe_script_Slav_16000super_vocab.json β merged vocabulary mapping token string β super indexconfig.yaml β model config with vocab_sizeparticipating_tokenizers.json β list of tokenizer names included<tokenizer>_super_mapping.json β per-tokenizer index β super index mapping<tokenizer>_vocab.json β per-tokenizer vocabulary<tokenizer>_info.json / .yaml β tokenizer metadata