Skip to content

Scripts to pre train a Multi Tokenizer and also evaluate the performance.

License

Notifications You must be signed in to change notification settings

aya-multitokenizer/multi-tokernizer-llm

Repository files navigation

multi-tokernizer-llm

Install dependancies

pip install -r requirements.txt

Preprocess data

python utils/preprocess.py --load_tokenizer

Use --load_tokenizer to load the tokenizer from the files if you have already saved one.

About

Scripts to pre train a Multi Tokenizer and also evaluate the performance.

Topics

Resources

License

Stars

Watchers

Forks

Languages