Remove chunk_word_count and chunk approximations #429

bbrowning · 2024-12-05T16:14:31Z

Now that we have the teacher models' Tokenizer, we can stop approximating chunk counts using chunk_word_count, _num_tokens_from_words, _num_chars_from_tokens, etc. We can always refer to chunk sizes in Tokens instead of ever needing to convert to and from "words".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove chunk_word_count and chunk approximations #429

Remove chunk_word_count and chunk approximations #429

bbrowning commented Dec 5, 2024

Remove chunk_word_count and chunk approximations #429

Remove chunk_word_count and chunk approximations #429

Comments

bbrowning commented Dec 5, 2024