You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently a couple of the apis talk in tokens, which is inconvenient. It would be nice if you could translate text into tokens and vise-versa easily.
The rust_tokenizer crate has a function called from_file that allows instantiating the GPT2 tokenizer given a couple pretrained tokenizer files. These files are available from huggingface's website here:
There is also an example in rust_bert of constructing a gpt2 tokenizer. Ideally the tokenizer would be built lazily so users of the library don't need to pay for it unless they need the features.
Where to use it
It looks most like this will be useful with the logit_bias feature, since the api requires you send the token number, rather than actual strings. Since the example code is in python, this is a bit of a barrier to users in rust.
The text was updated successfully, but these errors were encountered:
Currently a couple of the apis talk in tokens, which is inconvenient. It would be nice if you could translate text into tokens and vise-versa easily.
The rust_tokenizer crate has a function called from_file that allows instantiating the GPT2 tokenizer given a couple pretrained tokenizer files. These files are available from huggingface's website here:
There is also an example in
rust_bert
of constructing a gpt2 tokenizer. Ideally the tokenizer would be built lazily so users of the library don't need to pay for it unless they need the features.Where to use it
It looks most like this will be useful with the
logit_bias
feature, since the api requires you send the token number, rather than actual strings. Since the example code is in python, this is a bit of a barrier to users in rust.The text was updated successfully, but these errors were encountered: