-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass model and tokenizer arguments when instantiating SentenceTransformer
#1103
Pass model and tokenizer arguments when instantiating SentenceTransformer
#1103
Conversation
Thanks for the PR. I will have a look after my vacation (after Aug. 23rd) |
@nreimers, I'm wondering if you have time to look at this PR now. |
@aphedges Sadly not yet, sorry for that. I'm currently work on a major release of new models. Once that is done, I will have a look the PRs and integrate new functions into SBERT |
@nreimers, understood! Thanks for the update! |
412c1c2
to
121fec1
Compare
@nreimers, sorry to keep bothering you about this, but I was hoping you could take a look at this PR. If there is anything I can do to make it easier to review (e.g. splitting it up into multiple PRs), please let me know. |
Sorry that I did not yet look into it. Could you write a bit more about the use-case? The use case is when you run SentenceTransformers in a multi-process setup to disable the fast tokenizer? The linked issue in tokenizer appears that this is still an issue, or? |
It's okay! I know you have other priorities when it comes to this library. Your recounting of the use case is correct. My specific use case is running SentenceTransformers in a multi-threaded setup (e.g. Flask). As far as I'm aware, the underlying problem will always be present because of differences between the CPython and Rust runtimes. This allows me to disable the fast tokenizer by just passing in a parameter. I specifically made this fix more general than needed to allow any model or tokenizer arguments to be passed when the model and tokenizer are loaded. There are probably other use cases for this PR that I am unaware of. |
@nreimers, just pinging this again. |
121fec1
to
94247f2
Compare
I have rebased this branch against |
Fixes #794.
The
Transformer
class already hasmodel_args
andtokenizer_args
arguments in its constructor to allow overriding when instantiating models and tokenizers from thetransformers
library. However, there is no way to set these from theSentenceTransformers
class, which is the one that many users would be instantiating instead. I have added arguments to theSentenceTransformers
constructor to allow these to be more easily used.My specific motivation in this case is to make disabling the fast tokenizer easier, but it should work for any arguments that users might want to override.