-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: expected np.ndarray (got Tensor) #1431
Comments
ultimately the problem here is we modified the models for the upcoming version 1.10, and you're downloading the new models with the old code. you could use the dev branch or download the version 1.9 models directly from HF if you're sure you need to do it manually
"crashing" how? like with a bad connection? it doesn't "crash" when i run it you also don't need to do any of that just run
it should automatically download just the models you need for the right version |
If I'm using direct download - stanza.download('lt')
This is log file: Was testing your suggestion -- nlp = stanza.Pipeline("lt", processors="tokenize,pos", tokenize_pretokenized=True)
|
That's pretty weird. If I use the github repo main branch (which is 1.9.2), download successfully downloads a file with the following md5sum, which is the expected value:
I can switch branches back & forth between main & dev, and it overwrites the old models when trying to download again. At no point does it download a model with md5sum Is it possible the download was interrupted and it got a corrupted file? At any rate, I suggest deleting those incorrect files and trying again. |
Describe the bug
Was trying to use pretrained model https://huggingface.co/stanfordnlp/stanza-lt
With a lot of issues, like stanza.download("lt") constantly crashing, I was forced to do it manually. So, installed and downloaded everything and used next piece of code to get the bug
import stanza
config = {
'processors': 'tokenize,pos',
'lang': 'lt',
'tokenize_model_path': './stanza_resources/lt/tokenize/alksnis.pt',
'pos_model_path': './stanza_resources/lt/pos/alksnis_nocharlm.pt',
'pos_pretrain_path': './stanza_resources/lt/pretrain/fasttextwiki.pt',
'tokenize_pretokenized': True,
'download_method': None
}
nlp = stanza.Pipeline(**config) # initialize neural pipeline
doc = nlp("Kur einam mes su Knysliuku, didžiulė paslaptis") # run annotation over a sentence
print(doc)
Expected behavior
The result shoud be obvious:
Environment (please complete the following information):
Additional context
At least it works after patching code in file stanza/models/pos/model.py
~90 line self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(torch.from_numpy(emb_matrix), freeze=True))
to
if type(emb_matrix) == torch.Tensor:
self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(emb_matrix, freeze=True))
else:
self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(torch.from_numpy(emb_matrix), freeze=True))
Not sure who is culprit - library or model.
The text was updated successfully, but these errors were encountered: