Description
Describe the bug
Was trying to use pretrained model https://huggingface.co/stanfordnlp/stanza-lt
With a lot of issues, like stanza.download("lt") constantly crashing, I was forced to do it manually. So, installed and downloaded everything and used next piece of code to get the bug
import stanza
config = {
'processors': 'tokenize,pos',
'lang': 'lt',
'tokenize_model_path': './stanza_resources/lt/tokenize/alksnis.pt',
'pos_model_path': './stanza_resources/lt/pos/alksnis_nocharlm.pt',
'pos_pretrain_path': './stanza_resources/lt/pretrain/fasttextwiki.pt',
'tokenize_pretokenized': True,
'download_method': None
}
nlp = stanza.Pipeline(**config) # initialize neural pipeline
doc = nlp("Kur einam mes su Knysliuku, didžiulė paslaptis") # run annotation over a sentence
print(doc)
Expected behavior
The result shoud be obvious:
[
[
{
"id": 1,
"text": "Kur",
"upos": "ADV",
"xpos": "prm.l.lrgin.",
"feats": "Degree=Pos|PronType=Int,Rel",
"misc": "",
"start_char": 0,
"end_char": 3
},
...
]
Environment (please complete the following information):
- OS: Windows 10
- Python 3.10.5
- stanza 1.9.2
- numpy 2.1.2
Additional context
At least it works after patching code in file stanza/models/pos/model.py
~90 line self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(torch.from_numpy(emb_matrix), freeze=True))
to
if type(emb_matrix) == torch.Tensor:
self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(emb_matrix, freeze=True))
else:
self.add_unsaved_module('pretrained_emb', nn.Embedding.from_pretrained(torch.from_numpy(emb_matrix), freeze=True))
Not sure who is culprit - library or model.