Skip to content

Conversation

rapatel0
Copy link

The fix appears to be to replace self.config.tokenizer.model_dump() with self.config.tokenizer.dict() in wordllama/inference.py. expect that this might also be relavent for train.py but should be verified upstream

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from wordllama import WordLlama
   ...:
   ...: # Load pre-trained embeddings (truncate dimension to 64)
   ...: wl = WordLlama.load(config='l3_supercat',trunc_dim=64)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 4
      1 from wordllama import WordLlama
      3 # Load pre-trained embeddings (truncate dimension to 64)
----> 4 wl = WordLlama.load(config='l3_supercat',trunc_dim=64)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/wordllama.py:303, in WordLlama.load(cls, config, weights_dir, cache_dir, binary, dim, trunc_dim,
 disable_download)
    300         embedding = embedding[:, 0:trunc_dim]
    302 logger.debug(f"Loading weights from: {weights_file_path}")
--> 303 return WordLlamaInference(embedding, config_obj, tokenizer, binary=binary)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/inference.py:41, in WordLlamaInference.__init__(self, embedding, config, tokenizer, binary)
     39 self.config = config
     40 self.tokenizer = tokenizer
---> 41 self.tokenizer_kwargs = self.config.tokenizer.model_dump()
     43 # Default settings for all inference
     44 self.tokenizer.enable_padding()

AttributeError: 'TokenizerConfig' object has no attribute 'model_dump'

The fix appears to be to replace `self.config.tokenizer.model_dump()` with `self.config.tokenizer.dict()` in `wordllama/inference.py.` expect that this might also be relavent for train.py but should be verified upstream

```python
Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from wordllama import WordLlama
   ...:
   ...: # Load pre-trained embeddings (truncate dimension to 64)
   ...: wl = WordLlama.load(config='l3_supercat',trunc_dim=64)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 4
      1 from wordllama import WordLlama
      3 # Load pre-trained embeddings (truncate dimension to 64)
----> 4 wl = WordLlama.load(config='l3_supercat',trunc_dim=64)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/wordllama.py:303, in WordLlama.load(cls, config, weights_dir, cache_dir, binary, dim, trunc_dim,
 disable_download)
    300         embedding = embedding[:, 0:trunc_dim]
    302 logger.debug(f"Loading weights from: {weights_file_path}")
--> 303 return WordLlamaInference(embedding, config_obj, tokenizer, binary=binary)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/inference.py:41, in WordLlamaInference.__init__(self, embedding, config, tokenizer, binary)
     39 self.config = config
     40 self.tokenizer = tokenizer
---> 41 self.tokenizer_kwargs = self.config.tokenizer.model_dump()
     43 # Default settings for all inference
     44 self.tokenizer.enable_padding()

AttributeError: 'TokenizerConfig' object has no attribute 'model_dump'
```
@dleemiller
Copy link
Owner

I think your package environment failed to resolve. Can you check if your pydantic version is 1.x?

dict() is a deprecated method in pydantic v2, and model_dump() is the current API. The warning messages shows that pydantic v3 will remove the dict method, so it would become a breaking change to use it, and would create warnings for everybody using pydantic v2.

I'll think about it a little more this weekend and see if I can write a patch to support pydantic v1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants