Fix error with `tokenizer.model_dump()` on vanilla install. #40

rapatel0 · 2024-10-28T16:34:02Z

The fix appears to be to replace self.config.tokenizer.model_dump() with self.config.tokenizer.dict() in wordllama/inference.py. expect that this might also be relavent for train.py but should be verified upstream

Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from wordllama import WordLlama
   ...:
   ...: # Load pre-trained embeddings (truncate dimension to 64)
   ...: wl = WordLlama.load(config='l3_supercat',trunc_dim=64)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 4
      1 from wordllama import WordLlama
      3 # Load pre-trained embeddings (truncate dimension to 64)
----> 4 wl = WordLlama.load(config='l3_supercat',trunc_dim=64)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/wordllama.py:303, in WordLlama.load(cls, config, weights_dir, cache_dir, binary, dim, trunc_dim,
 disable_download)
    300         embedding = embedding[:, 0:trunc_dim]
    302 logger.debug(f"Loading weights from: {weights_file_path}")
--> 303 return WordLlamaInference(embedding, config_obj, tokenizer, binary=binary)

File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/inference.py:41, in WordLlamaInference.__init__(self, embedding, config, tokenizer, binary)
     39 self.config = config
     40 self.tokenizer = tokenizer
---> 41 self.tokenizer_kwargs = self.config.tokenizer.model_dump()
     43 # Default settings for all inference
     44 self.tokenizer.enable_padding()

AttributeError: 'TokenizerConfig' object has no attribute 'model_dump'

The fix appears to be to replace `self.config.tokenizer.model_dump()` with `self.config.tokenizer.dict()` in `wordllama/inference.py.` expect that this might also be relavent for train.py but should be verified upstream ```python Python 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0] Type 'copyright', 'credits' or 'license' for more information IPython 8.25.0 -- An enhanced Interactive Python. Type '?' for help. In [1]: from wordllama import WordLlama ...: ...: # Load pre-trained embeddings (truncate dimension to 64) ...: wl = WordLlama.load(config='l3_supercat',trunc_dim=64) --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) Cell In[1], line 4 1 from wordllama import WordLlama 3 # Load pre-trained embeddings (truncate dimension to 64) ----> 4 wl = WordLlama.load(config='l3_supercat',trunc_dim=64) File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/wordllama.py:303, in WordLlama.load(cls, config, weights_dir, cache_dir, binary, dim, trunc_dim, disable_download) 300 embedding = embedding[:, 0:trunc_dim] 302 logger.debug(f"Loading weights from: {weights_file_path}") --> 303 return WordLlamaInference(embedding, config_obj, tokenizer, binary=binary) File ~/.pyenv/versions/anaconda3-2024.02-1/envs/threat/lib/python3.11/site-packages/wordllama/inference.py:41, in WordLlamaInference.__init__(self, embedding, config, tokenizer, binary) 39 self.config = config 40 self.tokenizer = tokenizer ---> 41 self.tokenizer_kwargs = self.config.tokenizer.model_dump() 43 # Default settings for all inference 44 self.tokenizer.enable_padding() AttributeError: 'TokenizerConfig' object has no attribute 'model_dump' ```

dleemiller · 2024-10-29T13:56:21Z

I think your package environment failed to resolve. Can you check if your pydantic version is 1.x?

dict() is a deprecated method in pydantic v2, and model_dump() is the current API. The warning messages shows that pydantic v3 will remove the dict method, so it would become a breaking change to use it, and would create warnings for everybody using pydantic v2.

I'll think about it a little more this weekend and see if I can write a patch to support pydantic v1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix error with `tokenizer.model_dump()` on vanilla install. #40

Fix error with `tokenizer.model_dump()` on vanilla install. #40

Uh oh!

rapatel0 commented Oct 28, 2024

Uh oh!

dleemiller commented Oct 29, 2024

Uh oh!

Uh oh!

Fix error with tokenizer.model_dump() on vanilla install. #40

Are you sure you want to change the base?

Fix error with tokenizer.model_dump() on vanilla install. #40

Uh oh!

Conversation

rapatel0 commented Oct 28, 2024

Uh oh!

dleemiller commented Oct 29, 2024

Uh oh!

Uh oh!

Fix error with `tokenizer.model_dump()` on vanilla install. #40

Fix error with `tokenizer.model_dump()` on vanilla install. #40