Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError while running text_emojize.py #36

Closed
vidyap-xgboost opened this issue Jun 28, 2020 · 4 comments
Closed

AssertionError while running text_emojize.py #36

vidyap-xgboost opened this issue Jun 28, 2020 · 4 comments

Comments

@vidyap-xgboost
Copy link

vidyap-xgboost commented Jun 28, 2020

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-11-6c7dc2606552> in <module>()
      3 
      4 for i in flatten_list:
----> 5   deepmojify(i, top_n = 5)

1 frames
/content/torchMoji/torchmoji/sentence_tokenizer.py in tokenize_sentences(self, sentences, reset_stats, max_sentences)
    117         # may filter the sentences etc.
    118         if not self.uses_custom_wordgen and not self.ignore_sentences_with_only_custom:
--> 119             assert len(sentences) == next_insert
    120         else:
    121             # adjust based on actual tokens received

AssertionError: 


Hi,

The above error keeps coming when I run text_emojize.py file.

I have given a list of around 4700+ sentences for the model to convert it into 5 emojis.

I made changes to this block of code >> st = SentenceTokenizer(vocabulary, 100)

What am I doing wrong? Is it because I gave too many sentences?

@vidyap-xgboost vidyap-xgboost changed the title AssertionError while running sentence_tokenizer.py AssertionError while running text_emojize.py Jun 28, 2020
@vidyap-xgboost
Copy link
Author

@thomwolf @hiepph Please help me understand this error!

@vidyap-xgboost
Copy link
Author

@setu4993 any idea about this?

@setu4993
Copy link

@vidyap-xgboost : Hmm, can't reproduce on a single test sentence... I tried the setup in this Colab from #32. I tried locally, though.

@vidyap-xgboost
Copy link
Author

I've checked that particular row where the running of text_emojize.py stops.
If the row contains \n or \t or something like this, the whole dataset collapses.

Please add an exception to ignore such kind of rows or values in the list.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants