How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

Yahsaswi · 2017-05-09T18:48:36Z

I have some large text files which have such characters and i wish to ignore such characters and proceede with the sentToVec conversion .. I see the below error , please help me fix this .
File "kfold1.py", line 34, in
model = Sent2Vec(LineSentence(sent_file), model_file=input_file + '.model')
File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 800, in init
self.reset_sent_vec(sentences)
File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 809, in reset_sent_vec
for sent in sentences:
File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/word2vec.py", line 1113, in iter
yield utils.to_unicode(line).split()
File "/Users/ypochampally/Documents/RESEARCH/workspace_latest/Wiki_Actors/src/om_TextClassification/utils.py", line 190, in any2unicode
return unicode(text, encoding, errors=errors)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte

sumehta · 2017-05-24T14:35:53Z

I get the same error when I try to load the model using,
model = Word2Vec.load_word2vec_format('test.txt.model', binary=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

Yahsaswi commented May 9, 2017

sumehta commented May 24, 2017 •

edited

Loading

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

How to resolve "UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 566: invalid start byte" #14

Comments

Yahsaswi commented May 9, 2017

sumehta commented May 24, 2017 • edited Loading

sumehta commented May 24, 2017 •

edited

Loading