You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey, Im having troubles getting the data to train the RNN. Specifically on this line: sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
if I open the file as 'rb' i get the error:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
and if I open it up with 'r' i get:
sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
AttributeError: 'str' object has no attribute 'decode'
Im not sure wich is the very basic idea to train the NN with strings or binary codes (guess binary codes).
thanks for your time!
The text was updated successfully, but these errors were encountered:
Maybe your Python version is 3.x, the code below runs without error under Python 2.7
with open('data/reddit-comments-2015-08.csv', 'rb') as f:
reader = csv.reader(f, skipinitialspace=True)
reader.next()
# Split full comments into sentences
sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
# Append SENTENCE_START and SENTENCE_END
sentences = ["%s %s %s" % (sentence_start_token, x, sentence_end_token) for x in sentences]
Yes, you must remove this but a couple of other changes are also required so that entire line becomes - with open('data/reddit-comments-2015-08.csv', 'rt', encoding="utf8") as f:
Hey, Im having troubles getting the data to train the RNN. Specifically on this line:
sentences = itertools.chain(*[nltk.sent_tokenize(x[0].decode('utf-8').lower()) for x in reader])
if I open the file as 'rb' i get the error:
and if I open it up with 'r' i get:
Im not sure wich is the very basic idea to train the NN with strings or binary codes (guess binary codes).
thanks for your time!
The text was updated successfully, but these errors were encountered: