Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next word predictor #23

Open
arpit-dwivedi opened this issue Oct 8, 2020 · 3 comments
Open

next word predictor #23

arpit-dwivedi opened this issue Oct 8, 2020 · 3 comments

Comments

@arpit-dwivedi
Copy link
Member

Add in the comments the links of resources you found and also add these things:

  1. Algorithms used for each given link
  2. Libraries used
  3. Approx lines of codes.

At the end also conclude which one is better.

@madalatrivedh20
Copy link
Collaborator

madalatrivedh20 commented Oct 8, 2020

link 1: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f
i) Algorithms used: RNN-LSTM
ii) Libraries used: nltk, numpy, keras-from keras Tokenizer,LSTM,Dense,Embedding,Sequential,pad_sequences,to_categorical
iii) Approx lines f code: 53
link 2: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f
i) Algorithms used: N-grams
ii) Libraries used: nltk, collections
iii) Approx lines f code: 135

link 2: https://www.youtube.com/watch?v=35tu6XnRkH0
i) Algorithms used: RNN-LSTM
ii) Libraries used:numpy, keras-from keras Tokenizer,LSTM,Dense,Embedding,Sequential,to_categorical
iii) Approx lines of code:50

Between LSTM and N-grams approach RNN-LSTM is the best because it is a more advanced approach, using a neural language. Standard RNNs and other language models become less accurate when the gap between the context and the word to be predicted increases but LSTM can be used to tackle the long-term dependency problem because it has memory cells to remember the previous context.

@jaydhamale7
Copy link

link 1: https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f
A)
i) Approach : RNN-LSTM
ii) Libraries used: nltk
iii) Approx lines of code: 80

B)
i) Algorithms used: RNN-LSTM
ii) Libraries used : numpy, from keras Tokenizer,LSTM,Dense,Embedding, Sequential,load_model,to_categorical, pad_sequences
iii) Approx lines of code: 65

@john-2424
Copy link

Next word predictor is or can be an application of Natural Language (NLP), where we can use different algorithms or techniques of NLP and Recurrent Neural Network (RNN) to predict the next word in the sentence. There are many algorithms and some of them are n-gram, Kneser-Ney smoothing, k Nearest Neighbours, RNN-LSTM, RNN-GRU.

  1. https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f
    i) n-gram -> An n-gram model is a type of probabilistic language model for predicting the next item in a sequence and it is a statistical natural language processing model.
    ii) re, nltk.tokenize, word_tokenize, collections
    iii) 132

  2. https://rpubs.com/teez/196761#:~:text=To%20predict%20the%20next%20word,has%20the%20highest%20weighted%20frequency.
    i) Kneser-Ney smoothing -> It is a probabilistic language model
    ii) re, nltk, nltk.corpus, nltk.data, nltk.stem.wordnet, collections, numpy, math
    iii) 564

  3. https://pudding.cool/2019/04/text-prediction/
    i) k Nearest Neighbours -> Training based model
    ii) sklearn.neighbors, sklearn.model_selection, sklearn.datasets, numpy, matplotlib.pyplot
    iii) 100

  4. https://towardsdatascience.com/exploring-the-next-word-predictor-5e22aeb85d8f
    i) RNN-LSTM and RNN-GRU -> Training and memory based RNN model
    ii) Keras, Tensorflow - Dense, Activation, Dropout, Input, LSTM, Reshape, Lambda, RepeatVector, Adam, to_categorical
    iii) 100

After studying these models, the RNN-LSTM and RNN-GRU were the best models to implement, due to less code and more accuracy. Between RNN-LSTM and RNN-GRU, RNN-LSTM is the best among the two. This is due to the following:
RNN-GRU use less training parameters and therefore use less memory, execute faster and train faster than RNN-LSTM whereas RNN-LSTM is more accurate on dataset using longer sequence. In short, if sequence is large or accuracy is very critical, we can go for RNN-LSTM whereas for less memory consumption and faster operation we can go for RNN-GRU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants