Skip to content

N-gram language modelling for word sequences using Laplace or Knesey-Ney smoothing.

Notifications You must be signed in to change notification settings

agromanou/next-word-predictor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

Next word predictor in python

This project implements a language model for word sequences with n-grams using Laplace or Knesey-Ney smoothing.

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

  1. Create a directory in the Assignment/ and add all the data files
  2. Install the requirements.txt
  3. run the app.py (for modeling cross validation and testing) or predictive_keyboard.py for the predictive keyboard functionality.

Installing

In order to run the code in your local environment, please make sure your have python 3. and above and to have installed the needed python libraries. To install the libraries please run on your console:

pip install -r requirements.txt file

Train the model

In order to train the language model you will need to run the following command:

python Assignment1/app.py

Run the keyword predictor

In order to run the predictive keyboard you will need to give the following command:

python Assignment1/predictive_keyboard.py

Structure

The project consists of the following main classes:

Data Fetcher

This class is responsible for all the handling and fetching of the dataset(s). It loads the data, splits them into sub parts according to user needs and performs folding for the cross validation process.

Preprocessor

This class is responsible for the text pre-processing of the data. It consists methods for sentences splitting, tokenization and n-gram creation.

Model

This class is responsible for fitting the language model into the given data. It calculates the probabilities of the language model, performs smoothing (available implementations: Laplace or Kneser-Ney smoothing algorithms), runs linear interpolation on n-gram probabilities and predicts the next word for a given sequence.

Evaluation

This class is responsible for the evaluation of a given model. It calculates the cross-entropy and perplexity of the model.

About

N-gram language modelling for word sequences using Laplace or Knesey-Ney smoothing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages