GitHub - umberto-sonnino/cooccurence

Homework II: n-gram and co-occurrence estimation

Given a text corpus,there are two objectives:

Calculate a list of n-grams and their frequencies in the corpus: n-gram list
Create, for a given n-gram, a sorted list of similar words: co- occurrence list

You can work with any of the following corpora: 1.ukWaC (this has been used, but not uploaded) 2.Wikipedia corpus

Lemmatize the text corpus, if needed . Go over the processed text and create a list of all n-grams (unigrams and bigrams) • then calculate the frequency of each of these n-grams in the corpus

Co-occurrence estimation has been done with the Jaccard similarity coefficient.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
AndroidStudioProjects/WearTimer		AndroidStudioProjects/WearTimer
Documents/Workspace		Documents/Workspace
Library/Preferences/AndroidStudioPreview1.3/options		Library/Preferences/AndroidStudioPreview1.3/options
git/AndroidWear/.idea		git/AndroidWear/.idea
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

umberto-sonnino/cooccurence

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages