You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#Assignment 2: Create an inverted index from a collection of documents.Till now you work on three documnets.Now work on 50k.
Create an Inverted Index for the given collection. Make decision for all the trade-offs like data structure for term-index, posting list. Sorted or not etc. Store the index in the file for later use. Extra points for index compression.
The collection needs to be used is: 50k documents of Hindi in the same way. After that you work on English collection. The collection to be downloaded is: http://users.dsic.upv.es/grupos/nle/clinss.html in Section corpus:The file is "hindi.12.tar.gz.gpg" and the passphrase is "clinss2012fire".
## Note: if you have difficulty in handling gpg file, search on how to decrypt gpg file in your OS.
Hindi Stemmer in Java is here: http://members.unine.ch/jacques.savoy/clef/HindiStemmerLight.java.txt.