jSimilarity is a library that implements various similarity measures.
String Character-based Similarities:
Jaro
Jaro-Winker
String Token-based Similarities:
Jaccard
Cosine similarity
Document-based Similarities:
TF-IDF
SoftTFIDF
Useful implemented Utilities
TextDocument
Corpus
BasicTokenizer
JSimilarity mainly focuses on the implementation of tf-idf and also a number of variations are considered (Smooth IDF, Max IDF, Normalized TF, Double Normalization 0.5 etc.)