Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 1.15 KB

README.md

File metadata and controls

30 lines (22 loc) · 1.15 KB

NLP Project for clustering hashtags by concept

Included in the code:

  • Hashtag segmentation
  • Manually identified set of patterns for filtering query-type tags
  • Hand labeled dictionary of 6000 tags and their concepts
  • Test script for classifying new tags into respective clusters

Requirements

Need to have the following installed for the code to work correctly:

  • Python 3.0, or higher
  • NLTK, along with wordnet

Usage instructions

  • /pattern_filter/filter_query_type_tags.py <hashtagfile> Outputs a file containing segmented query type tags identified in the original hashtag file.

  • ./benchmark_similarity.py A simple exact word matching algorithm to identify the cluster of the tag, using the hand labeled dictionary.

  • ./wordnet_similarity.py Identify the cluster the tag belongs to, using the dictionary and word similarity measures from wordnet.


References