Skip to content

kritikapartha/ML-Final-Project

Repository files navigation

ML-Final-Project

This research paper addresses the classification challenges faced by authors of the Association for Computing Machinery (ACM) Digital Library. The current process for keyword indexing of scientific papers requires authors to categorize their work according to the ACM Computing Classification System (CCS), a time-consuming and inefficient process. This study proposes an efficient way to use keyword extraction and topic modeling algorithms to automate the classification process. A dataset of 350 ACM articles and their index terms were used as a training and testing set for this natural language processing task. The paper explores the performance of two keyword extraction alrgotihms, TopicRank and Rapid Keyword Extraction Algorithm, individually and combined. Additionally, this study also explores Latent Dirichlet Allocation (LDA) for topic modeling; specifically Labeled-LDA (LLDA) as a supervised method of LDA given the CCS topics. Performance metrics, including accuracy and the Jaccard Index, were used to compare the methods. Topic modeling approaches outperformed the keyword extraction algorithms. While the Jaccard Index scores were lower for topic modeling, this highlighted potential for these models to predict more accurate index terms than authors. For future study and development, further comparisons with TF-IDF, Top2Vec, YAKE, and Latent Semantic Analysis can be proposed for more comprehensive insights into automated classification approaches.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •