ML-Final-Project

This research paper addresses the classification challenges faced by authors of the Association for Computing Machinery (ACM) Digital Library. The current process for keyword indexing of scientific papers requires authors to categorize their work according to the ACM Computing Classification System (CCS), a time-consuming and inefficient process. This study proposes an efficient way to use keyword extraction and topic modeling algorithms to automate the classification process. A dataset of 350 ACM articles and their index terms were used as a training and testing set for this natural language processing task. The paper explores the performance of two keyword extraction alrgotihms, TopicRank and Rapid Keyword Extraction Algorithm, individually and combined. Additionally, this study also explores Latent Dirichlet Allocation (LDA) for topic modeling; specifically Labeled-LDA (LLDA) as a supervised method of LDA given the CCS topics. Performance metrics, including accuracy and the Jaccard Index, were used to compare the methods. Topic modeling approaches outperformed the keyword extraction algorithms. While the Jaccard Index scores were lower for topic modeling, this highlighted potential for these models to predict more accurate index terms than authors. For future study and development, further comparisons with TF-IDF, Top2Vec, YAKE, and Latent Semantic Analysis can be proposed for more comprehensive insights into automated classification approaches.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Data Collection		Data Collection
Exploratory (not used)		Exploratory (not used)
saved_model		saved_model
.DS_Store		.DS_Store
Compare_Accuracy_Plot.png		Compare_Accuracy_Plot.png
Compare_IOU_Plot.png		Compare_IOU_Plot.png
Keyword_Accuracy_Plot.png		Keyword_Accuracy_Plot.png
Keyword_IOU_Plot.png		Keyword_IOU_Plot.png
LLDA.py		LLDA.py
LLDATesting.ipynb		LLDATesting.ipynb
README.md		README.md
TopicRank.ipynb		TopicRank.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML-Final-Project

About

Releases

Packages

Contributors 4

Languages

kritikapartha/ML-Final-Project

Folders and files

Latest commit

History

Repository files navigation

ML-Final-Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages