Multi-Class-Text-Classification-Analysis

Project: Analyze the performance of algorithms that classify news headlines into 4 classes.

Data: UC Machine Learning Repository News Headlines

• Input: TITLE
o Example: " Bitcoin exchange seeks U.S. bankruptcy protection"

• Output of classification algorithm: CATEGORY
o Example: Business

Loading and cleaning the dataset

df = pd.read_csv('headlines.csv')
df = df[['CATEGORY','TITLE']]
df = df[pd.notnull(df['TITLE'])]
df.columns = ['CATEGORY', 'TITLE']
df.TITLE = df.TITLE.apply(lambda x: x.lower())
df.TITLE = df.TITLE.apply(lambda x: x.translate(str.maketrans('', '', string.punctuation)))
df.TITLE = df.TITLE.apply(lambda x: x.translate(str.maketrans('', '', '1234567890')))
df['category_id'] = df['CATEGORY'].factorize()[0]

Top 5 Features by Category

Mean Accuracy and Standard Deviation of the Algorithms

Logistic Regression: Mean Accuracy: 0.847214 Standard Deviation: 0.046154

Random Forest: Mean Accuracy: 0.361110 Standard Deviation: 0.098128

Naive Bayes: Mean Accuracy: 0.855489 Standard Deviation: 0.038743

Linear SVC: Mean Accuracy: 0.849241 Standard Deviation: 0.045773

Confusion Matrices of the Algorithms

References:

https://towardsdatascience.com/a-production-ready-multi-class-text-classifier-96490408757 https://buhrmann.github.io/tfidf-analysis.html

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
algorithms_comparison.py		algorithms_comparison.py
clean_data_helper.py		clean_data_helper.py
confusion_matrices.py		confusion_matrices.py
headlines.csv		headlines.csv
predictor.py		predictor.py
text_analysis.py		text_analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Class-Text-Classification-Analysis

Project: Analyze the performance of algorithms that classify news headlines into 4 classes.

Data: UC Machine Learning Repository News Headlines

Loading and cleaning the dataset

Top 5 Features by Category

Mean Accuracy and Standard Deviation of the Algorithms

Confusion Matrices of the Algorithms

About

Uh oh!

Releases

Packages

Languages

soomroha/Multi-Class-Text-Classification-Analysis

Folders and files

Latest commit

History

Repository files navigation

Multi-Class-Text-Classification-Analysis

Project: Analyze the performance of algorithms that classify news headlines into 4 classes.

Data: UC Machine Learning Repository News Headlines

Loading and cleaning the dataset

Top 5 Features by Category

Mean Accuracy and Standard Deviation of the Algorithms

Confusion Matrices of the Algorithms

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages