Team-IIRS

Team-IIRS's Machine learning project's repository This Github repository contains files and results that we got at different stages of the project:

1.InitialData_TweetID is the merged version of csv files, downloaded from IEEE dataport. It contains field: tweet ids which was used for the hydration and Sentiment scores which were ignored in this project.This file has 505193 entries.

2.tweepy_hydration.ipynb file contains the codes used to hydrate the tweet ids in the InitialData_TweetID.

3.Hydrated_India_Tweets is the Level 2 csv file which contains hydrated data for India.This has been done using tweepy python library,codes for which are provided in tweepy_hydration.ipynb file .It contains fields Id, date, time, text, long(longitude),lat(latitude),country,city and lang(Language).This file contains 11858 entries.

4.filtering is the sql file used to get the tweets for Maharashtra contituencies.

5.Maharashtra_Sentiment_Label is the final csv file which we got after scaling down the hydrated tweet to the state of maharshstra. The tweets obtained in the hydration process in this file were cleaned (removal of symbols, unwantd white space, emoticons) using codes in the cleantext.ipynb file.This cleaned text was then used to provide labels using Manual Labelling, textblob using python library (codes are provided in provided in ML_model.ipynb) and Lexicon based approach. This contains fields: Id, date, time, text, long(longitude),lat(latitude),country,city and lang(Language), geom(geometry),cleaned text,Textblob_score,Textblob label, Manual Label and Lexicon Label.This csv file have 2628 entries.

6.cleantext.ipynb file contains codes for cleaning the text.

7.ML_model.ipynb file contains codes for getting the sentiment polarity using textblob. It also has codes to implement supervised machine learning algorithm using the required python library i.e. sklearn.

Positive-words.txt and Negative-words.txt files contains a collection of positive and negative words respectively, that were used in Lexicon based Labelling approach
Lexicon.ipynb has codes to perform Lexicon based Labelling approach.

10.Dashboard.pbix file is the dasboard file for the visualisation of findings of this project.

11.Final report: Twitter data handling and sentiment analysis using big data tools and frameworks - Group 1 Final.docx

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Dasboard.pbix		Dasboard.pbix
Hydrated_India_Tweets.csv		Hydrated_India_Tweets.csv
InitialData_TweetID.csv		InitialData_TweetID.csv
Lexicon.ipynb		Lexicon.ipynb
ML_model.ipynb		ML_model.ipynb
Maharashtra_Sentiment_Label.csv		Maharashtra_Sentiment_Label.csv
README.md		README.md
Twitter data handling and sentiment analysis using big data tools and frameworks - Group 1 Final.docx		Twitter data handling and sentiment analysis using big data tools and frameworks - Group 1 Final.docx
cleantext.ipynb		cleantext.ipynb
filtering.sql		filtering.sql
negative-words.txt		negative-words.txt
positive-words.txt		positive-words.txt
shape		shape
tweepy_hydration.ipynb		tweepy_hydration.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team-IIRS

About

Releases

Packages

Languages

Utkarsh-iirs/Team-IIRS

Folders and files

Latest commit

History

Repository files navigation

Team-IIRS

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages