Sentiment Analyse

Sentiment Analyse Neural Network Model

Can classificate given text as bad or good review.

Target dataset is Sentiment140 dataset with 1.6 million tweets

Text preprocessing

Will use multiple pre trained embeding vectors for different tasks.

For tokenization and base lingustic analyse use spacy en_core_web_lg model. Large english corpus. Will be downloaded automatically by space in Docker.

For misspelling and contractions replacement will use GoogleNews-vectors-negative300.bin as biggest open accessable embeding corpus. It must be enought for all tasks, but if for you task other corpus will be enough use it.

List of most accessable embeding corpuses and tool for download them there: https://github.com/RaRe-Technologies/gensim-data

GoogleNews-vectors-negative300

Pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). The model contains 300-dimensional vectors for 3 million words and phrases.

https://code.google.com/archive/p/word2vec/

Requirements

Can be run in docker enviroment, or if you on linux:

Python - version 3.*
Tensorflow - version 2.2 or bigger
Make - optional, all commands can be run manually
DVC - Git for machine learning, need for load datasets and trained models.

All other Python libraries and models described in Dockerfile and requirements.txt

Hardware

NVidia videocard with DLSS version >= 10 - actually GPU optional, and learning can be run on CPU, but models and enviroment configurated to run on GPU, in base case tenserflow can fallback to CPU, so not need change anything for start development

Research Results

You can browse research as notebook

if github not load notebook you can open html version or even python version

Development

If you on Windows, build and run in-Docker development enviroment

# Build image
make docker-build
# Start docker container, map volume on current folder and attach current console
make docker-console

If you not in docker you need load Spacy model

# Will load meduim size model, good for developement, but for production better load lardger
make spacy-load-md

For rebuild model or test it after data or code changes, just run

make
# or
dvc repro

it will calculate all affected files and changed stages and start rebuild process

Training

For train model just run

make train

It will load dataset and start training

Testing

For test model just run

make test

Jupiter Notebook

In docker image insatlled Jupiter Notebook

For run notebook

make notebook

Five main steps for build Neural Network

Load dataset - split into train and test, and validation - which need for check training progress
Normalise data - need normalise text to vector with constant size, for put it into first layer of network
Build network model - stack layers for build model of network
Train - put train data with correct hyperparameters
Test - for insure correctness

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.dvc		.dvc
docs		docs
metrics		metrics
results		results
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
analyse.ipynb		analyse.ipynb
data.dvc		data.dvc
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analyse

Text preprocessing

GoogleNews-vectors-negative300

Requirements

Hardware

Research Results

Development

Training

Testing

Jupiter Notebook

Five main steps for build Neural Network

About

Releases

Packages

Contributors 2

Languages

License

LeoVS09/sentiment

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analyse

Text preprocessing

GoogleNews-vectors-negative300

Requirements

Hardware

Research Results

Development

Training

Testing

Jupiter Notebook

Five main steps for build Neural Network

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages