Stocks and news analysis

Datasets

Links to the datasets are added to this folder in Onedrive. Note: You must access the folder with a UiS account.

The files must put in HDFS such that the folder structure is equal to this:

├── data
│   ├── news
│   ├── stocks
│   ├── sectors.csv
│   ├── companynames.csv

Run this command to put the files in HDFS:

hadoop fs -put news stocks sectors.csv companynames.csv /data/

Directory structure

All commands must be run from the project directory (same as the Makefile). Folder structure of project:

├── project
├   ├── src
│   │   ├── spark
│   │   │   ├── installation
│   │   │   ├── main.py
│   │   │   ├── sentiment_analysis.py
│   │   │   ├── show.py
│   │   │   ├── storage.py
│   │   │   ├── transformations.py
│   │   │   ├── udfs.py
│   │   ├── hadoop
│   │   │   ├── models
│   │   │   │   ├── news_article.py
│   │   │   │   ├── stock_result.py
│   │   │   │   ├── stock.py
│   │   │   ├── tools
│   │   │   │   ├── console_printer.py
│   │   │   │   ├── number_or_defaults.py
│   │   │   ├── news_article_mp.py
│   │   │   ├── stocks_mr_job.py
├   ├── conf
│   │   ├── log4j2.properties
│   │   ├── spark-defaults.conf
│   │   ├── spark-env.sh
│   │   ├── spark
├   ├── Makefile
├   ├── README.md
└   └── requirements.txt

Requirements

In order to install packages for sentiment analysis, MRjob and Delta run:

make requirements

Replace the configuration files in $SPARK_HOME$/conf/ with the files in project/conf/

Run pipeline

Run the full pipeline with both Hadoop and Spark:

make pipeline

If you encounter any errors:

Make sure the file path to the python files are correct:
- Check line 14 in project/src/spark/main.py and update the variable to point to the folder project/src/spark/. Do not use relative paths.
- Check line 14 in project/src/spark/sentiment_analysis.py and update the variable to point to the folder project/src/spark/. Do not use relative paths.
The command will output the results to the terminal, and thus it is necessary to keep a connection to the cluster open for the whole duration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stocks and news analysis

Datasets

Directory structure

Requirements

Run pipeline

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
conf		conf
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
REPORT_stocks_and_news_analysis.pdf		REPORT_stocks_and_news_analysis.pdf
requirements.txt		requirements.txt

aleksander-vedvik/stocks_and_news_analysis

Folders and files

Latest commit

History

Repository files navigation

Stocks and news analysis

Datasets

Directory structure

Requirements

Run pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages