GitHub - mrubin01/ML-for-Passive-Investing

This project aims at investigating the potential of machine learning in forecasting the best stocks in the long-term. The idea underlying the project is to use machine learning to predict which stocks will increase their value by a certain threshold after a certain number of years. Stocks coming from the US stock exchanges (NYSE, NASDAQ) are used and their value at the end of a training period (ten years) is compared with that after additional one, three and five years. The model proposed will be trained by using fundamental data, that is data coming from Balance Sheet, Cash Flow and Income Statement, combined with technical data (stock price and volume) as opposed to active investing where the stock price is used along with the metrics derived from it. From a technical point of view this is a supervised multilabel binary classification

The project has been subdivided into three stages and one demo

DATA GATHERING AND PREPROCESSING

NYSE and NASDAQ stocks active over the whole period between the first quarter of 2000 and the last quarter of 2014. The other stocks have been excluded.
Data have been transposed so to have the tickers as the indexes and it has been used a multiIndex format
Technical data (adjusted price and volume) and fundamental data are merged
Gaps are filled, empty or duplicated features dropped
Scale and standardize data
Store dataframe into dataset_full.csv (1555 stocks, 21 features, 40 quarters, 3 classes)

DIMENSIONALITY REDUCTION

Use PCA to create a compressed dataframe (1555 stocks, 18 features, 3 classes), df_full_pca
Use LDA to create a compressed dataframe (1555 stocks, 3 features, 3 classes), df_full_lda

ALGORITHM EVALUATION

Use df_full_pca and df_full_lda
First test cycle with cross-validation: PCA + LR, RF, MLP (without and with Label Powerset)
First test cycle with cross-validation: LDA + LR, RF, MLP (without and with Label Powerset)
Second test cycle with cross-validation: LDA + SVM, SGD, KNN, NB, DT
Third test cycle with cross-validation: LDA + VotingClassifier(3) with (LR, RF, MLP, SVM, SGD, KNN, NB, DT)
Third test cycle with cross-validation: LDA + GradientBoostingClassifier

DEMO

This demo uses only the class 3 (ie. the stock price after 5 years)
Load the data from dataset_with_classes.csv
Create a multiindex dataframe with a balanced number of stocks (735 + 735)
Scale and standardize the data
Run the LDA to reduce dimensionality and separate x and the three Ys
Run the three best algorithms (RF, SVM, KNN) and print the metrics
Run the best ensemble method (Voting Classifier with RF, SVM, MLP) and print the metrics
Using the ensemble method, print the forecast and the actual output for 100 stocks as well as the increase %

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Algorithm_Evaluation.ipynb		Algorithm_Evaluation.ipynb
Data_Collection_and_Preprocessing.ipynb		Data_Collection_and_Preprocessing.ipynb
Demo.ipynb		Demo.ipynb
Dimensionality_Reduction.ipynb		Dimensionality_Reduction.ipynb
README.md		README.md
dataset_with_classes.csv		dataset_with_classes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

mrubin01/ML-for-Passive-Investing

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages