Skip to content

Latest commit

 

History

History
48 lines (32 loc) · 4.3 KB

README.md

File metadata and controls

48 lines (32 loc) · 4.3 KB

S&P 500 Index

Examining S&P 500 Index Data from 1/1/2015 to 12/31/2019

The data is gathered from the yfinance library. A total of 1258 dates of open price, high price, low price, close price, volume of stocks, dividends, and stock splits. The dividends and stock splits columns only had values of 0, so these two columns were dropped.

Candlestick_s&p_500

Plotting all of the data in a candlestick chart, there was a general trend of increasing open, high, low and close prices. The lowest price was a low price on January 20, 2016 at $1812.29 and the highest price was a high price on December 27, 2019 at $3,247.93.

boxplot_s&p_500 hist_open_price_by_year

The boxplot comparison of open, high, low and close prices across the five years shows that the each year is distinct across the four prices. 2015 and 2016 years are relatively similar across the four prices. From 2017 to 2019 is where there is a stronger trend of increasing prices.The spread of prices for 2019 is quite wide compare to the other years and more centered.

histograme_volume_sp500 scatterplot_volume_sp500

The distribution of volume of stocks is slightly skewed to the right, with a majority of volume of stocks traded is around 3.5 billion stocks. The volume of stocks traded at the end of year in November or December experience a wider range of volume of stocks traded. The lowess or Locally Weighted Scatterplot Smoothing line shows a weak general trend across years that the most stocks are traded at the beginning and the end of the year.

open_close_high

For the years from 2015-2019, there seems to be a linear relationship among the three prices.

Preprocessing Data

All input data, open, high, and low prices, and output data, close prices, were standardized using the MinMaxScaler with range 0 to 1. Using the function from https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/ to transform the data to be used in RNN as batch size, time steps, and input size. The train shape was (956, 90, 3) and the test shape was (202, 90, 3).The train data was taken from the first 80% the data, and the remaining 20% of the data was used for the test dataset. A validation data set was designateted to be 20% of the training data.

Model Comparisons

All three RNN models were run with an EarlyStopping(monitor='loss') with 50 epochs. compiled with the adam optimizer and mse loss metric. The rnn1 model only has 3 SimpleRNN layers with 40 hidden neurons. rnn2 model has 4 LSTM layers with 3 hidden neurons each and one SimpleRNN layer. rnn3 model has two LSTM layers with relu activation and 100 hidden neuron, plus one dense layer.

mse mae runtime

Management Research Question

Predicting the future close prices or any other stock prices is helpful in order to gauge whether to hold onto stocks or sell stocks. Determining the appropriate time step or days to include with each sample is key part in creating a better RNN model. The ideal timestep depends on the breadth of training data and

Conclusion

None of the models had very promising MSE scores. I choose to evaluate the model on 50 time step which may have negatively impacted the models ability to predict day to day prices. I would also try to increase the number of epochs run in order to bring down the MSE loss function.

model train/test MSE
rnn1 train 1120.32
rnn1 test 2541.250
rnn2 train 1547.56
rnn2 test 3992.39
rnn3 train 859.57
rnn3 tesst 2111.09