This project is divided into two main parts: Exploratory Data Analysis (EDA) and Prediction Modeling. The goal is to understand the historical behavior of the S&P 500 index through EDA and then apply machine learning techniques to predict its future movements.
README.md
: Project documentation.requirements.txt
: List of dependencies needed to run the project.StockMarketEDA.ipynb
: Jupyter notebook containing the Exploratory Data Analysis of the S&P 500 data.StockMarketPredictor.ipynb
: Jupyter notebook for the Prediction Modeling.stock_predictor_env/
: Virtual environment directory.
The EDA phase helps us uncover key patterns, trends, and insights from the S&P 500 data, which are essential for building a robust predictive model.
-
Time Series Analysis of Closing Prices:
- Long-term upward trends, with notable growth acceleration from the 1980s onwards.
- Identification of major downturns like the 2000 dot-com bubble and the 2008 financial crisis.
-
Distribution Analysis:
- The distribution shows a left-skew, reflecting the historical growth of the market with higher price levels becoming more frequent in recent decades.
-
Volume Trend Analysis:
- Significant increase in trading volume starting in the late 1980s, often correlating with major market events.
-
Rolling Mean and Standard Deviation:
- Rolling mean reveals long-term trends, while rolling standard deviation indicates periods of market volatility.
-
Daily Returns and Risk Analysis:
- Analysis of daily returns shows the market’s tendency to gain more often than it loses, with occasional extreme movements indicating higher risk.
-
Cumulative Returns:
- Illustrates the exponential growth of long-term investments in the S&P 500.
-
Bollinger Bands for Volatility Analysis:
- Bollinger Bands help in identifying overbought and oversold market conditions, providing potential trading signals.
-
Autocorrelation Analysis:
- Highlights patterns and cycles in the market, valuable for predictive modeling.
-
Clone the Repository:
git clone https://github.com/Nazaboy/StockMarketPredictor.git cd StockMarketPredictor
-
Set Up a Virtual Environment:
python -m venv stock_predictor_env source stock_predictor_env/bin/activate # On Windows: stock_predictor_env\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Run the EDA Notebook: Start the Jupyter notebook server:
jupyter notebook
Open the
StockMarketEDA.ipynb
file to explore the data and insights.
The Prediction Modeling phase builds on the insights gained from the EDA phase to develop machine learning models capable of predicting whether the S&P 500 will move up or down on the next day.
-
Feature Engineering:
- We create features such as RSI, MACD, Moving Averages, and other technical indicators to help the model learn the patterns in stock price movements.
-
Machine Learning Models:
- We train several models, including:
- HistGradientBoostingClassifier
- RandomForestClassifier
- LogisticRegression
- Support Vector Classifier (SVC)
- We train several models, including:
-
Model Tuning and Comparison:
- The models are evaluated based on accuracy, precision, recall, and F1-score.
- The best-performing model is selected after hyperparameter tuning using GridSearchCV.
-
Evaluation:
- The best model is evaluated on unseen test data, with detailed metrics including confusion matrix, precision, recall, and F1-score.
-
Open the Prediction Notebook: Open the
StockMarketPredictor.ipynb
file to view the prediction model steps and outputs. -
Run the Notebook: Run each cell step-by-step to train and evaluate the machine learning models on the S&P 500 data.
- Fork the repo.
- Create a new branch (
git checkout -b feature-branch-name
). - Commit your changes (
git commit -m 'Add some feature'
). - Push to the branch (
git push origin feature-branch-name
). - Create a Pull Request.