Stock Signal Prediction and Payoff Analysis

Overview

This Python script performs stock signal prediction and payoff analysis using machine learning classifiers. It processes financial data to generate buy signals for securities, evaluates model performance, and visualizes the payoff over a specified period. The script leverages various libraries for data manipulation, feature engineering, model training, and visualization.

Features

Data Preprocessing: Reads and preprocesses sector and financial data.
Feature Engineering: Calculates the Sharpe ratio as an additional feature.
Scaling: Normalizes features using Min-Max scaling.
Model Training: Utilizes CatBoost and XGBoost classifiers with cross-validation.
Signal Generation: Predicts buy signals based on model probabilities.
Performance Evaluation: Calculates accuracy of predictions and tracks payoff.
Visualization: Generates interactive plots for signals and payoff using Plotly.

Installation

Prerequisites

Python 3.7 or higher

Clone the Repository

git clone https://github.com/yourusername/stock-signal-prediction.git
cd stock-signal-prediction

Install Dependencies

It's recommended to use a virtual environment.

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows
venv\Scripts\activate

# On Unix or MacOS
source venv/bin/activate

# Install required packages
pip install -r requirements.txt

Alternatively, install packages manually:

pip install numpy pandas plotly scikit-learn catboost xgboost

Data Requirements

The script requires the following CSV data files placed in a data/ directory:

Sector Data: data/data0.csv
- Contains sector information for securities.
Financial Data: data/data1.csv
- Should include columns such as date, security, price, return30, and any other relevant features.
Returns Data: data/returns.csv
- Contains historical return data for securities with columns like date, security, and return1.

Ensure that dates are in a consistent format and that all required columns are present.

Parameters

The script includes several configurable parameters:

Training and Testing Periods
- start_train: Start date for training data (e.g., 2017-01-01)
- end_train: End date for training data (e.g., 2023-11-30)
- start_test: Start date for testing data (e.g., 2024-01-01)
- end_test: End date for testing data (e.g., 2024-06-30)
Modeling Parameters
- n_buys: Number of top buy signals to select (default: 10)
- verbose: Toggle for detailed logging (default: True)
- retrain_interval: Frequency of model retraining in steps (default: 300)

These parameters can be adjusted within the script to suit different analysis periods and modeling preferences.

Usage

Prepare Data: Ensure all required CSV files are in the data/ directory.
Configure Parameters: Modify the script parameters as needed for your analysis.
Run the Script:
```
python stock_signal_prediction.py
```
Replace stock_signal_prediction.py with the actual script filename.
View Outputs: Interactive plots will display automatically. Logs will be printed to the console.

Output

The script generates the following outputs:

Accuracy Metrics: Displays the accuracy of predictions for each test date.
Buy Signals Plots:
- Line plot for individual securities (e.g., AAPL).
- Heatmap showing buy signals across securities and dates.
Payoff Plot: Cumulative payoff over the testing period.
Payoff Summary: Prints the total payoff percentage for the selected buy signals.

All plots are interactive and rendered using Plotly for enhanced visualization.

Models

The script employs the following machine learning classifiers:

CatBoostClassifier
- Handles categorical features effectively.
- Configured with iterations=100 and random_seed=23.
XGBoostClassifier (Commented Out)
- Can be enabled by uncommenting the relevant lines.
- Configured with n_estimators=100, learning_rate=0.1, random_state=42, and verbosity=0.

Both models are trained using Stratified K-Fold cross-validation with n_splits=5 to ensure robust performance.

Logging

The script includes a custom logging function to timestamp messages:

def log(message):
    print(f'{datetime.datetime.now()} - {message}')

Key milestones and actions are logged to the console, including script start and end times, data setup, processing steps, and performance metrics.

License

This project is licensed under the MIT License.

Acknowledgments

Libraries:
- NumPy
- Pandas
- Plotly
- Scikit-learn
- CatBoost
- XGBoost
Community: Thanks to the open-source community for providing powerful tools and resources that make projects like this possible.

For any questions or contributions, please open an issue or submit a pull request on the GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
img		img
README.md		README.md
t6.py		t6.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Signal Prediction and Payoff Analysis

Table of Contents

Overview

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Data Requirements

Parameters

Usage

Output

Models

Logging

License

Acknowledgments

About

Releases

Packages

Languages

GazOmatic/prescient-coding-challenge-2024

Folders and files

Latest commit

History

Repository files navigation

Stock Signal Prediction and Payoff Analysis

Table of Contents

Overview

Features

Installation

Prerequisites

Clone the Repository

Install Dependencies

Data Requirements

Parameters

Usage

Output

Models

Logging

License

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages