This project involves the manual implementation of the K-Means clustering algorithm applied to stocks. The primary objective of this project was to conduct personal research and gain a deeper understanding of clustering techniques and their applications in financial data analysis.
The K-Means algorithm is a popular unsupervised learning method used for clustering data into distinct groups based on their features. In this project, I manually implemented the K-Means algorithm from scratch, without relying on pre-built libraries, to better understand its inner workings and nuances.
-
Interactive Stock Selection:
- Users can select multiple stocks from the NIFTY 50 index and add custom stocks for analysis
- Flexible timeframe selection to analyze different time periods
- Ability to specify the number of clusters (k) for the analysis
-
Real-time Clustering:
- Dynamic clustering of selected stocks based on user parameters
- Interactive visualization showing cluster assignments and centroids
-
Data Analysis Tools:
- Detailed view of financial metrics for each stock
- Cluster-wise analysis showing common characteristics
- Visual tracking of clustering iterations and convergence
-
Data Collection:
- Historical stock price data was collected for NIFTY 50 stocks using the
yfinance
library. - The data includes daily closing prices, which were used to calculate various financial metrics.
- Historical stock price data was collected for NIFTY 50 stocks using the
-
Data Preprocessing:
- The collected data was cleaned and preprocessed to handle missing values and normalize the features.
- From the data, we derived the following features: Mean Returns, Volatility, and Sharpe Ratio.
-
Manual K-Means Implementation:
- Initialize the centroids randomly from the data points.
- The algorithm iteratively assigned each data point to the nearest centroid and updated the centroids based on the mean of the assigned points.
- The process was repeated until the centroids stabilized or a maximum number of iterations was reached.
- Clone the repository.
- Install the required packages using
pip install -r requirements.txt
. - Run the Jupyter notebook
KMeans.ipynb
to see the implementation and results. - Launch the Streamlit application using the command
streamlit run KMeans.py
to interact with the clustering results.