Sliding Window Inter-extrapolative Regression model for time-series forecasting.
You may ask if that's too long a name, or if it even means much--same.
This algorithm takes multiple regressive looks at the same set of data [concerning different timeframes] to generate the most accurate linear outlook for a given period. In other words, a regressive analog to LSTM (long-short-term-memory) models.
The algorithm will be implemented first in Python and later in C++, though the C++ version may not contain files for data manipulation.
Often in the real world, it benefits data analysts to consider long-term trends as much as short-term trends. In the case of stocks, for example, finding a balance in how much to weigh the long-term growth potential of a given stock versus its more recent performance can be challenging for humans to do quickly and with high accuracy.
Moreover, the balance that might be reached for a certain period of forecasting may differ from the balance that is optimal for another period. In other words, an analyst trying to predict a stock's performance (or at least as best as a linear model could manage) for the next three months, may choose to focus on or weigh more heavily the recent performance of the stock. In contrast, an analyst looking to invest long-term in a stock may choose to weigh long-term trends more heavily.
It's with this context in mind that it becomes clear the need to conduct multiple regressive outlooks on the same piece of data, each successive model looking at a smaller and smaller chunk of data.
For example, consider a company like Apple (AAPL). With data on Apple's stock performance over decades, we could train a linear model on its long-term performance from its introduction into the stock market until today. A second "regressive look" (as I'm calling it) might take a look at a smaller chunk of data, say the performance in the last 10 years. Then another model for the last 5 years, 2 years, year, 6 months, 3 months, 1 month, etc.
Time frames will vary of course, and any users of the algorithm will have to implement their time frames manually.
The end product should be a line starting at the current price of the stock at today's date and projecting outward in a direction that weighs all the slopes of previous lines.
The equation of a line is
Once these weights are calculated for all the timeframes being considered, we can start the unique weighting process. Given the timeframe consideration, we lack data to do this weighting regressively. Instead, the algorithm opts for a more statistical approach that relies on the timeframe that needs to be predicted.
What we can assume is that behavior in the hyper-short-term is a lot less useful than data in the hyper-long-term, while data in the short-term will have a great influence on the behavior in the future. This means we can start distributing the weight with a right skew towards longer timeframes while keeping shorter terms relevant (to an extent). Where to center the distribution, though? In this regard, the algorithm is not entirely based on mathematical truths. For the center, we (somewhat arbitrarily, but with reasonable assumption) pick the timeframe that most closely resembles the timeframe we try to predict.
For example, if we want to predict the price of a stock 3 months from now, the algorithm would weigh the performance calculated by the regressive look at the past 3 months as the highest. The second highest would be the past 6 months. Then past month, past year, and so on.
With pricing information varying wildly across years, it's important to not only normalize stock data for performance but also change numbers to reflect current inflation rates.
This repository will include the necessary tools for normalizing and working with the data. In the future, support may be added for automatically scraping stock (or other) data or potentially incorporating an API to avoid the task of scraping.