Data is price values of steel in INR (Indian rupees) on each given day from 2015-01-01 to 2023-12-26 gotten from Source of data. And can be downloaded through the source or from this repository from a file "Steel Futures Historical Data".
Data has 7 variables:
- Date
- Price
- Open (Opening price)
- High (Highest price during that day)
- Low (Lowest price during that day)
- Vol. (Volume sold)
- Change % (Price change)
For this project the focus will be on the Price variable given Date.
Example of data:
Date | Price | Open | High | Low | Vol. | Change % |
---|---|---|---|---|---|---|
2023-01-03 | 39000 | 41000 | 41000 | 38000 | 0.43K | -0.23% |
2023-01-02 | 41000 | 40000 | 42500 | 39700 | 0.09K | +0.25% |
2023-01-01 | 40000 | 39000 | 41000 | 38500 | 0.23K | +0.13% |
Data is pretty good from the start but some steps which needs to be done to use the data for forecasting.
- Remove unnecessary variables
- Set Date as index
- Change values of Price, Open, ... from String to Float
- Handle missing values
STEP 1:
Variable Vol. was the only variable that had nan values in it. Also it was not necessary for the project thus it was removed to simplify data
STEP 2:
Date was set as index to simplify code. This was done by formatting string "26/12/2023" -> "2023-12-26" using pandas.to_datetime() function.
STEP 3:
Numerical values on variables [Price, Open, High, Low] were in a string of "41,000.00" which need to be formatted to 41000.00. This was done by removing the unnecessary "," and then transforming the value from String to Int.
STEP 4.:
Data has missing values between 2017-2021 like seen in Graph of Price to Date so data was changed to start from the first value on 2021. New graph can be seen below.
I began by fitting different polynomials to the whole data using the index to see what polynomial would fit the best and what parameter values it would give. Then I moved into using ARMA to find the best parameters for this current situtation.
Five lines were fitted to the data by using polynomials and linear regression. Polynomials follow a equation of ax+b, ax^2+bx+c and so on.
From the graph it can be seen that polynomials with degrees of 3 and 4 are the best. Since they are pretty close to each other it's best to choose 3 degree polynomial since it is simpler.\ This polynomial follow equation of ax^3+bx^2+c+d, where
a = 0.000101
b = -0.212
c = 96.258
d = 38903.300
Data was split into training and test sets like shown in the picture below. This way we can train the model and test it to find the best values for ARMA by calculating the RSME.
Data split into training and test sets
First ARMA model was done with parameters of (1,0,1) that granted a RMSE of 1532.58, how this fits in to the graph is shown below.
Default parameters (1,0,1) as prediction
Then parameters were tested from [0,5] for each parameter and the lowest RMSE of 795.06 was found for parameters of (1,3,3) and how this fits into the data is shown below.
Optimized parameters (1,3,3) as prediction
Project was a good introduction to ARMA as a method of forecasting but in the future projects I will be taking a closer look into other methods such as SARIMA which should take into account the seasonal changes in data.