According our submission records, the performance of the model is as follows:
The environment of the code is PYTHON 3.7 and the following packages are required:
pandas
numpy
matplotlib
sklearn
talib
xgboost
The codes are composed of 4 python scripts and the main.py can directly run out of the test results, which save at the file "result_xgb_rf.csv".
In our codes, we train three prediction models for three time-horizons: 1-day, 20-day, and 60-day.
The 1-day prediction models are XGBOOST model but with different hyper-parameters for each metals
The 20-day prediction models are XGBOOST or Random Forest for different metal.
The 60-day prediction models are Random Forest models but with different features and hyper-parameters.
The major novelty of our method is making use of the predicted label, for each time-horizon, we
- First, extract useful but not leaky features from the competition dataset;
- Second, train a Random Forest using the training data notate as $ \sum^{window_start}_{window_end} <features,label>$;
- Third, predict the label using the validation data on day T, and predict the next label using the next validation data on day T+1 until day T+$V$ ;
- Fourth, regard the predicted labels as real labels and retrain a Random Forest using the new training data $ \sum^{window_start+V}_{window_end+V} <features,label>$;
- Fifth, repeat the Third and Fourth step until the validation data on the last day.
In our computer(Inter(R) Core(TM) i7-7600U [email protected]), it costs about 2 minutes.
Please feel free to contact us when you have some troubles reproducing the results.
Contact Email: [email protected]