noncritical bugfix for tssr before new tournament starts

BorisVSchmid · Dec 28, 2023 · cff7c9c · cff7c9c
1 parent 8f888e0
commit cff7c9c
Show file tree

Hide file tree

Showing 2 changed files with 34 additions and 27 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,15 @@
+### 2023-12-28: V2.0.4 [BUGFIX].
+
+I was taking the Acf1 of the mean of the list of models you have, and it should have been the Acf1 of the time series of the individual models. Fixed now. Also replaced the Acf1 from R for a custom one, which ignores any pairs of sequential numbers where either one is NA. Standard Acf in R does weird things.
+
+Also removed the benchmark model-based thresholds for what is a good model, to a simpler "consider any model with a 0.5xcorr+2xMMC score > 0.005". Selecting models on a benchmark-based threshold was discarding quite a lot of models a priori, and I would rather have the portfolio optimizer decide on that.
+
+### 2023-12-21: V2.0.3 [WEIGHTS UPDATE].
+
+Numer.ai changed its opinion twice and moved to a 0.5xCORR + 3xMMC weights and back to a 0.5xCORR+2xMMC weights. They also fixed a scoring bug, so I had to re-run the optimizer.
+
+Noticed a bug in the acf1 calculations. Fixing that in the next version as it is not critical.
+
 ### 2023-11-27: V2.0.2 [WEIGHTS UPDATE]. 
 
 Numer.ai published the multipliers they are going to use - it is 0.5xCORR + 2xMMC. Vlad was ready for the change.

diff --git a/README.md b/README.md
@@ -7,62 +7,57 @@ For a python alternative, see [numerai-portfolio-opt](https://github.com/eses-wk
 
 Vlad should only consider portfolios with an average positive return. This is because I noticed that
 the tangency portfolio selection can be a bit wonky otherwise, and sometimes suggests portfolios
-with a negative-return based on a single model. By default, Vlad uses 80% of the mean benchmark-models 
-scores, which in general is positive.
+with a negative-return based on a single model. 
+
+By default, Vlad filters out models with a 0.5xCORR+2xMMC < 0.005, and models with < 60 datapoints.
 
 Particularly when you are just starting to use Vlad, use the model-performances-plot to see if
-the models selected by Vlad make sense to you.
+the models selected by Vlad make sense to you. Also check how drastically the recommendations change 
+per Vlad run. In my experience, certain filtering settings (like the < 0.005) help in stabilizing the 
+recommendations made by Vlad.
 
 ## Overview
 
 Vlad helps you decide your stake weights in Numer.ai. Use at your own risk. This script has been used by me for a while now, but no guarantees are provided.
 
 ## How to use: 
 
-1. Change the models in the Optimize-Me-23nov2023.xlsx file to your models or models you bought on NumerBay.ai.
+1. Change the models in the Optimize-Me-21dec2023.xlsx file to your models or models you bought on NumerBay.ai.
 2. Run the R script optimize_stake.R ('Rscript.exe optimize_stake.R', or run the script in Rstudio.)
-3. Inspect the images and the two tables that the script spits out, and consider if these make sense to you.
+3. Set the amount of NMR you want to stake on line 35 in the script.
+4. Inspect the image and the two tables that the script spits out, and consider if these stake weights make sense to you.
 
 ## Under the hood:
 
 Vlad downloads the end-of-round performances of your models and combines them to create a better portfolio (high return, low volatility). It will
 
 * Generate a mean portfolio from 80 portfolios build from resampled model scores, to reduce the impact of the tournament's famously noisy nature. Half of these portfolios optimize for tangency and half for minvariance.
-* Zero out stakes with less than 10% contribution to avoid spreading tiny stakes over a long list of models. You can set this threshold to other values. Look for the threshold variable in the script.
+* Zero out stakes with less than 10% contribution (the threshold set on line 95) to avoid spreading tiny stakes over a long list of models. You can set this threshold to other values.
 
 ## How does the output look?
 
-First, you will get an image that shows to you how your models perform in terms of mean _\_mmc_ and std _\_mmc_, as well as the associated max drawdown (bigger dots have less drawdowns) and autocorrelation- and samplesize-corrected sharpe (blacker dots have higher sharpes)
+First, you will get an image that shows to you how your models perform in terms of mean _\_0.5xCORR+2xMMC_ and std _\_0.5xCORR+2xMMC_, as well as the associated max drawdown (redder dots have larger drawdowns) and autocorrelation- and samplesize-corrected sharpe (bigger dots have higher tSSR).
 
 [MMC mean and std of the models](model-performances.png "Model performances on correlation")
 
 Second, your models will get grouped based on how complete their time series is. Did you add a bunch of models two months ago? then two months ago is another 'starting point' for considering which models to stake on. Splitting the time series of all your models by starting points ensures that Vlad isn't disregarding the information of your older models, just because your new models don't go back that far in time.
 
-You will get an intermediate table showing per starting point what your optimal portfolio is. In the case of the Numer.ai benchmark models, the V3_EXAMPLE_PREDS model has no scores for round 339, so is excluded from the portfolio optimization starting from round 339. But it has values from round 340 onwards, so round 340 is a second starting point, and there the V3_EXAMPLE_PREDS model is considered when calculating which models to stake on.
+You will get an intermediate table showing per starting point what your optimal portfolio is. 
 
 ```
-   |name             |weight |stake |mean   |Cov    |CVaR   |VaR    |samplesize |starting_round |
-   |:----------------|:------|:-----|:------|:------|:------|:------|:----------|:--------------|
-   |INTEGRATION_TEST |0.412  |59    |0.0128 |0.0219 |0.0216 |0.0184 |273        |339            |
-   |V4_LGBM_VICTOR20 |0.458  |65    |       |       |       |       |           |               |
-   |V4_LGBM_VICTOR60 |0.13   |19    |       |       |       |       |           |               |
-   |                 |       |      |       |       |       |       |           |               |
-   |INTEGRATION_TEST |0.416  |59    |0.0128 |0.0218 |0.0215 |0.0182 |273        |340            |
-   |V4_LGBM_VICTOR20 |0.46   |66    |       |       |       |       |           |               |
-   |V4_LGBM_VICTOR60 |0.124  |18    |       |       |       |       |           |               |
-   |                 |       |      |       |       |       |       |           |               |
+   |name               |weight |stake |mean   |Cov    |CVaR  |VaR    |samplesize |starting_round |
+   |:------------------|:------|:-----|:------|:------|:-----|:------|:----------|:--------------|
+   |V42_LGBM_CLAUDIA20 |0.272  |59    |0.0195 |0.0264 |0.026 |0.0221 |291        |339            |
+   |V42_LGBM_CT_BLEND  |0.116  |25    |       |       |      |       |           |               |
+   |V42_LGBM_ROWAN20   |0.09   |19    |       |       |      |       |           |               |
+   |V42_LGBM_TEAGER20  |0.315  |68    |       |       |      |       |           |               |
+   |V42_LGBM_TEAGER60  |0.095  |20    |       |       |      |       |           |               |
+   |V4_LGBM_VICTOR20   |0.112  |24    |       |       |      |       |           |               |
+   |                   |       |      |       |       |      |       |           |               |
 ```
 
-Finally, the last table merges the suggested stake distributions of your different starting points into a single stake distribution. 
-For Numer.ai's models, the two timepoints (starting_round 339 and 340) don't matter much, so merging them changes little.
+Finally, the last table merges the suggested stake distributions of your different starting points into a single stake distribution. In the case of the Numer.ai benchmark models, all 'good enough' models start at round 339, so there is only one starting point to consider, and the table is identical to the one above.
 
-```
-   |name             | weight| stake|mean   |Cov    |CVaR   |VaR    |samplesize |
-   |:----------------|------:|-----:|:------|:------|:------|:------|:----------|
-   |INTEGRATION_TEST |  0.414|    59|0.0128 |0.0218 |0.0216 |0.0183 |273        |
-   |V4_LGBM_VICTOR20 |  0.459|    66|       |       |       |       |           |
-   |V4_LGBM_VICTOR60 |  0.127|    18|       |       |       |       |           |
-```
 The columns _mean_, _covariance_ and _samplesize_ should be fairly self-evident, but for _CVaR_ and _VaR_ I'll gladly let ChatGPT explain those below.
 
 _Value-at-Risk (VaR)_: VaR is a statistical measure that estimates the maximum potential loss of an investment portfolio within a specified time horizon and at a given confidence level. For example, a 1-day 95% VaR of $1 million would imply that there is a 95% probability that the portfolio will not lose more than $1 million in value over a single day. VaR is widely used in risk management as it provides a simple way to quantify and communicate potential losses. However, it has its limitations, such as not providing information about the severity of losses beyond the given confidence level.