-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect return values for factor_cumulative_returns? #374
Comments
The recent upgrade to Alphalens version 0.4.0 broke the pyfolio integration API. @jmccorriston, @dmichalowicz : I believe the old |
@luca-s I think that what if we change the call to Additionally, we should probably add a test for |
This would be the obvious solution, but if
if you are willing to sacrifice the non-daily returns and factor feature then I don't see any problem in rewriting |
@luca-s sorry for the radio silence. I'm circling back to this now.
Can you point me to where this is being handled in Regarding the period argument, my understanding is that pyfolio wants daily returns anyway (see here). Does it make sense to just require 1D returns as we did in other parts of Alphalens in the last update and then those can get passed along to pyfolio later? One thing that I'm not yet understanding is why the returns are being weighted by the factor values. My understanding is that Pyfolio does that later on anyway, so are the returns being weighted by factor values twice? Sorry for all the questions - I should have been looking at these methods in my last change, but I don't think they're covered by the unit tests so I missed them. |
The previous version of
My solution was to compute the cumulative returns with the former implementation of
Do you mean inside On a side note, please forgive me if I give the feeling of being pedantic but there are many subtleties in alphalens that are hard to grasp without knowing the development history. Please consider that I am not against any change, but I like to stress what are the features that are going to be lost for a specific change. |
@luca-s You have spent much more time than I have developing Alphalens. You have no need to apologize. I really appreciate you taking the time to walk me through these details. I recognize that I don't have the same knowledge of the development history, which is why I'm asking questions before starting on the work to change anything. :) I spent some time looking through pyfolio and learned a few things that I didn't know about previously. Most importantly, I didn't realize that pyfolio only required a portfolio returns series as required input. I also noticed that if daily positions are provided, the portfolio returns still need to be provided independently. Given the development history of pyfolio and zipline, that makes sense to me. Knowing this now, I understand why I also spent some time thinking about why there are so many questions for which you and I are proposing different solutions. I obviously can't speak for you, so I wanted to give you a bit more detail on how I'm thinking about Alphalens. In my mind, Alphalens is a factor analysis tool, not a portfolio simulation engine like Zipline. Due to the nature/frequency of the input data to Alphalens, we need to bake some assumptions into the analysis tools in order to produce useful results. For instance, I need to provide factor data and returns data when using Alphalens. The factor and returns data are both dated, but it's not entirely clear (or enforced) what those dates represent about the data. When I use Alphalens, I provide factor data from Pipeline and backshifted returns data from Pipeline. The dates on my factor data represent the pipeline simulation date used to get the factor values. The dates on my returns data actually line up such that the 1D return for simulation date N actually represents the relative change in price from the close of day N to the close of day N+1. While these are the interpretations that I have chosen for my data, I think there are other reasonable choices of input. In the old definition of
I stand by both of these concerns still. I completely understand why it's interesting to look at different holding periods in a factor analysis. However, in my mental model of the boundaries between Alphalens/Pyfolio/Zipline, that's a job for Zipline. This is where I think we disagree and I'm not entirely sure how we should come to consensus. One question that I've been wondering about is whether a similar workflow (i.e. simulating a non-1D holding period) can be achieved by transforming the Regarding the forward returns dates not matching the sell date, I wonder if it makes sense to add trading_calendars as a dependency of Alphalens. In particular, we could leverage the next_close method which would allow us to shift returns data forward by one trading day. I actually think that we could add a calendar argument to |
Hello @jmccorriston!
At the risk of inflaming disagreement (😇), I'd like to push back on this view. I think you appreciate that the holding period is an important consideration, but maybe I can convince you that it's an important research consideration, and shouldn't be completely delegated to Zipline. I can think of at least two reasons:
I'm sure there are other reasons too (I'm not a seasoned alpha researcher), but hopefully this is convincing enough. I think the holding period is a valid object of alpha research, and while I could run a backtest to test my hypotheses, Alphalens ought to be able to answer my research questions. So, with respect to your two concerns:
I completely agree with both of them (and have had these concerns for at least a year), but I think my take would be:
I know I'm not addressing everything in this conversation (i.e. what functions to change, what dependencies to add), but I'm pretty sure consensus is easier to reach there. |
Hi @eigenfoo, I'm so glad you're still active in this repo! Thanks for weighing in.
I totally agree. I think this is where I may have a different opinion on the lines between pyfolio/alphalens/zipline. In my mind, all 3 tools are research tools. Each one offers different types of insights but ultimately, I'd consider them all to be tools that support a research workflow. Even zipline, which simulates portfolio holdings, is a research tool in my mind. Semantics aside, I want to drill into the example you gave:
In this situation, I imagine you'd want to provide '5D' returns to alphalens (please correct me if I'm wrong!). However, it's not clear to me what your factor data would look like. Would you expect to see values changing on a daily basis for each asset in your universe, or would you expect to them to reflect the holdings of a portfolio that follows your description above (hold for N days)? As soon as I start trying to answer this question, I find myself now thinking about holdings instead of factor data, which makes me think that zipline should be answering this question. How do you think about the difference between factor data and holdings (portfolio) data? Regarding some of your later comments:
Good idea.
This one is a little harder for me to agree with. I was seeing tearsheets take > 20 minutes, which kind of defeats the spirit of "throw out ideas quickly". I don't think it's worth trying to settle this one until we align ourselves on some of the more functional questions above though! Again - thanks for weighing in. I hope I'm not coming off as closed minded. I'm reading yours and Luca's questions and concerns and I'm trying to make sure I understand! |
I believe this is one topic where we do not understand each other. For me there is no reason of making any assumption on when the factor is computed (or the period to be in days), the daily factor is just a special case handled in the same way as other factors frequency or random events factors. If I am correct you believe the daily factor (with daily periods) is the only way Alphalens can work and we need to enforce/make assumptions on that inside the code. When I started working on Alphalens I looked at every single plot and computation and I realised there was no need for such assumptions as the results were totally meaningful even without daily factor assumption. I thought it was a pity to include that assumption in the code since the tool could have handled any type of data frequency. So release after release I removed the assumptions in the code and until release v0.3.6 Alphales could work with any factor frequency (or not frequency at all, like a random event in the case of event study).
Maybe you are not aware that Alphalens can work with datetime index too? If you add the time information to the index there is no need for assumptions. The fact that Alphalens works with dates is just a special case to make it easier to use. Please look again at the output of Alphalens and see that there is no need to know anything on when the factor is computed: the factor index contains a list of date/time, no specific frequency, no requirements. The only restriction is on the pricing data, the price dataframe defines the periods at which the factor is traded and must follow some rules: Please Imagine a factor computed one day every two hours, another day every 3 hours, then maybe some other days it is computed only once. That factor doesn't have a specific frequency, but as long as we pass price data exactly 30 minutes after each factor datetime we can compute a factor at 30minute period and the results will be consistent. Alphalens would make the same computations as per the daily factor case (the Quantopian Pipeline case). The only problematic topic is the cumulative returns computation.
Again we have to converge on the way we look at Alphalens. Let's say you have a factor computed every 15 minutes with a period of 3 minutes. The computation of comulative returns for this factor is the same as the one for a daily factor: the compound interest formula. Again, there is no need to make assumptions on the factor frequency or period length. There is only one problem: when the period length span beyond the next factor value (e.g. 2 days for a daily factor, or 16 minutes period for a 15 minute factor, etc. ) we cannot use the compound interest formula and we have to blend multiple portfolios. This is why the former implementation was complex and slow. Bottom line: the speed is an issue, drop the multiple portfolios blending! Which is what was done in the previous commit and I agree with that. Question: why making assumption on the factor frequency though? Why assuming daily factor, with daily periods? The code doesn't have those assumptions as long as we don't re-introduce them. The compound interest formula works for any factor frequency/period as long as they do not overlap. |
I believe we would move the complexity in the building of portfolio weights. Maybe we can discuss it further when you have more details in this regard.
In Alphalens there is no assumption on the trading calendar: it can be any market, any stock exchange, any country. The information is derived from the input data with no additional effort for the user. With trading_calendars as a dependency of Alphalens it would only work for NYSE. Maybe that assumption would work better inside pyfolio |
I only think this in the case of cumulative returns. In order to compute cumulative returns on other holding periods, I think we need to start thinking about the data as holdings/portfolio data. And as soon as we start talking about holdings/portfolio data, I think we should be talking about Pyfolio instead of Alphalens.
You're right. I think my previous statement about assumptions was too strong. One of the nice properties of Alphalens is that I don't need to do very much work to my input data. Alphalens "just works" on "just a dataframe" of data, which is really awesome.
In this example, was Alphalens assuming the same 3 minute return for each of the next 15 minutes (5*3min periods)?
You're right, I don't think we need to care about the factor frequency for the majority of Alphalens computations. As you said, cumulative returns is one place where I think we do need to care about the data frequency. Another place where I think we need to care is the
Fair enough. What about in the event-based cases, though, where a user might have sparse data whose datetimeindex doesn't reflect the trading calendar? One thing I'd like to point out is that |
@jmccorriston I'll reply to the last few things but I believe we are pretty much on the same page now :) Regarding moving
Yes. E.g. first trade 10:00 holding for 3 min, than second trade at 10:15 holding for 3 min, 10:30 holding 3 min, 10:45 holding 3 min...and so on (when I say first/second trade I mean first/second factor index that become trades when thinking in terms of cumulative return). I see only 2 issues. One is that the cumulative returns would have the wrong index (in the example 10:00, 10:15, 10:30, 10:45 instead of 10:03, 10:18, 10:33, 10:48). The second issue is that it is not straightforward to determine when a specific forward return (holding period) can overlap and we might simply warn users that the results might not make sense with certain inputs
oh! I didn't know. Still I don't think that's a good idea to add a dependency that Alphalens doesn't really need. The current trading calendar extrapolation is a workarourd that I hope it will disappear once the API are reworked to carry around trading dates (currently the information is lost lost when pricing information is transformed in forward returns and inserted in the factor dataframe). The trading calendar carries information on when the trades are actually performed, is not an actual "trading calendar". The difference becomes obvious in the event study case, where the trading calendar has valid dates only for those days for which there are events (the other days are market as holidays even though they might be valid trading days). Again, this is a workaround to carry around price information that are lost. |
@jmccorriston, @dmichalowicz, @luca-s , is there any quick fix or workaround on this issue? |
Alphalens v0.4.0 broke several APIs. I would suggest to stick to version v0.3.6, the latest well tested release. |
Suggesting |
Problem Description
I decided to clone this notebook from this thread. Running it gave the following error:
See this screenshot:
This was using the alphalens in the quantopian notebooks which appear to be 0.4.0 with Python 3.5. Examining the source code seems to suggest this bad return value is also in the head version on github.
Thanks for the good product!
The text was updated successfully, but these errors were encountered: