-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate replacing Ziggurat sampling with ETF #257
Comments
I should CC @sbarral and link the C++ implementation |
Also see the following 2018 paper arxiv by Morteza Jalalvand, Mohammad A. Charsooghi:
Authors' code (GPLv3): https://github.com/DiscreteLogarithm/Zest |
I'm looking into this. I don't speak C++ but I think my prompting skills are good enough to get the reference implementation interpreted into Rust for us. @dhardy, I believe the Ziggurat method is currently only used to sample normal and exponential distributions, no? I see that the ETF algorithm can sample arbitrarily shaped distributions, does that mean it could be used in any/all of rand's distributions? I'm trying to get a better understanding of the scope of applicability of this solution vs the existing one. |
Oh that's funny, I had already forgotten about my Rust implementation! I see that my last work on that dates back to 4 years ago. I see it's missing a README and license, but otherwise looks fairly complete (actually more than the C++ version), with a decent battery of tests. I think my ambition was to implement more distributions before publishing it, but there was always something more important to do... Anyway, rather than let it rot for ever, I have just committed everything that wasn't, and made the repo public: https://github.com/sbarral/etf-rs I am very busy this week but will try to put some finishing touches in the coming month. In the meantime, the license is MIT/APACHE2, so feel free to peruse. As for its strengths, I think ETF is great if you have untypical/custom distributions for which no literature exist. For well-known distributions, Compared to Zest, I expect that ETF will always be measurably faster. Due to the way it works, Zest is likely to have a very high branch misprediction rate. ETF is in this respect similar to regular Ziggurat, with a very good prediction rate. |
I remember now that the problem is not so much that ETF might be slower than specialized sampling method; in fact it is probably often significantly faster. The issue is that for most distributions, the tables cannot be pre-computed so constructing a distribution is a costly operation. Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue). |
I just updated the Rand-related dependencies and ran the benchmark:
Cauchy seems to be the obvious candidate for an ETF implementation since it is about 5-6 times faster than today's Chi-squared is much faster than |
Wow, that's fantastic!
I guess it depends on the use-case, right? If you only sample a distribution once before destroying it then ETF might not be worth it, but the construction cost gets amortized in the long-term. I think it would be very cool if we could let users choose which sampling method to use via a generic parameter. |
Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive.
This could be a great solution. I guess the |
What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere. What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large. It sounds like sample performance, set-up performance, object size and possibly even code size could all be reasons for picking one implementation over another. Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).
It does sound like there is good reason for publishing multiple variants of these distributions. We previously discussed similar questions (#494, #601), but so far So...
|
I agree that we should switch to ETF for the distributions where the table
can be precomputed. For the others, it's a bit more tricky: So far, we
don't offer the same distribution with different performance tradeoffs.
There is not just the construction time vs. sample time tradeoff; a
precision vs. time tradeoff came up as well in the past (#531, #1346).
Am Di., 9. Apr. 2024 um 19:31 Uhr schrieb Serge Barral <
***@***.***>:
… the construction cost gets amortized in the long-term
Yes, this is absolutely true. The current algorithm for table construction
is decently optimized, so the break-even is probably low enough to make it
worthwhile in most applications (I'd hazard that break-even is of the order
of ~10'000 samples). I was mainly concerned that it may not be a good
default as users may not expect a constructor to be expensive.
I think it would be very cool if we could let users choose which sampling
method to use via a generic parameter
This could be a great solution. I guess the new constructor would have to
be changed to some other name to get the ETF version then? Another solution
that could be considered is a submodule containing the alternative
implementations.
—
Reply to this email directly, view it on GitHub
<#257 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAIFNH6AUYVWBYN5TRYRATY4QQYXAVCNFSM4EQH7Y22U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TEMBUGU3TIOBQG43Q>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
For Cauchy and Normal, the tables of the above benchmark stored 256x3=762 elements of 32 bits (for The size vs speed trade-off can be adjusted of course. I just ran a benchmark with tables of size 128x3=384 for Cauchy and the numbers were ~4.9ns, so about 20-25% slower than the previous ~4ns. The size of distribution objects is basically the tables, so if they are pre-computed they are very small, and otherwise very large...
ETF is similar to ziggurat and will need only 1 RNG per sample like 95-99% of the time, and very rarely more than 2 (but in theory, and also like ziggurat, the maximum number of samples is unbounded).
One thing to consider indeed is that when the table is computed at construction, the constructor is fallible because the solver may fail to converge for some extreme choices of parameters. |
The linked issues are for uniform sampling, which is a bit specific. But for other distributions, I would argue that inverse transform sampling is the gold standard in the sense that it "makes the most" of a random number and, crucially, avoids clustering samples on the fine scale. From this perspective, ETF is almost as good as it gets. It is, in fact, an inverse transform sampling method, excepts that it samples an upper bound of the PDF made of rectangles rather than the actual PDF. Its only compromise is that it uses "only" 95-99% of the entropy of the random number for sampling, the remaing part being used for the rejection step. Here is a plot comparing the quality of ETF and ziggurat with the same source of entropy. |
Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF. |
It looks like Gumbel could be another candidate beside Cauchy. I can try to implement it in my repo and will post the results here (may take a few days as I have other urgent things). |
I have added Gumbel to the repo. Here are the results of
Bear in mind that here as well the tables were generated dynamically for the specific distribution parameters and their size was 256x3. Using a common table (location=0 and scale=1) necessitates an extra addition and multiplication per sample, but even with a common table and a small size (128x3), the numbers are still very decent: 5.4829 ns for |
See: The Exclusive Top Floor algorithm
The text was updated successfully, but these errors were encountered: