Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate replacing Ziggurat sampling with ETF #257

Open
Tracked by #1432
dhardy opened this issue Feb 12, 2018 · 15 comments
Open
Tracked by #1432

Investigate replacing Ziggurat sampling with ETF #257

dhardy opened this issue Feb 12, 2018 · 15 comments
Labels
A-new Propose usage of a new algorithm B-value Breakage: changes output values P-low Priority: Low

Comments

@dhardy
Copy link
Member

dhardy commented Feb 12, 2018

See: The Exclusive Top Floor algorithm

@dhardy
Copy link
Member Author

dhardy commented Feb 12, 2018

I should CC @sbarral and link the C++ implementation

@dhardy dhardy added X-enhancement P-low Priority: Low B-value Breakage: changes output values labels Mar 14, 2018
@dhardy
Copy link
Member Author

dhardy commented Apr 27, 2019

Also see the following 2018 paper arxiv by Morteza Jalalvand, Mohammad A. Charsooghi:

Generalized Ziggurat Algorithm for Unimodal and Unbounded Probability Density Functions with Zest

Authors' code (GPLv3): https://github.com/DiscreteLogarithm/Zest

@MichaelOwenDyer
Copy link
Member

I'm looking into this. I don't speak C++ but I think my prompting skills are good enough to get the reference implementation interpreted into Rust for us. @dhardy, I believe the Ziggurat method is currently only used to sample normal and exponential distributions, no? I see that the ETF algorithm can sample arbitrarily shaped distributions, does that mean it could be used in any/all of rand's distributions? I'm trying to get a better understanding of the scope of applicability of this solution vs the existing one.

@sbarral
Copy link

sbarral commented Apr 9, 2024

Oh that's funny, I had already forgotten about my Rust implementation!

I see that my last work on that dates back to 4 years ago. I see it's missing a README and license, but otherwise looks fairly complete (actually more than the C++ version), with a decent battery of tests. I think my ambition was to implement more distributions before publishing it, but there was always something more important to do... Anyway, rather than let it rot for ever, I have just committed everything that wasn't, and made the repo public:

https://github.com/sbarral/etf-rs

I am very busy this week but will try to put some finishing touches in the coming month. In the meantime, the license is MIT/APACHE2, so feel free to peruse.

As for its strengths, I think ETF is great if you have untypical/custom distributions for which no literature exist. For well-known distributions, chances are that some specialized sampling methods will be a bit faster (though ETF will probably be competitive) [probably not true, see next comment]. I seem to recall, however, that Cauchy and Chi-Squared were at that time faster than in Rand (TBC).

Compared to Zest, I expect that ETF will always be measurably faster. Due to the way it works, Zest is likely to have a very high branch misprediction rate. ETF is in this respect similar to regular Ziggurat, with a very good prediction rate.

@sbarral
Copy link

sbarral commented Apr 9, 2024

I remember now that the problem is not so much that ETF might be slower than specialized sampling method; in fact it is probably often significantly faster. The issue is that for most distributions, the tables cannot be pre-computed so constructing a distribution is a costly operation.
Cauchy is good in this respect because, like the normal or exponential distributions, it can be generated for any parameter using a single table.

Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).

@sbarral
Copy link

sbarral commented Apr 9, 2024

I just updated the Rand-related dependencies and ran the benchmark:

Test type Time (rand_distr) Time (etf-rs)
central_normal f32 5.0159 ns 3.8457 ns
central_normal f64 3.9653 ns 3.9148 ns
normal f64 6.2197 ns 4.2361 ns
cauchy f32 26.118 ns 4.0772 ns
cauchy f64 21.983 ns 3.8984 ns
chi_squared (k=0.5) f32 29.742 ns 7.8440 ns
chi_squared (k=0.5) f64 41.105 ns 10.746 ns
chi_squared (k=2) f32 7.7833 ns 4.2518 ns
chi_squared (k=2) f64 6.3638 ns 3.7355 ns
chi_squared (k=2) f32 7.7833 ns 4.2518 ns
chi_squared (k=2) f64 6.3638 ns 3.7355 ns
chi_squared (k=5) f32 16.573 ns 4.2282 ns
chi_squared (k=5) f64 15.904 ns 3.9329 ns
chi_squared (k=1000) f32 16.355 ns 4.0622 ns
chi_squared (k=1000) f64 15.495 ns 3.9862 ns

Cauchy seems to be the obvious candidate for an ETF implementation since it is about 5-6 times faster than today's rand_distr::Cauchy and it can be generated from a single pre-computed table, just like rand_distr::Normal is today (it will be in such case slightly slower than in the above benchmark, but the relative difference should be the same as rand_distr::Normal vs rand_distr::StandardNormal).

Chi-squared is much faster than rand_distr::ChiSquared, but AFAIR it cannot be generated from a single pre-computed table.

@MichaelOwenDyer
Copy link
Member

Wow, that's fantastic!

Bottom line: ETF is I think a very valuable algorithm, but the construction cost may not make it a good fit for Rand (and Zest will have the exact same issue).

I guess it depends on the use-case, right? If you only sample a distribution once before destroying it then ETF might not be worth it, but the construction cost gets amortized in the long-term. I think it would be very cool if we could let users choose which sampling method to use via a generic parameter.

@sbarral
Copy link

sbarral commented Apr 9, 2024

the construction cost gets amortized in the long-term

Yes, this is absolutely true. The current algorithm for table construction is decently optimized, so the break-even is probably low enough to make it worthwhile in most applications (I'd hazard that break-even is of the order of ~10'000 samples). I was mainly concerned that it may not be a good default as users may not expect a constructor to be expensive.

I think it would be very cool if we could let users choose which sampling method to use via a generic parameter

This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.

@dhardy
Copy link
Member Author

dhardy commented Apr 10, 2024

What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.

What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.

It sounds like sample performance, set-up performance, object size and possibly even code size could all be reasons for picking one implementation over another. Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).

This could be a great solution. I guess the new constructor would have to be changed to some other name to get the ETF version then? Another solution that could be considered is a submodule containing the alternative implementations.

It does sound like there is good reason for publishing multiple variants of these distributions. We previously discussed similar questions (#494, #601), but so far WeightedIndex is the only distribution with multiple implementations and we just went with different names.

So...

  1. In some cases (e.g. the already mentioned Cauchy), it might make sense only to replace the existing implementation? This is a value-breaking change, but we can do this occasionally.
  2. We could aim to publish under a rand_distr::etf sub-module. This may be appropriate if several distributions have similar performance/cost tradeoffs (sample time, set-up time, size).
  3. We could publish under rand_etf or etf instead. This may be preferable if code size is significant. It avoids needing to synchronise with rand_distr releases. (It might also prompt the question of whether the rest of rand_distr should be re-organised, but we'll leave this for now.)
  4. We could publish under something like rand_distr::fast, maybe moving WeightedAliasIndex there too, but if I've learnt anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I have a preference for reflecting the algorithm in the name.

@vks
Copy link
Collaborator

vks commented Apr 10, 2024 via email

@sbarral
Copy link

sbarral commented Apr 10, 2024

What's the compiled code size like for these? We have some pre-compiled tables for Ziggurat; adding more isn't a blocker but might be a reason to publish elsewhere.

What's the size of distribution objects? Presumably, anything not using a pre-compiled table is going to be large.

For Cauchy and Normal, the tables of the above benchmark stored 256x3=762 elements of 32 bits (for f32) or 64 bits (for f64) each.
For chi-squared, the tables had a size of 512x3.
From memory, Rand uses tables of size 256x2=512 for the normal distribution.

The size vs speed trade-off can be adjusted of course. I just ran a benchmark with tables of size 128x3=384 for Cauchy and the numbers were ~4.9ns, so about 20-25% slower than the previous ~4ns.

The size of distribution objects is basically the tables, so if they are pre-computed they are very small, and otherwise very large...

Possibly also variance in the number of RNG samples required (if more predictable for one algorithm over another).

ETF is similar to ziggurat and will need only 1 RNG per sample like 95-99% of the time, and very rarely more than 2 (but in theory, and also like ziggurat, the maximum number of samples is unbounded).

We could publish under something like rand_distr::fast, maybe moving WeightedAliasIndex there too, but if I've learnt
anything from the latter, its that the choice of algorithm has consequences beyond just sample/set-up performance
trade-offs: it can result in API changes (bounds on generics, additional methods) and does affect reproducibility. Thus, I
have a preference for reflecting the algorithm in the name.

One thing to consider indeed is that when the table is computed at construction, the constructor is fallible because the solver may fail to converge for some extreme choices of parameters.

@sbarral
Copy link

sbarral commented Apr 11, 2024

There is not just the construction time vs. sample time tradeoff; a precision vs. time tradeoff came up as well in the past (#531, #1346).

The linked issues are for uniform sampling, which is a bit specific. But for other distributions, I would argue that inverse transform sampling is the gold standard in the sense that it "makes the most" of a random number and, crucially, avoids clustering samples on the fine scale.

From this perspective, ETF is almost as good as it gets. It is, in fact, an inverse transform sampling method, excepts that it samples an upper bound of the PDF made of rectangles rather than the actual PDF. Its only compromise is that it uses "only" 95-99% of the entropy of the random number for sampling, the remaing part being used for the rejection step.

Here is a plot comparing the quality of ETF and ziggurat with the same source of entropy.

@dhardy
Copy link
Member Author

dhardy commented Apr 11, 2024

Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.

@sbarral
Copy link

sbarral commented Apr 11, 2024

Then I agree with @vks: where pre-computed tables are possible, we should likely switch to ETF.

It looks like Gumbel could be another candidate beside Cauchy. I can try to implement it in my repo and will post the results here (may take a few days as I have other urgent things).

@sbarral
Copy link

sbarral commented Apr 13, 2024

I have added Gumbel to the repo. Here are the results of cargo bench on my machine using the same conditions as the benchmark reported earlier for other distributions.

Test type Time (rand_distr) Time (etf-rs)
gumbel f32 13.872 ns 4.6761 ns
gumbel f64 19.719 ns 4.5018 ns

Bear in mind that here as well the tables were generated dynamically for the specific distribution parameters and their size was 256x3. Using a common table (location=0 and scale=1) necessitates an extra addition and multiplication per sample, but even with a common table and a small size (128x3), the numbers are still very decent: 5.4829 ns for f32 and 5.8653 ns for f64.

@dhardy dhardy added A-new Propose usage of a new algorithm and removed X-enhancement labels Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-new Propose usage of a new algorithm B-value Breakage: changes output values P-low Priority: Low
Projects
None yet
Development

No branches or pull requests

4 participants