JAX-based sampler #517

JohnGoertz · 2021-11-22T12:12:49Z

Feature description
It would be great to have a sampler that's both compatible with JAX-jitted functions and leverages JAX's parallelization tools.

Motivation/Application
I have a slow objective function that is made significantly (orders of magnitude) faster using JAX tools, in particular jit and vmap. However, JAX's multithreading clashes with pyABC's multithreaded samplers, and pickling the jitted function doesn't behave either. Oddly, this isn't an issue with relatively simple versions of my objective function, those can use pyABC's default samplers, but more complex versions only work with pyABC's SingleCoreSampler.

I'd like to write an extension of the SingleCoreSampler that relies solely on JAX for vectorization/parallelization/multithreading. I have some ideas on how to get started but I'd like some pointers. This would work best as a batch-sampling system, where an array of samples are submitted and the model function is mapped across the array using vmap or pmap. This evaluation could itself be jitted as well. My questions are:

How could I get the sampler to create a batch of samples?
Would mapping submit_one across the batch work?
Do you know if there's anything that happens to the model function when it's assigned to submit_one that JAX might not like? (namely numpy operations).
How to return the samples after evaluation?

The text was updated successfully, but these errors were encountered:

yannikschaelte · 2021-11-22T23:14:33Z

Hi, this sure sounds useful. We had plans of implementing batch submission, essentially generalizing the simulate_one to simulate_n, which would require some internal rewirings. What is your timeline on this? We would probably have time in the next couple weeks, however not immediately. In principle, simulate_many would be stand-alone and return a List of model simulations Dict[str, Union[np.ndarray, pd.DataFrame]] as well as distance values etc. These would be in List format as the data storage handles single simulations as different entities, without vectorization. Would that be an issue? Conceptually, rewriting storage to provide a single simulation matrix per generation would clearly also be possible, however would require further changes, as pyABC was not designed for batching but rather simulation-heavy single cpu simulators. Definition of batch size n would require some tuning, e.g. based on prediction of acceptance rate. See also the discussion in #351

JohnGoertz · 2021-11-23T10:48:19Z

Hi Yannik, that's great to hear. It's not urgent, something in a few weeks would be fine. Going to/from List/Dict and JAX-numpy arrays shouldn't be too much of an issue; the jitted evaluator would have to be inside of that, but that shouldn't introduce too much overhead. I can definitely see how batch size would have to be tuned to each application, but I have at least some intuition for what should work. For instance, if I am requesting a population size of 1000, a batch size of 100 should significantly speed things up without leading to too much excessive computation. Initially at least, this would be sequential sampling from pyABC's perspective making 100 proposals at a time, so you don't have to worry about the distributed case of having 999 accepted samples but then ten workers each submit 100 proposals just to get that last sample.

It's interesting that they had better luck with pymc3's ABC-SMC, I tried that first and I think yours was faster. Also, re-implementing simulations to be JAX-friendly isn't trivial, but it's a lot more approachable than Theano...

yannikschaelte · 2021-11-23T11:24:50Z

Sounds good! I will let you know when I get to work on this, hopefully in the next few weeks. If you need an urgent solution, a simple implementation sidestepping the simulate_one calls in

pyABC/pyabc/inference/smc.py

Line 799 in e4fdc78

sample = self.sampler.sample_until_n_accepted(

with a dedicated sampler (which implements batch size and result merging) should be straightforward, a sustainable solution however take a little longer. Agreed, it should not be too much work, only simultaneously speeding up the storage format, which at the moment can be the bottleneck for fast simulators, can add a bit of complexity.

Yes, what is faster depends on the problem at hand (as well as the implemented algorithms). For most of our problems, Theano/Aesara//Jax are no options, as simulators are dedicated C++/R routines, however it will be good if pyABC also efficiently handles those, as there have by now been a few applications already.

JohnGoertz added the enhancement label Nov 22, 2021

yannikschaelte assigned yannikschaelte and JohnGoertz Nov 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JAX-based sampler #517

JAX-based sampler #517

JohnGoertz commented Nov 22, 2021

yannikschaelte commented Nov 22, 2021 •

edited

Loading

JohnGoertz commented Nov 23, 2021

yannikschaelte commented Nov 23, 2021

JAX-based sampler #517

JAX-based sampler #517

Comments

JohnGoertz commented Nov 22, 2021

yannikschaelte commented Nov 22, 2021 • edited Loading

JohnGoertz commented Nov 23, 2021

yannikschaelte commented Nov 23, 2021

yannikschaelte commented Nov 22, 2021 •

edited

Loading