You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm running into an issue with george and emcee. My likelihood function involves a call to gp.predict. This is fast in serial, but when I try to run emcee with a multiprocessing Pool, each gp.predict call takes a longer time. (Not sure whether I should have opened an issue here or at emcee!)
Below is a minimum failing example. When I run this on my 24-core machine, this is the output:
Serial took 5.1 seconds
Parallel multiprocessing took 39.5 seconds
The parallel one is much slower in total. Individual gp.predict calls are ~10x slower in parallel (serial: ~0.002s, parallel: ~0.02s).
I tried adding:
importosos.environ["OMP_NUM_THREADS"] ="1"
Then I get this output:
Serial took 5.1 seconds
Parallel multiprocessing took 1.6 seconds
This makes the parallel job faster than the serial job, by about 3x. However, the individual predict calls still take 2-3x longer (serial: ~0.002s, parallel: ~0.004-0.006s), so I'm not getting the speedup I expect.
This holds up for varying N, nwalkers, and ndim. I am having this issue in my full-scale code with a large training set and high-dimensional statistics. There the predict calls take up to 6x longer in parallel (with OMP_NUM_THREADS="1").
Could this have to do with the same gp object not being able to be called by multiple threads at the same time? Is there a way around this / am I missing something with the parallelization? Thank you!
Minimum failing example:
importtimeimportnumpyasnpimportemceeimportgeorgefromgeorgeimportkernelsdeflog_prob(theta):
xval=np.random.random()*10# to be in same xranges=time.time()
pred, pred_var=gp.predict(y, xval, return_var=True)
e=time.time()
print("GP predict time:", e-s) #This will print many lines, may want to comment outreturnpred## Set up MCMC parametersnwalkers=24ndim=2np.random.seed(42)
initial=np.random.randn(nwalkers, ndim)
nsteps=100## Build GP (from george tutorial)N=1500np.random.seed(1234)
x=10*np.sort(np.random.rand(N))
yerr=0.2*np.ones_like(x)
y=np.sin(x) +yerr*np.random.randn(len(x))
kernel=np.var(y) *kernels.ExpSquaredKernel(0.5)
gp=george.GP(kernel)
gp.compute(x, yerr)
## Serialfrommultiprocessingimportcpu_countncpu=cpu_count()
print("{0} CPUs".format(ncpu))
sampler=emcee.EnsembleSampler(nwalkers, ndim, log_prob)
start=time.time()
sampler.run_mcmc(initial, nsteps)
end=time.time()
serial_time=end-startprint("Serial took {0:.1f} seconds".format(serial_time))
## ParallelfrommultiprocessingimportPoolwithPool() aspool:
sampler=emcee.EnsembleSampler(nwalkers, ndim, log_prob, pool=pool)
start=time.time()
sampler.run_mcmc(initial, nsteps)
end=time.time()
multi_time=end-startprint("Parallel multiprocessing took {0:.1f} seconds".format(multi_time))
The text was updated successfully, but these errors were encountered:
Hi, I'm running into an issue with george and emcee. My likelihood function involves a call to gp.predict. This is fast in serial, but when I try to run emcee with a multiprocessing Pool, each gp.predict call takes a longer time. (Not sure whether I should have opened an issue here or at emcee!)
Below is a minimum failing example. When I run this on my 24-core machine, this is the output:
The parallel one is much slower in total. Individual gp.predict calls are ~10x slower in parallel (serial: ~0.002s, parallel: ~0.02s).
I tried adding:
Then I get this output:
This makes the parallel job faster than the serial job, by about 3x. However, the individual predict calls still take 2-3x longer (serial: ~0.002s, parallel: ~0.004-0.006s), so I'm not getting the speedup I expect.
This holds up for varying N, nwalkers, and ndim. I am having this issue in my full-scale code with a large training set and high-dimensional statistics. There the predict calls take up to 6x longer in parallel (with OMP_NUM_THREADS="1").
Could this have to do with the same gp object not being able to be called by multiple threads at the same time? Is there a way around this / am I missing something with the parallelization? Thank you!
Minimum failing example:
The text was updated successfully, but these errors were encountered: