How to speed up Kernel Density Estimation based sampling #414

lizhengyuhang · 2023-11-07T02:48:29Z

Hi Jonathan：

When I use the distribution generated by kernel density estimation for sampling, it takes a lot of time. And I use the distribution generated by sklearn's KDE for sampling, which is very fast.
Moreover, the orthogonal polynomial generation based on kernel density estimation is also time-consuming.
Is there any method that can accelerate sampling and orthogonal polynomial generation based on kernel density estimation?

Here is my code:

My environment:
Python 3.8.8
chaospy 4.3.13
numpoly 1.2.11

# sampling based on kernel density estimation
import numpy as np
import chaospy as cp
from sklearn.neighbors import KernelDensity
import time
samples_x_mc = np.random.randn(4, 1000)+2
print('MC:',np.mean(samples_x_mc,axis=1))

time_b = time.time()
dist_kde =KernelDensity(bandwidth='silverman',kernel='gaussian').fit(samples_x_mc.T)
samples_ked = dist_kde.sample(1000).T
time_e = time.time()
print('SKL:',np.mean(samples_ked,axis=1))
print('SKL time:',(time_e-time_b))

time_b = time.time()
dist_cp =cp.GaussianKDE(samples_x_mc,estimator_rule='silverman')
samples_cp = dist_cp.sample(1000)
time_e = time.time()
print('CP:',np.mean(samples_cp,axis=1))
print('CP time:',(time_e-time_b))

The outout is:
MC: [1.99415654 1.97854686 1.99451553 1.99307159]
SKL: [2.01294929 2.01646145 2.02962535 1.91007853]
SKL time: 0.001994609832763672
CP: [1.92208752 1.95432996 1.98291103 1.94525712]
CP time: 81.79704451560974
If there are more samples, the CP time will be larger and even difficult to calculate.

# the orthogonal polynomial generation based on kernel density estimation
import numpy as np
import chaospy as cp
import time

samples_x_mc = np.random.randn(5, 1000)+2
time_b = time.time()
distributions =cp.GaussianKDE(samples_x_mc,estimator_rule='silverman')
expansion,norms = cp.generate_expansion(2, distributions, rule="cholesky",retall=True)
time_e = time.time()
print('time:',(time_e-time_b))

The output is:
time: 71.0097062587738

jonathf · 2024-05-18T19:05:11Z

Unfortunatly, the KDE module is really slow. There is unfortunatly no roadmap for having that fixed, but MRs are welcome.

lizhengyuhang added the question label Nov 7, 2023

lizhengyuhang changed the title ~~Kernel Density Estimation based sampling is very slow~~ How to speed up Kernel Density Estimation based sampling Nov 15, 2023

iffanh mentioned this issue Nov 20, 2023

.sample() method is much slower for GaussianKDE #415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to speed up Kernel Density Estimation based sampling #414

How to speed up Kernel Density Estimation based sampling #414

lizhengyuhang commented Nov 7, 2023

jonathf commented May 18, 2024

How to speed up Kernel Density Estimation based sampling #414

How to speed up Kernel Density Estimation based sampling #414

Comments

lizhengyuhang commented Nov 7, 2023

jonathf commented May 18, 2024