Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to speed up Kernel Density Estimation based sampling #414

Open
lizhengyuhang opened this issue Nov 7, 2023 · 1 comment
Open

How to speed up Kernel Density Estimation based sampling #414

lizhengyuhang opened this issue Nov 7, 2023 · 1 comment
Labels

Comments

@lizhengyuhang
Copy link

Hi Jonathan:

When I use the distribution generated by kernel density estimation for sampling, it takes a lot of time. And I use the distribution generated by sklearn's KDE for sampling, which is very fast.
Moreover, the orthogonal polynomial generation based on kernel density estimation is also time-consuming.
Is there any method that can accelerate sampling and orthogonal polynomial generation based on kernel density estimation?

Here is my code:

My environment:
Python 3.8.8
chaospy 4.3.13
numpoly 1.2.11

# sampling based on kernel density estimation
import numpy as np
import chaospy as cp
from sklearn.neighbors import KernelDensity
import time
samples_x_mc = np.random.randn(4, 1000)+2
print('MC:',np.mean(samples_x_mc,axis=1))

time_b = time.time()
dist_kde =KernelDensity(bandwidth='silverman',kernel='gaussian').fit(samples_x_mc.T)
samples_ked = dist_kde.sample(1000).T
time_e = time.time()
print('SKL:',np.mean(samples_ked,axis=1))
print('SKL time:',(time_e-time_b))

time_b = time.time()
dist_cp =cp.GaussianKDE(samples_x_mc,estimator_rule='silverman')
samples_cp = dist_cp.sample(1000)
time_e = time.time()
print('CP:',np.mean(samples_cp,axis=1))
print('CP time:',(time_e-time_b))

The outout is:
MC: [1.99415654 1.97854686 1.99451553 1.99307159]
SKL: [2.01294929 2.01646145 2.02962535 1.91007853]
SKL time: 0.001994609832763672
CP: [1.92208752 1.95432996 1.98291103 1.94525712]
CP time: 81.79704451560974
If there are more samples, the CP time will be larger and even difficult to calculate.

# the orthogonal polynomial generation based on kernel density estimation
import numpy as np
import chaospy as cp
import time

samples_x_mc = np.random.randn(5, 1000)+2
time_b = time.time()
distributions =cp.GaussianKDE(samples_x_mc,estimator_rule='silverman')
expansion,norms = cp.generate_expansion(2, distributions, rule="cholesky",retall=True)
time_e = time.time()
print('time:',(time_e-time_b))

The output is:
time: 71.0097062587738

@lizhengyuhang lizhengyuhang changed the title Kernel Density Estimation based sampling is very slow How to speed up Kernel Density Estimation based sampling Nov 15, 2023
@jonathf
Copy link
Owner

jonathf commented May 18, 2024

Unfortunatly, the KDE module is really slow. There is unfortunatly no roadmap for having that fixed, but MRs are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants