-
Hello, I have been looking into kdeplot as a way to analyse some bivariate data resulting from a markov-chain like sampling procedure. As far as I understand, the data fed into kdeplot is used to construct a kde of the underlying probability density from which the data is sampled. Then that the estimator is evaluated on the data to get the density values for each point, which are fed into _quantile_to_level().
As I would expect, the points are sorted in decreasing density order since we want to determine the isocontour of density within which some fixed proportion of the data lies. However I don't understand why the cumulative sum of the densities is used to determine the "integral" of the density. If the data were drawn uniformly then I could see how that corresponds to a Monte Carlo estimate of the integral. However, as far as I understand, the data are also drawn from same distribution that we are trying to integrate, so one actually has a perfect importance-sampled dataset. In this case, a simple unweighted sum should give an estimate of the proportion of the data below a given density value. I other words, if I have 100 datapoints sorted by density, then the density value of the 95th one should given me the isodensity contour within which 95% of the datapoints lie. Am I missing something? I am definitely new to seaborn and havent quite figured out how the input data is treated in kdeplot so I apologise if my question is borne out of igorance of basic things about seaborn. Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I think what you’re missing is that the kde is not evaluated on the original data observations, but on a regular grid of points. |
Beta Was this translation helpful? Give feedback.
The
estimator
here is this KDE transformer:seaborn/seaborn/_statistics.py
Line 41 in 865618d