Question about isoproportions to isodensities in plot_bivariate_density() #3594

kenmimasu · 2023-12-15T18:16:00Z

kenmimasu
Dec 15, 2023

Hello,

I have been looking into kdeplot as a way to analyse some bivariate data resulting from a markov-chain like sampling procedure.
However, I am confused about how the plot_bivariate_density() function that is called under the hood determines the isodensity value to use for a given quantile specified by the user via the 'levels' option (e.g. 0.05 for 95%).

As far as I understand, the data fed into kdeplot is used to construct a kde of the underlying probability density from which the data is sampled. Then that the estimator is evaluated on the data to get the density values for each point, which are fed into _quantile_to_level().

    def _quantile_to_level(self, data, quantile):
        """Return data levels corresponding to quantile cuts of mass."""
        isoprop = np.asarray(quantile)
        values = np.ravel(data)
        sorted_values = np.sort(values)[::-1]
        normalized_values = np.cumsum(sorted_values) / values.sum()
        idx = np.searchsorted(normalized_values, 1 - isoprop)
        levels = np.take(sorted_values, idx, mode="clip")
        return `levels`

As I would expect, the points are sorted in decreasing density order since we want to determine the isocontour of density within which some fixed proportion of the data lies. However I don't understand why the cumulative sum of the densities is used to determine the "integral" of the density.

If the data were drawn uniformly then I could see how that corresponds to a Monte Carlo estimate of the integral. However, as far as I understand, the data are also drawn from same distribution that we are trying to integrate, so one actually has a perfect importance-sampled dataset. In this case, a simple unweighted sum should give an estimate of the proportion of the data below a given density value. I other words, if I have 100 datapoints sorted by density, then the density value of the 95th one should given me the isodensity contour within which 95% of the datapoints lie.

Am I missing something? I am definitely new to seaborn and havent quite figured out how the input data is treated in kdeplot so I apologise if my question is borne out of igorance of basic things about seaborn.

Thanks!

Answered by mwaskom

Dec 16, 2023

The estimator here is this KDE transformer:

seaborn/seaborn/_statistics.py

Line 41 in 865618d

class KDE:

View full answer

mwaskom · 2023-12-15T20:18:01Z

mwaskom
Dec 15, 2023
Maintainer

I think what you’re missing is that the kde is not evaluated on the original data observations, but on a regular grid of points.

5 replies

kenmimasu Dec 15, 2023
Author

Ah yes, thank you! I was wondering if that might be the case. Is that the difference between comp_data and plot_data?

mwaskom Dec 15, 2023
Maintainer

comp_data doesn't change the shape of the data, but it does ensure that we're working with data you can do math on (i.e., everything has been converted to floats). The key step is when the data pass through the KDE estimator:

seaborn/seaborn/distributions.py

Line 1073 in 865618d

density, support = estimator(*observations, weights=weights)

kenmimasu Dec 16, 2023
Author

Thanks, so just to be sure: the estimator is generated at some earlier stage using the original data, and later on a rectangular grid is generated in order to get the contour levels, right? Thanks for your patience so far!

mwaskom Dec 16, 2023
Maintainer

The estimator here is this KDE transformer:

seaborn/seaborn/_statistics.py

Line 41 in 865618d

class KDE:

Answer selected by kenmimasu

kenmimasu Dec 18, 2023
Author

I see! Thank you very much for the clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about isoproportions to isodensities in plot_bivariate_density() #3594

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Question about isoproportions to isodensities in plot_bivariate_density() #3594

kenmimasu Dec 15, 2023

Replies: 1 comment · 5 replies

mwaskom Dec 15, 2023 Maintainer

kenmimasu Dec 15, 2023 Author

mwaskom Dec 15, 2023 Maintainer

kenmimasu Dec 16, 2023 Author

mwaskom Dec 16, 2023 Maintainer

kenmimasu Dec 18, 2023 Author

kenmimasu
Dec 15, 2023

Replies: 1 comment 5 replies

mwaskom
Dec 15, 2023
Maintainer

kenmimasu Dec 15, 2023
Author

mwaskom Dec 15, 2023
Maintainer

kenmimasu Dec 16, 2023
Author

mwaskom Dec 16, 2023
Maintainer

kenmimasu Dec 18, 2023
Author