Skip to content

tdaverse/phutil

Folders and files

NameName
Last commit message
Last commit date

Latest commit

2088778 · Apr 27, 2025
Apr 26, 2025
Apr 26, 2025
Apr 25, 2025
Apr 25, 2025
Apr 26, 2025
Apr 25, 2025
Apr 27, 2025
Apr 24, 2025
Mar 31, 2025
Apr 27, 2025
Apr 27, 2025
Apr 26, 2025
Apr 24, 2025
Apr 26, 2025
Apr 24, 2025
Apr 24, 2025
Apr 24, 2025
Apr 25, 2025
Apr 25, 2025
Apr 1, 2025
Apr 1, 2025
Mar 31, 2025

Repository files navigation

{phutil}

R-CMD-check Codecov test coverage

The goal of the {phutil} package is to provide utility functions for persistence data analysis. In particular, the package defines a data structure for hosting persistence data. The package is part of the tdaverse suite of packages, which aims to provide a consistent interface for topological data analysis in R that nicely integrates with the tidymodels ecosystem.

Installation

You can install the development version of {phutil} from GitHub with:

# install.packages("pak")
pak::pak("tdaverse/phutil")

The ‘persistence’ class

The package currently provides a new data structure for hosting persistence data. The data set noisy_circle_points is a simulated data set consisting of 100 points sampled from a circle with additive Gaussian noise using a standard deviation of 0.05 :

library(phutil)
head(noisy_circle_points)
#>               x          y
#> [1,]  0.6651721  0.6363172
#> [2,] -0.7481066 -0.6901262
#> [3,] -0.8288544 -0.5519689
#> [4,] -0.7650181 -0.7436499
#> [5,]  0.6337296 -0.7607464
#> [6,] -0.6077663 -0.7036492

The data set noisy_circle_ripserr is a persistence diagram computed using the ripserr package with the ripserr::vietoris_rips() and stored as an object of class ‘PHom’, which is a light wrapper around a data frame with 3 variables:

head(noisy_circle_ripserr)
#>   dimension birth       death
#> 1         0     0 0.008162723
#> 2         0     0 0.008548402
#> 3         0     0 0.009238432
#> 4         0     0 0.014004707
#> 5         0     0 0.020555701
#> 6         0     0 0.024621462

The data set noisy_circle_tda_rips is a persistence diagram computed using the TDA package with the TDA::ripsDiag() and stored as a list of length 1 containing the object diagram of class ‘diagram’, which is a matrix with 3 columns:

head(noisy_circle_tda_rips$diagram)
#>      dimension Birth     Death
#> [1,]         0     0 1.6322000
#> [2,]         0     0 0.3677392
#> [3,]         0     0 0.3325817
#> [4,]         0     0 0.3226673
#> [5,]         0     0 0.2096916
#> [6,]         0     0 0.1953531

The data structure adopted by the tdaverse suite of packages and provided by this package is designed to be a common interface for persistence data. This allows for a seamless integration of persistence data across different packages and ensures that the data is always in a consistent format.

Specifically, the ‘persistence’ class is a list with the following components:

  • pairs: A list of 2-column matrices containing birth-death pairs. The -th element of the list corresponds to the -th homology dimension. If there is no pairs for a given dimension but there are pairs in higher dimensions, the corresponding element(s) is/are filled with a numeric matrix.

  • metadata: A list of length 6 containing information about how the data was computed:

    • orderered_pairs: A boolean indicating whether the pairs in the pairs list are ordered (i.e. the first column is strictly less than the second column).
    • data: The name of the object containing the original data on which the persistence data was computed.
    • engine: The name of the package and the function of this package that computed the persistence data in the form "package_name::package_function".
    • filtration: The filtration used in the computation in a human-readable format (i.e. full names, capitals where need, etc.).
    • parameters: A list of parameters used in the computation.
    • call: The exact call that generated the persistence data.

The as_persistence() function is used to convert persistence data from different packages into the ‘persistence’ class. For example, the persistence data computed using the ripserr package can be converted into the ‘persistence’ class as follows:

as_persistence(noisy_circle_ripserr)
#> 
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 99 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips/Cubical filtration using `ripserr::<vietoris_rips/cubical>()`.
#> ! With unknown parameters.

Similarly, the persistence data computed using the TDA package can be converted into the ‘persistence’ class as follows:

as_persistence(noisy_circle_tda_rips)
#> 
#> ── Persistence Data ────────────────────────────────────────────────────────────
#> ℹ There are 100 and 2 pairs in dimensions 0 and 1 respectively.
#> ℹ Computed from a Vietoris-Rips filtration using `TDA::ripsDiag()`.
#> ℹ With the following parameters: maxdimension = 1 and maxscale = 1.6322.

Distances between persistence diagrams

The package also provides a function to compute the bottleneck distance between two persistence diagrams. The function bottleneck_distance() takes two persistence diagrams as input and computes the bottleneck distance between them:

library(phutil)
diag1 <- persistence_sample[[1]]
diag2 <- persistence_sample[[2]]
bottleneck_distance(diag1, diag2)
#> [1] 0.06197453

One can also compute the Wasserstein distance between two persistence diagrams using the wasserstein_distance() function:

wasserstein_distance(diag1, diag2)
#> [1] 1.53403

It is also possible to compute these distances between all pairs of persistence diagrams in a list using e.g. wasserstein_pairwise_distances():

wasserstein_pairwise_distances(persistence_sample[1:5])
#>          1        2        3        4
#> 2 1.534030                           
#> 3 2.301812 1.852241                  
#> 4 2.354967 2.362616 1.922799         
#> 5 2.374818 2.707366 1.985518 3.154120

It returns an object of class ‘dist’. It is parallelized using OpenMP and relies under the hood on the Hera C++ library which is BSD-licensed. The code interfacing R and C++ is generated by the header-only {cpp11} package which is MIT-licensed.

Contributions

Code of Conduct

Please note that the {phutil} package is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

Acknowledgments

This project was funded by an ISC grant from the R Consortium and done in coordination with Jason Cory Brunson and with guidance from Bertrand Michel and Paul Rosen. It builds upon conversations with Mathieu Carrière and Vincent Rouvreau who are among the authors of the GUDHI library. Package development also benefited from the support of colleagues at the Department of Mathematics Jean Leray and the use of equipment at Nantes University.