Skip to content
This repository was archived by the owner on Jun 9, 2023. It is now read-only.

Add additional covers: LandmarkBallCover & NeighborhoodCover #16

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

yaraskaf
Copy link

@yaraskaf yaraskaf commented May 5, 2020

New Covers

LandmarkBallCover

As discussed in #7, the existing BallCover.R constructs an epsilon-ball around each lensed point, then unions all intersecting balls together. It would be useful to have a version of this cover that constructs an epsilon-landmark set in the lensed space, then uses balls centered at these landmarks as the cover-- this is the purpose of LandmarkBallCover.R.

The landmark set can be chosen by specifying either

  • Radius: epsilon parameter sets radius of cover set

  • Number: num_sets parameter sets desired number of balls in the cover

The seed to use for the landmark selection algorithm can also be specified as one of

  • Specific point: set the index of the seed using the seed_index parameter

  • Random point: parameter seed_method="RAND"

  • Max eccentricity: parameter seed_method="ECC" selects lensed point with highest eccentricity

NeighborhoodCover

This is an additional cover type where open sets are formed by by k-neighborhoods about a landmark set. Instead of specifying a radius or a number of cover sets, the number of points/neighbors per set is specified via the k parameter. It also has the same options to select the seed as LandmarkBallCover.

This might be useful when the lensed data has areas of high and low density-- there will be more cover sets in high density areas since sets are determined by a number of neighbors rather than a distance-based radius.

Demonstration

The following code was run after installing the current version of this landmark-ballcover branch.

Simple example of functionality

First consider a simple test case to illustrate the difference between the covers:

X = cbind(c(0,1,2,3,4,8), c(0,1,2,1,3,1))
basePt = X[1]
f_X = matrix(apply(X[,1,drop=FALSE], 1, function(pt) (pt - basePt)))

m <- MapperRef$new(X)$use_filter(f_X)

image

The existing ball cover unions intersecting cover sets, resulting in two disconnected 0-simplices:

m$use_cover("ball", epsilon=1.1)$
  use_distance_measure(measure="euclidean")$
  construct_k_skeleton(k=1L)
plot(m$simplicial_complex)

image

The landmark ball cover generates the following using the same epsilon:

m$use_cover("landmark_ball", epsilon=1.1)$
  use_distance_measure(measure="euclidean")$
  construct_k_skeleton(k=1L)
plot(m$simplicial_complex)

image

The same output can be obtained by specifying a number of sets instead, i.e. use_cover("landmark_ball", num_sets=4L).

A different complex is generated using the neighborhood cover:

m$use_cover("neighborhood", k=2L)$
  use_distance_measure(measure="euclidean")$
  construct_k_skeleton(k=1L)
plot(m$simplicial_complex)

image

Seed options

For both new cover methods, there are three options for specifying the seed points:

(1) User-specified: use_cover("landmark_ball", epsilon=1.1, seed_index=2L) uses the second point in f_X as the seed:

> ## Landmark_ball Cover: (epsilon = 1.1, seed index = 2)

(2) Eccentricity: use_cover("landmark_ball", epsilon=1.1, seed_method="ECC") uses the point in f_X with the highest eccentricity as the seed:

> ## Landmark_ball Cover: (epsilon = 1.1, seed index = 6)

(3) Random: use_cover("landmark_ball", epsilon=1.1, seed_method="RAND") uses a random point in f_X as the seed:

> ## Landmark_ball Cover: (epsilon = 1.1, seed index = 4)

Noisy circle

Both new covers can also recover the homology of a larger data set (over specific parameter ranges):

data("noisy_circle", package = "Mapper")
left_pt <- noisy_circle[which.min(noisy_circle[, 1]),]
f_X <- matrix(apply(noisy_circle, 1, function(pt) (pt - left_pt)[1]))

image

m <- MapperRef$new(noisy_circle)$
  use_filter(filter = matrix(f_X))$
  use_cover(cover="landmark_ball", epsilon=0.5)$
  use_distance_measure(measure="euclidean")$
  construct_k_skeleton(k=1L)
plot(m$simplicial_complex)

image

m <- MapperRef$new(noisy_circle)$
  use_filter(filter = matrix(f_X))$
  use_cover(cover="neighborhood", k=70)$
  use_distance_measure(measure="euclidean")$
  construct_k_skeleton(k=1L)
plot(m$simplicial_complex)

image

Other Modifications

The file landmarks.R was also updated to contain the epsilon-landmark selection algorithm from Dłotko's paper "Ball Mapper: A Shape Summary for Topological Data Analysis" (2019).

Calling the function as landmarks(f_X, n=N) selects N landmarks using the originally implemented landmark method. Using landmarks(f_X, eps=epsilon) selects however many landmarks are needed to create an epsilon-net using the Dłotko algorithm.

Naming Issue

@corybrunson and I were thinking that it may be better to use the name "ball cover" for the landmark ball cover instead. Like we talked about in #7, the landmark ball cover might be the expected functionality for a method referencing a ball cover.

The existing BallCover.R could either be removed or renamed to something like DisjointBallCover-- then LandmarkBallCover could take the name BallCover. But I wanted to get your opinion before I did any refactoring since it may cause looking back at the commit history of "BallCover.R" to get a little confusing.

Implementation of ball cover of lensed data space using landmark point set.

Fixed bug in landmarks.cpp
Addition of ability to specify a radius rather than a number of balls to construct a LandmarkBallCover.

Modified LandmarkBallCover.R to support use of epsilon parameter, added new landmark function to landmarks.R to find landmarks by radius rather than number.
…eccentricity as seed for landmark selection

Modified construct_cover function in LandmarkBallCover.R to include selection of seed based on maximum eccentricity.
…umber of points

When every ball has the same number of points, apply() returns a matrix rather than a list of lists, causing a crash. Replaced this function with splitting the matrix by column in this case.
…ompute eps-landmark set

Replaced 1D euclidean distance with proxy::dist(), allowing arbitrary dist_method from proxy::pr_DB
Updated validate() method in LandmarkBallCover.R to ensure appropriate values of parameters for cover construction.
Implementation of cover by k-nearest neighbor sets.
…a cover set

Fixed bug where some points are excluded from cover sets.
Allow neighborhoods of size greater than k when >k points have the same lens value. Now all points with the same lensed value end up in the same pullback set.
…ver.R

Update headers, add reference to Dlotko paper.
Remove commented out code that computed landmarks in onle one dimensional data.
In this case, just take the unique balls as the cover sets. Duplicate lensed values should not form distinct centers.
Moved calculation of k-neighborhoods to NeighborhoodCover.R, eliminated unnecessary code, removed k-nhd functionality from landmarks.R
…ation

Refactor code for epsilon-landmarks in landmarks.R, update documentation
Change list() to c() to avoid bug with pasting lists of parameters.
@corybrunson
Copy link
Contributor

Thanks @yaraskaf ! One thing to note, since it came up in #7 , is that Dłotko calculated landmark sets from the point cloud X in the original data space, whereas the ball cover performs this calculation on the lensed (filtered) points f(X) in Z. That is to say, this is a type of cover for use with a mapper construction, rather than the alternative construction discussed in the Ball Mapper paper. (This caused confusion on our end so i thought it worth stating explicitly.)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants