README

This is an implementation of the mapper algorithm by Singh, Mémoli, and Carlsson (2007).

Setup

To install the latest version of this package from Github, run the following commands:

Mathy Overview

Cover \(\mathbb{R}\) in a set of intervals \(`\{I_i\}_{i=1}^n`\), so that every point in \(P\) is contained in some level set \(L_i = f^{-1}(I_i)\). We may then construct a graph

This is the basic idea of the mapper algorithm, with the addition that each level set is first refined into clusters based on the intrinsic pairwise distances of the data according to some clustering algorithm. That is, we partition each level set \(L_i\) into \(k_i\) disjoint clusters

Example 1: 1D Mapper

num_points = 5000

P.data = data.frame(
  x = sapply(1:num_points, function(x)
    sin(x) * 10) + rnorm(num_points, 0, 0.1),
  y = sapply(1:num_points, function(x)
    cos(x) ^ 2 * sin(x) * 10) + rnorm(num_points, 0, 0.1),
  z = sapply(1:num_points, function(x)
    10 * sin(x) ^ 2 * cos(x)) + rnorm(num_points, 0, 0.1)
)

P.dist = dist(P.data)

Here is a point cloud \(P\) formed by adding a bit of uniform noise to 5000 points regularly sampled from the parametric curve

This seems to form a kind of figure-8 curve just based on this projection. But as we can see from the 2D projections, the “shape” of the data set we’re seeing really does depend on how we’re looking at it:

We will build graphs using the outline of the mapper algorithm described, with real-valued lens functions.

# lens functions
projx = P.data$x
projy = P.data$y
projz = P.data$z
eccentricity = apply(as.matrix(P.dist), 1, sum) / num_points

# cover parameters to generate a width-balanced cover
num_bins = 10
percent_overlap = 25

# generate the cover
xcover = create_width_balanced_cover(min(projx), max(projx), num_bins, percent_overlap)
ycover = create_width_balanced_cover(min(projy), max(projy), num_bins, percent_overlap)
zcover = create_width_balanced_cover(min(projz), max(projz), num_bins, percent_overlap)
eccentriccover = create_width_balanced_cover(min(eccentricity),
                                             max(eccentricity),
                                             num_bins,
                                             percent_overlap)

# bin tester machine machine
check_in_interval <- function(endpoints) {
  return(function(x) (endpoints[1] - x <= 0) & (endpoints[2] - x >= 0))
}

# each of the "cover" elements will really be a function that checks if a data point lives in it
xcovercheck = apply(xcover, 1, check_in_interval)

# build the mapper objects
xmapper = create_mapper_object(
  data = P.data,
  dists = P.dist,
  filtered_data = projx,
  cover_element_tests = xcovercheck,
  method = "single"
)

# mappeR also has a built-in 1D mapper function that will do the above for you
ymapper = create_1D_mapper_object(P.data, P.dist, projy, ycover, "single")
zmapper = create_1D_mapper_object(P.data, P.dist, projz, zcover, "single")
eccentricmapper = create_1D_mapper_object(P.data, P.dist, eccentricity, eccentriccover, "single")

# mappeR also has functions which will convert the mapper outputs into igraph format
ixmapper = mapper_object_to_igraph(xmapper)
iymapper = mapper_object_to_igraph(ymapper)
izmapper = mapper_object_to_igraph(zmapper)
ieccentricmapper = mapper_object_to_igraph(eccentricmapper)

The vertices in each output graph below are colored according to the level set the cluster belongs to, and scaled by (the square root of) the number of data points in the cluster.

Example 2: ball mapper

By toying with the general mapper parameters, we can obtain different flavors of the algorithm. In the ball mapper flavor, we simply use the inclusion into the ambient space of the data as our lens function, and let the cover do the work. Specifically, we cover the ambient space with \(\varepsilon\)-balls by creating a \(\varepsilon\)-net, which can be done with a greedy algorithm.

There’s a secret parameter here, which is \(\varepsilon\). Below are output graphs for varying values of \(\varepsilon\); the sizing is as with the 1D mapper, but no coloring is done as each vertex would have to receive its own color in this flavor, which is redundant.

# creates a cover using a greedy algorithm
balls1 = create_balls(data = P.data, dists = P.dist, eps = .25)

# ball tester machine machine
is_in_ball <- function(ball) {
  return(function(x) x %in% ball)
}

# filtering is just giving back the data (row names because my balls are lists of data point names, so the filter should match)
ballmapper1 = create_mapper_object(P.data, P.dist, rownames(P.data), lapply(balls1, is_in_ball))

# mappeR has a built-in ball mapper function to do this for you
ballmapper2 = create_ball_mapper_object(P.data, P.dist, .5)
ballmapper3 = create_ball_mapper_object(P.data, P.dist, 1)
ballmapper4 = create_ball_mapper_object(P.data, P.dist, 2)