-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathtest_analytic_geodesic_distance.Rmd
81 lines (61 loc) · 3.77 KB
/
test_analytic_geodesic_distance.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: Test if analytic derivation of geodesic distance for curves from sim_functional_data.R
is correct
output:
html_document:
df_print: paged
pdf_document: default
---
```{r}
library(reticulate)
# use_python('/usr/local/bin/python3.7',required=TRUE)
# use_python('/Users/UQAM/anaconda3/bin/python',required=TRUE)
# use_python('/Users/suswei/anaconda3/bin/python',required=TRUE)
```
```{r}
py_config()
```
```{r}
source('sim_functional_data.R')
```
The function sim_functional_data.R allows specification of several simulatin scenarios.
# The samplesize should be set very high so that Floyd's algorithm works well on the noiseless data. The number of time grid points K also should be high so that the vector data really represents the functional curve.
TODO: The reg_sampling parameter in sim_functional_data is somewhat of a misnomer. In either case when it's 0 or 1, it does NOT mean that the manifold can be sampled evenly. Recall swiss roll euclidean example. A simple sampling of the one-dimensional manifold parameter does not mean that entire swiss roll is visited, some corners are seldom sampled. This will affect downstream Floyd's algorithm. Indeed, Diaconis et. al in "Sampling from a Manifold" realize the drawbacks of naive sampling of Euclidean manifolds.
```{r}
# Generate data
sim <- sim_functional_data(sce=4,samplesize=100,K=30,plot_true=1,reg_sampling=1)
```
The output contains the in-sample pairwise geodesic distance matrix (analytic_geo) via analytic derivation. The purpose of this notebook is to see if our analytic derivations match numerical results.
```{r}
names(sim)
```
Specifically, we would like to compare sim$analytic_geo to geodesic distance estimation via Floyd's algorithm for the noiseless (discretised) curve. This algorithm is guaranteed to do well if we sample the manifold densely enough? Setting the samplesize very high is not enough for this, a concentrated sampling of the manifold is actually needed. More on this in sim_functiona_data.R documentation. The number of grid points should not be too low since then the L2 distance between the discretized curves won't be well estimated.
```{r}
get_min_num_neighbors = import_from_path("get_min_num_neighbors",path='.')
getIsomapGdist = import_from_path("getIsomapGdist",path='.')
attach(sim)
a = reg_grid[1]
b = tail(reg_grid,1)
K = length(reg_grid)
samplesize = dim(noiseless_data)[1]
noiseless_data_tmp = (sqrt((b-a)/K))*noiseless_data
# Find a grid of possible values for the number of neigbors
num_neigh_min=get_min_num_neighbors$get_min_num_neighbors(noiseless_data_tmp)
num_neigh_true=seq(num_neigh_min,samplesize/2,by=2)
# Calculate the geo matrix for each number of neighbors and keep the one that gives the minimal error
norm_analytic_geo = sqrt(sum(analytic_geo^2))
Error_true_mani_K= rep(0,length(num_neigh_true))
for(j in 1:length(num_neigh_true)){
IsomapGdist = getIsomapGdist$getIsomapGdist(noiseless_data_tmp,num_neigh_true[j])
Error_true_mani_K[j]=sqrt(sum((IsomapGdist - analytic_geo )^2))/norm_analytic_geo
}
ind_op_true=min(which(Error_true_mani_K==min(Error_true_mani_K)))
estim_geo_noiseless_data = getIsomapGdist$getIsomapGdist(noiseless_data_tmp,num_neigh_true[ind_op_true])
image.plot(estim_geo_noiseless_data,main='geo estimation')
```
If the analytic derviation is correct, then estim_geo_noiseless_data should be extremely similar to analytic_geo. We can assess their closeness using assess_goodness_estimation.R. Three assessment metrics are returned. If the calculation is correct, rmse should be close to zero and epsilon_isometry_auc and pearson_corr should both be close to 1. Note that running this notebook for sce=1 gives rmse on the order of e-16.
```{r}
library(MESS)
source('assess_goodness_estimation.R')
assess_goodness_estimation(estim_geo_noiseless_data,sim$analytic_geo)
```