Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Islands() #121

Merged
merged 6 commits into from
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: TreeDist
Type: Package
Title: Calculate and Map Distances Between Phylogenetic Trees
Version: 2.7.1.9002
Version: 2.7.1.9003
Authors@R: c(person("Martin R.", "Smith",
email = "[email protected]",
role = c("aut", "cre", "cph", "prg"),
Expand Down Expand Up @@ -29,6 +29,8 @@ Description: Implements measures of tree similarity, including
(1996) <doi:10.1007/3-540-61332-3_168>.
Includes tools for visualizing mappings of tree space (Smith 2022)
<doi:10.1093/sysbio/syab100>,
for identifying islands of trees (Silva and Wilkinson 2021)
<doi:10.1093/sysbio/syab015>,
for calculating the median of sets of trees,
and for computing the information content of trees and splits.
Copyright: Jonker-Volgenant Linear Assignment Problem implementation
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ export(GeneralizedRF)
export(GetParallel)
export(InfoRobinsonFoulds)
export(InfoRobinsonFouldsSplits)
export(Islands)
export(JaccardRobinsonFoulds)
export(JaccardSplitSimilarity)
export(KCDiameter)
Expand Down
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# branch no-phangorn (9002)
# TreeDist 2.7.1.9003 (2024-06-28)

- `Islands()` allows the identification of islands of trees.

- Internal implementation of path and SPR distances, removing dependency
on phangorn (and thus R 4.4).
Expand Down
79 changes: 79 additions & 0 deletions R/Islands.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#' Find islands from distance matrix
#'
#' `Islands()` assigns a set of objects to islands, such that all elements
#' within an island can form a connected graph in which each edge is no longer
#' than `threshold` distance units \insertRef{Silva2021}{TreeDist}.
#'
#' @inheritParams SpectralEigens
#' @param threshold Elements greater than `threshold` distance units will not be
#' assigned to the same island.
#' @param dense Logical; if `FALSE`, each island will be named according to the
#' index of its lowest-indexed member; if `TRUE`, each island will be numbered
#' sequentially from `1`, ordered by the index of the lowest-indexed member.
#' @param smallest Integer; Islands comprising no more than `smallest` elements
#' will be assigned to island `NA`.
#' @return `Islands()` returns a vector listing the island to which
#' each element is assigned.
#' @references \insertAllCited{}
#' @examples
#' library("TreeTools", quietly = TRUE)
#' # Generate a set of trees
#' trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + c(-(40:20), 70:105), 16)
#'
#' # Calculate distances between trees
#' distances <- ClusteringInfoDist(trees)
#' summary(distances)
#'
#' # Assign trees to islands
#' isle <- Islands(distances, quantile(distances, 0.1))
#' table(isle)
#'
#' # Indicate island membership on 2D mapping of tree distances
#' mapping <- cmdscale(distances, 2)
#' plot(mapping, col = isle + 1,
#' asp = 1, # Preserve aspect ratio - do not distort distances
#' ann = FALSE, axes = FALSE, # Don't label axes: dimensions are meaningless)
#' pch = 16 # Plotting character: Filled circle
#' )
#'
#' # Compare strict consensus with island consensus trees
#' oPar <- par(mfrow = c(2, 2), mai = rep(0.1, 4))
#' plot(Consensus(trees), main = "Strict")
#' plot(Consensus(trees[isle == 1]), edge.col = 2, main = "Island 1")
#' plot(Consensus(trees[isle == 2]), edge.col = 3, main = "Island 2")
#' plot(Consensus(trees[isle == 3]), edge.col = 4, main = "Island 3")
#'
#' # Restore graphical parameters
#' par(oPar)
#' @template MRS
#' @family tree space functions
#' @export
Islands <- function(D, threshold, dense = TRUE, smallest = 0) {
linked <- as.matrix(D) <= threshold
n <- dim(linked)[[1]]
ret <- integer(n)
i <- 1
repeat {
links <- seq_len(n) == i
repeat {
nowLinked <- colSums(linked[links, , drop = FALSE]) > 0
if (any(nowLinked[!links])) {
# Added to island
links <- nowLinked
} else {
break
}
}
ret[links] <- i
i <- which.min(ret)
if (ret[[i]]) break
}
tab <- table(ret, dnn = NULL)
ret[ret %in% as.integer(names(tab)[tab < smallest])] <- NA

if (dense) {
as.integer(factor(rank(ret, ties.method = "min", na.last = "keep")))
} else {
ret
}
}
11 changes: 11 additions & 0 deletions inst/REFERENCES.bib
Original file line number Diff line number Diff line change
Expand Up @@ -491,6 +491,17 @@ @article{Shannon1948
journal = {Bell System Technical Journal}
}

@article{Silva2021,
title = {On Defining and Finding Islands of Trees and Mitigating Large Island Bias},
author = {Silva, Ana Serra and Wilkinson, Mark},
year = {2021},
journal = {Systematic Biology},
volume = {70},
number = {6},
pages = {1282--1294},
doi = {10.1093/sysbio/syab015}
}

@article{Smith2014,
author = {Smith, Martin R. and Ortega-Hern{\'{a}}ndez, Javier},
doi = {10.1038/nature13576},
Expand Down
79 changes: 79 additions & 0 deletions man/Islands.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MSTSegments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MapTrees.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/MappingQuality.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/SpectralEigens.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/cluster-statistics.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/median.multiPhylo.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions src/spr.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -174,8 +174,8 @@ List keep_and_reroot(const List tree1,
const intx n_node = ret_edge1.nrow() / 2;
if (!n_node) {
List nullTree = List::create(Named("edge") = ret_edge1,
_["tip.label"] = CharacterVector(0),
_["Nnode"] = n_node);
_["Nnode"] = n_node,
_["tip.label"] = CharacterVector(0));

nullTree.attr("class") = "phylo";
nullTree.attr("order") = "preorder";
Expand Down Expand Up @@ -228,8 +228,8 @@ List keep_and_reduce(const List tree1,

if (edge1.nrow() < 1) {
List nullTree = List::create(Named("edge") = NumericMatrix(0, 2),
_["tip.label"] = CharacterVector(0),
_["Nnode"] = 0);
_["Nnode"] = 0,
_["tip.label"] = CharacterVector(0));

nullTree.attr("class") = "phylo";
nullTree.attr("order") = "preorder";
Expand Down
19 changes: 19 additions & 0 deletions tests/testthat/test-Islands.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
test_that("Islands() works", {
library("TreeTools", quietly = TRUE)
trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + c(70:78, -(39:30), 80:99), 16)
distances <- ClusteringInfoDist(trees)
expect_equal(Islands(distances, 2.5), rep(c(1, 2, 1), c(9, 10, 20)))
expect_equal(Islands(distances, 2.5, FALSE), rep(c(1, 10, 1), c(9, 10, 20)))


trees <- as.phylo(c(0:10, 1000:1005, 2000:2019), 16)
distances <- ClusteringInfoDist(trees)
expect_equal(Islands(distances, 2.5, TRUE, smallest = 6),
rep(c(1, 2, 3), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, TRUE, smallest = 7),
rep(c(1, NA, 2), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, FALSE, smallest = 6),
rep(c(1, 12, 18), c(11, 6, 20)))
expect_equal(Islands(distances, 2.5, FALSE, smallest = 7),
rep(c(1, NA, 18), c(11, 6, 20)))
})
2 changes: 1 addition & 1 deletion tests/testthat/test-tree_distance_spr.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ test_that("SPR: keep_and_reroot()", {
expect_equal(Preorder(reduced[[1]]), Preorder(DropTip(result[[1]], "t9")))
expect_equal(Preorder(reduced[[2]]), Preorder(DropTip(result[[2]], "t9")))

skip_if_not_installed("TreeTools", "1.11.1.9002")
skip_if_not_installed("TreeTools", "1.11.1.9003")
twoZeroes <- list(Preorder(ZeroTaxonTree()), Preorder(ZeroTaxonTree()))
expect_equal(keep_and_reroot(SingleTaxonTree(), SingleTaxonTree(), FALSE),
twoZeroes)
Expand Down
30 changes: 30 additions & 0 deletions vignettes/tree-islands.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: "Finding islands of phylogenetic trees"
author: "[Martin R. Smith](https://smithlabdurham.github.io/)"
output:
rmarkdown::html_vignette:
fig_width: 6
fig_height: 4
bibliography: ../inst/REFERENCES.bib
csl: https://raw.githubusercontent.com/citation-style-language/styles/master/apa-old-doi-prefix.csl
vignette: >
%\VignetteIndexEntry{Finding islands of phylogenetic trees}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

Collections of phylogenetic trees, such as those output by Bayesian or parsimony
analysis, may occupy discrete regions of tree space such that individual
clusters of trees are separated from all other trees by a certain distance.
Finding such islands, and taking their consensus, can be an effective way
of summarising conflicting signal in a phylogenetic dataset [@Silva2021].

```{r col-trees-by-score}
# Load required libraries
library("TreeTools", quietly = TRUE)
library("TreeDist")

# Generate a set of trees
trees <- as.phylo(as.TreeNumber(BalancedTree(16)) + 0:100 - 15, 16)

```
41 changes: 40 additions & 1 deletion vignettes/treespace.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ points(seq_along(trees), rep(1, length(trees)), pch = 16,

Another thing we may wish to do is to take the consensus of each cluster:

```{r consensus, fig.align="center"}
```{r cluster-consensus, fig.align="center"}
par(mfrow = c(1, 2), mar = rep(0.2, 4))
col1 <- spectrum[mean(treeNumbers[cluster == 1])]
col2 <- spectrum[mean(treeNumbers[cluster == 2])]
Expand All @@ -219,6 +219,45 @@ plot(consensus(trees[cluster == 2], p = 0.5),

Here, we learn that the two clusters are distinguished by the position of `t7`.


### Identifying islands

Besides clustering, we can also define 'islands' in tree space that are
separated by a 'moat', such that all trees on one island are separated from
all trees on another by at least a certain distance [@Silva2021].

```{r island-id, fig.asp = 1, fig.width = 3, fig.align="center"}
par(mar = rep(0, 4))
# set a threshold corresponding to the width of the "moat" between islands
threshold <- 1.8
island <- Islands(distances, threshold)

# See how many trees are on each island
table(island)

# Let's ignore the small islands for now
largeIsle <- Islands(distances, threshold, smallest = 5)

# Colour trees according to their island
plot(mapping,
asp = 1, # Preserve aspect ratio - do not distort distances
ann = FALSE, axes = FALSE, # Don't label axes: dimensions are meaningless
col = ifelse(is.na(largeIsle), "grey", largeIsle + 1),
pch = 16
)
```

Let's view the consensus of each large island cluster:

```{r island-consensus, fig.align="center"}
par(mfrow = c(1, 2), mar = rep(0.2, 4))
plot(consensus(trees[!is.na(largeIsle) & largeIsle == 1], p = 0.5),
edge.color = 2, edge.width = 2, tip.color = 2)
plot(consensus(trees[!is.na(largeIsle) & largeIsle == 2], p = 0.5),
edge.color = 3, edge.width = 2, tip.color = 3)
```


### Validating a mapping

Now let's evaluate whether our map of tree space is representative.
Expand Down
Loading