Hanrui Zhang ([email protected]) and Yuanfang Guan ([email protected])
This is team Guan&Zhang's submission to Allen Institute Cell Lineage Reconstruction DREAM Challenge.
Our method in subchallenge 1 was based on:
- distance transformation,
- rule based hierarchical clustering based on minimal distance which we describe in the following figure.
Figure 1. Workflow of Guanlab’s method in Subchallenge 1.
First of all, we summarized the frequency of different edition states of the 10 barcodes based on the training set (right panel in Figure 1). The more frequent edition states should be assigned less importance, therefore we assign larger distances. The edition states of the barcode 1-10 is in the Barcode distance table (Figure 1).
Then, we start to reconstruct the lineage tree for every training dataset. First for every group of cells we need to reconstruct the lineage tree, we transform their edition status according to the Barcode Distance table (middle panel in Figure 1).
Then the transformed edition status are sent for hierarchical clustering, where the two cells with minimal distances are clustered together, and the parent cell of the two cells are deduced based on the irreversible edition rules (“Constructed new nodes from leaves” in Figure 1). The hierarchical clustering stops until there’s only one node left for the whole cell set therefore no more cells left for clustering.
Before you start, make sure the following dependencies have been installed:
To reproduce Guan&Zhang's submission for SubChallenge1:
python train.py [INPUTPATH]
[INPUTPATH]
is the path of directory containing all the input recordings.
This program will generate a file prediction.txt
and a new folder ./output/
.
prediction.txt
: a data table containing two columns. The first columndreamID
is the tree id. The second columnnw
is the recomstructed tree in Newick format../output/
: the reconstructed trees in Newich format.