-
Notifications
You must be signed in to change notification settings - Fork 11
Single cell A B compartment calling
The Higashi-analysis pipeline runs on the imputed contact maps produced by Higashi-main. Please first execute Higashi-main as described here.
We designed an approach for A/B compartment score annotation based on the widely-used method proposed in Lieberman-Aiden et al., Science, 2009. In the original compartment calling method, the Hi-C contact map is normalized and transformed into a Pearson correlation matrix. The sign of the first principal component (referred to as PC1 for simplicity) of this correlation matrix is then used to define A/B compartments.
We developed a new method that calculates continuous single cell A/B compartment scores that are directly comparable across cells and are sensitive to reflect the subtle variability of compartment shifts. The first two steps, i.e., normalization and transformation into Pearson correlation matrices, remain the same for each single cell. However, instead of performing PCA on each individual Pearson correlation matrix, we apply PCA once on the Pearson correlation matrix from the pooled scHi-C and save the PCA projection matrix. We then use this bulk projection matrix to transform the single-cell Pearson correlation matrices into continuous one dimensional vectors.
Please execute the following code:
cd higashi/
python scCompartment.py [-c CONFIG] [--calib_file FILE] [--calib] [--neighbor] [-o OUTPUT]
'
optional arguments:
--calib_file FILE The path to the calibration file (CG ratio, CpG density or bulk A/B compartment
annotations etc.)
--calib Calibrate the sign of the called A/B compartments. When using this option the
`calib_file` would be required.
-n, --neighbor Call compartments on the imputed maps with neighboring cell information utilized.
-o, --output Output file name (stored in the `temp_dur`). (default: scTAD.hdf5)
required arguments:
-c CONFIG The path to the configuration JSON file that you created in the step.
'
The FILE
should have the following format (a tab-separated text file):
chr1 0 0.0323533
chr1 1000000 0.033473
chr1 2000000 0.0275663
chr1 3000000 0.0193224
chr1 4000000 0.012299
chr1 5000000 0.020483
chr1 6000000 0.017645
chr1 8000000 0.01735
chr1 8000000 0.016412
Note: Higashi assumes that larger values corresponds to higher likelihood for a bin being A compartments. So if the used calibration file do not follow this convention, please adapt the signals accordingly.
We also provided a script called CpG_density.py
to calculate the CpG density, which requires the genome reference fasta file (For instance, hg19.fa from UCSC genome browser). To execute the script:
cd higashi
python CpG_density.py [-g FASTA] [-w WINDOW] [-o OUTPUT]
'
required arguments:
-g FASTA The path to the genome reference fasta file.
-w WINDOW The window size, should be the same as the imputation resolution.
-o OUTPUT The name of the output cpg_density file name, will be used in scAB calling proces
'
The code would generate a file cpg_density.txt
at the higashi/
folder, which can be directly used as the {CALIB_FILE}
in the above single cell A/B compartment calling algorithm.
Note: Higashi also supports using the projection matrix calculated based on real bulk Hi-C instead of the pooled scHi-C contact maps. To do that, include the bulk_path
parameter in the configuration file which records the path to the bulk Hi-C in the .mcool format.
Higashi ~ ~ Wiki
- Input files
- Usage (API)
- [Fast-Higashi initialized Higashi (Under construction)]
- Runtime of Fast-Higashi