Sub-compartment Identifier (SCI)

Authors: Haitham Ashoor, Sheng Li Contact: [email protected], [email protected]

Description

SCI is a program to identify sub-compartments from HiC data. SCI utilizes graph embedding followed by K-means clustering in order to predict sub-compartments from HiC data.

Dependencies

python 2.7

Python Libraries

scikit-learn >=0.19.0
Numpy >= 1.15
tqdm>=4.24

C++ libraries

GSL

Installation

$ python setup.py

Input format

SCI accepts bedpe-like format

chr1: is the chromosome name for the first interacting HiC bin
start1: is the starting coordinate for the first interacting HiC bin
end1: is the ending coordinate for the first interacting HiC bin
chr2: is the chromosome name for the second interacting HiC bin
start2: is the starting coordinate for the second interacting HiC bin
end2: is the ending coordinate for the second interacting HiC bin
HiC count: number of HiC reads for the interacting HiC bins. SCI does not perform HiC normalization, if user wants to use normalized HiC data, HiC count should corresponds to the normalized HiC read-count.

SCI provides a script to convert .hic format into SCI accepted format under scripts/hic2sci.sh. In order to convert .hic file into please follow the following instructions:

export installed juicer-tools into JUICERTOOLS environment variable

$ cd export JUICERTOOLS=/path/to/juicer-tools

Then, run hic2sci script to get SCI formatted input data:

$ scripts/hic2sci.sh <input .hic file> <output file> <resolution>

Parameters description:

Parameter	Mandatory/Optional	Description
-n, --name	yes	Name of the experiment, it will be used as a prefix for all output files
-r, --resolution	yes	Required resolution to predict compartments,provided bins' size should have resolution greater than or equal the provided value
-g, --genome_size	yes	File containing chromosome sizes of the target genome
-o, --order	No. Default: 1	Graph order to consider when performing graph embedding. Available options are 1,2 or both
-s, --samples	No. Default: 25	Number of edges to sample in millions order from the graph
-k, --clusters	No. Default: 2	Nubmer of sub-compartments to be predicted

Output

SCI output sub-compartments annotation into BED format with the following fields:

chr: chromosome for sub-compartment annotaiton
start: genomic location where sub-compartment bin starts
end: genomic location where sub-compartment bin ends
label: sub-compartment unique label. Bins that do not have sub-compartment label due to low mapability are labeled with NA.

Test run

To preform test run for SCI please follow the following steps:

Go the Input_sample directory

$ cd Input_sample

Uncompress the sample file:

$ gunzip SCI_input.txt.gz

Go back to SCI main directory

$ cd ..

run SCI using the following command

$ python sci.py -n test -f Input_sample/SCI_input.txt -r 100000 -g chromosome_sizes/hg19.chrom.sizes -o both -s 1 -k 5

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LINE		LINE
chromosome_sizes		chromosome_sizes
images		images
predictions		predictions
sci		sci
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirments.txt		requirments.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sub-compartment Identifier (SCI)

Description

Dependencies

Installation

Input format

Parameters description:

Output

Test run

About

Releases

Packages

Languages

License

BoevaLab/sci

Folders and files

Latest commit

History

Repository files navigation

Sub-compartment Identifier (SCI)

Description

Dependencies

Installation

Input format

Parameters description:

Output

Test run

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages