Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data conversion from different Hi-C contact matrices bin size #32

Open
LucoLab opened this issue Mar 2, 2018 · 1 comment
Open

data conversion from different Hi-C contact matrices bin size #32

LucoLab opened this issue Mar 2, 2018 · 1 comment
Assignees
Labels
Documentation Regarding to documentation and usage

Comments

@LucoLab
Copy link

LucoLab commented Mar 2, 2018

bash ./hic2give ./ test.hic giveInteraction.bed 40000

bin size that user wants to extract the data from (please make sure the bin size you entered is contained in the hic file).
I'm trying to transform .Hic to interaction matrice. I don't understand the bin size and how to choose one.

The interaction I finaly generated with 40 000 seems almost emtpy and I have not so much interaction.
Something interesting would be to browse from nearest to nearest interactions when your are blind and know what/where you want to look at.

THe project I try to visualise says on Geo :

content: Hi-C: tar ball archive of all normalized/corrected Hi-C data matrices binned at 40kb/250kb/1Mb, TAD boundaries at 40kb and genomic compartments at 250kb resolution

@caoxiaoyi03 caoxiaoyi03 added the Documentation Regarding to documentation and usage label Mar 2, 2018
@frankyan
Copy link
Member

The bin size is a required parameter for HiC data processing. It determines the resolution of Genomic Interaction from HiC data. When you use hic2give to convert certain HiC interaction table file format to give interaction format, you must correctly set the bin size to that used in HiC data processing.

You can read some papers about HiC bin_size, such as Hi-C: A comprehensive technique to capture the conformation of genomes. In section 3.3, it said

it is difficult to generate a Hi-C library with enough complexity or sequence depth to cover all possible restriction fragment interactions. In order to gain statistical power, it is useful to pool numbers of reads within larger genomic regions before further analyzing the data. Larger bins will contain more reads and thus have more discriminatory power, but at the cost of lowering the resolution of the data. The optimal bin size, and therefore the resolution at which the interaction data can be analyzed, depends on the sequencing depth and the linear separation of the genomic regions under consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Regarding to documentation and usage
Projects
None yet
Development

No branches or pull requests

3 participants