Skip to content

Latest commit

 

History

History
executable file
·
59 lines (49 loc) · 1.56 KB

HAP1_100k.md

File metadata and controls

executable file
·
59 lines (49 loc) · 1.56 KB

chromflock at 100kb on HAP1 using GPSeq

This can be done on a system with 32 GB but will kill a system with 16 GB of ram (or use a lot of swapping).

Get Hi-C data

# 5,032,452,195 bytes
wget https://data.4dnucleome.org/files-processed/4DNFI1E6NJQJ/@@download/4DNFI1E6NJQJ.hic

Convert Hi-C matrix to raw matrices

This step requires that straw is installed and can be found in the PATH.

# A 30877x30877 matrix
~/code/chromflock_dev/util/python/chromflock_hic2H.py 4DNFI1E6NJQJ.hic 100000

Translocate Hi-C data

~/code/chromflock_dev/util/python/chromflock_translocate.py  4DNFI1E6NJQJ.hic.H.double 4DNFI1E6NJQJ.hic.L0.uint8 100000

Convert Hi-C (cc) data to contact probability matrix (cpm)

ChrY is also removed at this stage.

mkdir 1000
cd 1000
cc2cpm --hfile ../4DNFI1E6NJQJ.hic.H_trans.double --lfile ../4DNFI1E6NJQJ.hic.H_trans.L.uint8 --nCont 8 --nStruct 1000

Get GPSeq data (matlab)

L = read_raw('8.000000_1000_MAX.L.uint8', 'uint8');
L = L(L<24); % Excluded Y if not already excluded

G = loadGPSeq(L, ...
    '/mnt/bicroserver2/projects/GPSeq/centrality_by_seq/SeqNoGroup/B170_transCorrected/all/B170_transCorrected.asB165.rescaled.bins.size100000.step100000.csm3.rmOutliers_chi2.rmAllOutliers.tsv', ...
    100000);

GN = log2(G);
GN(GN>1) = 1;
GN(GN<0) = 0;
% GG: That's GPSeq score, ergo "centrality". I.e., greater at the center.
GN = 1-GN;

figure, plot(GN)
write_raw(GN, 'G.double', 'double');

Run chromflock

chromflock
vim chromflock_gen
# Add gpseq force
vim chromflock.lua
./chromflock_gen
screen
./chromflock_run