-
Notifications
You must be signed in to change notification settings - Fork 1
Formatting data files for JBrowse
Kim Rutherford edited this page Aug 19, 2022
·
26 revisions
All features in data files need to use these chromosome IDs:
- I
- II
- III
- chr_II_telomeric_gap
- mitochondrial
- mating_type_region
Commands like samtools
and bgzip
are installed on oliver1. If something is missing please let me (kmr) know.
On oliver1:
- main directory:
/data/pombase/external_datasets/
- original files, one sub-directory per dataset:
/data/pombase/external_datasets/originals
- processed files, with correct chromosome IDs, sorted and indexed:
/data/pombase/external_datasets/processed/
The processed
directory is copied to the Babraham server as part of the nightly load:
- using
rsync
- the files are in
/home/ftp/pombase/external_datasets/
on the Babraham server
BED format files need to be sorted and then compressed with bgzip
for use by JBrowse. Example:
sort -k1,1 -k2,2n -k3,3n BranchPoint_v2.bed > BranchPoint_v2.sorted.bed
bgzip BranchPoint_v2.sorted.bed
tabix -p bed BranchPoint_v2.sorted.bed.gz
for i in `find . -name '*.wig'`
do
BW=${i%.wig}.bw
echo $i
wigToBigWig $i ~/pombe/chromosome.sizes $BW
done
Where chromosome.sizes is:
chr_II_telomeric_gap 20000
mitochondrial 19431
mating_type_region 20128
III 2452883
II 4539804
I 5579133
JBrowse needed sorted, indexed BAM files with correct chromosome IDs. Both the .bam
and .bam.bai
need to be on the server.
For generating bigWig coverage graphs for use at lower zoom levels:
for i in *.bam
do
base=`basename $i .bam`
nice -19 bedtools genomecov -ibam $base.bam -bg -scale 1.0 > $base.bedgraph
LC_COLLATE=C sort -k1,1 -k2,2n $base.bedgraph > $base.sorted_bedgraph
/usr/local/bin/ucsc-utils/bedGraphToBigWig $base.sorted_bedgraph /var/pomcur/sources/pombe-embl/supporting_files/chromosome_sizes.txt $base.bw
rm $base.sorted_bedgraph $base.bedgraph
done
PomBase is funded by the Wellcome Trust