Skip to content

Formatting data files for JBrowse

Kim Rutherford edited this page Nov 24, 2021 · 26 revisions

Chromosome IDs

All features in data files need to use these chromosome IDs:

  • I
  • II
  • III
  • chr_II_telomeric_gap
  • mitochondrial
  • mating_type_region

Data directories

On oliver1:

  • main directory: /data/pombase/external_datasets/
  • original files, one sub-directory per dataset: /data/pombase/external_datasets/originals
  • processed files, with correct chromosome IDs, sorted and indexed: /data/pombase/external_datasets/processed/

The processed directory is rsynced to:

  • /home/ftp/pombase/external_datasets/ on the Babraham server

BED / TabixBED

sort -k1,1 -k2,2n -k3,3n BranchPoint_v2.bed > BranchPoint_v2.sorted.bed
bgzip BranchPoint_v2.sorted.bed
tabix -p bed BranchPoint_v2.sorted.bed.gz

WIG

for i in `find . -name '*.wig'`
do
  BW=${i%.wig}.bw
  echo $i
  wigToBigWig $i ~/pombe/chromosome.sizes $BW
done

Where chromosome.sizes is:

chr_II_telomeric_gap    20000
mitochondrial   19431
mating_type_region      20128
III     2452883
II      4539804
I       5579133

SAM/BAM

JBrowse needed sorted, indexed BAM files with correct chromosome IDs. Both the .bam and .bam.bai need to be on the server.

For generating bigWig coverage graphs for use at lower zoom levels:

for i in *.bam
do base=`basename $i .bam`
   nice -19 bedtools genomecov -ibam $base.bam -bg -scale 1.0 > $base.bedgraph
   LC_COLLATE=C sort -k1,1 -k2,2n $base.bedgraph > $base.sorted_bedgraph
   /usr/local/bin/ucsc-utils/bedGraphToBigWig $base.sorted_bedgraph /tmp/chromosome_sizes.txt $base.bw
done