Skip to content

Formatting data files for JBrowse

Kim Rutherford edited this page Nov 24, 2021 · 26 revisions

Chromosome IDs

All features in data files need to use these chromosome IDs:

  • I
  • II
  • III
  • chr_II_telomeric_gap
  • mitochondrial
  • mating_type_region

Data directories

On oliver1:

  • main directory: /data/pombase/external_datasets/
  • original files, one sub-directory per dataset: /data/pombase/external_datasets/originals
  • processed files, with correct chromosome IDs, sorted and indexed: /data/pombase/external_datasets/processed/

The processed directory is rsynced to:

  • /home/ftp/pombase/external_datasets/ on the Babraham server

BED / TabixBED

sort -k1,1 -k2,2n -k3,3n BranchPoint_v2.bed > BranchPoint_v2.sorted.bed
bgzip BranchPoint_v2.sorted.bed
tabix -p bed BranchPoint_v2.sorted.bed.gz

WIG

for i in `find . -name '*.wig'`
do
  BW=${i%.wig}.bw
  echo $i
  wigToBigWig $i ~/pombe/chromosome.sizes $BW
done

Where chromosome.sizes is:

chr_II_telomeric_gap    20000
mitochondrial   19431
mating_type_region      20128
III     2452883
II      4539804
I       5579133

SAM/BAM

JBrowse needed sorted, indexed BAM files. Both the .bam and .bam.bai need to be on the server