Properly sort chromosomes in output file #50

wdecoster · 2021-10-14T14:22:57Z

Sorting the output file should be done by chromosome and then by position. However, sorting by chromosome is hard: you don't want chr10 before chr7, chromosomes may or may not have chr prefix. Removing the 'chr' and converting to int is a problem for chrX, chrY, chrMT, and all decoy contigs/alternative haplotypes.

I naively tried key=lambda col: col.astype(str).str.replace('chr', '').astype(int)) but it has to be more advanced.

Alternatively, we could use bcftools... but adding more dependencies if something I would like to avoid. Then again I already added pandas for combining the files.

The text was updated successfully, but these errors were encountered:

this is an unsolved problem :) see also #50

wdecoster added a commit that referenced this issue Oct 14, 2021

remove key for sorting

233f216

this is an unsolved problem :) see also #50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properly sort chromosomes in output file #50

Properly sort chromosomes in output file #50

wdecoster commented Oct 14, 2021

Properly sort chromosomes in output file #50

Properly sort chromosomes in output file #50

Comments

wdecoster commented Oct 14, 2021