Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly sort chromosomes in output file #50

Open
wdecoster opened this issue Oct 14, 2021 · 0 comments
Open

Properly sort chromosomes in output file #50

wdecoster opened this issue Oct 14, 2021 · 0 comments

Comments

@wdecoster
Copy link
Collaborator

Sorting the output file should be done by chromosome and then by position. However, sorting by chromosome is hard: you don't want chr10 before chr7, chromosomes may or may not have chr prefix. Removing the 'chr' and converting to int is a problem for chrX, chrY, chrMT, and all decoy contigs/alternative haplotypes.

I naively tried key=lambda col: col.astype(str).str.replace('chr', '').astype(int)) but it has to be more advanced.

Alternatively, we could use bcftools... but adding more dependencies if something I would like to avoid. Then again I already added pandas for combining the files.

wdecoster added a commit that referenced this issue Oct 14, 2021
this is an unsolved problem :) see also 
#50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant