Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark biopython gff, pysam fastx and fasta #35

Open
ambrosejcarr opened this issue Apr 14, 2018 · 2 comments
Open

Benchmark biopython gff, pysam fastx and fasta #35

ambrosejcarr opened this issue Apr 14, 2018 · 2 comments

Comments

@ambrosejcarr
Copy link
Member

Biopython and pysam have iterators for some of the objects implemented in SC Tools.

See if those tools are more efficient than the ones implemented here, and if so, determine how difficult it would be to extend their tools with single-cell functionality.

@heuermh
Copy link

heuermh commented May 22, 2018

Please also consider the Python APIs to ADAM, https://pypi.org/project/bdgenomics.adam/.

Since the meeting in April I've been thinking mostly about representations for the matrix service, but if I could help with benchmarking feature and sequence readers and perhaps also implementing some of the tools here or in https://github.com/HumanCellAtlas/fastq_utils in a scalable fashion on Spark+ADAM, please let me know.

@ambrosejcarr
Copy link
Member Author

ambrosejcarr commented Jul 19, 2018

Great ideas! We'll definitely include these in any benchmarking. I think @mbabadi is considering doing some of this work either this or next quarter, and our team's engineers @dshiga may also work on getting these things tools running a bit more performantly.

When we start doing this, we'll sync you in and see how to best include you. @heuermh :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants