Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change input/output tables to be handled by pandas #4

Open
MikeDacre opened this issue Mar 22, 2016 · 2 comments
Open

Change input/output tables to be handled by pandas #4

MikeDacre opened this issue Mar 22, 2016 · 2 comments
Assignees

Comments

@MikeDacre
Copy link
Member

This would allow for easier maintainability, as well as calculating and/or plotting statistics.

@petercombs petercombs self-assigned this Mar 22, 2016
@petercombs
Copy link
Contributor

I'm unsure how much maintainability there is to be gained by switching the code to pandas. Part of the problem here is the A|C|G|T format for each SNP, since it violates the implicit pandas assumption that a single column is a single entity (one number/a string).

See Pull request #9 for an example of how much the code can be cleaned up. Some quick testing suggests that there's no major speed benefit or penalty to switching the output code to pandas.

@MikeDacre
Copy link
Member Author

Two questions:

  1. Could we change the A|C|G|T format from a single column to 4 columns with 0/1 for present/absent,
    or is that too complex?
  2. Do you think it is worth it? What I like about pandas is that it allows future extensibility: we could
    add code later that could use the data-frame, or we could even output the data-frame with pickle
    to be used programmatically later with a single import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants