Change input/output tables to be handled by pandas #4

MikeDacre · 2016-03-22T22:24:35Z

This would allow for easier maintainability, as well as calculating and/or plotting statistics.

petercombs · 2016-03-24T20:04:12Z

I'm unsure how much maintainability there is to be gained by switching the code to pandas. Part of the problem here is the A|C|G|T format for each SNP, since it violates the implicit pandas assumption that a single column is a single entity (one number/a string).

See Pull request #9 for an example of how much the code can be cleaned up. Some quick testing suggests that there's no major speed benefit or penalty to switching the output code to pandas.

MikeDacre · 2016-03-24T20:57:32Z

Two questions:

Could we change the A|C|G|T format from a single column to 4 columns with 0/1 for present/absent,
or is that too complex?
Do you think it is worth it? What I like about pandas is that it allows future extensibility: we could
add code later that could use the data-frame, or we could even output the data-frame with pickle
to be used programmatically later with a single import.

MikeDacre added the enhancement label Mar 22, 2016

petercombs self-assigned this Mar 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change input/output tables to be handled by pandas #4

Change input/output tables to be handled by pandas #4

MikeDacre commented Mar 22, 2016

petercombs commented Mar 24, 2016

MikeDacre commented Mar 24, 2016

Change input/output tables to be handled by pandas #4

Change input/output tables to be handled by pandas #4

Comments

MikeDacre commented Mar 22, 2016

petercombs commented Mar 24, 2016

MikeDacre commented Mar 24, 2016