You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm unsure how much maintainability there is to be gained by switching the code to pandas. Part of the problem here is the A|C|G|T format for each SNP, since it violates the implicit pandas assumption that a single column is a single entity (one number/a string).
See Pull request #9 for an example of how much the code can be cleaned up. Some quick testing suggests that there's no major speed benefit or penalty to switching the output code to pandas.
Could we change the A|C|G|T format from a single column to 4 columns with 0/1 for present/absent,
or is that too complex?
Do you think it is worth it? What I like about pandas is that it allows future extensibility: we could
add code later that could use the data-frame, or we could even output the data-frame with pickle
to be used programmatically later with a single import.
This would allow for easier maintainability, as well as calculating and/or plotting statistics.
The text was updated successfully, but these errors were encountered: