added per-group processing for inStrain compare #39
+109
−14
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I quickly tried changing the code to allow for single-group processing by
inStrain compare
. While the group processing time is uneven, such as this run:...parallel processing of each group separately can save some time (many hours in this example).
I couldn't really find any good testing datasets in
./tests/
, so I used my own, but here is the general workflow for parallel processing of groups:...while the basic functionality remains intact:
The output for both approaches is the same, although it appears that your code allows for variable ordering of the output table columns, probably due to using a dict. Using an ordered dict will stabilize the column order.
Sorry for not keeping the code style consistent with you. Please just consider this an example/guide on how this could be done.