added per-group processing for inStrain compare#39
Conversation
|
Hi Nick, Thanks a ton for doing this. I appreciate the need to run compare over distributed systems, and as soon as I have time I'll dive into the implementation details you've provided here and see what's what. A way that I've done this in my own work is by using the Thanks again for going this and following up with the code you've written, much appreciated. -Matt |
|
The |
|
I am interested in the --list-groups feature. I also noticed that this step takes a long time. If I need to compare 300 IS files, can I run them separately and then combine them together? Can the --list-groups step be run with instrain compare? |
I quickly tried changing the code to allow for single-group processing by
inStrain compare. While the group processing time is uneven, such as this run:...parallel processing of each group separately can save some time (many hours in this example).
I couldn't really find any good testing datasets in
./tests/, so I used my own, but here is the general workflow for parallel processing of groups:...while the basic functionality remains intact:
The output for both approaches is the same, although it appears that your code allows for variable ordering of the output table columns, probably due to using a dict. Using an ordered dict will stabilize the column order.
Sorry for not keeping the code style consistent with you. Please just consider this an example/guide on how this could be done.