Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group mode #34

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Group mode #34

wants to merge 3 commits into from

Conversation

koenhelwegen
Copy link
Member

Adds a group mode to the structural and functional pipeline. To use the group mode, the user passes a file instead of a subject directory; the file should contain one subject directory per line.

This feature is useful when doing short computations in which starting up the MCR becomes a substantial overhead.

@SiemondeLange
Copy link
Member

Great feature! This option would not only be helpful for reducing MATLAB Runtime (MCR) overhead, but simplify running multiple subjects by eliminating the need for a separate script.

In the current implementation, this PR would introduce a breaking change, as the --subjectDir option of the executable would be deprecated, requiring a new major CATO version (v4) that is not anticipated(?) in the near future. Maybe we can use both --subjectDir and --subjectListFile as input options for the executable? This approach would be similar to the pipeline functions in MATLAB that use structural_pipeline(SUBJECTDIR) and structural_pipeline(SUBJECTLISTFILE). This way, we can integrate these improvements much faster into CATO v3.X.

I would also recommend to try to implement test coverage for this newly introduced feature to ensure its proper functioning.

I will add more detailed comments to the code as well!

fprintf('\n--------strarting group run----------\n')

% read file
fi = fopen(subjectListFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using fopen and fgetl functions, we can maybe use the readcell function to read the subject list file. readcell handles trailing white spaces at the end of lines (which occurs often accidentally) and provides better error handling, which makes it a more reliable option.

Additionally, using readcell allows us to load all subject directories at once and let the pipeline check that they are valid before running the pipeline. This way, the user can get information about the number of valid subjects and catch errors before running the long pipeline.

Maybe, readcell has some drawbacks that I missed, in that case these issues could also be addressed with e.g. deblank.

Small note: The file handle is commonly referred to as fid instead of fi in CATO.


assert(ischar(subjectDir), ...
assert(ischar(subjectSelection), ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update subjectDirNotText to subjectSelectionNotText (here and at the other locations), such that the internal error ID matches the variable.

'CATO:functional_pipeline:subjectDirNotText', ...
'subjectDir must be a row vector of characters or string scalar.');
assert(isdir(subjectDir), ...
assert(isdir(subjectSelection) | isfile(subjectSelection), ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function isfolder is recommended instead of isdir:
https://www.mathworks.com/help/matlab/ref/isdir.html

end

fprintf('\n---------group run finished----------\n')
fprintf('Processed %d subjects, %d failed, %d unrecognized\n\n', ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice that the code keeps track of the unsuccessful subjects! It might also be a good idea to provide a list of the subject names that were unsuccessful as this might help users to identify issues quickly. Maybe this list can be displayed alongside the statistics?

To keep this CATO usable with this feature, it might be helpful to truncate the list to a manageable size (e.g. 100 or 1,000) when the number of subjects is very large.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants