Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rezipping in fmri_data may be unnecessary #60

Open
jcf2 opened this issue Sep 19, 2024 · 2 comments · Fixed by jcf2/CanlabCore#1 · May be fixed by #61
Open

Rezipping in fmri_data may be unnecessary #60

jcf2 opened this issue Sep 19, 2024 · 2 comments · Fixed by jcf2/CanlabCore#1 · May be fixed by #61

Comments

@jcf2
Copy link

jcf2 commented Sep 19, 2024

Currently fmri_data will uncompress .nii.gz inputs automatically using gunzip_image_names_if_gz:

[image_names, was_gzipped] = gunzip_image_names_if_gz(image_names);

This allows it to read .nii files. Depending on MATLAB gunzip and whether it deletes the original, this would result in either two files (.nii and .nii.gz) or just the .nii version. As housekeeping, fmri_data apparently assumes the latter as the worst case and rezips:

    % re-zip images if they were originally zipped
            % add .gz back to file names.
            if any(was_gzipped)
                
                image_names = re_zip_images(image_names, was_gzipped);
                
            end

where re_zip_images is a subfunction of fmri_data that calls MATLAB's gzip. This gzip call overwrites a .nii.gz if it exists, and re_zip_images then removes the .nii:

            gzip(deblank(image_names(i, :)));
            delete(deblank(image_names(i, :)))

This approach is unfortunately extremely costly, as gzip may take around 10x as much time as gunzip, and would seem to be totally unnecessary if the .nii.gz was in fact not deleted by gunzip. Also, it would seem that a replacement of .nii.gz files may invalidate cryptographic hashes calculated on the original.

So, one simple improvement would be to make the "housekeeping" step conditional, and if a .nii.gz exists remove the .nii instead.

In fact, the documentation for MATLAB's gzip (R2024a) says

gunzip does not delete the original GNU zip files.

and this seems to be true as well for past versions, so while OK to have that conditional "just in case" it seems rezip should be 100% avoidable here?

@jcf2
Copy link
Author

jcf2 commented Sep 19, 2024

Note: the original approach was apparently to use the host system's gunzip

% [status, result] = system(['gunzip ' my_image]);

and there deletion of the original .gz would be the default (overridable by --keep). So that explains where the "worst case assumption" above comes from...

@jcf2 jcf2 changed the title Rezipping in fmri_prep may be unnecessary Rezipping in fmri_data may be unnecessary Sep 19, 2024
@jcf2 jcf2 linked a pull request Sep 19, 2024 that will close this issue
@jcf2
Copy link
Author

jcf2 commented Sep 20, 2024

(oops, didn't mean to close this prematurely above... doh)

The above PR is designed to address efficiency related to zip with a minimal change -- just avoid an expensive and unnecessary call to gzip in fmri_data if the target already exists.

A separate issue that could be fixed by a deeper change is that fmri_data will fail on a readable .nii.gz if the directory is not writable (with a an obscure "internal error has occurred" originating from MATLAB gunzip). A fix that covers that as well could be to change the unzip structure:

  1. create a temporary directory: gunzip_temp = tempname;
  2. unzip there instead of in .gzip directory: gunzip(<target>, gunzip_temp);
  3. when the time comes for housecleaning, remove the temporary directory

Though in a way conceptually simpler, this approach would affect gunzip_image_names_if_gz as well as fmri_data (thus possibly affecting code beyond fmri_data), and would require passing the gunzip_temp location. Presumably in fmri_data it would also call for some renaming/restructuring, as re_zip_images would no longer be an accurate name.

@jcf2 jcf2 reopened this Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant