Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add method to overwrite / redo some analysis steps #311

Open
larsoner opened this issue Oct 22, 2020 · 5 comments
Open

ENH: Add method to overwrite / redo some analysis steps #311

larsoner opened this issue Oct 22, 2020 · 5 comments

Comments

@larsoner
Copy link
Member

larsoner commented Oct 22, 2020

At least estimation of:

  • _maxbad.txt
  • .pos
  • -annot.h5
  • -counts.h5

These steps are slow so currently they are recomputed only when they are missing. This means the way to say "recompute these" is to delete files from disk (not great). We could add an overwrite/recompute parameter to control which of these to recompute. Or maybe make it so the inputs to the computation function are cached properly (joblib?) so that recomputation is automatic when the relevant parameters change.

@NeuroLaunch
Copy link
Collaborator

NeuroLaunch commented Feb 18, 2021

I like the idea of setting up a toggle parameter to greenlight potential overwrites, as it can be very annoying to launch a run only to realize a single change (for me, usually related to HPI processing) will cause an error.

I actually do something similar in my own code:

clear_chpi, clear_annot = True, True
if any((clear_chpi, clear_annot)):
    delete_sssfiles(params, clear_chpi, clear_annot)

where delete_sssfiles() is a simple program I stashed away in my scoring script. But this just deletes the files outright, forcing the recomputation.

@larsoner
Copy link
Member Author

I think the annotation is completely dependent on the cHPI fitting, so maybe we can get away with just adding a do_chpi parameter that deletes all of the files listed above except _maxbad.txt (but hopefully we don't change that very often) and then reestimates? And for maxbad I guess we could have do_maxbad=True | False as well.

In both cases, do_whatever=False means "don't do it if it's already there" (and might automatically be run if do_sss=True and the files are not there) and do_whatever=True means "delete whatever is there and run it".

@NeuroLaunch
Copy link
Collaborator

It may be worth implementing your above idea of recomputing (if the toggle is True) only when one or more of the relevant parameters has changed OR if a computed file isn't present. I'm not familiar with joblib, but I imagine a simple file generated during the SSS/cHPI step that holds meta data (or hash codes) with details about how the files are created, something that the params dictionary is compared against for later runs.

@NeuroLaunch
Copy link
Collaborator

It would be more work, but perhaps such a meta file could be comprehensive across all of the important MNEFun parameters, as well as archivable and human-readable, effectively providing a diary of MNEFun processing for that experiment directory. (I picture a command line tool that could give a nicely formatted readout of the last run or a prior run.) Just a pipeline dream?

@larsoner
Copy link
Member Author

It may be worth implementing your above idea of recomputing (if the toggle is True) only when one or more of the relevant parameters has changed OR if a computed file isn't present... Just a pipeline dream?

Yes in principle this would work with something like joblib.cache set up properly but it's difficult and a lot of work to get right. For now I would just always do it if it's true, it's an intermediate solution but it's easy to implement and solves a real problem people have now, even if it doesn't do it optimally (e.g., by automatically tracking what needs to be done).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants