-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Where to store the labels? #5
Comments
The best practice is of course to never modify input data. This is easy with pipelines that take data in some format and output processed data in some other format (e.g. CZI->Zarr). In that case I would follow the nf-core standard and have a parameter called However, when your workflow takes a Zarr as input, it seems to me like it should be okay to write back into it (as long as you don't modify existing data sets). This seems like a common usage pattern with NGFF formats: you do some processing and augment the input data set with another data set. And all the tools expect this (Napari, BDV, etc.), so if you write the labels to another container it may not be possible to easily visualize them. Personally, I would write the blurred image to the same Zarr as another group, since it is just a processed version. This keeps things a little more organized on the file system, and the provenance is more clear:
There could be a pipeline option like |
Barring the fact that there's no provision for "discovering" the blurred directory via the Zarr metadata, @krokicki's take matches mine. In the future, I think it would be good if we can support "writing to new subgroups and mildly updating the metadata". A summary for the current state might look like this:
|
Okay, so it sounds like the blurred version does belong outside as we have currently, and if it's being deleted anyway then keeping these tied together doesn't matter too much. But I also share the feeling that @tischi has about this being a little strange from a data management perspective. Maybe one way to mitigate it would be to make it explicit. Write the labels to a separate zarr by default, and have an option that allows writing it to the original zarr: So by default:
But with the option:
Just brainstorming. |
@tischi, yes. That's the current setting. And yes, the
but from a FAIR perspective, this makes more sense.
Since |
Very interesting discussion. I can see an argument for both approaches. As a user I would find it more convenient to have the individual outputs as individual zarr files:
This would make it easy to test different parameters (e.g. blur-sigmas) since I can just delete the whole directory and I am done with the clean-up. From a FAIR perspective it would make sense to put every processing output as a sub-group. However, that would require some extra work on the spec. A discussion has started in this issue. It seems to me that we would almost need two flags. One on the workflow level indicating if the results should be separate zarr-files and one on the task level indicating if the result should be kept after workflow completion. The second flag could be used by a clean-up task to remove intermediate results, which are not required any longer. |
Right now we are having this situation:
Before the analysis:
After the analysis
@BioinfoTongLI is this correct? Maybe not entirely, because the
blurred_image
may go to a differentoutput_dir
?!Since both the labels and the blurred image are a result of the analysis, from a data management point of view, it does feel weird to me that the labels are an in-place modification of the input image while the blurred image is outside.
I admit however that it is very convenient to have the labels with the image for later visual inspection. One may argue that the blurred image is just some non-interesting intermediate. In fact, probably one would not even save it in real life (and we just did that here for practicing)?!
@joshmoore What is your take on this? I think you said only storing labels in a separate container is possible?
@krokicki What would be your preferred best practice here?
@tibuch Your opinion?
The text was updated successfully, but these errors were encountered: