Move processing to Denali? #24

aappling-usgs · 2020-07-31T13:45:58Z

From #23:

UV data munge was causing memory failures in R. My solution was to reduce down to daily mean values in the combine step, so that the raw data is not preserved in the shared cache. I think this approach is ok, since we have a reproducible pipeline + are not using the raw data.

I agree that this is OK, but I also want to document David's suggestion from standup that this pull could be done on one of the USGS clusters. This could potentially solve two problems: (1) the UV data munge probably(?) won't cause memory failures if the available memory is larger, and (2) in theory, though we've not yet tried this, having the data pull on a cluster would allow multiple people access to the raw data pull without going through the shared cache. Given the unknowns with each of these objectives, I'm not pushing hard for this switch, but let's at least keep it on the table.

If we did this, I think we'd do the pulling on a data transfer node (to be good cluster citizens) and then switch over to a login node -> SLURM-allocated job to get the larger memory needed to process the data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move processing to Denali? #24

Move processing to Denali? #24

aappling-usgs commented Jul 31, 2020

Move processing to Denali? #24

Move processing to Denali? #24

Comments

aappling-usgs commented Jul 31, 2020