Bnb/masked fwp #231

bnb32 · 2024-09-25T15:04:03Z

This is just a few additions to the refactor branch, and would like to merge before it diverges too much. This adds the option to skip chunks in the forward pass routine if all coordinates covered by that chunk are masked. The mask is provided as an additional variable through file_paths, where mask is 1 / True if the coordinate should not be included in the forward pass. An example which masks conus except for areas around observation locations is shown below.

Mask:

Collected forward pass output:

…pi should not interrupt a multi download run

…all points masked. mask is given as variable in same file paths used for fwp input. adjusted collection to make sure all unique spatiotemporal regions are included in time index and meta.

…ed to start at the zeroth hour, if it has not been shifted already.

… instead of multi variable monthly files

…rent time indices, like for monthly files. also added catch in cacher for weird dimensions that we shouldnt be writing to h5

…oad, for era downloader.

…_exo_features

…orrupted and need to be redownloaded

…se must be determined from the config

… fails

…g. u_30m from u_10m and u_100m, with u pressure level array

…ed to era downloader

…e height. Annua scalar correction bias correction method. removed random string from exo cache naming.

…es. each node uses this data so it doesn't make sense to have each node try to cache this data.

…rnings.

…derive ws_20m from u, v, zg, and topography. also some edits to allow interp on time independent variables.

grantbuster

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

grantbuster · 2024-11-12T19:52:49Z

sup3r/bias/bias_transforms.py

@@ -396,6 +397,9 @@ def monthly_local_linear_bc(
        effect of extreme values within aggregations over large number of
        pixels.  This value is the standard deviation for the gaussian_filter
        kernel.
+    range_kwargs : dict | None
+        Dictionary of ranges for scalar and adder values. e.g. {'scalar': (0,
+        3), 'adder': (-2, 2)}


generally i dont think we should use kwargs in a public-facing method unless you can point to a subsequent docstring that describes the options. I would think scalar_range and adder_range would be more clear and explicit? what are your thoughts bnb?

Yeah I agree with that. Changed.

grantbuster · 2024-11-12T19:54:10Z

sup3r/bias/bias_calc.py

+            f'bias_{bias_feature}_mean': np.nanmean(bias_data),
+            f'bias_{bias_feature}_std': bias_std,
+            f'base_{base_dset}_mean': np.nanmean(base_data),
+            f'base_{base_dset}_std': np.nanstd(base_data),


This is kind of confusing if the base data is vortex data that is only 0D. Does std just come out to be nan/inf?

Yeah good point. This isn't even used for scalar correction calculations so maybe it shouldn't be written at all?

yeah i would argue it should not be calculated then and the adder should be fixed as zero if you're using the linear function from the bias transforms

grantbuster · 2024-11-12T19:56:01Z

sup3r/bias/utilities.py

@@ -18,7 +18,7 @@
 logger = logging.getLogger(__name__)


-def lin_bc(handler, bc_files, threshold=0.1):
+def lin_bc(handler, bc_files, reference_feature=None, threshold=0.1):


I think we've been using "bias_feature" elsewhere. you might consider changing for uniformity but i agree reference feature is maybe a better name. just won't match the bias calculation work and could cause confusion.

I did this to match reference_feature used in qdm_bc but you're right these both should match what is used in bias_calc.py

grantbuster · 2024-11-12T19:59:48Z

sup3r/pipeline/strategy.py

-        masked."""
+    def unmasked_chunks(self):
+        """List of chunk indices that are not masked from the input spatial
+        region."""


Can you be more explicit in what is masked vs. unmasked? You could mask included or excluded points. I feel like the word "mask" does not intuitively convey included or excluded. I actually would have defined things the other way around but that's okay.

Yeah I was thinking of masked as synonymous with filtered but I suppose it could go either way. Added somre more info here and elsewhere.

grantbuster · 2024-11-12T20:47:08Z

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

@bnb32 can you clarify these general comments too?

…ask is 2D and chunks with _any_ unmasked points will still be sent through the generator)

bnb32 · 2024-11-12T21:00:22Z

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

@bnb32 can you clarify these general comments too?

Oh yeah, no prob. Just added some info to ForwardPassStrategy doc string. Mask is spatial only and a single unmasked point will send the chunk through the generator.

bnb32 force-pushed the bnb/masked_fwp branch 2 times, most recently from c50f16e to 606eeb7 Compare October 25, 2024 10:48

bnb32 changed the base branch from main to bnb/dh_refactor October 25, 2024 15:24

bnb32 force-pushed the bnb/masked_fwp branch from 606eeb7 to d031cab Compare October 25, 2024 16:20

bnb32 marked this pull request as ready for review October 25, 2024 16:23

bnb32 requested a review from grantbuster October 25, 2024 16:23

Base automatically changed from bnb/dh_refactor to main November 5, 2024 20:40

bnb32 added 23 commits November 5, 2024 15:14

added try and except for era downloader. occasional falures from cdsa…

f38d01c

…pi should not interrupt a multi download run

added spatial fwp mask used to skip forward_pass runs on chunks with …

d2db0d8

…all points masked. mask is given as variable in same file paths used for fwp input. adjusted collection to make sure all unique spatiotemporal regions are included in time index and meta.

time shift added to solar module, so that daily gcm data can be shift…

3b6218d

…ed to start at the zeroth hour, if it has not been shifted already.

era downloader changes - making single variable yearly files directly…

a52a9c5

… instead of multi variable monthly files

bug in era_downloader, needed to combine files differently when diffe…

3c2d8d0

…rent time indices, like for monthly files. also added catch in cacher for weird dimensions that we shouldnt be writing to h5

using Cacher classes in era downloader intead of to_netcdf

0ae1f02

era downloader test fixes

edf606f

better node distribution for solar module

80b431d

dont want time slice in exo kwargs

e407ebf

tmp download file before moving to non tmp file upon successful downl…

3294cc3

…oad, for era downloader.

trim exo_handler_kwargs for features in model.lr_features or model.hr…

6daf57a

…_exo_features

added additional check on downloaded era files. these sometimes get c…

900192f

…orrupted and need to be redownloaded

surface models dont have topo as part of lr or hr exo features so the…

9e2102d

…se must be determined from the config

compute node_chunks without masked chunks

b4823b9

need slices from preflight to compute masked nodes for head node

000ec2b

removed auto chunk since when the time index is object dtype chunking…

39091bb

… fails

a little nc writer clean up

cd0c8cb

fixed pipeline with mask test

527bc7e

added height interpolation for the case of just single level data. e.…

901c100

…g. u_30m from u_10m and u_100m, with u pressure level array

flatten method added to Sup3rX and monthly averaged product types add…

8d8cc6b

…ed to era downloader

add coordinate meta dataframe as dataset to h5 cacher

25f5dfa

mistaken commit with booster import

5e82479

duplicate function in era_downloader

45e52d3

bnb32 added 9 commits November 5, 2024 15:24

round in log output

d72b1a0

removed duplicate coords

513a0a3

tests for caching during fwp. modification to fwp_mask to use Loader

6a6f235

Need to use Rasterizer instead of Loader for fwp mask check.

4ea7b83

verbose logging for netcdf caching in chunks. safe casting for featur…

a6d1110

…e height. Annua scalar correction bias correction method. removed random string from exo cache naming.

fixes: MonthlyScalarCorrection, height interp for height = 0

b8075a7

cdsapi url update. cachers additions with dataset specific attributes

15588b4

removed duplicate enum method

2997a4e

Addition to h5 loader for 2d spatial only data.

9d6363c

bnb32 force-pushed the bnb/masked_fwp branch 2 times, most recently from e4b5c21 to 07877a3 Compare November 8, 2024 17:22

exogenous data caching done on head node after checking for cache fil…

298e551

…es. each node uses this data so it doesn't make sense to have each node try to cache this data.

bnb32 force-pushed the bnb/masked_fwp branch from 07877a3 to 298e551 Compare November 8, 2024 18:27

bnb32 added 8 commits November 8, 2024 17:06

Using preprocessing function in call to xr.open_mfdataset

ece79e7

mfdataset preprocessing: cftime to datetimeindex before casting to int

f3b6b04

Merge branch 'main' into bnb/masked_fwp

e03720f

temporal_coarsening: redundant code removed

de6c088

tests: making sure upper case warnings are raised, not just random wa…

a181ae8

…rnings.

era_downloader test fix

fd13f49

enabled recursive feature derivation with height interpolation. e.g. …

8ae86d5

…derive ws_20m from u, v, zg, and topography. also some edits to allow interp on time independent variables.

check for rename method before lower_names

7eb2b8e

bnb32 force-pushed the bnb/masked_fwp branch from 978f3a5 to 7eb2b8e Compare November 12, 2024 17:26

grantbuster requested changes Nov 12, 2024

View reviewed changes

pr changes

f2820e6

removed std calc from scalar correction. added info on fwp masking (m…

3326039

…ask is 2D and chunks with _any_ unmasked points will still be sent through the generator)

grantbuster approved these changes Nov 12, 2024

View reviewed changes

bnb32 merged commit 6e708cb into main Nov 13, 2024
12 checks passed

bnb32 deleted the bnb/masked_fwp branch November 13, 2024 02:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bnb/masked fwp #231

Bnb/masked fwp #231

bnb32 commented Sep 25, 2024 •

edited

Loading

grantbuster left a comment

grantbuster Nov 12, 2024

bnb32 Nov 12, 2024

grantbuster Nov 12, 2024

bnb32 Nov 12, 2024

grantbuster Nov 12, 2024 •

edited

Loading

grantbuster Nov 12, 2024

bnb32 Nov 12, 2024

grantbuster Nov 12, 2024

bnb32 Nov 12, 2024

grantbuster commented Nov 12, 2024

bnb32 commented Nov 12, 2024

Bnb/masked fwp #231

Bnb/masked fwp #231

Conversation

bnb32 commented Sep 25, 2024 • edited Loading

grantbuster left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantbuster Nov 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

grantbuster commented Nov 12, 2024

bnb32 commented Nov 12, 2024

bnb32 commented Sep 25, 2024 •

edited

Loading

grantbuster Nov 12, 2024 •

edited

Loading