Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bnb/masked fwp #231

Merged
merged 43 commits into from
Nov 13, 2024
Merged

Bnb/masked fwp #231

merged 43 commits into from
Nov 13, 2024

Conversation

bnb32
Copy link
Collaborator

@bnb32 bnb32 commented Sep 25, 2024

This is just a few additions to the refactor branch, and would like to merge before it diverges too much. This adds the option to skip chunks in the forward pass routine if all coordinates covered by that chunk are masked. The mask is provided as an additional variable through file_paths, where mask is 1 / True if the coordinate should not be included in the forward pass. An example which masks conus except for areas around observation locations is shown below.

Mask:
conus_val_mask

Collected forward pass output:
conus_fwp_masked

@bnb32 bnb32 force-pushed the bnb/masked_fwp branch 2 times, most recently from c50f16e to 606eeb7 Compare October 25, 2024 10:48
@bnb32 bnb32 changed the base branch from main to bnb/dh_refactor October 25, 2024 15:24
@bnb32 bnb32 marked this pull request as ready for review October 25, 2024 16:23
Base automatically changed from bnb/dh_refactor to main November 5, 2024 20:40
…pi should not interrupt a multi download run
…all points masked. mask is given as variable in same file paths used for fwp input. adjusted collection to make sure all unique spatiotemporal regions are included in time index and meta.
…ed to start at the zeroth hour, if it has not been shifted already.
…rent time indices, like for monthly files. also added catch in cacher for weird dimensions that we shouldnt be writing to h5
…g. u_30m from u_10m and u_100m, with u pressure level array
@bnb32 bnb32 force-pushed the bnb/masked_fwp branch 2 times, most recently from e4b5c21 to 07877a3 Compare November 8, 2024 17:22
…es. each node uses this data so it doesn't make sense to have each node try to cache this data.
Copy link
Member

@grantbuster grantbuster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

@@ -396,6 +397,9 @@ def monthly_local_linear_bc(
effect of extreme values within aggregations over large number of
pixels. This value is the standard deviation for the gaussian_filter
kernel.
range_kwargs : dict | None
Dictionary of ranges for scalar and adder values. e.g. {'scalar': (0,
3), 'adder': (-2, 2)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally i dont think we should use kwargs in a public-facing method unless you can point to a subsequent docstring that describes the options. I would think scalar_range and adder_range would be more clear and explicit? what are your thoughts bnb?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree with that. Changed.

f'bias_{bias_feature}_mean': np.nanmean(bias_data),
f'bias_{bias_feature}_std': bias_std,
f'base_{base_dset}_mean': np.nanmean(base_data),
f'base_{base_dset}_std': np.nanstd(base_data),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of confusing if the base data is vortex data that is only 0D. Does std just come out to be nan/inf?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. This isn't even used for scalar correction calculations so maybe it shouldn't be written at all?

Copy link
Member

@grantbuster grantbuster Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i would argue it should not be calculated then and the adder should be fixed as zero if you're using the linear function from the bias transforms

@@ -18,7 +18,7 @@
logger = logging.getLogger(__name__)


def lin_bc(handler, bc_files, threshold=0.1):
def lin_bc(handler, bc_files, reference_feature=None, threshold=0.1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've been using "bias_feature" elsewhere. you might consider changing for uniformity but i agree reference feature is maybe a better name. just won't match the bias calculation work and could cause confusion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to match reference_feature used in qdm_bc but you're right these both should match what is used in bias_calc.py

masked."""
def unmasked_chunks(self):
"""List of chunk indices that are not masked from the input spatial
region."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more explicit in what is masked vs. unmasked? You could mask included or excluded points. I feel like the word "mask" does not intuitively convey included or excluded. I actually would have defined things the other way around but that's okay.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I was thinking of masked as synonymous with filtered but I suppose it could go either way. Added somre more info here and elsewhere.

@grantbuster
Copy link
Member

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

@bnb32 can you clarify these general comments too?

…ask is 2D and chunks with _any_ unmasked points will still be sent through the generator)
@bnb32
Copy link
Collaborator Author

bnb32 commented Nov 12, 2024

Minor changes and clarification: does the mask variable only need to have one pixel False in an otherwise large chunk to be included in the fwp? Or does the whole chunk need to be false? Also, does this work for temporal masking or is this spatial only? Is the mask dataset assumed to be 2D?

@bnb32 can you clarify these general comments too?

Oh yeah, no prob. Just added some info to ForwardPassStrategy doc string. Mask is spatial only and a single unmasked point will send the chunk through the generator.

@bnb32 bnb32 merged commit 6e708cb into main Nov 13, 2024
12 checks passed
@bnb32 bnb32 deleted the bnb/masked_fwp branch November 13, 2024 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants