Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support subsetting datasets by constraining coordinate values. #494

Open
wants to merge 11 commits into
base: develop
Choose a base branch
from

Conversation

MCazaly
Copy link
Contributor

@MCazaly MCazaly commented Jun 29, 2022

@jpolton I'm hoping to get you to try this out and tell me...
A: Is this way of defining coordinates and shapes something that feels good for you to use?
B: Have I used the correct terminology in my docstrings/method names?

Example code based on what I have been playing with locally:

import coast

fn_config_t_grid = "./config/example_cmems_grid_t.json"
database = coast.Copernicus(username, password, "nrt")
forecast = database.get_product("global-analysis-forecast-phy-001-024")
nemo_t = coast.Gridded(fn_data=forecast, config=fn_config_t_grid)

start = coast.Coordinates2D(50, -10)
end = coast.Coordinates2D(60, 5)
nemo_t.set_constraint(start, end)

ssh = nemo_t.dataset.ssh.isel(t_dim=1).values  # Download subsetted data into a numpy Array

@MCazaly MCazaly requested a review from jpolton June 29, 2022 11:59
@MCazaly MCazaly self-assigned this Jun 29, 2022
@MCazaly MCazaly added enhancement New feature or request help wanted Extra attention is needed labels Jun 29, 2022
@jpolton
Copy link
Collaborator

jpolton commented Jun 29, 2022

import coast
import unit_testing.unit_test_files as files
nemo_t = coast.Gridded(files.fn_nemo_dat, files.fn_nemo_dom, config=files.fn_config_t_grid)
start = coast.Coordinates2D(50, -10)
end = coast.Coordinates2D(60, 5)
nemo_t.set_constraint(start, end)

Produces a problem - which I think is to do with the 2D nature of the latitude and longitudes.

coast/_utils/coordinates.py Show resolved Hide resolved
@dataclass
class Coordinates2D:
"""Represent a point in one-to-two-dimensional space with optional X and Y coordinates."""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinates2D is an example of an object where the user needs to know the ordering convention: Coordinates2D(45, 50) is obviously a different place to Coordinates2D(50,45), but which is which.

Convention would have x preceding y, but also that latitude proceeds longitude...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the parameters are named "x" and "y", what would be you preferred way of indicating that?

coast/data/coast.py Outdated Show resolved Hide resolved
coast/data/coast.py Outdated Show resolved Hide resolved
Returns:
The corresponding coordinate array from the provided dataset.
"""
return get_coord(dataset, "x")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, a coordinate longitude can be 2 dimensional. E.g. if you want a box that is 1000 km x 1000 km, then near the north pole it would vary with latitude and longitude.

Or is the terminology for "coordinates" referring to indices in an array, which are 1-dimensional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is the terminology for "coordinates" referring to indices in an array, which are 1-dimensional?

That's right, I was basing the name off xr.Dataset.coords, what's the most correct term to use here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If I load your copernicus example and inspect the coords I get the following:

import coast

fn_config_t_grid = "./config/example_cmems_grid_t.json"
database = coast.Copernicus(username, password, "nrt")
forecast = database.get_product("global-analysis-forecast-phy-001-024")
nemo_t = coast.Gridded(fn_data=forecast, config=fn_config_t_grid)

start = coast.Coordinates2D(50, -10)
end = coast.Coordinates2D(60, 5)
nemo_t.set_constraint(start, end)

nemo_t.dataset.coords
Out[10]: 
Coordinates:
    longitude  (x_dim) float32 50.0 50.08 50.17 50.25 ... 59.75 59.83 59.92 60.0
    latitude   (y_dim) float32 -10.0 -9.917 -9.833 -9.75 ... 4.833 4.917 5.0
  * z_dim      (z_dim) float32 0.494 1.541 2.646 ... 5.275e+03 5.728e+03
    time       (t_dim) datetime64[ns] 2020-01-01T12:00:00 ... 2022-07-10T12:0...

So the horizontal coords are have names latitude and longitude and their content are floats not index integers.

Indices are generally referred to as j and i, which are a nightmare for grepping. So we are shifting towards j_ind and i_ind notation. j_ind is the index for the nominally northward, or y-direction. i_ind is the index for the nomillay eastward, or x-direction.
So, from the above example:
longitude[i_ind=0] = 50.0

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, from data in the example files:

import coast
import unit_testing.unit_test_files as files
nemo_t = coast.Gridded(files.fn_nemo_dat, files.fn_nemo_dom, config=files.fn_config_t_grid)
#start = coast.Coordinates2D(50, -10)
#end = coast.Coordinates2D(60, 5)
#nemo_t.set_constraint(start, end)

# Set some indices values
j_ind = 0
i_ind = 0

nemo_t.dataset.latitude[j_ind, i_ind].values  # notice the pain in the but `j` before `i` expected ordering.

Gives a floating point latitude, which will vary with both indices.

Out[26]: array(40.066406, dtype=float32)

In summary indices are more basic than coordinates because coordinates are a function of indices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MCazaly MCazaly marked this pull request as ready for review September 14, 2022 12:57
@MCazaly
Copy link
Contributor Author

MCazaly commented Sep 14, 2022

@jpolton There are a couple of knownn limitations to this version:
Firstly, it only supports datasets on a uniform grid.
Secondly, it does not handle updating valid min/max values of coordinates.

Are these features we want to handle now or do we push them back to another ticket?

@jpolton
Copy link
Collaborator

jpolton commented Sep 14, 2022

@jpolton There are a couple of knownn limitations to this version: Firstly, it only supports datasets on a uniform grid. Secondly, it does not handle updating valid min/max values of coordinates.

Are these features we want to handle now or do we push them back to another ticket?

Matt Incase you spot this before the end of the day.
I'm on the train so have limit data access. So tried your methods on example_files data and just on subsetting (without all the wrapping nasties). It fell over. This is an example of "non-uniform grid"? By uniform grid do you mean that the latitude and longitude align with matrix rows and columns?

import coast
import unit_testing.unit_test_files as files
nemo_t = coast.Gridded(files.fn_nemo_dat, files.fn_nemo_dom, config=files.fn_config_t_grid)
start = coast.Coordinates2D(x=-5, y=50)
end = coast.Coordinates2D(x=5, y=55)
tt = nemo_t.constrain(start,end)
Traceback (most recent call last):
  File "/Users/jeff/opt/anaconda3/envs/coast_dev/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3552, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-40-f4b8e25618bf>", line 1, in <cell line: 1>
    tt = nemo_t.constrain(start,end)
  File "/Users/jeff/GitHub/COAsT/coast/data/coast.py", line 537, in constrain
    return constrain(self.dataset, start, end, drop=drop)
  File "/Users/jeff/GitHub/COAsT/coast/data/coast.py", line 685, in constrain
    constrained = constrained.where(create_constraint(start.x, end.x, x_dim(constrained)), drop=drop)
  File "/Users/jeff/GitHub/COAsT/coast/data/coast.py", line 626, in x_dim
    return get_coord(dataset, "x")
  File "/Users/jeff/GitHub/COAsT/coast/data/coast.py", line 614, in get_coord
    return dataset[list(dataset[f"{dim.lower()}_dim"].coords)[0]]
IndexError: list index out of range

@jpolton
Copy link
Collaborator

jpolton commented Sep 14, 2022

Hmm maybe ignore the previous comment. I need to have a proper look at what you've been able to do.

@MCazaly
Copy link
Contributor Author

MCazaly commented Sep 14, 2022

@jpolton Apologies if I've used the term incorrectly (I was basing this in a conversation with @anwiseNOCL) - you're right. The Copernicus dataset I was using in the original example works as I expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants