Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support creating global graphs by introducing haversine distance metric #37

Draft
wants to merge 27 commits into
base: main
Choose a base branch
from

Conversation

leifdenby
Copy link
Member

@leifdenby leifdenby commented Nov 25, 2024

Describe your changes

A student I am working with would like to look at emulating 1D fluid simulations on a periodic 1D domain with neural-lam. I thought an easy way to create a 1D periodic domain was to just create a line along the equator and use haversine distance to do the mesh-generation and grid-to-mesh connections. This PR is a proof-of-concept that builds on #32 to make that possible.

The PR introduces the distance_metric argument (which defaults to euclidean) to all graph generation functions, so that this metric is used when computing distances. By using the haversine distance metric and also not providing a projection argument then we can create a 1D periodic domain that wraps the equator:

billede

The PR also adapts the flat-mesh creation routines to ensure that both the "x" and "y" directions have at least one grid point (this is necessary when the point-to-point span of on direction, here the poleward direction, is zero).

The haversine distance calculation is used within the Ball-tree implementation in scikit-learn which has been added as a dependency.

NB: the hierarchical graph generation doesn't work yet, I think because the graph refinement is applied in both "x" and "y" directions for now. I will try and fix this.

NB no 2: this PR does not add more clever mesh-geometries than the "rectangular" mesh that we already have. I think how this should fit into the existing codebase needs a bit more thought. But I just wanted to show you all what I have done :)

Issue Link

This builds on #32 and the suggestion I gave in there that the generalisation in #32 allows for.

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the documentation to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging)

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug

Checklist for assignee

  • PR is up to date with the base branch
  • the tests pass
  • author has added an entry to the changelog (and designated the change as added, changed or fixed)
  • Once the PR is ready to be merged, squash commits and merge the PR.

@joeloskarsson
Copy link
Contributor

joeloskarsson commented Nov 26, 2024

This is neat :) I did not know that sklearn.neighbors.BallTree had that many different metrics implemented, so that's quite useful!

Something I am thinking about around this is to what extent the "rectangular" layout of mesh nodes makes sense when we use a different metric as distance. In some sense the concept of equally spaced rows and columns of mesh nodes would be different for a different metric. Now this implementation is only concerned with the connections between mesh and grid.
We could very much treat these as two separate issues I think. Just a consideration in how this is presented. Maybe it would make more sense to see the distance metric here as a property of the g2m_connectivity and m2g_connectivity rather than the full graph?

And yes, I understand this is just an example demonstration. Just wanted to put my thoughts here for future reference 😄

@leifdenby
Copy link
Member Author

leifdenby commented Nov 28, 2024

This is neat :) I did not know that sklearn.neighbors.BallTree had that many different metrics implemented, so that's quite useful!

Yes! It's quite a heavy dependency though, so we might want to make it optional... We'll see

Something I am thinking about around this is to what extent the "rectangular" layout of mesh nodes makes sense when we use a different metric as distance. In some sense the concept of equally spaced rows and columns of mesh nodes would be different for a different metric. Now this implementation is only concerned with the connections between mesh and grid.

Yes, that is true. The current "rectangular" layout (I wish I could think of a better name...) implemented would spacing lat/lon values equally apart, but that would then not be equidistantly spaced with for example the haversine metric. Hmm...

I was also thinking that if want to introduce other types of mesh layout than this "rectangular" layout then we might need to change/adapt the arguments a bit. m2m_connectivity="hierarchical" or m2m_connectivity="flat" could still apply with a triangular grid I guess though, so maybe we just need a another argument called mesh_layout with mesh_layout="rectangular" by default, and we could pass kwargsthrough withmesh_layout_kwargs. Or should mesh_connectivitymaybe become part of them2m_connectivity_kwargs`?

Maybe if we do that then we could add a warning/raise an exception if someone tries to use a "rectangular" layout while trying to use the haversine distance for the g2m and m2g connectivity. Does that make sense? I think this would benefit from some examples in a notebook... 😆

We could very much treat these as two separate issues I think. Just a consideration in how this is presented. Maybe it would make more sense to see the distance metric here as a property of the g2m_connectivity and m2g_connectivity rather than the full graph?

Yes unless it (the distance metric) starts playing a role in mesh layout too. If you contrast making a triangular mesh layout on a plane vs on a sphere, you might want to set the mesh-spacing by applying a different metric in the two cases.

@leifdenby
Copy link
Member Author

Maybe rectilinear would be a better name for the "layout" of the mesh?

@joeloskarsson
Copy link
Contributor

Rectilinear is probably the correct term yes. We wrote that in a paper and then I have stopped using it. I realize that there's hardly anyone that read rectilinear and immediately understands what you mean, whereas just writing rectangular is wrong, but people understand you 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants