Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate client aggregation methods #225

Closed
3 tasks
Spartee opened this issue Feb 27, 2022 · 4 comments
Closed
3 tasks

Spike: Investigate client aggregation methods #225

Spartee opened this issue Feb 27, 2022 · 4 comments
Assignees
Labels
type: design Issues related to architecture and code design

Comments

@Spartee
Copy link
Contributor

Spartee commented Feb 27, 2022

Description

In some cases, a global domain of a simulation or workload is needed for analysis or training. When using the SmartRedis clients in an MPI application, each rank sends it's own data to the Orchestrator (redis/keydb) database. a method, Client.aggregate could be very useful in combining data back into a global field.

Justification

Online analysis is a primary use case here. For example, let's say we are running a weather simulation and would like to view a global snapshot of the domain every 50 timesteps. currently, one must create a helper function to retrieve data from all the ranks that sent data to the database and recombine it into a global field before it can be analyized.

Also, future methods like to_xarray become much more useful if a domain can be aggregated first.

Implementation Strategy

This ticket is meant to provide a place for feedback on this issue. The result of this ticket is a design document that will be shared with the community to collect feedback.

I've been ideating on this however, and I'm thinking something like the following method

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*")

This example would collect all the tensors in the database the start with hello_world_rank_. Essentially this is like glob syntax for specifying which tensors to retrieve.

Open questions:

  1. Does a simple concat of all tensors work for most use cases?
  2. How does the user specify how the domain should be recombined in more complex cases? (unstructured mesh?)
  3. Multiple fields in one dataset/aggregate call?
  4. How can we use the metadata fields of the DataSet object to make this easier for users?

It's important that when we design this method, we keep in mind that the idea is to be able to quickly convert to something like an xarray dataset.

for example:

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*").to_xarray()
dataset.plot() # use xarray methods now.

Acceptance Criteria

  • Create a design document with how this aggregation method will work (discussion should take place here)
  • Post link to the design document here
  • Incorporate feedback and open issue to implement Client.aggregate

Tagging some people who I think would provide good feedback on this issue.
@ashao @rabernat @mellis13 @nbren12

@Spartee Spartee added the type: design Issues related to architecture and code design label Feb 27, 2022
@ashao
Copy link
Collaborator

ashao commented Feb 28, 2022

I've thought about this quite a a bit and it's a problem that I think is best handled on the model simulation side. Most model grids (especially the ones within the climate domain) do not necessarily lend themselves to a general solution and require knowledge of the specific grid topology, e.g. the tri-polar, cubed-sphere, and finite element-like grids. Even if you knew that a grid fell under one of these general types, you still likely need information that is unique to a given implementation. Lastly, I think in most cases it will likely be more performant to do it on the model side since most models which some kind of parallel decomposition already have optimized routines to do so.

One case that might be useful to handle is that of a rectangular topology, which are often used in idealized models and LES. That maps directly onto a normal n-dimensional tensor and we could setup a specific dataset structure to ensure that all the needed metadata exists for a reconstruction.

Perhaps more useful than a client aggregation of the raw model data is a way to generally treat the data as a scattered data and interpolate to a given grid (e.g. scipy.interpolate.griddata). In that case, all we need is the coordinate of every grid point associated with the data. It would dovetail quite nicely with xarray and xesmf and any future support of xarray Datasets in SmartRedis.

@ashao
Copy link
Collaborator

ashao commented Feb 28, 2022

Just to parachute an item of relevance from #228

I agree with @Spartee that it's a good idea to have a base method that implements a good parallel strategy for retrieving the same field(s) that have been distributed.

How does the user specify how the domain should be recombined in more complex cases? (unstructured mesh?)

That question might then be left to the SmartRedis model-specific extension

@nbren12
Copy link

nbren12 commented Feb 28, 2022

Thanks. I 100% agree with the motivation of this issue, but I agree it might be tricky to find a naming scheme "rank_*" that works for all use-cases. Maybe exposing some lower-level redis data structures would leave the door open for folks to choose their own strategy. Is there any fortran redis client supporting the basic commands like GET/SET/LPUSH/LPOP etc?

@mellis13
Copy link
Collaborator

That's a great point and definitely something we are considering. Utilizing something like a redis list to store a collection of keys to be aggregated might be easier than relying on a naming convention. We currently don't have direct support for other redis data types (e.g. LPUSH/LPOP/etc), but that's something we can easily build into existing client functionality or expose to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: design Issues related to architecture and code design
Projects
None yet
Development

No branches or pull requests

4 participants