Spike: Investigate client aggregation methods #225

Spartee · 2022-02-27T23:29:50Z

Description

In some cases, a global domain of a simulation or workload is needed for analysis or training. When using the SmartRedis clients in an MPI application, each rank sends it's own data to the Orchestrator (redis/keydb) database. a method, Client.aggregate could be very useful in combining data back into a global field.

Justification

Online analysis is a primary use case here. For example, let's say we are running a weather simulation and would like to view a global snapshot of the domain every 50 timesteps. currently, one must create a helper function to retrieve data from all the ranks that sent data to the database and recombine it into a global field before it can be analyized.

Also, future methods like to_xarray become much more useful if a domain can be aggregated first.

Implementation Strategy

This ticket is meant to provide a place for feedback on this issue. The result of this ticket is a design document that will be shared with the community to collect feedback.

I've been ideating on this however, and I'm thinking something like the following method

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*")

This example would collect all the tensors in the database the start with hello_world_rank_. Essentially this is like glob syntax for specifying which tensors to retrieve.

Open questions:

Does a simple concat of all tensors work for most use cases?
How does the user specify how the domain should be recombined in more complex cases? (unstructured mesh?)
Multiple fields in one dataset/aggregate call?
How can we use the metadata fields of the DataSet object to make this easier for users?

It's important that when we design this method, we keep in mind that the idea is to be able to quickly convert to something like an xarray dataset.

for example:

from smartredis import Client

client = Client(cluster=True)
dataset = client.aggregate("hello_world_rank_*").to_xarray()
dataset.plot() # use xarray methods now.

Acceptance Criteria

Create a design document with how this aggregation method will work (discussion should take place here)
Post link to the design document here
Incorporate feedback and open issue to implement Client.aggregate

Tagging some people who I think would provide good feedback on this issue.
@ashao @rabernat @mellis13 @nbren12

The text was updated successfully, but these errors were encountered:

ashao · 2022-02-28T01:36:20Z

I've thought about this quite a a bit and it's a problem that I think is best handled on the model simulation side. Most model grids (especially the ones within the climate domain) do not necessarily lend themselves to a general solution and require knowledge of the specific grid topology, e.g. the tri-polar, cubed-sphere, and finite element-like grids. Even if you knew that a grid fell under one of these general types, you still likely need information that is unique to a given implementation. Lastly, I think in most cases it will likely be more performant to do it on the model side since most models which some kind of parallel decomposition already have optimized routines to do so.

One case that might be useful to handle is that of a rectangular topology, which are often used in idealized models and LES. That maps directly onto a normal n-dimensional tensor and we could setup a specific dataset structure to ensure that all the needed metadata exists for a reconstruction.

Perhaps more useful than a client aggregation of the raw model data is a way to generally treat the data as a scattered data and interpolate to a given grid (e.g. scipy.interpolate.griddata). In that case, all we need is the coordinate of every grid point associated with the data. It would dovetail quite nicely with xarray and xesmf and any future support of xarray Datasets in SmartRedis.

ashao · 2022-02-28T16:54:52Z

Just to parachute an item of relevance from #228

I agree with @Spartee that it's a good idea to have a base method that implements a good parallel strategy for retrieving the same field(s) that have been distributed.

How does the user specify how the domain should be recombined in more complex cases? (unstructured mesh?)

That question might then be left to the SmartRedis model-specific extension

nbren12 · 2022-02-28T20:48:50Z

Thanks. I 100% agree with the motivation of this issue, but I agree it might be tricky to find a naming scheme "rank_*" that works for all use-cases. Maybe exposing some lower-level redis data structures would leave the door open for folks to choose their own strategy. Is there any fortran redis client supporting the basic commands like GET/SET/LPUSH/LPOP etc?

mellis13 · 2022-02-28T21:34:08Z

That's a great point and definitely something we are considering. Utilizing something like a redis list to store a collection of keys to be aggregated might be easier than relying on a naming convention. We currently don't have direct support for other redis data types (e.g. LPUSH/LPOP/etc), but that's something we can easily build into existing client functionality or expose to the user.

Spartee added the type: design Issues related to architecture and code design label Feb 27, 2022

Spartee mentioned this issue Feb 27, 2022

Client.DataSet.to_xarray #226

Closed

ashao mentioned this issue Feb 28, 2022

Simulation Model-specific extensions #228

Open

4 tasks

Spartee assigned mellis13 Feb 28, 2022

mellis13 closed this as completed Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spike: Investigate client aggregation methods #225

Spike: Investigate client aggregation methods #225

Spartee commented Feb 27, 2022

ashao commented Feb 28, 2022

ashao commented Feb 28, 2022

nbren12 commented Feb 28, 2022

mellis13 commented Feb 28, 2022

Spike: Investigate client aggregation methods #225

Spike: Investigate client aggregation methods #225

Comments

Spartee commented Feb 27, 2022

Description

Justification

Implementation Strategy

Acceptance Criteria

ashao commented Feb 28, 2022

ashao commented Feb 28, 2022

nbren12 commented Feb 28, 2022

mellis13 commented Feb 28, 2022