Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulation Model-specific extensions #228

Open
4 tasks
ashao opened this issue Feb 28, 2022 · 2 comments
Open
4 tasks

Simulation Model-specific extensions #228

ashao opened this issue Feb 28, 2022 · 2 comments
Labels
type: design Issues related to architecture and code design

Comments

@ashao
Copy link
Collaborator

ashao commented Feb 28, 2022

Description

For many of simulation models with a large userbase, we could think about organizing a set of SmartRedis extensions that aggregates the operations that are both necessary and common for the manipulation of data into and out of the model. This is perhaps a slightly more model-by-model approach to the more general #225 solution proposed by @Spartee, i.e. we could leave aggregate as an abstract method intended to populated by the extension of Client.

This might be a more realistic approach than trying to derive a general solution. Instead, model developers could implement these methods from their better-informed point of view.

Justification

Users of the simulation model who will need to perform common tasks that might need model-specific solutions.

Implementation Strategy

This is more of a high-level design question whose solution should answer the following

  • How do we provide clear direction for model developers/power users to naturally extend the functionality of existing SmartRedis objects
  • How do we functionally organize these extensions? Do they live in their own repo, within this repo, or should they 'live' within the numerical mode code/analysis packages
  • How do we delineate between needs that are common across a wide swath of models that should be part of the SmartRedis base classes versus model extensions
  • How do we coordinate testing and development for the widely-used extensions
@ashao ashao added the type: feature Issues that include feature request or feature idea label Feb 28, 2022
@Spartee Spartee added type: design Issues related to architecture and code design and removed type: feature Issues that include feature request or feature idea labels Feb 28, 2022
@Spartee
Copy link
Contributor

Spartee commented Feb 28, 2022

I like the idea of having model by model extensions, but I have a few worries

  1. How would they be packaged and tested? (similar to two of your points)

How do we functionally organize these extensions? Do they live in their own repo, within this repo, or should they 'live' within the numerical mode code/analysis packages
How do we coordinate testing and development for the widely-used extensions

Like say for instance we made an extension for SmartRedis in MOM6 or NEMO, would we have to find some way to add those models to a CI? Would they live in the MOM6 source code? I would like to keep what we can in a place where we have nightly integration and unit tests that don't require building a simulation code.

We have a similar deal with the smartsim-lammps repo where we implemented a "dump" (I/O) style with SmartRedis. We've been wanting to contibute that back upstream for a while now, but we wanted to make sure we had the ability to test it regularly. One thought was to build it into a container for ease of testings.

Im already thinking that we could use the extras_require field for extensions for some of the less model specific extensions (i.e. pip install smartredis[xarray] for #224 )

  1. What languages?

Would there be a need to implement such features in languages other than Python? I was imagining that this feature would live in the Python side as thats where we expect users to do analysis. Would there need to be helpers in the compiled languages as well?

I tend to think that the work in #225 should be general enough to be a base to implement some of these more "tightly integrated" solutions. The biggest reason for this being that we can write performant code to pull the data with multiprocessing or possibly even MPI. I don't want to expect users to be able to write performant multiprocessing extensions that use shared memory or locking primitives on shared data structures.

It's my thought that the aggregate method could be a useful starting point where we abstract away some of the performance details.

How do we provide clear direction for model developers/power users to naturally extend the functionality of existing SmartRedis objects

Another reason I think a base method to build on is important. We can provide documentation and examples for this method where we utilize it to create something simulation/workload specific.

@ashao
Copy link
Collaborator Author

ashao commented Feb 28, 2022

Like say for instance we made an extension for SmartRedis in MOM6 or NEMO, would we have to find some way to add those models to a CI? Would they live in the MOM6 source code? I would like to keep what we can in a place where we have nightly integration and unit tests that don't require building a simulation code.

100% agree that this is a tricky case since the CI strategy might be different for projects which are distributed across a number of centers. If we held the client extensions within CrayLabs vs the model code, we could probably get away with a simple unit test by storing the grid for a given model and making sure we can always reconstruct the grid. Since the definition of the grid is such a fundamental part of most models it doesn't change THAT often, so we'd just have to keep track of when they might do a major refactor.

Would there be a need to implement such features in languages other than Python? I was imagining that this feature would live in the Python side as thats where we expect users to do analysis. Would there need to be helpers in the compiled languages as well?
I definitely think there will be a need for that, but in that case, I think the onus should be on the model side since that will be a more rapidly iterating part of the code. Similar to your argument for being performant on the Python/Analysis side of things, model developers are better positioned to make such extensions performant on their end.

It's my thought that the aggregate method could be a useful starting point where we abstract away some of the performance details.

I agree. I think that base method could just return a list/dictionary of all the requested objects that might have been posted for a given subset of clients (be it decomposition within a single simulation, across the entire ensemble. The model-side dev would be responsible for constructing something useful from that, but I agree that we're best situated to make that gather performant on our side. We already know that doing the naive thing of looping over all the possible key prefixes is VERY slow and I think we can exercise some basic functionality to help parallelize the gather.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: design Issues related to architecture and code design
Projects
None yet
Development

No branches or pull requests

2 participants