-
Hi everyone! I have been working with charm++ for several weeks. I'm particularly interested in the shrink/expand mechanism. I work with Ubuntu Virtual Machines and Charm 6.9.0 and build it with following command: ./build charm++ netlrts-linux-x86_64 --enable-shrinkexpand My findings so far are that for shrink/expand requests, charmrun is always restarted. My question is are there other options besides restarting? For example, at runtime via object migration from one processor to another. Are there plans for further possibilities in the future? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Welcome to the Charm community! Several other mechanisms have been explored in the past, the variety of complications they ran into prevented them from becoming the current supported scheme. Simple migration works well for chare array elements, but complications arise for other constructs, like groups, nodegroups, singleton chares, and the interactions between them. Restarting the job from a checkpoint reduces a useful subset of these issues down to robustly creating a checkpoint and launching from it with appropriate environmental changes. That isn't to say there isn't interest in alternate solutions, simply that this is the most robust and maintainable solution available to us. Could you expand on the sorts of use cases that interest you and how alternate approaches to shrink/expand would serve them better? |
Beta Was this translation helpful? Give feedback.
Welcome to the Charm community!
Several other mechanisms have been explored in the past, the variety of complications they ran into prevented them from becoming the current supported scheme. Simple migration works well for chare array elements, but complications arise for other constructs, like groups, nodegroups, singleton chares, and the interactions between them. Restarting the job from a checkpoint reduces a useful subset of these issues down to robustly creating a checkpoint and launching from it with appropriate environmental changes. That isn't to say there isn't interest in alternate solutions, simply that this is the most robust and maintainable solution available to us.
Could yo…