Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable Fluxion resource graph elasticity #989

Merged
merged 25 commits into from
Aug 22, 2023

Conversation

milroy
Copy link
Member

@milroy milroy commented Dec 1, 2022

This PR enables elasticity in the Fluxion resource graph by creating requisite ctors, dtors, and assignment overloads for planner, and planner_multi.

As discussed in PR #775, attaching a new subgraph to the boost resource graph without preallocating sufficient vertices in the vecS associative Boost container triggers a copy. Since planner did not have a copy constructor to deep copy the red-black trees and related data strctures, the copy cleared the RBtrees and allocations and reservations.

By defining copy constructors for these classes and data structures, subgraph attach and future subgraph shrink will preserve allocations and reservations.

This PR refactors planner and planner_multi from structs to classes. Operations on both classes (e.g., setters and getters) have been pulled into the class as member functions, and access controlled via const member functions where appropriate.

The PR is WIP, as there are still several tasks to complete which may require reviewer feedback:

  • Review the current header (.h and .hpp) structure for compatibility with external C clients.
  • Consider the current header (.h and .hpp) structure for compatibility with external C++ clients.
  • Review the usage of wrapper context structs planner_t and planner_multi_t to avoid errors related to forward declarations.
  • Add more robust error checking where noted in code comments
  • Add testsuite tests for the planner_multi and pool_infra_t ctors and assignment overload for elastic resources.

I've changed the planner interface as well as the build system to accommodate C and C++ clients. I added unit tests for ctors and assignment overload operator for planner_t and planner_multi_t, and created new unit tests for ctors and assignment overload operator for schema (both schedule_t and pool_infra_t).

@milroy milroy added Status: In Progress elasticity support for elastic scheduling labels Dec 1, 2022
@milroy milroy force-pushed the copy-constuctor branch 4 times, most recently from 3bbb7d8 to 8426d27 Compare December 15, 2022 03:36
@milroy milroy changed the title [WIP] Enable Fluxion elasticity [WIP] Enable Fluxion resource graph elasticity Dec 15, 2022
@milroy milroy force-pushed the copy-constuctor branch 8 times, most recently from 08d5577 to fe2efb6 Compare December 20, 2022 02:29
@milroy milroy force-pushed the copy-constuctor branch 4 times, most recently from acd46c1 to 3349b69 Compare January 6, 2023 10:30
@milroy milroy changed the title [WIP] Enable Fluxion resource graph elasticity Enable Fluxion resource graph elasticity Jan 6, 2023
@milroy
Copy link
Member Author

milroy commented Jan 6, 2023

@JaeseungYeom and @tpatki the PR is ready for review.

@milroy milroy force-pushed the copy-constuctor branch 3 times, most recently from 665ae76 to 52ac67f Compare April 1, 2023 01:22
@milroy
Copy link
Member Author

milroy commented Aug 20, 2023

Adding the conditional pragma with the suppression directive works for the Clang and GCC 12 tests but breaks the earlier GCC tests. That's because #pragma GCC diagnostic ignored "-Wmaybe-uninitialized" and -Wmissing-template-keyword are unrecognized directives for earlier versions of GCC (or conflict with -Wno-error=maybe-uninitialized from generate-matrix.py), but seem to be required for later versions (i.e., -Wno-error=maybe-uninitialized" doesn't work for GCC 12).

I added additional checks for (__GNUC__ > 11) or (__GNUC__ <= 11) which allows the CI tests to pass. I'm not entirely sure if the behavior changed with GCC 11, though, and haven't figured out exactly which version caused the change.

@trws
Copy link
Member

trws commented Aug 20, 2023 via email

Problem: testsuite tests will need to compare
pool_infra_t via the `==` operator to check if copying,
destruction, and assignment works as needed.

Add the required member checks.
Problem: the testsuite needs to test
if copying, destroying, and assigning
planners works as desired. Namely,
calling the copy constructor should
deep copy planners, and assignment
should ensure the two planner's states
are independent but the same initially.
Mutation of a copy should not affect the
source object. Destruction should not affect
the source object.

Add low-level checks for planner behavior.
Problem: the testsuite needs to test
if copying, destroying, and assigning
planner_multis works as desired. Namely,
calling the copy constructor should
deep copy planner_multis, and assignment
should ensure the two planner_multis' states
are independent but the same initially.
Mutation of a copy should not affect the
source object. Destruction should not affect
the source object.

Add low-level checks for planner_multis behavior.
Add unit tests to ensure the copy constructor,
assignment overload, and equality overload
function as needed.
Add unit tests to ensure the copy constructor,
assignment overload, and equality overload
for pool_infra_t function as needed.
Add the schema tests to gitignore to ensure undesired
files in the directory are not tracked by git.
Add tests for attaching a new subgraph to an
instantiated resource graph and ensure the
allocations and reservations are not affected.
Test that the match policy selects appropriate
resources from the newly added subgraph.
Problem: schema_test01 and schema_test02 have
misspellings of "constructors" and "assignment."
Fix the spelling.
@milroy
Copy link
Member Author

milroy commented Aug 21, 2023

Thanks for your detailed and helpful feedback @trws. I think I've addressed and implemented your comments and suggestions.

The PR is ready for another review.

@milroy milroy requested a review from trws August 21, 2023 06:05
Problem: with GCC, Yggdrasil and Boost libraries generate
compilation errors related to missing template keywords
and maybe uninitialized values, respectively. Suppression
flags for GCC are unrecognized by Clang and produce
compilation errors.

Add conditional diagnostic pragmas to suppress the errors
for GCC but not Clang.
Copy link
Member

@trws trws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, thanks @milroy! There's always refactoring I'd love to do, but I don't think it makes sense to do it before we bump the C++ version. Delegated constructors and some other things could help us with code like this in the future.

@milroy
Copy link
Member Author

milroy commented Aug 22, 2023

@trws should I set MWP, or do you think this needs a review from someone else as well?

Maybe @tpatki @JaeseungYeom or @jameshcorbett could spot check a few of the commits.

@trws
Copy link
Member

trws commented Aug 22, 2023 via email

@tpatki
Copy link
Member

tpatki commented Aug 22, 2023

+1 with merging @trws @milroy -- we've reviewed it together, and looked at it a few times (I looked at it in Feb as well). The changes make sense to me and seem correct overall. I haven't had a chance to test it locally at my end, and I will do that soon -- but we don't need to hold on merging this for my testing. We can always open an issue if something comes up when I try it out.

@milroy
Copy link
Member Author

milroy commented Aug 22, 2023

Thanks everyone! Setting MWP.

@milroy milroy added the merge-when-passing mergify.io - merge PR automatically once CI passes label Aug 22, 2023
I missed some mergify requirements when transferring this, or maybe
updated them after.
@trws
Copy link
Member

trws commented Aug 22, 2023

@milroy, I just pushed up a CI config fix. This should let everything merge.

@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Merging #989 (e8d3b22) into master (a34dbe6) will increase coverage by 0.8%.
The diff coverage is 76.9%.

❗ Current head e8d3b22 differs from pull request most recent head 3903ac3. Consider uploading reports for the commit 3903ac3 to get more accurate results

@@           Coverage Diff            @@
##           master    #989     +/-   ##
========================================
+ Coverage    72.6%   73.4%   +0.8%     
========================================
  Files          80      98     +18     
  Lines       10243   11888   +1645     
========================================
+ Hits         7437    8732   +1295     
- Misses       2806    3156    +350     
Files Changed Coverage Δ
resource/modules/resource_match.cpp 67.6% <0.0%> (ø)
resource/planner/c++/mintime_resource_tree.hpp 100.0% <ø> (ø)
resource/planner/c++/planner_internal_tree.hpp 100.0% <ø> (ø)
resource/policies/base/matcher.hpp 100.0% <ø> (ø)
resource/readers/resource_reader_grug.cpp 84.4% <ø> (ø)
resource/readers/resource_reader_jgf.cpp 66.1% <ø> (ø)
resource/readers/resource_reader_rv1exec.cpp 73.6% <ø> (ø)
resource/readers/resource_spec_grug.cpp 46.0% <ø> (ø)
resource/schema/ephemeral.hpp 100.0% <ø> (ø)
resource/traversers/dfu_impl.hpp 94.5% <ø> (ø)
... and 11 more

... and 9 files with indirect coverage changes

@trws trws merged commit 041db81 into flux-framework:master Aug 22, 2023
16 of 17 checks passed
trws added a commit to trws/flux-sched that referenced this pull request Aug 22, 2023
Enable Fluxion resource graph elasticity
@milroy milroy deleted the copy-constuctor branch August 22, 2023 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
elasticity support for elastic scheduling merge-when-passing mergify.io - merge PR automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants