Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ray data virtual cluster test #453

Merged
merged 27 commits into from
Jan 9, 2025
Merged

Add Ray data virtual cluster test #453

merged 27 commits into from
Jan 9, 2025

Conversation

NKcqx
Copy link
Collaborator

@NKcqx NKcqx commented Jan 9, 2025

Why are these changes needed?

Related issue number

#409

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

NKcqx added 23 commits January 2, 2025 19:55
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
Signed-off-by: NKcqx <[email protected]>
@NKcqx NKcqx requested review from wumuzi520, Chong-Li and xsuler January 9, 2025 10:04
@NKcqx NKcqx self-assigned this Jan 9, 2025
print(
f"Driver detected parallelism: {res}, expect[{i}]: {expected_parallelism[i]}"
)
wait_for_condition(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assert ray.get(signal_actor.data.remote())
== expected_parallelism[i]

Copy link
Collaborator

@wumuzi520 wumuzi520 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NKcqx NKcqx merged commit bbaf5ea into main Jan 9, 2025
@NKcqx NKcqx deleted the ray_data_virt_test branch January 9, 2025 14:40
wumuzi520 pushed a commit that referenced this pull request Jan 9, 2025
* ray.nodes fetch only the virtual cluster nodes

Signed-off-by: NKcqx <[email protected]>

* pass UT

Signed-off-by: NKcqx <[email protected]>

* support cluster_resources split virtual_cluster

Signed-off-by: NKcqx <[email protected]>
* available_resources support virtual_cluster

Signed-off-by: NKcqx <[email protected]>

* job only get virtual cluster data

Signed-off-by: NKcqx <[email protected]>

* fix ut

Signed-off-by: NKcqx <[email protected]>

* rm outdated warning

Signed-off-by: NKcqx <[email protected]>

* assert job in certain virtual cluster

Signed-off-by: NKcqx <[email protected]>

* pass UT

Signed-off-by: NKcqx <[email protected]>

* add UT

Signed-off-by: NKcqx <[email protected]>

* lint codes

Signed-off-by: NKcqx <[email protected]>

* add barrier for sync job progress

Signed-off-by: NKcqx <[email protected]>

* more comments

Signed-off-by: NKcqx <[email protected]>

---------

Signed-off-by: NKcqx <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants