Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more observability to Ray Data operator metrics #14

Merged
merged 15 commits into from
Aug 19, 2024

Conversation

votrou
Copy link

@votrou votrou commented Jul 25, 2024

Adds a few new metrics:

  • In-Task Backpressure: Cumulative amount of time in tasks spent idle waiting to "yield" a block
  • CPU Time: Cumulative time for a task spent with CPU
  • Also fixes the memory related ray data metrics
Screenshot 2024-07-25 at 1 40 39 PM Screenshot 2024-07-25 at 1 40 59 PM

python/ray/data/_internal/execution/resource_manager.py Outdated Show resolved Hide resolved
python/ray/data/context.py Outdated Show resolved Hide resolved
python/ray/data/block.py Outdated Show resolved Hide resolved
@votrou votrou changed the title Add more observability Add more observability to Ray Data operator metrics Aug 16, 2024
@@ -365,10 +380,11 @@ def obj_store_mem_max_pending_output_per_task(self) -> Optional[float]:
context = ray.data.DataContext.get_current()
if context._max_num_blocks_in_streaming_gen_buffer is None:
return None


estimation_ratio = context.op_resource_memory_estimation_ratio

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this newly introduced?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so do we need an entry to the DataContext as well?

Copy link

@lee1258561 lee1258561 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please land after removing exp code. Thanks!

@votrou votrou merged commit 3387e67 into pinterest/main-2.10.0 Aug 19, 2024
1 check passed
lee1258561 added a commit that referenced this pull request Sep 7, 2024
lee1258561 added a commit that referenced this pull request Sep 7, 2024
This reverts commit 3387e67.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
votrou added a commit that referenced this pull request Oct 8, 2024
votrou added a commit that referenced this pull request Oct 8, 2024
sjoshi6 pushed a commit that referenced this pull request Oct 31, 2024
Adds a few new metrics:
- In-Task Backpressure: Cumulative amount of time in tasks spent idle
waiting to "yield" a block
- CPU Time: Cumulative time for a task spent with CPU
- Also fixes the memory related ray data metrics

<img width="1645" alt="Screenshot 2024-07-25 at 1 40 39 PM"
src="https://github.com/user-attachments/assets/40e8283f-70b2-4d17-be74-35090f15297f">
<img width="1655" alt="Screenshot 2024-07-25 at 1 40 59 PM"
src="https://github.com/user-attachments/assets/b66fdd14-7207-42f8-92b9-ffcf8d366089">

Signed-off-by: Saurabh Vishwas Joshi <[email protected]>
sjoshi6 pushed a commit that referenced this pull request Oct 31, 2024
This reverts commit 3387e67.

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Saurabh Vishwas Joshi <[email protected]>
sjoshi6 pushed a commit that referenced this pull request Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants