Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/working dir #49312

Closed

Conversation

ujjawal-khare-27
Copy link
Contributor

@ujjawal-khare-27 ujjawal-khare-27 commented Dec 17, 2024

Related issue number

Closes #49036

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

ujjawal-khare and others added 30 commits December 18, 2024 01:28
Signed-off-by: ujjawal-khare <[email protected]>
It breaks my pre-push linter.

Signed-off-by: hjiang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed?

Add async shutdown for handles.

---------

Signed-off-by: Cindy Zhang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…behavior (ray-project#48958)

### Issue

In the Ray codebase,
[logging.config.dictConfig](https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig)
may be called multiple times. However, we found that if a logger’s child
loggers are set before the logger is set via `dictConfig`, it may cause
issues.

*
[Example1](https://gist.github.com/kevin85421/24849e06c61f221fd95063a4ce81ca8f)
(incremental: False): The logger `Ray.data` loses its original handler
and uses the `Ray` logger’s handler after the Ray logger is set via
`dictConfig`.
  ```
2024-11-27 04:32:06,213 - Ray.data - INFO - This is an INFO log from
Ray.data.
2024-11-27 04:32:06,213 - Ray.data - WARNING - This is a WARNING log
from Ray.data.
  2024-11-27 04:32:06,213 - Ray.data - INFO - Ray data propagate False
  abc Ray - DEBUG - This is a DEBUG log from Ray.
  abc Ray - ERROR - This is an ERROR log from Ray.
  abc Ray.data - INFO - Another INFO log from Ray.data.
  abc Ray.data - INFO - Ray data propagate True
  ```
*
[Example2](https://gist.github.com/kevin85421/9cf6ee70ceec42be3de888174d0c8e6a)
(incremental: True): It looks like `Ray.data`’s handlers are removed
after the `Ray` logger is set via `dictConfig`.
  ```
2024-11-27 04:35:25,379 - Ray.data - INFO - This is an INFO log from
Ray.data.
2024-11-27 04:35:25,379 - Ray.data - WARNING - This is a WARNING log
from Ray.data.
  2024-11-27 04:35:25,379 - Ray.data - INFO - Ray data propagate False
  This is an ERROR log from Ray.
2024-11-27 04:35:25,379 - Ray.data - INFO - Another INFO log from
Ray.data.
  2024-11-27 04:35:25,379 - Ray.data - INFO - Ray data propagate False
  ```

* CPython implementation
  * Case 1: `incremental` is `False`
* If an existing logger is also a child logger of a new logger, the
child logger’s handlers will be reset, and its `propagate` attribute
will be set to true.
* In
[Example1](https://gist.github.com/kevin85421/24849e06c61f221fd95063a4ce81ca8f),
`Ray.data` is not only an existing logger but also a child logger of
Ray. Therefore, its handlers will be reset, and propagate will be set to
true.
* See the function for more details:
https://github.com/python/cpython/blob/71ede1142ddad2d31cc966b8fe4a5aff664f4d53/Lib/logging/config.py#L193-L196
 * Case 2: `incremental` is `True`
    * No handlers & filters will be added to the new logger.
* See the function for more details:
https://github.com/python/cpython/blob/71ede1142ddad2d31cc966b8fe4a5aff664f4d53/Lib/logging/config.py#L906-L915

### Solution

Instead of using `dictConfig` to set the root logger and the Ray logger,
call other functions to set the loggers explicitly.

## Related issue number

Closes ray-project#48732

<!-- For example: "Closes ray-project#1234" -->

## Checks

* Test 1
  ```python
  import ray
  import logging
  import ray.data

ray.init(logging_config=ray.LoggingConfig(encoding="TEXT",
log_level="INFO"))

  root_logger = logging.getLogger()
  root_logger.info("root logger")

  ray_logger = logging.getLogger("ray")
  ray_logger.info("ray logger")

  ray_data_logger = logging.getLogger("ray.data")
  ray_data_logger.info("ray data logger")

  @ray.remote
  def f():
      root_logger = logging.getLogger()
      root_logger.info("root logger")
      ray_data_logger = logging.getLogger("ray.data")
      ray_data_logger.info("ray data logger")

  ray.get(f.remote())
  ```
<img width="1440" alt="image"
src="https://github.com/user-attachments/assets/e522a257-28c5-4b3c-ad62-c41e4cd61664">

* Test 2
  ```python
  import ray
  import logging

  def report_logger(logger):
      # Collect this logger and its parents
      loggers = []
      current_logger = logger
      while current_logger:
          loggers.append(current_logger)
if not current_logger.parent or current_logger.parent == current_logger:
              break
          current_logger = current_logger.parent

      # Report the configuration of each logger in the hierarchy
print(f"Logging configuration for '{logger.name}' and its hierarchy:")
for log in reversed(loggers): # Start from the root and go down to the
given logger
print(f"\nLogger: {log.name or 'root'} (Level:
{logging.getLevelName(log.level)})")
          if log.handlers:
              print("  Handlers:")
              for handler in log.handlers:
print(f" - {handler.__class__.__name__} (Level:
{logging.getLevelName(handler.level)})")
          else:
              print("  No handlers configured")

  print("BEFORE")
  report_logger(logging.getLogger("ray.data"))
  print()

  import ray.data
ray.init(logging_config=ray.LoggingConfig(encoding="TEXT",
log_level="INFO"))

  print("AFTER:")
  report_logger(logging.getLogger("ray.data"))
  ```
<img width="1189" alt="image"
src="https://github.com/user-attachments/assets/9129b22a-f436-40ca-9f42-f1ecacf6c515">

Signed-off-by: kaihsun <[email protected]>
Signed-off-by: Kai-Hsun Chen <[email protected]>
Co-authored-by: Jiajun Yao <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…#48225)

Some users are asking how to connect Ray GCS to Redis/Valkey with
username and password.
[ref](https://ray.slack.com/archives/C02GFQ82JPM/p1729643749769319)

Unfortunately, Ray currently can't carry a username to Redis/Valkey.
This PR makes it accept a new `--redis-username` parameter and pass it
to Redis/Valkey during authentication.

---------

Signed-off-by: Rueian <[email protected]>
Signed-off-by: Ruiyang Wang <[email protected]>
Co-authored-by: Ruiyang Wang <[email protected]>
Co-authored-by: Lonnie Liu <[email protected]>
Co-authored-by: angelinalg <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
… using modern absl (ray-project#48722)

In
[src/ray/object_manager/common.cc](https://github.com/ray-project/ray/blob/e393a716d8742a987a36df555defb2ca90bb94d4/src/ray/object_manager/common.cc)
we extensively use `absl::StrCat` without explicitely importing the
associated header `absl/strings/str_cat.h`. It worked fine previously,
as `absl/strings/str_format.h` transitively included that via absl
internal headers. However, somewhere in 2023-2024, absl has been cleaned
up from unintentional transitive includes, breaking `ray` builds against
modern absl. This PR fixes that.

## Effect on backward/forward compatability
None. Builds against older absl versions will continue to work as
before, as `absl::StrCat` was always declared in `str_cat.h` and
including that was always the intended way to import this symbol.

## Related issues/PRs:
+ ray-project#48667

---------

Signed-off-by: gorloffslava <[email protected]>
Signed-off-by: Slava Gorlov <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed?

Optimizes filtering in Ray Data, and introduces a new expression-based
syntax for filtering.

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Shared pointer is wide-spread our codebase, which makes ownership
justification and tracing hard.
To make things worse, new code has to confirm the same interface, which
means more shared pointer.

Would like to spend some time to cleanup, start with something simple.

---------

Signed-off-by: hjiang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Each `read_sql` read tasks attempts to read a different chunk of data
using the offset and limit filters. For example, if you have a database
with 200 rows, `read_sql` might launch two tasks that reads `offset 0
limit 100` and `offset 100 limit 100` to read rows 0-100 and 100-200,
respectively.

However, if the underlying database doesn’t have a deterministic
ordering, read tasks might read duplicate data.

To fix this correctness issue, this PR makes `read_sql` always launch
one task. Since `offset` typically requires scanning and discarding
rows, this PR's code should perform similarly to the original
implementation.

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Refactors WorkerPool and adds an extra PrestartWorkers API.

Changes:

1. PopWorkerRequest now has a new field to let the started idle worker
to keep-alive for a duration. After the duration, or after it's assigned
a task, the keep-alive is lifted and it can be idle-killed.
2. Make the WorkerPool process more clear: now we have distinct
`PopWorker` -> `StartNewWorker` -> `StartWorkerProcess` with their
differences documented.
3. Adds a `NodeManagerService.PrestartWorkers` gRPC method. Callers can
ask to prestart `num_workers` workers, capped by
`RAY_restart_workers_api_max_num_workers`, with runtime_env and job_id.
4. Changes idle-killing logic in worker_pool. Previously we kept a
worker "idle since" timestamp and compare it with `now +
idle_worker_killing_time_threshold_ms`. In this PR we change to keep a
worker "keep alive until" timestamp, set to `idle time +
idle_worker_killing_time_threshold_ms` or `create time +
keep_alive_duration`, and compare with `now`.

---------

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Other than minor cleanups, this PR:
* Makes batch inference release tests use autoscaling, to ensure Ray Data works well with autoscaling
* Removes the "raw images" variants, because there isn't much value to them in addition to the parquet tests, and people use parquet more often)
* Updates the node types, for consistency with the other release test compute

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
As titled, allow heterogeneous lookup and insertion; also add customized
hash and eq.

---------

Signed-off-by: hjiang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…8969)

Closes: ray-project#48926
ray-project#48927
<!-- Please give a short summary of the change and the problem this
solves. -->

Signed-off-by: Superskyyy <[email protected]>
Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: Hao Chen <[email protected]>
Co-authored-by: Richard Liaw <[email protected]>
Co-authored-by: Hao Chen <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…t with Fluent Bit` to `Check the CloudWatch dashboard`. (ray-project#48936)

Signed-off-by: win5923 <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: hjiang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed?

use a github org group instead of listing all the github ids

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Richard Liaw <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…ay-project#48964)

After the effort to add Async GcsClient binding to Python, we allow both
old and new code paths to live for a while and now it seems to be pretty
stable. We never received any issues from customers about the
NewGcsClient. Hence we remove the old one.

---------

Signed-off-by: Ruiyang Wang <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…ject#48643)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->
When the dataset of webdataset consists of many tar files, if there is a
failure in reading one of the files, it is necessary to print the name
of the failed file.

before this pr:
```
ValueError: fx001.jpg: duplicate file name in tar file jpg dict_keys(['__key__', 'jpg', 'json'])
```

after this pr:

```
ValueError: fx001.jpg: duplicate file name in tar file jpg dict_keys(['__key__', 'jpg', 'json']), in emr-autotest/datasets/jkj_10/1b63122e734064926c47a2ccdcbaafd3.tar
```

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [x] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [x] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: jukejian <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
We don't have a release test that tracks the performance of writing
data. This PR adds one.

---------

Signed-off-by: Balaji Veeramani <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
Upgrade CI to use pyarrow 18, the latest released version from Pyarrow.

Signed-off-by: Scott Lee <[email protected]>
Co-authored-by: Cuong Nguyen <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
@aslonnie aslonnie removed request for a team, woshiyyya and aslonnie December 17, 2024 20:00
@ujjawal-khare
Copy link

Closing this for cleaning commit history

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Ray Job] : Working Dir files does not get changed even if remote URI contents get changed