-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/working dir #49312
Closed
ujjawal-khare-27
wants to merge
178
commits into
ray-project:master
from
ujjawal-khare-27:fix/working-dir
Closed
Fix/working dir #49312
ujjawal-khare-27
wants to merge
178
commits into
ray-project:master
from
ujjawal-khare-27:fix/working-dir
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: ujjawal-khare <[email protected]>
It breaks my pre-push linter. Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed? Add async shutdown for handles. --------- Signed-off-by: Cindy Zhang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
…nner`. (ray-project#48309) Signed-off-by: ujjawal-khare <[email protected]>
…behavior (ray-project#48958) ### Issue In the Ray codebase, [logging.config.dictConfig](https://docs.python.org/3/library/logging.config.html#logging.config.dictConfig) may be called multiple times. However, we found that if a logger’s child loggers are set before the logger is set via `dictConfig`, it may cause issues. * [Example1](https://gist.github.com/kevin85421/24849e06c61f221fd95063a4ce81ca8f) (incremental: False): The logger `Ray.data` loses its original handler and uses the `Ray` logger’s handler after the Ray logger is set via `dictConfig`. ``` 2024-11-27 04:32:06,213 - Ray.data - INFO - This is an INFO log from Ray.data. 2024-11-27 04:32:06,213 - Ray.data - WARNING - This is a WARNING log from Ray.data. 2024-11-27 04:32:06,213 - Ray.data - INFO - Ray data propagate False abc Ray - DEBUG - This is a DEBUG log from Ray. abc Ray - ERROR - This is an ERROR log from Ray. abc Ray.data - INFO - Another INFO log from Ray.data. abc Ray.data - INFO - Ray data propagate True ``` * [Example2](https://gist.github.com/kevin85421/9cf6ee70ceec42be3de888174d0c8e6a) (incremental: True): It looks like `Ray.data`’s handlers are removed after the `Ray` logger is set via `dictConfig`. ``` 2024-11-27 04:35:25,379 - Ray.data - INFO - This is an INFO log from Ray.data. 2024-11-27 04:35:25,379 - Ray.data - WARNING - This is a WARNING log from Ray.data. 2024-11-27 04:35:25,379 - Ray.data - INFO - Ray data propagate False This is an ERROR log from Ray. 2024-11-27 04:35:25,379 - Ray.data - INFO - Another INFO log from Ray.data. 2024-11-27 04:35:25,379 - Ray.data - INFO - Ray data propagate False ``` * CPython implementation * Case 1: `incremental` is `False` * If an existing logger is also a child logger of a new logger, the child logger’s handlers will be reset, and its `propagate` attribute will be set to true. * In [Example1](https://gist.github.com/kevin85421/24849e06c61f221fd95063a4ce81ca8f), `Ray.data` is not only an existing logger but also a child logger of Ray. Therefore, its handlers will be reset, and propagate will be set to true. * See the function for more details: https://github.com/python/cpython/blob/71ede1142ddad2d31cc966b8fe4a5aff664f4d53/Lib/logging/config.py#L193-L196 * Case 2: `incremental` is `True` * No handlers & filters will be added to the new logger. * See the function for more details: https://github.com/python/cpython/blob/71ede1142ddad2d31cc966b8fe4a5aff664f4d53/Lib/logging/config.py#L906-L915 ### Solution Instead of using `dictConfig` to set the root logger and the Ray logger, call other functions to set the loggers explicitly. ## Related issue number Closes ray-project#48732 <!-- For example: "Closes ray-project#1234" --> ## Checks * Test 1 ```python import ray import logging import ray.data ray.init(logging_config=ray.LoggingConfig(encoding="TEXT", log_level="INFO")) root_logger = logging.getLogger() root_logger.info("root logger") ray_logger = logging.getLogger("ray") ray_logger.info("ray logger") ray_data_logger = logging.getLogger("ray.data") ray_data_logger.info("ray data logger") @ray.remote def f(): root_logger = logging.getLogger() root_logger.info("root logger") ray_data_logger = logging.getLogger("ray.data") ray_data_logger.info("ray data logger") ray.get(f.remote()) ``` <img width="1440" alt="image" src="https://github.com/user-attachments/assets/e522a257-28c5-4b3c-ad62-c41e4cd61664"> * Test 2 ```python import ray import logging def report_logger(logger): # Collect this logger and its parents loggers = [] current_logger = logger while current_logger: loggers.append(current_logger) if not current_logger.parent or current_logger.parent == current_logger: break current_logger = current_logger.parent # Report the configuration of each logger in the hierarchy print(f"Logging configuration for '{logger.name}' and its hierarchy:") for log in reversed(loggers): # Start from the root and go down to the given logger print(f"\nLogger: {log.name or 'root'} (Level: {logging.getLevelName(log.level)})") if log.handlers: print(" Handlers:") for handler in log.handlers: print(f" - {handler.__class__.__name__} (Level: {logging.getLevelName(handler.level)})") else: print(" No handlers configured") print("BEFORE") report_logger(logging.getLogger("ray.data")) print() import ray.data ray.init(logging_config=ray.LoggingConfig(encoding="TEXT", log_level="INFO")) print("AFTER:") report_logger(logging.getLogger("ray.data")) ``` <img width="1189" alt="image" src="https://github.com/user-attachments/assets/9129b22a-f436-40ca-9f42-f1ecacf6c515"> Signed-off-by: kaihsun <[email protected]> Signed-off-by: Kai-Hsun Chen <[email protected]> Co-authored-by: Jiajun Yao <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…#48225) Some users are asking how to connect Ray GCS to Redis/Valkey with username and password. [ref](https://ray.slack.com/archives/C02GFQ82JPM/p1729643749769319) Unfortunately, Ray currently can't carry a username to Redis/Valkey. This PR makes it accept a new `--redis-username` parameter and pass it to Redis/Valkey during authentication. --------- Signed-off-by: Rueian <[email protected]> Signed-off-by: Ruiyang Wang <[email protected]> Co-authored-by: Ruiyang Wang <[email protected]> Co-authored-by: Lonnie Liu <[email protected]> Co-authored-by: angelinalg <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
… using modern absl (ray-project#48722) In [src/ray/object_manager/common.cc](https://github.com/ray-project/ray/blob/e393a716d8742a987a36df555defb2ca90bb94d4/src/ray/object_manager/common.cc) we extensively use `absl::StrCat` without explicitely importing the associated header `absl/strings/str_cat.h`. It worked fine previously, as `absl/strings/str_format.h` transitively included that via absl internal headers. However, somewhere in 2023-2024, absl has been cleaned up from unintentional transitive includes, breaking `ray` builds against modern absl. This PR fixes that. ## Effect on backward/forward compatability None. Builds against older absl versions will continue to work as before, as `absl::StrCat` was always declared in `str_cat.h` and including that was always the intended way to import this symbol. ## Related issues/PRs: + ray-project#48667 --------- Signed-off-by: gorloffslava <[email protected]> Signed-off-by: Slava Gorlov <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed? Optimizes filtering in Ray Data, and introduces a new expression-based syntax for filtering. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Shared pointer is wide-spread our codebase, which makes ownership justification and tracing hard. To make things worse, new code has to confirm the same interface, which means more shared pointer. Would like to spend some time to cleanup, start with something simple. --------- Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Each `read_sql` read tasks attempts to read a different chunk of data using the offset and limit filters. For example, if you have a database with 200 rows, `read_sql` might launch two tasks that reads `offset 0 limit 100` and `offset 100 limit 100` to read rows 0-100 and 100-200, respectively. However, if the underlying database doesn’t have a deterministic ordering, read tasks might read duplicate data. To fix this correctness issue, this PR makes `read_sql` always launch one task. Since `offset` typically requires scanning and discarding rows, this PR's code should perform similarly to the original implementation. --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Refactors WorkerPool and adds an extra PrestartWorkers API. Changes: 1. PopWorkerRequest now has a new field to let the started idle worker to keep-alive for a duration. After the duration, or after it's assigned a task, the keep-alive is lifted and it can be idle-killed. 2. Make the WorkerPool process more clear: now we have distinct `PopWorker` -> `StartNewWorker` -> `StartWorkerProcess` with their differences documented. 3. Adds a `NodeManagerService.PrestartWorkers` gRPC method. Callers can ask to prestart `num_workers` workers, capped by `RAY_restart_workers_api_max_num_workers`, with runtime_env and job_id. 4. Changes idle-killing logic in worker_pool. Previously we kept a worker "idle since" timestamp and compare it with `now + idle_worker_killing_time_threshold_ms`. In this PR we change to keep a worker "keep alive until" timestamp, set to `idle time + idle_worker_killing_time_threshold_ms` or `create time + keep_alive_duration`, and compare with `now`. --------- Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Other than minor cleanups, this PR: * Makes batch inference release tests use autoscaling, to ensure Ray Data works well with autoscaling * Removes the "raw images" variants, because there isn't much value to them in addition to the parquet tests, and people use parquet more often) * Updates the node types, for consistency with the other release test compute --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
As titled, allow heterogeneous lookup and insertion; also add customized hash and eq. --------- Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…8969) Closes: ray-project#48926 ray-project#48927 <!-- Please give a short summary of the change and the problem this solves. --> Signed-off-by: Superskyyy <[email protected]> Signed-off-by: Richard Liaw <[email protected]> Signed-off-by: Hao Chen <[email protected]> Co-authored-by: Richard Liaw <[email protected]> Co-authored-by: Hao Chen <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…ay-project#47988) Signed-off-by: bthananjeyan <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…t with Fluent Bit` to `Check the CloudWatch dashboard`. (ray-project#48936) Signed-off-by: win5923 <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
## Why are these changes needed? use a github org group instead of listing all the github ids <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Richard Liaw <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: hjiang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…ay-project#48964) After the effort to add Async GcsClient binding to Python, we allow both old and new code paths to live for a while and now it seems to be pretty stable. We never received any issues from customers about the NewGcsClient. Hence we remove the old one. --------- Signed-off-by: Ruiyang Wang <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: ujjawal-khare <[email protected]>
… thread-safety for MetricsLogger. (ray-project#48868) Signed-off-by: ujjawal-khare <[email protected]>
…ject#48643) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> When the dataset of webdataset consists of many tar files, if there is a failure in reading one of the files, it is necessary to print the name of the failed file. before this pr: ``` ValueError: fx001.jpg: duplicate file name in tar file jpg dict_keys(['__key__', 'jpg', 'json']) ``` after this pr: ``` ValueError: fx001.jpg: duplicate file name in tar file jpg dict_keys(['__key__', 'jpg', 'json']), in emr-autotest/datasets/jkj_10/1b63122e734064926c47a2ccdcbaafd3.tar ``` <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: jukejian <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
Signed-off-by: Jiajun Yao <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
We don't have a release test that tracks the performance of writing data. This PR adds one. --------- Signed-off-by: Balaji Veeramani <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
…d custom tcp-capable EnvRunner. (ray-project#49033) Signed-off-by: ujjawal-khare <[email protected]>
Upgrade CI to use pyarrow 18, the latest released version from Pyarrow. Signed-off-by: Scott Lee <[email protected]> Co-authored-by: Cuong Nguyen <[email protected]> Signed-off-by: ujjawal-khare <[email protected]>
ujjawal-khare
requested review from
hongpeng-guo,
justinvyu,
matthewdeng,
raulchen,
woshiyyya,
a team,
maxpumperla,
pcmoritz,
kevin85421,
sven1977,
simonsays1980,
SongGuyang,
edoakes,
kfstorm,
richardliaw,
aslonnie,
thomasdesr and
hongchaodeng
as code owners
December 17, 2024 19:58
Closing this for cleaning commit history |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related issue number
Closes #49036
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.