-
Notifications
You must be signed in to change notification settings - Fork 549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Sync down logs #3063
base: master
Are you sure you want to change the base?
[Serve] Sync down logs #3063
Conversation
…s method (avoid circular import)
Requesting review @cblmemo @MaoZiming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dtran24 for this exciting feature!! Left some preliminary comments 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt fix!! 🚀 It looks mostly great to me. Left some final comments
launch_log_file) | ||
replica_info = serve_state.get_replica_info_from_id( | ||
service_name, replica_id) | ||
if replica_info is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For replica that has been normally terminated (scale down), we don't keep the record in database but we will keep its log file under the service directory. So it is likely the replica_info is None
IIUC. Should we just continue
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside this for
loop, we are only dealing with launch log files, which shouldn't be relevant to replicas that have been normally terminated. For replicas that have been normally terminated, we manage them here
skypilot/sky/serve/serve_utils.py
Lines 657 to 662 in 1eb3a53
terminated_replica_log_files = [ | |
file for file in log_files if not file.endswith('_launch.log') and | |
has_valid_replica_id(file, target_replica_id) | |
] | |
for file_path in terminated_replica_log_files: | |
shutil.copy(file_path, dir_for_download) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, could you address the merge conflict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned, we might want to move those code under sky/serve
.
A bump for this PR as a user is requesting this feature. |
Hey! I was trying to explore on the log capturing part. Is there a way to continuously sync down the logs. |
Hi @aj-ya ! Thanks for your interest in this PR. For now, one workaround would be having a screen session and redirecting the output of sky serve logs service-name --controller > ~/serve-logs/controller.log 2>&1 This PR is on our roadmap and we'll try to prioritize it based on your request. Thanks! |
Fixes #2914
Tested (run the relevant ones):
bash format.sh
PROVISIONING
stateREADY
replicasREADY
, and another replica terminated (e.g. throughsky cancel
on controller)pytest tests/test_smoke.py
pytest tests/test_smoke.py::test_fill_in_the_name
bash tests/backward_comaptibility_tests.sh