Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move logic from TorchX CLI -> API, so MVAI can call it #955

Merged
merged 1 commit into from
Sep 11, 2024

Conversation

Sanjay-Ganeshan
Copy link
Contributor

Summary:
MVAI's "light" is synchronous - you can immediately see the logs for jobs you start. Only "fire" is asynchronous.

TorchX's API, since it's generic, always creates jobs that are asynchronous. Therefore, there isn't a built-in interface for "tailing" the stderr of every started process - just for tailing individual replicas of a given role.

The TorchX CLI's torchx run command has implemented this, but its implementation is coupled with the CLI implementations of torchx run and torchx log.

This diff extracts the useful logic into a helper function of the TorchX API

Reviewed By: andywag

Differential Revision: D62463211

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 11, 2024
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62463211

Copy link
Contributor

@andywag andywag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

Summary:
Pull Request resolved: pytorch#955

MVAI's "light" is synchronous - you can immediately see the logs for jobs you start. Only "fire" is asynchronous.

TorchX's API, since it's generic, *always* creates jobs that are asynchronous. Therefore, there isn't a built-in interface for "tailing" the stderr of every started process - just for tailing individual replicas of a given role.

The TorchX CLI's `torchx run` command **has** implemented this, but its implementation is coupled with the CLI implementations of `torchx run` and `torchx log`.

This diff extracts the useful logic into a helper function of the TorchX API

Reviewed By: andywag

Differential Revision: D62463211
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D62463211

@facebook-github-bot facebook-github-bot merged commit b7fd00b into pytorch:main Sep 11, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants