Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Slurm Deployment For Xorbits #719

Merged
merged 75 commits into from
Oct 31, 2023
Merged

Conversation

fengsxy
Copy link
Contributor

@fengsxy fengsxy commented Sep 25, 2023

What do these changes do?

In the past, if the user need to deploy xorbits on the slurm, they need to construct bash code by themselves. And it's so hard to debug for the slurm systems, because the output file should only output in the shared file.
So with the help from xorbits, I have constructed Class which is named Slurm, which could generate the slurm.sh and sbatch it automatically. What's more, the docker with slurm cluster test environment has been completed.

Related issue number

Fixes #315

Check code requirements

  • add a class that could generate slurm batch code
  • add the function which could sbatch the code
  • add the function when the programming end just cancel the xorbits on slurm
  • add a environment that ensure the slurm cluster with xorbits
  • add test on generate slurm batch code
  • add test on sbatch script and get the true address and sbatch
  • completed related documents
  • Ensure all linting tests pass

@XprobeBot XprobeBot added this to the v0.6.3 milestone Sep 25, 2023
@ChengjieLi28 ChengjieLi28 changed the title Slurm Deployment For Xorbits FEAT: Slurm Deployment For Xorbits Sep 25, 2023
@ChengjieLi28
Copy link
Contributor

Please fix your python lint firstly. Refer https://doc.xorbits.io/en/latest/development/contributing_codebase.html#pre-commit to use pre-commit to format your code before git commit.

.github/workflows/cluster.yaml Outdated Show resolved Hide resolved
python/xorbits/cluster/Slurm.py Outdated Show resolved Hide resolved
python/xorbits/cluster/Slurm.py Outdated Show resolved Hide resolved
python/xorbits/cluster/Slurm.py Outdated Show resolved Hide resolved
python/xorbits/cluster/Slurm.py Outdated Show resolved Hide resolved
@XprobeBot XprobeBot modified the milestones: v0.6.3, v0.7.0 Sep 25, 2023
@codecov
Copy link

codecov bot commented Sep 26, 2023

Codecov Report

Merging #719 (a0da962) into main (b320ca3) will increase coverage by 11.03%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##             main     #719       +/-   ##
===========================================
+ Coverage   82.55%   93.58%   +11.03%     
===========================================
  Files        1058     1059        +1     
  Lines       79780    79781        +1     
  Branches    16504    16504               
===========================================
+ Hits        65861    74663     +8802     
+ Misses      11645     3443     -8202     
+ Partials     2274     1675      -599     
Flag Coverage Δ
unittests 93.47% <100.00%> (+11.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
python/xorbits/deploy/slurm/__init__.py 100.00% <100.00%> (ø)

... and 171 files with indirect coverage changes

.gitignore Outdated Show resolved Hide resolved
.pre-commit-config.yaml Outdated Show resolved Hide resolved
python/xorbits/deploy/cluster/Slurm.py Outdated Show resolved Hide resolved
python/xorbits/deploy/cluster/test.py Outdated Show resolved Hide resolved
python/xorbits/deploy/cluster/Slurm.py Outdated Show resolved Hide resolved
python/xorbits/deploy/cluster/Slurm.py Outdated Show resolved Hide resolved
fengsxy and others added 6 commits October 30, 2023 00:30
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
Signed-off-by: liddle rain <[email protected]>
@ChengjieLi28 ChengjieLi28 merged commit e77db37 into xorbitsai:main Oct 31, 2023
29 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FEAT: Support running on schedulers like Slurm
4 participants