Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Slurm agent #3005

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

JiangJiaWei1103
Copy link
Contributor

Tracking issue

flyteorg/flyte#5634

Why are the changes needed?

What changes were proposed in this pull request?

Support naive synchronous create and get methods of slurm agent based on slurm REST API (slurmrestd):

  1. create: Submit a job to the slurm cluster
  2. get: Check the job state

How was this patch tested?

At this early stage, we test these two simple methods in the local development environment:

  1. OS: Ubuntu 20.04
  2. slurm: Setup slurmctld, slurmd, slurmdbd, and slurmrestd on the same machine
    • The slurm agent interacts with slurmrestd through the host base url http://localhost:6820/.
  3. flytekit: Test slurm agent locally following this guide

We plan to write a single-host setup tutorial and organize useful resources here.

Following demonstrates a simple slurm task:

# tiny_slurm.py
import os
from typing import Any, Dict

from flytekit import workflow
from flytekitplugins.slurm import SlurmTask


slurm_tiny_job = SlurmTask(
    name="demo-slurm",
    slurm_config={
        "script": "#!/bin/bash\necho Hello Slurm Agent!",
        "account": "flyte",
        "partition": "debug",
        "name": "hello-slurm-agent",
        "environment": ["PATH=/bin/:/sbin/:/usr/bin/:/usr/sbin/"],
        "current_working_directory": "/tmp"
    },
)


@workflow
def hi_slurm(dummy: str) -> Dict[str, Any]:
    """Return slurm job information."""
    res = slurm_tiny_job(dummy=dummy)

    return res


if __name__ == "__main__":
    from flytekit.clis.sdk_in_container import pyflyte
    from click.testing import CliRunner

    runner = CliRunner()
    path = os.path.realpath(__file__)

    # Local run
    print(f">>> LOCAL EXEC <<<")
    result = runner.invoke(pyflyte.main, ["run", path, "hi_slurm", "--dummy", "dummy_input"])
    print(result.output)

The test result is shown as follows:
Screenshot 2024-12-16 at 11 22 22 PM

Setup process

As stated above

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

1 participant