Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Maximum Recursion depth exceeded when running with --remote-flag #6147

Closed
2 tasks done
HansBambel opened this issue Jan 8, 2025 · 3 comments
Closed
2 tasks done
Assignees
Labels
bug Something isn't working flytekit FlyteKit Python related issue

Comments

@HansBambel
Copy link
Contributor

HansBambel commented Jan 8, 2025

Describe the bug

I am currently trying out Flyte and trying to run a local kubernetes cluster.

To include all dependencies that are needed for our workflows and reuse our package-manager (uv) I am creating my own docker image (Dockerfile attached below) that is supposed to be used for the workflow. I created an example workflow that is running fine without the --remote flag, but produces a Maximum recursion depth exceeded error when running with the flag.

workflows/pipeline.py

import datetime
from pathlib import Path
from time import sleep

import flytekit
from flytekit import task, workflow, FlyteDirectory
import polars as pl

@task()
def create_initial_dataset_task() -> pl.DataFrame:
    df = pl.DataFrame(
        {
            "learner_int_id": [1, 1, 1, 1],
            "content_int_id": [10, 20, 30, 40],
            "score": [0.5, 0.6, 0.7, 0.8],
            "outcome_int": [2, 0, 1, 1],
            "play_rank": [1, 2, 3, 4],
        }
    )
    return df

@task()
def create_second_dataset_task() -> pl.DataFrame:
    df = pl.DataFrame(
        {
            "learner_int_id": [2, 2, 2, 2],
            "content_int_id": [10, 20, 30, 40],
            "score": [0.5, 0.6, 0.7, 0.8],
            "outcome_int": [2, 0, 1, 1],
            "play_rank": [1, 2, 3, 4],
        }
    )
    return df

@task()
def combine_datasets_task(dataset_1: pl.DataFrame, dataset_2:pl.DataFrame) -> pl.DataFrame:
    """Combine datasets from a folder into a single parquet file."""
    combined = pl.concat([dataset_1, dataset_2])
    return combined

@task()
def train_model_task(dataset: pl.DataFrame) -> dict:
    for i in range(10):
        print(f"Epoch {i}...")
        sleep(3)

    print("Model trained!")
    return {"status": "trained", "Other params": {"lr": 0.001, "drop-out": 0.5}}

@task()
def evaluate_model_task(model: dict) -> dict:
    print("Testing model...")
    lr = model["Other params"]["lr"]
    for i in range(10):
        print((i + 1) * lr)
    return {"datetime": datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%S"), "status": "evaluated",
            "my-metric": {"lr": lr, "accuracy": 0.9}}


@task()
def create_folder(name: str) -> FlyteDirectory:
    folder = Path(flytekit.current_context().working_directory) / name
    folder.mkdir(parents=True, exist_ok=True)
    return FlyteDirectory(path=str(folder))

@workflow
def pipeline() -> None:
    ds1 = create_initial_dataset_task()
    ds2 = create_second_dataset_task()
    dataset = combine_datasets_task(ds1, ds2)
    model = train_model_task(dataset)
    metrics = evaluate_model_task(model)


if __name__ == '__main__':
    # folder = Path("data") / datetime.datetime.now().strftime("%Y-%m-%d--%H-%M-%S")
    pipeline()

Dockerfile

FROM python:3.12-slim-bookworm
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

# Install the project into `/app`
#WORKDIR /app

# Then, add the rest of the project source code and install it
# Installing separately from its dependencies allows optimal layer caching
COPY pyproject.toml pyproject.toml
COPY uv.lock uv.lock
RUN uv sync --frozen

# Place executables in the environment at the front of the path
# add /app/ in front for it to activate the environment
ENV PATH=".venv/bin:$PATH"

COPY src .
#COPY config.py .
COPY workflows .

# This tag is supplied by the build script and will be used to determine the version
# when registering tasks, workflows, and launch plans
ARG tag
ENV FLYTE_INTERNAL_IMAGE=$tag

# Reset the entrypoint, don't invoke `uv`
# This seems to work though
ENTRYPOINT ["uv", "run"]

Traceback:

Trace:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/pathlib.py", line 441, in __str__
    return self._str
           ^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute '_str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/pathlib.py", line 555, in drive
    return self._drv
           ^^^^^^^^^
AttributeError: 'PosixPath' object has no attribute '_drv'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/.venv/lib/python3.12/site-packages/flytekit/bin/entrypoint.py", line 164, in _dispatch_execute
    task_def = load_task()
               ^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/bin/entrypoint.py", line 583, in load_task
    return resolver_obj.load_task(loader_args=resolver_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/utils.py", line 312, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/python_auto_container.py", line 271, in load_task
    task_module = importlib.import_module(name=task_module)  # type: ignore
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1387, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1360, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1331, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 935, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 999, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/workflows/pipeline.py", line 9, in <module>
    @task()
     ^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/task.py", line 359, in wrapper
    task_instance = TaskPlugins.find_pythontask_plugin(type(task_config))(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 82, in __call__
    o = super(InstanceTrackingMeta, cls).__call__(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/python_function_task.py", line 139, in __init__
    name, _, _, _ = extract_task_module(task_function)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 382, in extract_task_module
    mod_name = get_full_module_path(mod, mod_name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 391, in get_full_module_path
    new_mod_name = _mod_sanitizer.get_absolute_module_name(inspect.getabsfile(mod), package_root)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 328, in get_absolute_module_name
    return self._resolve_abs_module_name(path, package_root)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 318, in _resolve_abs_module_name
    mod_name = self._resolve_abs_module_name(dirname, package_root)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 318, in _resolve_abs_module_name
    mod_name = self._resolve_abs_module_name(dirname, package_root)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 318, in _resolve_abs_module_name
    mod_name = self._resolve_abs_module_name(dirname, package_root)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 964 more times]
  File "/.venv/lib/python3.12/site-packages/flytekit/core/tracker.py", line 294, in _resolve_abs_module_name
    if not Path(dirname).is_dir():
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 875, in is_dir
    return S_ISDIR(self.stat().st_mode)
                   ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 840, in stat
    return os.stat(self, follow_symlinks=follow_symlinks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 448, in __fspath__
    return str(self)
           ^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 443, in __str__
    self._str = self._format_parsed_parts(self.drive, self.root,
                                          ^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 557, in drive
    self._load_parts()
  File "/usr/local/lib/python3.12/pathlib.py", line 415, in _load_parts
    drv, root, tail = self._parse_path(path)
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/pathlib.py", line 395, in _parse_path
    drv, root, rel = cls._flavour.splitroot(path)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RecursionError: maximum recursion depth exceeded

Message:

RecursionError: maximum recursion depth exceeded

Expected behavior

Execution of the workflow should work without the --remote flag as well as with it.

Running Execution on local.
Epoch 0...
Epoch 1...
Epoch 2...
Epoch 3...
Epoch 4...
Epoch 5...
Epoch 6...
Epoch 7...
Epoch 8...
Epoch 9...
Model trained!
Testing model...
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009000000000000001
0.01

Additional context to reproduce

  1. docker build --tag localhost:30000/toy-pipeline:latest .
  2. docker push localhost:30000/toy-pipeline:latest
  3. Run the workflow without --remote: pyflyte run --image localhost:30000/toy-pipeline:latest -p toy-pipeline -d development workflows/pipeline.py pipeline
  4. Start a cluster locally: flytectl demo start
  5. Adding --remote-flag: pyflyte run --image localhost:30000/toy-pipeline:latest --remote -p toy-pipeline -d development workflows/pipeline.py pipeline

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@HansBambel HansBambel added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Jan 8, 2025
Copy link

welcome bot commented Jan 8, 2025

Thank you for opening your first issue here! 🛠

@HansBambel HansBambel changed the title [BUG] Maximum Recoursion depth exceeded when running with --remote-flag [BUG] Maximum Recursion depth exceeded when running with --remote-flag Jan 9, 2025
@wild-endeavor wild-endeavor self-assigned this Jan 9, 2025
@wild-endeavor wild-endeavor added flytekit FlyteKit Python related issue and removed untriaged This issues has not yet been looked at by the Maintainers labels Jan 9, 2025
@wild-endeavor
Copy link
Contributor

Actually I can't repro this...

I tried following the steps but they worked for me. I don't have your pyproject or uv.lock file so i put in one of our own. I've pushed the image to ghcr.io/wild-endeavor/yt_public:hans_v1 if you want to pull it and inspect it.

I was thinking that it might be related to running in / so I also updated the dockerfile to COPY workflows workflows instead of COPY workflows . (this i pushed to ghcr.io/wild-endeavor/yt_public:hans_v2) but also works.

where i think this must be falling into is this here: https://github.com/flyteorg/flytekit/blob/master/flytekit/core/tracker.py#L287

this dirname call always returns something different until you get to / - at that point it just continues to return /. I suspect that's what's happening but not sure why i'm not seeing it.

maybe you can take a look and see what's different about the image?

also, why copy the code into the image (vs using fast register?)

@HansBambel
Copy link
Contributor Author

HansBambel commented Jan 10, 2025

Oh wow, changing

COPY src .
#COPY config.py .
COPY workflows .

to

COPY src src
#COPY config.py .
COPY workflows workflows

fixed it. Thanks!

also, why copy the code into the image (vs using fast register?)

What do you mean by this?

EDIT: Found something about fast registration in the docs. I didn't know this existed. Thanks for the info. I might try it out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue
Projects
Status: Done
Development

No branches or pull requests

2 participants