Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] FlyteDirectory fails for Azure #5541

Closed
2 tasks done
Tom-Newton opened this issue Jul 5, 2024 · 5 comments
Closed
2 tasks done

[BUG] FlyteDirectory fails for Azure #5541

Tom-Newton opened this issue Jul 5, 2024 · 5 comments
Labels
bug Something isn't working flytekit FlyteKit Python related issue untriaged This issues has not yet been looked at by the Maintainers

Comments

@Tom-Newton
Copy link
Contributor

Tom-Newton commented Jul 5, 2024

Describe the bug

Imagine:

@workflow
def test_workflow(directory: FlyteDirectory):
    ...

When using Azure, launching this workflow with a directory containing more than one file causes

Invalid value for '--directory': Failed to convert param: <Option directory>, value: /path/on/local/ to type: <class 'flytekit.types.directory.types.FlyteDirectory'>. Reason USER:AssertionError: error=Failed to put data from      │
│ /path/on/local/ to flyte://data/10ba872c9015435996c14abeeb15f846 (recursive=True).                                                                                                                                                    │
│                                                                                                                                                                                                                                                                              │
│ Original exception: unable to connect to account for Must provide either a connection_string or account_name with credentials!!, cause=unable to connect to account for Must provide either a connection_string or account_name with credentials!!

Expected behavior

I would expect this to run without error.

Root cause

The files are actually uploaded fine using FlyteFS. The problem is when it calls FlyteFS.extract_common() which comes after successful upload. This contains the snippet

        fs = fsspec.filesystem(get_protocol(native_urls[0]))
        sep = fs.sep

The problem is that adlfs.AzureBlobFileSystem() fails because the account_name is unknown. I'm thinking it would be best to avoid needing to initialise an fsspec filesystem just to determine the separator used on the filesystem in question.

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@Tom-Newton Tom-Newton added bug Something isn't working untriaged This issues has not yet been looked at by the Maintainers labels Jul 5, 2024
@Tom-Newton
Copy link
Contributor Author

I think we can straightforwardly switch from fsspec.filesystem to fsspec.get_filesystem_class. This avoids initialising the filesystem object, which is unnecessary because sep is a class attribute. I will make a PR.

@Tom-Newton Tom-Newton added the flytekit FlyteKit Python related issue label Jul 5, 2024
@Tom-Newton
Copy link
Contributor Author

Made a PR that fixes this flyteorg/flytekit#2564

Copy link

runllm bot commented Jul 5, 2024

The issue you're encountering is related to Flyte's handling of FlyteDirectory with Azure Blob Storage. The error arises when fsspec attempts to initialize an AzureBlobFileSystem without the necessary credentials, specifically during the call to FlyteFS.extract_common().

To work around this issue, you can ensure that the necessary Azure credentials are provided. One way to do this is by setting the AZURE_STORAGE_CONNECTION_STRING environment variable in your pod template.

Here is an example of how you can set this environment variable in your pod template:

env:
  - name: AZURE_STORAGE_CONNECTION_STRING
    valueFrom:
      secretKeyRef:
        name: flyteruntimesecret

Would you like more details on setting up the environment or addressing this issue in another way?

Sources:


This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not.

Mention @RunLLM to ask follow-up questions. See runllm.com for more!

@Tom-Newton
Copy link
Contributor Author

The bot hasn't fully understood. The problem at least for me is only when launching an execution so setting the environment variable in the pod template doesn't help. It also doesn't understand that its unnecessary to initialise the filesystem object.

@thomasjpfan
Copy link
Member

Closing as this was resolved in flyteorg/flytekit#2564

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working flytekit FlyteKit Python related issue untriaged This issues has not yet been looked at by the Maintainers
Projects
Status: Done
Development

No branches or pull requests

2 participants