Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

providing validation for specialized FlyteFormat types #3834

Closed
2 tasks done
Tracked by #4064
zeryx opened this issue Jul 5, 2023 · 4 comments · Fixed by flyteorg/flytekit#1893
Closed
2 tasks done
Tracked by #4064

providing validation for specialized FlyteFormat types #3834

zeryx opened this issue Jul 5, 2023 · 4 comments · Fixed by flyteorg/flytekit#1893
Assignees
Labels
enhancement New feature or request flytekit FlyteKit Python related issue good first issue Good for newcomers hacktoberfest

Comments

@zeryx
Copy link

zeryx commented Jul 5, 2023

Motivation: Why do you think this is important?

This issue revolves around the fact that today, specialized filetypes within Flyte are not validated to actually conform to the requested type - just the file type metadata is provided to s3. This means I could store a png image as a JPEGImageFile object, which could end up adding a lot of confusion to end users who may expect JPEGImageFile types to be validated.

Goal: What should the final outcome look like, ideally?

Ideally, there should be a mechanism at runtime that validates that a particular file actually corresponds to the format requested. If I had a PDFFile type, I would expect to be able to ensure that the actual serialized file is a PDF type.

Describe alternatives you've considered

https://en.wikipedia.org/wiki/File_(command)

^ Previously used this in other projects to verify image format and filetype.

Checking file extensions is also a potential mechanism for validation (albeit a much worse one)

Propose: Link/Inline OR Additional context

example of a wf that succeeds (when it shouldn't)
https://gist.github.com/zeryx/31266bbe21d4dcfeca9f1b0e7dc3a883

Are you sure this issue hasn't been raised already?

  • Yes

Have you read the Code of Conduct?

  • Yes
@zeryx zeryx added the enhancement New feature or request label Jul 5, 2023
@eapolinario
Copy link
Contributor

The functionality provided by the file utility is present in a library called libmagic which has python bindings.

We could hook python-libmagic up with the aliases exported in the FlyteFile type transformers, more specifically in the to_literal.

@eapolinario eapolinario added good first issue Good for newcomers flytekit FlyteKit Python related issue labels Jul 7, 2023
@hhcs9527
Copy link

Hi @eapolinario,
I am interested in this topic but still new to flyte.
Can I try this issue?

@eapolinario
Copy link
Contributor

@hhcs9527 , for sure. I'm going to assign it to you. Feel free to ping me if you need to discuss it further or simply to review PRs, etc. Thanks!

@jasonlai1218
Copy link
Contributor

jasonlai1218 commented Oct 10, 2023

@hhcs9527 @pingsutw @eapolinario
I would like to work on this, can you please assign me and give me a chance to participate in open source together?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request flytekit FlyteKit Python related issue good first issue Good for newcomers hacktoberfest
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants