-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Using HashMethod annotation for numpy arrays fails #5630
Comments
Thank you for opening your first issue here! 🛠 |
The issue you're experiencing with using To address this, you might need to implement a custom transformer for numpy arrays that can handle the Would you like more details on how to implement a custom transformer for numpy arrays in Flytekit? Sources: This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not. Mention @RunLLM to ask follow-up questions. See runllm.com for more! |
@RunLLM Sadly most of your comment is repeating my arguments. Regarding writing a custom transformer, I think that because flyte is already providing a special transformer for numpy arrays and logic for a hash method, this error should be resolved on fltye's side instead of asking users to fix it themselves with yet another custom transformer. But could you perhaps elaborate, why flyte has a customer transformer for numpy arrays in the first place? |
Flyte has a custom transformer for numpy arrays to efficiently handle their serialization and deserialization, ensuring compatibility with Flyte's type system and storage mechanisms. This transformer, The error you're encountering arises because the Would you like more details on how to potentially modify the existing Sources: This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not. Mention @RunLLM to ask follow-up questions. See runllm.com for more! |
@RunLLM How does flyte handle data structures with no specialized transformers like pandas dataframes compared to the customized handling of numpy arrays? |
Flyte handles data structures without specialized transformers, like pandas dataframes, by using a more generic approach. For instance, pandas dataframes are typically serialized to a file format (e.g., Parquet) and stored in a remote location, with only the file path being stored as metadata. This is managed by the In contrast, numpy arrays have a specialized transformer, Would you like more details on how to potentially modify the existing Sources: This is an AI-generated response and your feedback is appreciated! Please leave a 👍 if this is helpful and 👎 if it is not. Mention @RunLLM to ask follow-up questions. See runllm.com for more! |
Describe the bug
Following your guide for caching of tasks based on non-literal inputs, but replacing the dataframe in the guide simply with an numpy array, leads to an error:
I used the following hash function:
and used it as
Expected behavior
Successfully produce a caching of the
bar
task and a cache hit in subsequent runs.Additional context to reproduce
I already stepped a bit through the flytekit code with a debugger and found its coming down to numpy arrays having a special transformer type logic in flytekit.
The hashing value is produced successfully internally, but a few code lines down the line hit the
to_literal
function ofNumpyArrayTransformer
, which seems to be unable to handle annotated types.It seems dataframes from pandas don't get this special treatment.
Maybe you could also clarify here in this issue for me, why we need or want a special treatment for np arrays when it is still moving the data to disk and only stores a filepath as meta data of the output. Isn't that how all compound data structures are handled by default, e.g. offloading them to the disk and only storing a file pointer?
Screenshots
No response
Are you sure this issue hasn't been raised already?
Have you read the Code of Conduct?
The text was updated successfully, but these errors were encountered: