Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow a hash method to be present for numpy arrays #2649

Merged

Conversation

demmerichs
Copy link
Contributor

Tracking issue

Closes flyteorg/flyte#5630

Why are the changes needed?

See above issue.

What changes were proposed in this pull request?

Code diff is minimal, changes were made in a consistent way, following code from structured_dataset.py, specifically the extract_cols_and_format function.

How was this patch tested?

I only tested this with my custom setup, modifying the files in the site-packages in place. I'll leave proper testing up to the devs.

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

welcome bot commented Aug 3, 2024

Thank you for opening this pull request! 🙌

These tips will help get your PR across the finish line:

  • Most of the repos have a PR template; if not, fill it out to the best of your knowledge.
  • Sign off your commits (Reference: DCO Guide).

elif isinstance(aa, HashMethod):
continue
else:
raise TypeTransformerFailedError(f"{t}'s metadata needs to be of type kwtypes.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise TypeTransformerFailedError(f"{t}'s metadata needs to be of type kwtypes.")
raise TypeTransformerFailedError(f"The metadata for {t} must be of type kwtypes or HashMethod.")

shouldn't you also import HashMethod from flytekit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot the import in the PR, probably the reason for some (but definitely not all) test fails.

@samhita-alla Quick question here, you added HashMethod in error message, but how does kwtypes relate to the code, where the check looks for an OrderedDict, not kwtypes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, flytekit's kwtypes is an ordered dict. let's replace kwtypes with ordereddict.

@samhita-alla
Copy link
Contributor

samhita-alla commented Aug 5, 2024

could you add some test cases to the https://github.com/flyteorg/flytekit/blob/master/tests/flytekit/unit/types/numpy/test_ndarray.py file? looks like the existing test cases are failing. would you mind taking a look at the failures as well?

@demmerichs
Copy link
Contributor Author

demmerichs commented Aug 5, 2024

@samhita-alla I added some tests, and I guess numpy UTs were failing because my check for an existing metadata was flawed.
Locally I ran the tests and had no fails in the numpy types test file, however 14 other tests failed, probably because I have not the exact setup (remote tests etc.).
Please let me know if you think this PR can be further improved.

FYI: Running make lint introduced some white space changes in ndarray.py, I committed them?!

@wild-endeavor
Copy link
Contributor

i think this error is just because of error formatting between the different python versions. i can repro locally using 3.12. a possibly heavyhanded patch would be something like

diff --git a/tests/flytekit/unit/types/numpy/test_ndarray.py b/tests/flytekit/unit/types/numpy/test_ndarray.py
index 3a75a65cf..92e57eb37 100644
--- a/tests/flytekit/unit/types/numpy/test_ndarray.py
+++ b/tests/flytekit/unit/types/numpy/test_ndarray.py
@@ -3,6 +3,7 @@ from typing_extensions import Annotated
 
 from flytekit import HashMethod, kwtypes, task, workflow
 from flytekit.core.type_engine import TypeTransformerFailedError
+import re
 
 
 @task
@@ -106,17 +107,19 @@ def wf():
     try:
         t6_annotate_kwtypes_twice(array=array_1d)
     except TypeTransformerFailedError as err:
-        assert (
-            str(err).split("\n")[-1].strip()
-            == "Metadata OrderedDict([('allow_pickle', True)]) is already specified, cannot use OrderedDict([('allow_pickle', False)])."
-        )
+        err_txt = str(err).split("\n")[-1].strip()
+        err_txt = re.sub("\W", "", err_txt)
+        expected = "Metadata OrderedDict([('allow_pickle', True)]) is already specified, cannot use OrderedDict([('allow_pickle', False)])."
+        expected = re.sub("\W", "", expected)
+        assert err_txt == expected
     try:
         t7_annotate_with_sth_strange(array=array_1d)
     except TypeTransformerFailedError as err:
-        assert (
-            str(err).split("\n")[-1].strip()
-            == "The metadata for typing.Annotated[numpy.ndarray, (1, 2, 3)] must be of type kwtypes or HashMethod."
-        )
+        err_txt = str(err).split("\n")[-1].strip()
+        err_txt = re.sub("\W", "", err_txt)
+        expected = "The metadata for typing.Annotated[numpy.ndarray, (1, 2, 3)] must be of type kwtypes or HashMethod."
+        expected = re.sub("\W", "", expected)
+        assert err_txt == expected
     try:
         generate_numpy_fails()
     except Exception as e:

@wild-endeavor
Copy link
Contributor

also the error matching doesn't need to be that precise. it's okay just to get a few keywords in there.

@demmerichs
Copy link
Contributor Author

only the formating of OrderedDict changed a bit in python 3.12

I repleaced the exact formatting with a regex expression. This should now fix it.

Copy link

codecov bot commented Aug 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.62%. Comparing base (72da0d0) to head (db4a59e).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2649      +/-   ##
==========================================
- Coverage   78.77%   74.62%   -4.15%     
==========================================
  Files         187      187              
  Lines       19149    19160      +11     
  Branches     3993     3997       +4     
==========================================
- Hits        15085    14299     -786     
- Misses       3374     4156     +782     
- Partials      690      705      +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wild-endeavor wild-endeavor merged commit 9321bc2 into flyteorg:master Aug 6, 2024
97 of 99 checks passed
Copy link

welcome bot commented Aug 6, 2024

Congrats on merging your first pull request! 🎉

mao3267 pushed a commit to mao3267/flytekit that referenced this pull request Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Using HashMethod annotation for numpy arrays fails
3 participants