Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5201] feat(client-python): Implement expressions in python client #5646

Merged
merged 25 commits into from
Dec 5, 2024

Conversation

SophieTech88
Copy link
Contributor

@SophieTech88 SophieTech88 commented Nov 22, 2024

What changes were proposed in this pull request?

Implement expression from java, including:

  • Expression.java
  • FunctionExpression.java
  • NamedReference.java
  • UnparsedExpression.java
  • literals/

convert to python client, and add unit test for each class.

Why are the changes needed?

We need to support the expressions in python client

Fix: #5201

Does this PR introduce any user-facing change?

No

How was this patch tested?

Need to pass all unit tests.

@SophieTech88 SophieTech88 marked this pull request as draft November 22, 2024 06:24
@jerryshao
Copy link
Contributor

@mchades @xunliu would you please also help to review this PR?

@SophieTech88 SophieTech88 marked this pull request as ready for review November 25, 2024 05:21
@xunliu
Copy link
Member

xunliu commented Nov 25, 2024

hi @SophieTech88 I will help you improve this PR.
Please review my commit.

Copy link
Contributor

@tengqm tengqm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we split this into a few smaller PRs.

pass

@abstractmethod
def data_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like that the above two methods are be implemented by subclasses anyway.
If that is true, I don't think we we do a pass here.
We may want to raise an exception instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Just updated the code to raise NotImplementedError() for those 2 functions.

@xunliu xunliu changed the title [5201]Implement expressions in python client [5201] feat(client-python): Implement expressions in python client Nov 26, 2024
Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest we split this into a few smaller PRs.

Agree with @tengqm, the current PR is too large to review.

As this PR is focused on expressions, I suggest moving distributions, sorts, and transforms to separate PRs. This will make it easier for us to review this PR. WDYT? @SophieTech88

@SophieTech88
Copy link
Contributor Author

I'd suggest we split this into a few smaller PRs.

Agree with @tengqm, the current PR is too large to review.

As this PR is focused on expressions, I suggest moving distributions, sorts, and transforms to separate PRs. This will make it easier for us to review this PR. WDYT? @SophieTech88

Of course. Just update the PR, removed distributions, sorts, and transforms .


_value: Union[int, float, str, datetime, time, date, bool, Decimal, None]
_data_type: (
str # TODO: Need implement `api/src/main/java/org/apache/gravitino/rel/types`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you already implemented types clients/client-python/gravitino/api/types.py and type clients/client-python/gravitino/api/type.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Yeah. Do I need to import and list all types in this _data_type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just update the _data_type in the new commit

"""Creates a literal with the given type value."""

_value: Union[int, float, Decimal, str, date, time, datetime, bool, None]
_data_type: Union[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems should be _data_type: Type? @xunliu cc

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I improve this part codes
@SophieTech88 @noidname01 Please help me review, thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes for the types things, LGTM. Many thanks!

class Expression(ABC):
"""Base class of the public logical expression API."""

EMPTY_EXPRESSION: List[Expression] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause unexpected behaviour if it will be modified at any moment (like append, delete, e.g.), because this list would share between classes, any modification would affect where it exists once and for all at the same time. Unless we're sure that the scenario above would not happen, I think it's better to use immutable object here or some other methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @noidname01
The EMPTY_EXPRESSION value does not fill in any data.
So, I think we didn't need to modify this code, what's do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, then overall LGTM!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unnecessary since #5354 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mchades I removed Decimal type. Please help me review again, Thanks.

mchades
mchades previously approved these changes Dec 3, 2024
Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. Do you want to have another look? @tengqm

@jerryshao jerryshao changed the title [5201] feat(client-python): Implement expressions in python client [#5201] feat(client-python): Implement expressions in python client Dec 3, 2024
@jerryshao
Copy link
Contributor

LGTM.

@xunliu
Copy link
Member

xunliu commented Dec 4, 2024

hi @tengqm
Do you want to have another look?

A concrete implementation of the FunctionExpression interface.
"""

_function_name: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name as the property name rather than _function_name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, given an instance impl of class FuncExpressionImpl,
the following calls are still pretty clear without the complex names:

   def __init__(self, name: str, arguments: List[Expression]):
      ....

   def name(self) -> str:
      ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name as the property name rather than _function_name.

and

similarly, given an instance impl of class FuncExpressionImpl,
the following calls are still pretty clear without the complex names:

hi @tengqm
Will want to keep Python code consistent with Java code, https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/rel/expressions/FunctionExpression.java#L66

@abstractmethod
def function_name(self) -> str:
"""Returns the function name."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted previously, please reconsider all these pass cases ...
do we really want to silently succeed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, You are right, I will fix this problem.

Comment on lines 34 to 38
def __init__(
self,
value: T,
data_type: Type,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(
self,
value: T,
data_type: Type,
):
def __init__(self, value: T, data_type: Type):

@@ -123,7 +123,7 @@ def get(cls) -> "ShortType":
return cls(True)

@classmethod
def unsigned(cls):
def unsigned(cls) -> "ShortType":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help share with me what this syntax is?
I mean ... why the return type is quoted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    class ShortType(IntegralType):
        _instance: "ShortType" = None
        _unsigned_instance: "ShortType" = None

The type annotation ShortType is quoted because _instance is an attribute of the class that references ShortType, and the class itself isn't fully defined at the time this attribute is declared.

The return type annotations (-> "ShortType") are also quoted for the same reason: ShortType isn't fully defined when these methods are declared.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove the quote, there is a Pylance errors, and it won't pass the pylint check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Em... I am not a big fan of passing lint checking simply for passing's purpose.
At the end of the day, we are the maintainers to maintain these code.
If lint rules are not appropriate in our context, we may want to disable that
particular rule. It is not an uncommon practice.

At the same time, if we are forced to "hack" the code to make the linter happy,
we may need to revise our design. Probably there are more straightforward ways
to achieve the same end.


from __future__ import annotations
from abc import ABC, abstractmethod
from typing import List, Set, TYPE_CHECKING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to this typing thing.
But according to Python docs, these alias are all deprecated in Python 3.9.
cf. https://docs.python.org/3/library/typing.html#deprecated-aliases

It might not be a good idea to use them, IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. List, Set are deprecated from typing in Python 3.9. We can use list and set directly.

@SophieTech88
Copy link
Contributor Author

The Integration Test failure is related to the List typing things in literal.py file.

@xunliu
Copy link
Member

xunliu commented Dec 4, 2024

hi @tengqm We are improving PR base on your comments,
Please help me review it again, thanks!

# under the License.

from abc import abstractmethod
from typing import List, TypeVar, Generic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to kill 'List' here.

return f"LiteralImpl(value={self._value}, data_type={self._data_type})"


class Literals:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have a 'Literal' class in another file,
defining a new class called 'Literals' may confuse other developers/users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further more, it doesn't make a lot senses to me to define a class as a wrapper
for so many class methods.
Python is not a pure object-oriented language, after all.

Copy link
Member

@xunliu xunliu Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tengqm I know what you mean.
We also refer to Iceberg's Python client code style.
https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/name_mapping.py
At present, there are not enough experienced Python engineers in the community, which is the result of the best efforts of the community contributors.
We keep the python client code consistent with the java code to ensure that the functionality is consistent.
So I think we didn't modify these codes.

@xunliu
Copy link
Member

xunliu commented Dec 4, 2024

When I use list to replace List.
The TestLiterals throw these errors.

The error TypeError: 'type' object is not subscriptable occurs because the list type is not subscriptable in Python versions earlier than 3.9. To fix this, you should use List from the typing module instead of list.

@tengqm @noidname01 Do you have any suggestions to fix this problem? I rollback this change??

@xunliu
Copy link
Member

xunliu commented Dec 4, 2024

hi @tengqm
Gravitino Python client went to support from 3.8 to 3.11, So maybe we must use typing.List type to support low Python version.

@tengqm
Copy link
Contributor

tengqm commented Dec 4, 2024

hi @tengqm Gravitino Python client went to support from 3.7 to 3.10, So maybe we must use typing.List type to support low Python version.

In that case, we have to stick to the old way.
3.7-3.10 is a large range anyway.

@tengqm
Copy link
Contributor

tengqm commented Dec 4, 2024

Just check the github workflow for this project:

https://github.com/apache/gravitino/blob/main/.github/workflows/python-integration-test.yml#L76

Looks like we are testing against 3.8, 3.9, 3.10, 3.11.

This reverts commit d3dc38a.
@xunliu
Copy link
Member

xunliu commented Dec 4, 2024

hi @tengqm :-)
I revert remove List commit, Please help me review it again, Thanks!

@tengqm
Copy link
Contributor

tengqm commented Dec 4, 2024

lgtm

Copy link
Contributor

@mchades mchades left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am merging this into the main branch. Thank you all for your efforts on this PR!
@SophieTech88 @tengqm @noidname01 @xunliu

@mchades mchades merged commit 10fc5cd into apache:main Dec 5, 2024
23 checks passed
jerryshao pushed a commit that referenced this pull request Dec 6, 2024
…5646)

### What changes were proposed in this pull request?
Implement expression from java, including:
- Expression.java
- FunctionExpression.java
- NamedReference.java
- UnparsedExpression.java
- literals/

convert to python client, and add unit test for each class.

### Why are the changes needed?

We need to support the expressions in python client

Fix: #5201

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Need to pass all unit tests.

---------

Co-authored-by: Xun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support Expression system
6 participants