-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#5201] feat(client-python): Implement expressions in python client #5646
Conversation
clients/client-python/gravitino/api/expressions/named_reference.py
Outdated
Show resolved
Hide resolved
clients/client-python/gravitino/api/expressions/named_reference.py
Outdated
Show resolved
Hide resolved
clients/client-python/gravitino/api/expressions/function_expression.py
Outdated
Show resolved
Hide resolved
hi @SophieTech88 I will help you improve this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we split this into a few smaller PRs.
pass | ||
|
||
@abstractmethod | ||
def data_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like that the above two methods are be implemented by subclasses anyway.
If that is true, I don't think we we do a pass
here.
We may want to raise an exception instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree. Just updated the code to raise NotImplementedError() for those 2 functions.
clients/client-python/gravitino/api/expressions/Literals/literals.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest we split this into a few smaller PRs.
Agree with @tengqm, the current PR is too large to review.
As this PR is focused on expressions, I suggest moving distributions
, sorts
, and transforms
to separate PRs. This will make it easier for us to review this PR. WDYT? @SophieTech88
Of course. Just update the PR, removed |
|
||
_value: Union[int, float, str, datetime, time, date, bool, Decimal, None] | ||
_data_type: ( | ||
str # TODO: Need implement `api/src/main/java/org/apache/gravitino/rel/types` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already implemented types clients/client-python/gravitino/api/types.py
and type clients/client-python/gravitino/api/type.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Yeah. Do I need to import and list all types in this _data_type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just update the _data_type in the new commit
"""Creates a literal with the given type value.""" | ||
|
||
_value: Union[int, float, Decimal, str, date, time, datetime, bool, None] | ||
_data_type: Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems should be _data_type: Type
? @xunliu cc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I improve this part codes
@SophieTech88 @noidname01 Please help me review, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes for the types things, LGTM. Many thanks!
class Expression(ABC): | ||
"""Base class of the public logical expression API.""" | ||
|
||
EMPTY_EXPRESSION: List[Expression] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might cause unexpected behaviour if it will be modified at any moment (like append, delete, e.g.), because this list would share between classes, any modification would affect where it exists once and for all at the same time. Unless we're sure that the scenario above would not happen, I think it's better to use immutable object here or some other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @noidname01
The EMPTY_EXPRESSION
value does not fill in any data.
So, I think we didn't need to modify this code, what's do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, then overall LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unnecessary since #5354 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mchades I removed Decimal
type. Please help me review again, Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Do you want to have another look? @tengqm
LGTM. |
hi @tengqm |
A concrete implementation of the FunctionExpression interface. | ||
""" | ||
|
||
_function_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name
as the property name rather than _function_name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, given an instance impl
of class FuncExpressionImpl
,
the following calls are still pretty clear without the complex names:
def __init__(self, name: str, arguments: List[Expression]):
....
def name(self) -> str:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name as the property name rather than _function_name.
and
similarly, given an instance impl of class FuncExpressionImpl,
the following calls are still pretty clear without the complex names:
hi @tengqm
Will want to keep Python code consistent with Java code, https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/rel/expressions/FunctionExpression.java#L66
@abstractmethod | ||
def function_name(self) -> str: | ||
"""Returns the function name.""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted previously, please reconsider all these pass
cases ...
do we really want to silently succeed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, You are right, I will fix this problem.
def __init__( | ||
self, | ||
value: T, | ||
data_type: Type, | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def __init__( | |
self, | |
value: T, | |
data_type: Type, | |
): | |
def __init__(self, value: T, data_type: Type): |
clients/client-python/gravitino/api/expressions/named_reference.py
Outdated
Show resolved
Hide resolved
@@ -123,7 +123,7 @@ def get(cls) -> "ShortType": | |||
return cls(True) | |||
|
|||
@classmethod | |||
def unsigned(cls): | |||
def unsigned(cls) -> "ShortType": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help share with me what this syntax is?
I mean ... why the return type is quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ShortType(IntegralType):
_instance: "ShortType" = None
_unsigned_instance: "ShortType" = None
The type annotation ShortType is quoted because _instance
is an attribute of the class that references ShortType, and the class itself isn't fully defined at the time this attribute is declared.
The return type annotations (-> "ShortType") are also quoted for the same reason: ShortType
isn't fully defined when these methods are declared.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we remove the quote, there is a Pylance errors, and it won't pass the pylint check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Em... I am not a big fan of passing lint checking simply for passing's purpose.
At the end of the day, we are the maintainers to maintain these code.
If lint rules are not appropriate in our context, we may want to disable that
particular rule. It is not an uncommon practice.
At the same time, if we are forced to "hack" the code to make the linter happy,
we may need to revise our design. Probably there are more straightforward ways
to achieve the same end.
|
||
from __future__ import annotations | ||
from abc import ABC, abstractmethod | ||
from typing import List, Set, TYPE_CHECKING |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm new to this typing
thing.
But according to Python docs, these alias are all deprecated in Python 3.9.
cf. https://docs.python.org/3/library/typing.html#deprecated-aliases
It might not be a good idea to use them, IIUC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. List, Set are deprecated from typing in Python 3.9. We can use list and set directly.
The Integration Test failure is related to the List typing things in literal.py file. |
hi @tengqm We are improving PR base on your comments, |
# under the License. | ||
|
||
from abc import abstractmethod | ||
from typing import List, TypeVar, Generic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to kill 'List' here.
return f"LiteralImpl(value={self._value}, data_type={self._data_type})" | ||
|
||
|
||
class Literals: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already have a 'Literal' class in another file,
defining a new class called 'Literals' may confuse other developers/users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further more, it doesn't make a lot senses to me to define a class as a wrapper
for so many class methods.
Python is not a pure object-oriented language, after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @tengqm I know what you mean.
We also refer to Iceberg's Python client code style.
https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/name_mapping.py
At present, there are not enough experienced Python engineers in the community, which is the result of the best efforts of the community contributors.
We keep the python client code consistent with the java code to ensure that the functionality is consistent.
So I think we didn't modify these codes.
When I use
@tengqm @noidname01 Do you have any suggestions to fix this problem? I rollback this change?? |
hi @tengqm |
In that case, we have to stick to the old way. |
Just check the github workflow for this project: https://github.com/apache/gravitino/blob/main/.github/workflows/python-integration-test.yml#L76 Looks like we are testing against 3.8, 3.9, 3.10, 3.11. |
This reverts commit d3dc38a.
hi @tengqm :-) |
lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am merging this into the main branch. Thank you all for your efforts on this PR!
@SophieTech88 @tengqm @noidname01 @xunliu
…5646) ### What changes were proposed in this pull request? Implement expression from java, including: - Expression.java - FunctionExpression.java - NamedReference.java - UnparsedExpression.java - literals/ convert to python client, and add unit test for each class. ### Why are the changes needed? We need to support the expressions in python client Fix: #5201 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Need to pass all unit tests. --------- Co-authored-by: Xun <[email protected]>
What changes were proposed in this pull request?
Implement expression from java, including:
convert to python client, and add unit test for each class.
Why are the changes needed?
We need to support the expressions in python client
Fix: #5201
Does this PR introduce any user-facing change?
No
How was this patch tested?
Need to pass all unit tests.