Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5201] feat(client-python): Implement expressions in python client #5646

Merged
merged 25 commits into from
Dec 5, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
bade12b
[5201]Implement expressions in python client
SophieTech88 Nov 22, 2024
3fc7400
Update the script for named_reference and function_expression
SophieTech88 Nov 23, 2024
00d31e8
Update the distribution.py
SophieTech88 Nov 24, 2024
c21bba5
Update literals.py
SophieTech88 Nov 24, 2024
361acbc
Update sorts.py
SophieTech88 Nov 24, 2024
07a8a87
Update the transforms.py
SophieTech88 Nov 25, 2024
2bc8983
Fix a f string bug for distributions.py
SophieTech88 Nov 25, 2024
4bac456
Update the comment for Partition class and so on
SophieTech88 Nov 25, 2024
40a14ee
improve model structure
xunliu Nov 25, 2024
2a25eb4
Update the script and fix a name bug
SophieTech88 Nov 26, 2024
3bc79ca
Update the Literal class to raise NotImplementedError
SophieTech88 Nov 26, 2024
9fe8d6c
remove Literals folder
xunliu Nov 26, 2024
cbb2a55
Add literals folder
xunliu Nov 26, 2024
624929e
Update licenses headers and format
SophieTech88 Nov 26, 2024
c3d8271
Remove distributions, sorts, and transforms to separate PRs
SophieTech88 Nov 26, 2024
6042765
Update the data type in literals.py
SophieTech88 Nov 27, 2024
45ae0c3
imporve type in the literal.py
xunliu Dec 2, 2024
88acd36
fix CI
xunliu Dec 2, 2024
2c0e316
delete Decimal
xunliu Dec 3, 2024
8a28e7d
fix CI
xunliu Dec 3, 2024
3750bbf
Update the typing for list and set
SophieTech88 Dec 4, 2024
82333a7
Fix CI: import List in literal.py
SophieTech88 Dec 4, 2024
89b56bf
remove pass
xunliu Dec 4, 2024
d3dc38a
remove List
xunliu Dec 4, 2024
c68c530
Revert "remove List"
xunliu Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions clients/client-python/gravitino/api/expressions/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
51 changes: 51 additions & 0 deletions clients/client-python/gravitino/api/expressions/expression.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from __future__ import annotations
from abc import ABC, abstractmethod
from typing import List, Set, TYPE_CHECKING
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to this typing thing.
But according to Python docs, these alias are all deprecated in Python 3.9.
cf. https://docs.python.org/3/library/typing.html#deprecated-aliases

It might not be a good idea to use them, IIUC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. List, Set are deprecated from typing in Python 3.9. We can use list and set directly.


if TYPE_CHECKING:
from gravitino.api.expressions.named_reference import NamedReference


class Expression(ABC):
"""Base class of the public logical expression API."""

EMPTY_EXPRESSION: List[Expression] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might cause unexpected behaviour if it will be modified at any moment (like append, delete, e.g.), because this list would share between classes, any modification would affect where it exists once and for all at the same time. Unless we're sure that the scenario above would not happen, I think it's better to use immutable object here or some other methods.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @noidname01
The EMPTY_EXPRESSION value does not fill in any data.
So, I think we didn't need to modify this code, what's do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, then overall LGTM!

"""
`EMPTY_EXPRESSION` is only used as an input when the default `children` method builds the result.
"""

EMPTY_NAMED_REFERENCE: List[NamedReference] = []
"""
`EMPTY_NAMED_REFERENCE` is only used as an input when the default `references` method builds
the result array to avoid repeatedly allocating an empty array.
"""

@abstractmethod
def children(self) -> List[Expression]:
"""Returns a list of the children of this node. Children should not change."""
pass

def references(self) -> List[NamedReference]:
"""Returns a list of fields or columns that are referenced by this expression."""

ref_set: Set[NamedReference] = set()
for child in self.children():
ref_set.update(child.references())
return list(ref_set)
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from __future__ import annotations
from abc import abstractmethod
from typing import List

from gravitino.api.expressions.expression import Expression


class FunctionExpression(Expression):
"""
The interface of a function expression. A function expression is an expression that takes a
function name and a list of arguments.
"""

@staticmethod
def of(function_name: str, *arguments: Expression) -> FuncExpressionImpl:
"""
Creates a new FunctionExpression with the given function name.
If no arguments are provided, it uses an empty expression.

:param function_name: The name of the function.
:param arguments: The arguments to the function (optional).
:return: The created FunctionExpression.
"""
arguments = list(arguments) if arguments else Expression.EMPTY_EXPRESSION
return FuncExpressionImpl(function_name, arguments)

@abstractmethod
def function_name(self) -> str:
"""Returns the function name."""
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted previously, please reconsider all these pass cases ...
do we really want to silently succeed here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, You are right, I will fix this problem.


@abstractmethod
def arguments(self) -> List[Expression]:
"""Returns the arguments passed to the function."""
pass

def children(self) -> List[Expression]:
"""Returns the arguments as children."""
return self.arguments()


class FuncExpressionImpl(FunctionExpression):
"""
A concrete implementation of the FunctionExpression interface.
"""

_function_name: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name as the property name rather than _function_name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly, given an instance impl of class FuncExpressionImpl,
the following calls are still pretty clear without the complex names:

   def __init__(self, name: str, arguments: List[Expression]):
      ....

   def name(self) -> str:
      ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a member that is well scoped, I don't think we need a sophisticated name.
I'd suggest we use _name as the property name rather than _function_name.

and

similarly, given an instance impl of class FuncExpressionImpl,
the following calls are still pretty clear without the complex names:

hi @tengqm
Will want to keep Python code consistent with Java code, https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/rel/expressions/FunctionExpression.java#L66

_arguments: List[Expression]

def __init__(self, function_name: str, arguments: List[Expression]):
super().__init__()
self._function_name = function_name
self._arguments = arguments

def function_name(self) -> str:
return self._function_name

def arguments(self) -> List[Expression]:
return self._arguments

def __str__(self) -> str:
if not self._arguments:
return f"{self._function_name}()"
arguments_str = ", ".join(map(str, self._arguments))
return f"{self._function_name}({arguments_str})"

def __eq__(self, other: FuncExpressionImpl) -> bool:
if self is other:
return True
if other is None or self.__class__ is not other.__class__:
return False
return (
self._function_name == other.function_name()
and self._arguments == other.arguments()
)

def __hash__(self) -> int:
return hash((self._function_name, tuple(self._arguments)))
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from abc import abstractmethod
from typing import List, TypeVar, Generic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to kill 'List' here.


from gravitino.api.expressions.expression import Expression
from gravitino.api.types.type import Type

T = TypeVar("T")


class Literal(Generic[T], Expression):
"""
Represents a constant literal value in the public expression API.
"""

@abstractmethod
def value(self) -> T:
"""The literal value."""
raise NotImplementedError("Subclasses must implement the `value` method.")

@abstractmethod
def data_type(self) -> Type:
"""The data type of the literal."""
raise NotImplementedError("Subclasses must implement the `data_type` method.")

def children(self) -> List[Expression]:
return Expression.EMPTY_EXPRESSION
142 changes: 142 additions & 0 deletions clients/client-python/gravitino/api/expressions/literals/literals.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

from typing import TypeVar
from datetime import date, time, datetime

from gravitino.api.expressions.literals.literal import Literal
from gravitino.api.types.decimal import Decimal
from gravitino.api.types.type import Type
from gravitino.api.types.types import Types

T = TypeVar("T")


class LiteralImpl(Literal[T]):
"""Creates a literal with the given type value."""

_value: T
_data_type: Type

def __init__(
self,
value: T,
data_type: Type,
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __init__(
self,
value: T,
data_type: Type,
):
def __init__(self, value: T, data_type: Type):

self._value = value
self._data_type = data_type

def value(self) -> T:
return self._value

def data_type(self) -> Type:
return self._data_type

def __eq__(self, other: object) -> bool:
if not isinstance(other, LiteralImpl):
return False
return (self._value == other._value) and (self._data_type == other._data_type)

def __hash__(self):
return hash((self._value, self._data_type))

def __str__(self):
return f"LiteralImpl(value={self._value}, data_type={self._data_type})"


class Literals:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have a 'Literal' class in another file,
defining a new class called 'Literals' may confuse other developers/users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further more, it doesn't make a lot senses to me to define a class as a wrapper
for so many class methods.
Python is not a pure object-oriented language, after all.

Copy link
Member

@xunliu xunliu Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @tengqm I know what you mean.
We also refer to Iceberg's Python client code style.
https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/name_mapping.py
At present, there are not enough experienced Python engineers in the community, which is the result of the best efforts of the community contributors.
We keep the python client code consistent with the java code to ensure that the functionality is consistent.
So I think we didn't modify these codes.

"""The helper class to create literals to pass into Apache Gravitino."""

NULL = LiteralImpl(None, Types.NullType.get())

@staticmethod
def of(value: T, data_type: Type) -> Literal[T]:
return LiteralImpl(value, data_type)

@staticmethod
def boolean_literal(value: bool) -> LiteralImpl[bool]:
return LiteralImpl(value, Types.BooleanType.get())

@staticmethod
def byte_literal(value: str) -> LiteralImpl[str]:
return LiteralImpl(value, Types.ByteType.get())

@staticmethod
def unsigned_byte_literal(value: str) -> LiteralImpl[str]:
return LiteralImpl(value, Types.ByteType.unsigned())

@staticmethod
def short_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.ShortType.get())

@staticmethod
def unsigned_short_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.ShortType.unsigned())

@staticmethod
def integer_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.IntegerType.get())

@staticmethod
def unsigned_integer_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.IntegerType.unsigned())

@staticmethod
def long_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.LongType.get())

@staticmethod
def unsigned_long_literal(value: int) -> LiteralImpl[int]:
return LiteralImpl(value, Types.LongType.unsigned())

@staticmethod
def float_literal(value: float) -> LiteralImpl[float]:
return LiteralImpl(value, Types.FloatType.get())

@staticmethod
def double_literal(value: float) -> LiteralImpl[float]:
return LiteralImpl(value, Types.DoubleType.get())

@staticmethod
def decimal_literal(value: Decimal) -> LiteralImpl[Decimal]:
return LiteralImpl(
value, Types.DecimalType.of(value.precision(), value.scale())
)

@staticmethod
def date_literal(value: date) -> Literal[date]:
return LiteralImpl(value, Types.DateType.get())

@staticmethod
def time_literal(value: time) -> Literal[time]:
return Literals.of(value, Types.TimeType.get())

@staticmethod
def timestamp_literal(value: datetime) -> Literal[datetime]:
return Literals.of(value, Types.TimestampType.without_time_zone())

@staticmethod
def timestamp_literal_from_string(value: str) -> Literal[datetime]:
return Literals.timestamp_literal(datetime.fromisoformat(value))

@staticmethod
def string_literal(value: str) -> Literal[str]:
return LiteralImpl(value, Types.StringType.get())

@staticmethod
def varchar_literal(length: int, value: str) -> Literal[str]:
return LiteralImpl(value, Types.VarCharType.of(length))
Loading