Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented remaining Python builtins #268

Merged
merged 5 commits into from
Feb 21, 2025
Merged

Implemented remaining Python builtins #268

merged 5 commits into from
Feb 21, 2025

Conversation

vineetg3
Copy link
Contributor

Resolves #259

Please check issue for more details.

Few differences from the issues details:

__iter__ : The __iter__ method is not implemented for UnqualifiedNode because doing so would make the node iterable. This behavior could conflict with or disrupt the logic in other components, such as Top_K, which rely on specific handling of non-iterable structures. Implementing __iter__ would introduce unintended behaviors and break the integrity of the system.

The __format__: The __format__ method is not implemented which intentionally does not raise an error because its use interferes with the logic for printing and logging. Additionally, the fallback to __str__ and __repr__ provides sufficiently clear and concise representations, making a custom __format__ unnecessary.

@vineetg3 vineetg3 self-assigned this Feb 15, 2025
Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Vineet! Just left a bunch of clarity/nitpick comments to address, then I'll approve.

Also, can you add a new comment to the issue #259 explaining your deviations described in the top of this PR? These are good details to document in the original issue.

Comment on lines 609 to 611
def bad_floor_1():
# Using `math.floor` (calls __floor__)
return Customer(age=math.floor(order.total_price))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor all of these examples to match the format of other examples in the file (just a single line of code, not a function with a return statement)

@@ -54,6 +54,18 @@ Below is the list of every function/operator currently supported in PyDough as a
- [Window Functions](#window-functions)
* [RANKING](#ranking)
* [PERCENTILE](#percentile)
- [Unsupported Magic Methods](#unsupported-magic-methods)
* [FLOOR](#floor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change all of these to the same format as the magic methods: FLOOR -> __floor__

* [INDEX](#index)
* [LEN](#len)
* [CONTAINS](#contains)
* [SETITEM](#setitem)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include any other unsupported ones (__call__ is only for function calls, __bool__ is banned for x and y)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we be supporting both .CALCULATE() and conventional syntax of just ()? In that case I'll mention that.

@@ -580,3 +596,164 @@ Customers.WHERE(PERCENTILE(by=acctbal.ASC(), n_buckets=1000) == 1000)
# For every region, find the top 5% of customers with the highest account balances.
Regions.nations.customers.WHERE(PERCENTILE(by=acctbal.ASC(), levels=2) > 95)
```
## Unsupported Magic Methods

Below is a list of magic methods that are not supported in PyDough. Calling these methods will result in an Exception.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic methods is not a terminology that our users will likely understand, since we are not targeting advanced Python developers. We should simplify this to just be about banned Python logic.

<!-- TOC --><a name="floor"></a>
### FLOOR

The `math.floor` function calls the `__floor__` magic method, which is not supported in PyDough.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not CURRENTLY supported

<!-- TOC --><a name="contains"></a>
### CONTAINS

Using the `in` operator calls the `__contains__` magic method, which is not supported in PyDough. This operation is not allowed because the implementation has to return a boolean instead of a PyDough object.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using ISIN instead

@@ -124,6 +124,10 @@ def __getitem__(self, key):
args: MutableSequence[UnqualifiedNode] = [self]
for arg in (key.start, key.stop, key.step):
coerced_elem = UnqualifiedNode.coerce_to_unqualified(arg)
if not isinstance(coerced_elem, UnqualifiedLiteral):
raise PyDoughUnqualifiedException(
"PyDough objects cannot be used as indices in Python slices."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: "are not currently supported"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we going to support something like this: Customers[:orders] ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, no. Eventually, 🤷?

Comment on lines 271 to 274
if n is None:
raise PyDoughUnqualifiedException(
"PyDough requires a specific number of decimal places for rounding."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just change this so the default is 0, like the normal Python behavior.

@@ -73,3 +75,71 @@ def bad_slice_3():
def bad_slice_4():
# Unsupported slicing: reversed
return Customers(name[::-1])


def bad_floor_1():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: don't use _1 if it is the only one of its kind.

Comment on lines 394 to 395
def abs_round_magic_method():
return DailyPrices(abs_low=abs(low), round_low=round(low, 2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test round(low)

Copy link
Contributor

@knassre-bodo knassre-bodo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more revision suggestions including some nitpicks, but feel free to merge once you feel they are addressed.

Using an object in a boolean context calls the `__nonzero__` magic method, which is not supported in PyDough. This operation is not allowed because the implementation has to return an integer instead of a PyDough object.

```py
Orders(discount=bool(order.total_price))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: lets change this to something where bool makes sense:

Lineitems(is_taxed=bool(tax))

Comment on lines 676 to 683
# Use & instead of and
Customers.WHERE((acctbal > 0) & (nation.name == "GERMANY"))

# Use | instead of or
Orders.WHERE((discount > 0.05) | (tax > 0.08))

# Use ~ instead of not
Parts.WHERE(~(retail_price > 1000))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have all of these do the bad thing (and, or, and not, then suggest alternative in the comments)

The `__bool__` magic method is not supported in PyDough. PyDough code cannot be treated as booleans. Instead of using `or`, `and`, and `not`, use `|`, `&`, and `~` for logical operations. Check out the [Logical](#logical) section for more information.

```py
# Not allowed - will raise PyDoughUnqualifiedException
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just say this isn't allowed

Comment on lines 690 to 696
The `__call__` magic method is not supported in PyDough, but calling an object will not throw an error. Instead, it internally evaluates to the `CALC` property.

```py
# These are equivalent:
Customers(name)
Customers.__call__(name)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the examples more consistent & stick with one pattern:

  • Show a bad behavior with regular Python code (not the magi method)
  • Include a comment saying it is banned, and why

E.g. for this one it would be:

# Not allowed - calls PyDough code as if it were a function
(1 - discount)(extended_price * 0.5)

The `reversed` function calls the `__reversed__` magic method, which is currently not supported in PyDough.

```py
Orders(reversed(order_key))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets change this to a string so it makes sense, e.g. Regions(backwards_name=reversed(name))

### ROUND

The `ROUND` function rounds its first argument to the precision of its second argument. The rounding rules used depend on the database's round function.
The `ROUND` function rounds its first argument to the precision of its second argument. The rounding rules used depend on the database's round function. The `round()` magic method is also evaluated to the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "The Python builtin round() function can also be used to accomplish the same thing"

### ABS

The `ABS` function returns the absolute value of its input.
The `ABS` function returns the absolute value of its input. The `abs()` magic method is also evaluated to the same.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "The Python builtin abs() function can also be used to accomplish the same thing."

Comment on lines 473 to 479
Note: The default precision for `round` magic method is 0, to be in alignment with the Python implementation. The pydough `ROUND` function requires the precision to be specified.

```py
# This is legal.
Parts(rounded_price = round(retail_price))
# This is illegal as precision is not specified.
Parts(rounded_price = ROUND(retail_price))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to open a Github issue to fix this behavior.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, #276 tackles this.

Parts(rounded_price = round(retail_price, 1))
```

Note: The default precision for `round` magic method is 0, to be in alignment with the Python implementation. The pydough `ROUND` function requires the precision to be specified.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: the PyDough ROUND function..

Also, refer to round as a function instead of a magic method.


### \_\_contains\_\_

Using the `in` operator calls the `__contains__` magic method, which is not supported in PyDough. This operation is not allowed because the implementation has to return a boolean instead of a PyDough object. Instead, usage of [ISIN](#isin) function is recommended.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT on the last part, change to several recommendations phrased as follows:
"If you need to check if a string is inside another substring, use CONTAINS. If you need to check if an expression is a member of a list of literals, use ISIN."

@vineetg3 vineetg3 merged commit b2d548f into main Feb 21, 2025
5 checks passed
@vineetg3 vineetg3 deleted the vineet/builtins branch February 21, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure Python remaining Python builtins with magic methods behave as-expected in PyDough
2 participants