-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Slicing to support negative indices #277
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
I have minor comments.
- Convert `start` to positive. If < 1, set to 1. | ||
- Compute `length = stop - start` (clamp to 0 if negative). | ||
- 3. `start >= 0`, `stop < 0`: | ||
- Convert `stop` & `start` to positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on 3. case start
is already positive.
- Convert `stop` & `start` to positive. | |
- Convert `stop` to positive. |
@@ -59,7 +59,7 @@ These are created with a prefix operator syntax instead of called as a function. | |||
|
|||
These are other PyDough operators that are not necessarily used as functions: | |||
|
|||
- `SLICE`: operator used for string slicing, with the same semantics as Python string slicing. If `s[a:b:c]` is done, that is translated to `SLICE(s,a,b,c)` in PyDough, and any of `a`/`b`/`c` could be absent. Currently, PyDough does not support negative indices or providing step values other than 1. | |||
- `SLICE`: operator used for string slicing, with the same semantics as Python string slicing. If `s[a:b:c]` is done, that is translated to `SLICE(s,a,b,c)` in PyDough, and any of `a`/`b`/`c` could be absent. Negative slicing is supported, however currently, PyDough does not support providing step values other than 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is this PyDough does not support providing step values other than 1.
applies for both positive and negative values.
Current edit makes it that only negative doesn't support step values other than 1.
A suggestion.
- `SLICE`: operator used for string slicing, with the same semantics as Python string slicing. If `s[a:b:c]` is done, that is translated to `SLICE(s,a,b,c)` in PyDough, and any of `a`/`b`/`c` could be absent. Negative slicing is supported, however currently, PyDough does not support providing step values other than 1. | |
- `SLICE`: operator used for string slicing, with the same semantics as Python string slicing. If `s[a:b:c]` is done, that is translated to `SLICE(s,a,b,c)` in PyDough, and any of `a`/`b`/`c` could be absent. Negative slicing is supported. Currently PyDough does not support providing step values other than 1. |
if isinstance(step, sqlglot_expressions.Literal): | ||
if int(step.this) != 1: | ||
try: | ||
step_int = int(step.this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be like others to handle None case?
start_idx = int(start.this) if not isinstance(start, sqlglot_expressions.Null) else None
try: | ||
step_int = int(step.this) | ||
except ValueError: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to also add which argument was the issue.
e.g. SLICE function step argument must be integer literal or absent
) | ||
except ValueError: | ||
raise ValueError( | ||
"SLICE function currently only supports the slicing arguments being integer literals or absent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Specify argument
) | ||
except ValueError: | ||
raise ValueError( | ||
"SLICE function currently only supports the slicing arguments being integer literals or absent" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same. Specify argument
start_idx: int | None = None | ||
stop_idx: int | None = None | ||
try: | ||
start_idx = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have value in ()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic @vineetg3!!! Just a few comments before merging.
name_without_first_char = name[1:], | ||
last_digit = phone[-1:], | ||
name_without_start_and_end_char = name[1:-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets also add:
phone_without_last_5_chars = phone[:-5]
name_second_to_last_char = name[-2:-1]
@@ -504,12 +504,68 @@ def convert_concat( | |||
return Concat(expressions=inputs) | |||
|
|||
|
|||
def positive_index( | |||
column: SQLGlotExpression, neg_index: int, is_0_based: bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: is_zero_indexed
instead if is_0_based
based on the length of the column. | ||
|
||
Args: | ||
`column`: The column expression to reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: this isn't necessarily a column (could be a function call, or a literal). Let's just call this string_expr
and say it refers to the string that the negative index is referencing a point within.
""" | ||
assert len(sql_glot_args) == 4 | ||
_, start, stop, step = sql_glot_args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: can unpack string_expr
as the first arg & use that instead of sql_glot_args[0]
in the rest of the function.
for arg in sql_glot_args[1:]: | ||
if not isinstance(arg, (sqlglot_expressions.Literal, sqlglot_expressions.Null)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: can replace this the loop guard with for arg in (start, stop, step)
try: | ||
step_int = int(step.this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: can push this check into the the earlier for loop by checking if isinstance(arg.this, int)
when it is a literal.
try: | ||
start_idx = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here: no need to check with try/excepts, just psuh the check into the earlier loop & just do start_idx = int(start.this)
if start
isn't null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The integer literal is presented as a string by sqlglot, for example 2 is presented as '2'. So, we would need to check if that could be converted to an int by using try catch. This isn't the cleanest way probably and I'm open to other suggestions.
if start_idx < 0 and stop_idx < 0: | ||
# Early return if start index is greater than stop index | ||
# e.g., "hello"[-2:-4] should return empty string | ||
if start_idx > stop_idx: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be >=
since "hello"[-2:-2]
returns an empty string (the upper bound is exclusive.
tests/bad_pydough_functions.py
Outdated
def bad_slice_4(): | ||
# Unsupported slicing: reversed | ||
return Customers(name[::-1]) | ||
# Unsupported slicing: non-integer start | ||
return Customers(name[datetime.datetime.now() : -1 :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about testing with non-literals?
tests/simple_pydough_functions.py
Outdated
|
||
|
||
def step_slicing(): | ||
return Customers.WHERE(name == "Jane Smith")( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the test generates the answers, why not just test on the entire table (its only 20 rows)?
Resolves #260.
Please read the issue for more details.
The
SUBSTR
conversion logic inconvert_slice
is being expanded to handle negative indices.Here is the outline of the logic to support Python based slicing semantics:
(None, None)
(start, None)
start
: Convert to 1-based indexing and slice fromstart
.start
: ComputeLENGTH(string) + start + 1
; clamp to1
if less than1
.(None, stop)
stop
: Slice from position1
tostop
.stop
: ComputeLENGTH(string) + stop
; clamp to0
if less than0
(empty slice).(start, stop)
start
&stop
>= 0:start
to 1-based.length = stop - start
.start < 0
,stop >= 0
:start
to positive. If < 1, set to 1.length = stop - start
(clamp to 0 if negative).start >= 0
,stop < 0
:stop
&start
to positive.stop
< 1, slice is empty (length = 0
).length = stop - start
.start < 0
,stop < 0
:start
&stop
to positive. Ifstart
< 1, set to 1.stop
< 1, slice is empty (length = 0
).length = stop - start
.