Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot merge when predicate contains Decimal comparision #3033

Closed
ponychicken opened this issue Nov 26, 2024 · 2 comments · Fixed by #3090
Closed

Cannot merge when predicate contains Decimal comparision #3033

ponychicken opened this issue Nov 26, 2024 · 2 comments · Fixed by #3090
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@ponychicken
Copy link

ponychicken commented Nov 26, 2024

Environment

Delta-rs version:
0.22


Bug

Merging into a DeltaTable and comparing a Decimal in the predicate fails with

deltalake/table.py", line 1800, in execute
    metrics = self._table.merge_execute(self._builder)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_internal.DeltaError: Generic DeltaTable error: Unable to convert expression to string

Reprocase:

from deltalake import DeltaTable, write_deltalake
import pandas as pd
import pyarrow as pa
from datetime import datetime
from decimal import Decimal
import random

DATABASE_NAME = ''.join(random.choices('0123456789', k=10))

data = {
    "timestamp": [datetime(2024, 3, 20, 12, 30, 0)],
    "altitude": [Decimal("150.5")],
}

# Create DataFrame
df = pd.DataFrame(data)


# Define schema using pyarrow
schema = pa.schema(
    [
        ("timestamp", pa.timestamp("us")),
        ("altitude", pa.decimal128(6, 1)),
    ]
)

# Create new Delta table
dt = DeltaTable.create(DATABASE_NAME, schema=schema)

# Initial write
write_deltalake(dt, df, mode="append")


# Read Delta table and display schema and content
dt_read = DeltaTable(DATABASE_NAME)
print("Schema:")
print(dt_read.schema())

# Convert to pandas DataFrame and display content
df_read = dt_read.to_pandas()
print("\nContent:")
print(df_read)


# SUCCEDS
dt.merge(
    source=df,
    predicate="target.timestamp = source.timestamp",
    source_alias="source",
    target_alias="target",
).when_matched_update_all().when_not_matched_insert_all().execute()

# FAILS Merge operation
dt.merge(
    source=df,
    predicate="target.timestamp = source.timestamp AND target.altitude = source.altitude",
    source_alias="source",
    target_alias="target",
).when_matched_update_all().when_not_matched_insert_all().execute()

@ponychicken
Copy link
Author

Casting it before comparission works....

dt.merge(
    source=df,
    predicate="target.timestamp = source.timestamp AND CAST(target.altitude AS STRING) = CAST(source.altitude AS STRING)",
    source_alias="source",
    target_alias="target",
).when_matched_update_all().when_not_matched_insert_all().execute()

@ion-elgreco
Copy link
Collaborator

@ponychicken i think it's happening because the fmt of decimal scalar value is missing.

Could you add a small fix for this, it's probably the same change as this PR: #3019

Unfortunately I don't have time myself to do this

@ion-elgreco ion-elgreco added help wanted Extra attention is needed good first issue Good for newcomers labels Dec 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants