Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigquery): correctly format the scientific notation decimal #1068

Merged
merged 5 commits into from
Feb 25, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions ibis-server/app/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,13 @@ def _to_datetime_and_format(series: pd.Series) -> pd.Series:


def _to_json_obj(df: pd.DataFrame) -> dict:
data = df.map(lambda x: f"{x:.9g}" if isinstance(x, float) else x).to_dict(
orient="split", index=False
)
data = df.map(
lambda x: f"{x:.9g}"
if isinstance(x, float)
else f"{x:.3f}"
if isinstance(x, decimal.Decimal)
else x
).to_dict(orient="split")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

❓ Verification inconclusive

Add error handling for extreme values.

The current implementation might fail when handling very large or small numbers that could overflow the string formatting.

Add try-except block to handle potential overflow:

     data = df.map(
-        lambda x: f"{x:.9g}"
-        if isinstance(x, float)
-        else f"{x:.3f}"
-        if isinstance(x, decimal.Decimal)
-        else x
+        lambda x: (
+            try_format(x, "{:.9g}") if isinstance(x, float)
+            else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal)
+            else x
+        )
     ).to_dict(orient="split")

+def try_format(value, format_str: str) -> str:
+    try:
+        return format_str.format(value)
+    except (OverflowError, ValueError):
+        return str(value)

🏁 Script executed:

#!/bin/bash
# Description: Check if there are any existing test cases for extreme values
rg -A 5 "1e\+?[0-9]{3,}|1e-[0-9]{3,}" tests/

Length of output: 93


Action: Verify handling of extreme numerical values with proper error catching

The current implementation does not handle potential overflow errors when formatting very large or small floats and decimal values. I recommend wrapping the formatting logic in a try-except block (using a helper like try_format) to catch OverflowError and ValueError exceptions. Note that our exploratory test search did not locate any extreme value test cases (no tests folder found), so please verify manually that these scenarios are appropriately covered or add tests where needed.

Locations to address:

  • File: ibis-server/app/util.py (Lines 32-38)
  • Suggested diff:
         data = df.map(
    -        lambda x: f"{x:.9g}"
    -        if isinstance(x, float)
    -        else f"{x:.3f}"
    -        if isinstance(x, decimal.Decimal)
    -        else x
    +        lambda x: (
    +            try_format(x, "{:.9g}") if isinstance(x, float)
    +            else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal)
    +            else x
    +        )
         ).to_dict(orient="split")
    
    +def try_format(value, format_str: str) -> str:
    +    try:
    +        return format_str.format(value)
    +    except (OverflowError, ValueError):
    +        return str(value)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data = df.map(
lambda x: f"{x:.9g}"
if isinstance(x, float)
else f"{x:.3f}"
if isinstance(x, decimal.Decimal)
else x
).to_dict(orient="split")
data = df.map(
lambda x: (
try_format(x, "{:.9g}") if isinstance(x, float)
else try_format(x, "{:.3f}") if isinstance(x, decimal.Decimal)
else x
)
).to_dict(orient="split")
def try_format(value, format_str: str) -> str:
try:
return format_str.format(value)
except (OverflowError, ValueError):
return str(value)


def default(obj):
if pd.isna(obj):
Expand Down
14 changes: 14 additions & 0 deletions ibis-server/tests/routers/v2/connector/test_bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,20 @@ async def test_query_values(client, manifest_str):
assert response.status_code == 204


async def test_scientific_notation(client, manifest_str):
response = await client.post(
url=f"{base_url}/query",
json={
"connectionInfo": connection_info,
"manifestStr": manifest_str,
"sql": "SELECT cast(0 as numeric) as col",
},
)
assert response.status_code == 200
result = response.json()
assert result["data"][0] == ["0.000"]


async def test_query_empty_json(client, manifest_str):
"""Test the empty result with json column."""
response = await client.post(
Expand Down
Loading