Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(ibis): format float with scientific notation #1023

Merged
merged 7 commits into from
Jan 6, 2025

Conversation

goldmedal
Copy link
Contributor

@goldmedal goldmedal commented Jan 3, 2025

Description

  • format floating value if it has scientific notation.
  • See the tests for detail results.

Summary by CodeRabbit

  • New Features

    • Added a new test to validate floating-point number formatting in PostgreSQL queries.
  • Bug Fixes

    • Improved float value representation in JSON conversion, ensuring consistent 9-decimal place formatting.
    • Updated error message assertion for PostgreSQL connection tests.
  • Tests

    • Enhanced test coverage for floating-point number handling in database queries.
    • Modified expected response data type for totalprice in multiple test cases, changing from float to string representation.

Copy link

coderabbitai bot commented Jan 3, 2025

Walkthrough

The pull request introduces modifications to the _to_json_obj function in the utility module, enhancing the representation of float values during JSON conversion. Additionally, it adds a new asynchronous test case for PostgreSQL connector handling of floating-point numbers and modifies existing tests across various connectors to reflect changes in expected data types for specific fields.

Changes

File Change Summary
ibis-server/app/util.py Modified _to_json_obj to format float values to 9 decimal places using a mapping function.
ibis-server/tests/routers/v2/connector/test_postgres.py Added test_format_floating async test function to validate floating-point number handling. Updated error message assertion in an existing test.
ibis-server/tests/routers/v2/connector/test_bigquery.py Changed expected totalprice field from float to string in test_query.
ibis-server/tests/routers/v2/connector/test_canner.py Changed expected totalprice field from float to string in test_query.
ibis-server/tests/routers/v2/connector/test_trino.py Changed expected totalprice field from float to string in test_query.
ibis-server/tests/routers/v3/connector/postgres/test_query.py Changed expected value of the fourth element in response from float to string in test_query.

Sequence Diagram

sequenceDiagram
    participant Util as Utility Module
    participant DataFrame as Pandas DataFrame
    participant JSON as JSON Converter

    Util->>DataFrame: Apply float formatting
    DataFrame-->>Util: Mapped DataFrame
    Util->>JSON: Convert to dictionary
    JSON-->>Util: Formatted JSON object
Loading

Poem

🐰 Floating numbers dance with grace,
Nine decimals find their perfect place.
From DataFrame to JSON's embrace,
Precision leaps with rabbit's pace,
A test of numbers, small and bright! 🔢✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added ibis python Pull requests that update Python code labels Jan 3, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbf56c8 and 22162ce.

📒 Files selected for processing (2)
  • ibis-server/app/util.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
🔇 Additional comments (1)
ibis-server/app/util.py (1)

32-34: Confirm the float format consistency with test requirements.

Here, using f"{x:.9f}" enforces a non-scientific decimal notation with exactly nine decimal digits. However, the test_format_floating in test_postgres.py checks for scientific notation (e.g., "1.23E-7"). Verify whether the desired format is strictly decimal or scientific notation. If scientific notation is desired for large or small floats, replace "{x:.9f}" with a more suitable specifier like "{x:.9g}" or a custom formatting function that selectively uses scientific notation.

ibis-server/tests/routers/v2/connector/test_postgres.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
ibis-server/tests/routers/v2/connector/test_postgres.py (3)

218-253: Improve SQL query readability with consistent formatting.

The SQL query could be more readable with consistent indentation and grouping. Consider adding comments to separate the test case groups.

             "sql": """
                 SELECT
+                    -- Scientific notation cases
                     0.0123e-5 AS case_scientific_original,
                     1.23e+4 AS case_scientific_positive,
                     -4.56e-3 AS case_scientific_negative,
                     7.89e0 AS case_scientific_zero_exponent,
                     0e0 AS case_scientific_zero,
 
+                    -- Decimal cases
                     123.456 AS case_decimal_positive,
                     -123.456 AS case_decimal_negative,
                     0.0000123 AS case_decimal_small,
                     123.0000 AS case_decimal_trailing_zeros,
                     0.0 AS case_decimal_zero,
 
+                    -- Integer cases
                     0 AS case_integer_zero,
                     0e-9 AS case_integer_zero_scientific,
                     -1 AS case_integer_negative,
                     9999999999 AS case_integer_large,
 
+                    -- Float boundary cases
                     1.7976931348623157E+308 AS case_float_max,
                     2.2250738585072014E-308 AS case_float_min,
                     -1.7976931348623157E+308 AS case_float_min_negative,
 
+                    -- Mixed arithmetic cases
                     1.23e4 + 4.56 AS case_mixed_addition,
                     -1.23e-4 - 123.45 AS case_mixed_subtraction,
                     0.0123e-5 * 1000 AS case_mixed_multiplication,
                     123.45 / 1.23e2 AS case_mixed_division,
 
+                    -- Special cases
                     CAST('NaN' AS FLOAT) AS case_special_nan,
                     CAST('Infinity' AS FLOAT) AS case_special_infinity,
                     CAST('-Infinity' AS FLOAT) AS case_special_negative_infinity,
                     NULL AS case_special_null,
 
+                    -- Type casting cases
                     CAST(123.456 AS FLOAT) AS case_cast_float,
                     CAST(1.23e4 AS DECIMAL(10,5)) AS case_cast_decimal
             """,
🧰 Tools
🪛 GitHub Actions: ibis CI

[warning] File needs to be reformatted using Ruff formatter. Code style does not match the required format.


256-289: Enhance test assertions for better error reporting.

While the test covers various floating-point scenarios, the assertions could be more specific and easier to debug:

  1. Add validation for response column types (dtypes)
  2. Validate column names
  3. Split assertions for each test case
     assert response.status_code == 200
     result = response.json()
+    
+    # Validate response structure
+    assert "columns" in result, "Response should contain 'columns'"
+    assert "dtypes" in result, "Response should contain 'dtypes'"
+    assert "data" in result, "Response should contain 'data'"
+    
+    # Validate column names match test cases
+    expected_columns = [
+        "case_scientific_original",
+        "case_scientific_positive",
+        # ... add all expected column names
+    ]
+    assert result["columns"] == expected_columns
+    
+    # Validate data types
+    for column in expected_columns:
+        assert column in result["dtypes"], f"Missing dtype for column {column}"
+        assert result["dtypes"][column] in ["object", "float64"], f"Unexpected dtype for column {column}"
 
-    assert result["data"] == [
-        [
-            "1.23E-7",
-            "1.23E+4",
+    # Validate individual test cases for easier debugging
+    data = result["data"][0]
+    assert data[0] == "1.23E-7", "Scientific notation (original) failed"
+    assert data[1] == "1.23E+4", "Scientific notation (positive) failed"
🧰 Tools
🪛 GitHub Actions: ibis CI

[warning] File needs to be reformatted using Ruff formatter. Code style does not match the required format.


211-211: Add docstring to describe test purpose and cases.

Add a docstring to document the test's purpose and the different categories of test cases it covers.

 async def test_format_floating(client, manifest_str, postgres):
+    """
+    Test PostgreSQL connector's handling of various floating-point number formats.
+    
+    Test cases include:
+    - Scientific notation (positive, negative, zero exponent)
+    - Decimal numbers (positive, negative, small values, trailing zeros)
+    - Integer representations
+    - Float boundaries (max, min, negative)
+    - Mixed arithmetic operations
+    - Special values (NaN, Infinity)
+    - Type casting
+    """
🧰 Tools
🪛 GitHub Actions: ibis CI

[warning] File needs to be reformatted using Ruff formatter. Code style does not match the required format.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22162ce and b33d203.

📒 Files selected for processing (2)
  • ibis-server/app/util.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • ibis-server/app/util.py
🧰 Additional context used
🪛 GitHub Actions: ibis CI
ibis-server/tests/routers/v2/connector/test_postgres.py

[warning] File needs to be reformatted using Ruff formatter. Code style does not match the required format.

ibis-server/tests/routers/v2/connector/test_postgres.py Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)

219-253: Consider adding edge cases for comprehensive testing.

The test cases cover a wide range of scenarios. Consider adding these edge cases to strengthen the test suite:

  • Numbers close to underflow (e.g., 1.18e-308)
  • Numbers with many significant digits (e.g., 1.234567890123456789)
  • Numbers that might trigger rounding behavior (e.g., 9.9999999999999999)
     SELECT
         0.0123e-5 AS case_scientific_original,
         1.23e+4 AS case_scientific_positive,
         -4.56e-3 AS case_scientific_negative,
         7.89e0 AS case_scientific_zero_exponent,
         0e0 AS case_scientific_zero,
+        1.18e-308 AS case_edge_near_underflow,
+        1.234567890123456789 AS case_edge_many_digits,
+        9.9999999999999999 AS case_edge_rounding,

218-253: Improve SQL query readability with comments.

Consider adding SQL comments to explain the expected behavior and any specific formatting rules for each test case. This will make it easier for other developers to understand the test's intentions.

     "sql": """
     SELECT
+        -- Test scientific notation formatting
+        -- Expected: Maintains E notation for very small numbers
         0.0123e-5 AS case_scientific_original,
+        -- Expected: Maintains E notation for large numbers
         1.23e+4 AS case_scientific_positive,
+        -- Expected: Converts to decimal for small negative numbers
         -4.56e-3 AS case_scientific_negative,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b33d203 and 2f3f044.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
🔇 Additional comments (2)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)

211-211: Well-structured test organization!

The test function is well-organized with clear sections for different floating-point scenarios, making it easy to understand and maintain.


211-290: Address formatting issues.

The code needs to be reformatted using the Ruff formatter as mentioned in previous reviews.

ibis-server/tests/routers/v2/connector/test_postgres.py Outdated Show resolved Hide resolved
@goldmedal goldmedal removed the bigquery label Jan 3, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

Line range hint 1-1: Code formatting required

The pipeline indicates a formatting issue. Please reformat the code to adhere to your project's style guidelines (e.g., using Black, isort, or the configured linting tool).

#!/bin/bash
# Example: using Black to reformat this file.
black ibis-server/tests/routers/v3/connector/postgres/test_query.py
ibis-server/tests/routers/v2/connector/test_postgres.py (2)

211-286: Enhance test organization with grouped assertions.

The test cases are well-structured in the SQL query, but the assertions could be better organized to match this structure. Consider grouping related assertions and adding comments to match the SQL sections.

     assert response.status_code == 200
     result = response.json()
 
+    # Scientific notation cases
     assert result["data"][0] == "1.23E-7"
     assert result["data"][1] == "1.23E+4"
     assert result["data"][2] == "-0.00456"
     assert result["data"][3] == "7.89"
     assert result["data"][4] == "0"
+
+    # Decimal cases
     assert result["data"][5] == "123.456"
     assert result["data"][6] == "-123.456"
     assert result["data"][7] == "0.0000123"
     assert result["data"][8] == "123"
     assert result["data"][9] == "0"
+
+    # Integer cases
     assert result["data"][10] == 0
     assert result["data"][11] == "0"
     assert result["data"][12] == -1
     assert result["data"][13] == 9999999999
+
+    # Float boundary cases
     assert result["data"][14] == "1.7976931348623157E+308"
     assert result["data"][15] == "2.2250738585072014E-308"
     assert result["data"][16] == "-1.7976931348623157E+308"
+
+    # Mixed arithmetic cases
     assert result["data"][17] == "12304.56"
     assert result["data"][18] == "-123.450123"
     assert result["data"][19] == "0.000123"
     assert result["data"][20] == "1.0036585365853659"
+
+    # Special cases
     assert result["data"][21] == "nan"
     assert result["data"][22] == "inf"
     assert result["data"][23] == "-inf"
     assert result["data"][24] is None
+
+    # Type casting cases
     assert result["data"][25] == "123.456001"
     assert result["data"][26] == "12300.00000"

219-253: Consider adding more edge cases and type verification.

While the test cases are comprehensive, consider adding:

  1. Edge cases for very small numbers (e.g., 1e-323) near the denormal range
  2. Cases with repeated decimals (e.g., 1/3)
  3. Type verification using isinstance() for numeric values
     SELECT
         0.0123e-5 AS case_scientific_original,
         1.23e+4 AS case_scientific_positive,
         -4.56e-3 AS case_scientific_negative,
         7.89e0 AS case_scientific_zero_exponent,
         0e0 AS case_scientific_zero,
+        1e-323 AS case_scientific_denormal,
 
         123.456 AS case_decimal_positive,
         -123.456 AS case_decimal_negative,
         0.0000123 AS case_decimal_small,
         123.0000 AS case_decimal_trailing_zeros,
         0.0 AS case_decimal_zero,
+        CAST(1.0/3.0 AS FLOAT) AS case_decimal_repeating,
 
         0 AS case_integer_zero,
         0e-9 AS case_integer_zero_scientific,
         -1 AS case_integer_negative,
         9999999999 AS case_integer_large,

And add type verification:

     assert result["data"][13] == 9999999999
+    assert isinstance(result["data"][13], int)
     assert result["data"][14] == "1.7976931348623157E+308"
+    assert isinstance(result["data"][14], str)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f3f044 and 53e0230.

📒 Files selected for processing (5)
  • ibis-server/tests/routers/v2/connector/test_bigquery.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_canner.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
  • ibis-server/tests/routers/v2/connector/test_trino.py (1 hunks)
  • ibis-server/tests/routers/v3/connector/postgres/test_query.py (1 hunks)
🧰 Additional context used
🪛 GitHub Actions: ibis CI
ibis-server/tests/routers/v2/connector/test_canner.py

[warning] File needs reformatting. Code formatting does not meet the required standards.

🔇 Additional comments (6)
ibis-server/tests/routers/v3/connector/postgres/test_query.py (1)

118-118: Changed expected total price type from float to string

This assertion now treats o_totalprice as a string ("172799.49") instead of a float. Ensure that the underlying _to_json_obj conversion and any consuming code expects and handles the string type consistently across all connectors.

ibis-server/tests/routers/v2/connector/test_canner.py (1)

97-97: Changed expected total price type from float to string

The test now expects "172799.49" instead of the float value 172799.49. Verify that all references and validations for totalprice are updated accordingly throughout the code.

🧰 Tools
🪛 GitHub Actions: ibis CI

[warning] File needs reformatting. Code formatting does not meet the required standards.

ibis-server/tests/routers/v2/connector/test_bigquery.py (1)

93-93: Changed expected total price type from float to string

This test expects "172799.49" instead of a float value. Confirm that your BigQuery connector and downstream logic handle string representations of float values correctly.

ibis-server/tests/routers/v2/connector/test_trino.py (1)

114-114: Changed expected total price type from float to string

Similar to other connectors, Trino’s test now anticipates "172799.49". Ensure that the type change is consistent with the new float-to-string logic and that calling code or consumers are aware of this change.

ibis-server/tests/routers/v2/connector/test_postgres.py (2)

Line range hint 287-309: LGTM! Error message assertion is more precise.

The modified error message assertion improves test reliability by matching the exact PostgreSQL error message format.


211-286: Effectively addresses float formatting concerns.

The comprehensive test cases for floating-point formatting effectively address the concerns raised in previous reviews about float formatting consistency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)

259-285: Consider grouping assertions for better maintainability.

The individual assertions, while thorough, could be organized better. Consider grouping related assertions and using helper methods to improve readability and maintenance.

-    assert result["data"][0][0] == "1.23E-7"
-    assert result["data"][0][1] == "1.23E+4"
-    # ... more assertions ...
+    data = result["data"][0]
+    
+    # Scientific notation cases
+    scientific_cases = {
+        0: "1.23E-7",    # case_scientific_original
+        1: "1.23E+4",    # case_scientific_positive
+        2: "-0.00456",   # case_scientific_negative
+        3: "7.89",       # case_scientific_zero_exponent
+        4: "0"           # case_scientific_zero
+    }
+    for idx, expected in scientific_cases.items():
+        assert data[idx] == expected, f"Scientific notation case {idx} failed"
+    
+    # Similar grouping for other categories...

218-253: Add descriptive comments for test case categories.

The SQL query is well-structured but could benefit from category headers to improve readability.

     "sql": """
     SELECT
+        -- Scientific notation cases
         0.0123e-5 AS case_scientific_original,
         1.23e+4 AS case_scientific_positive,
         -4.56e-3 AS case_scientific_negative,
         7.89e0 AS case_scientific_zero_exponent,
         0e0 AS case_scientific_zero,
 
+        -- Decimal number cases
         123.456 AS case_decimal_positive,
         -123.456 AS case_decimal_negative,
         # ... rest of the cases with similar category headers
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d85ef4e and 422fac8.

📒 Files selected for processing (1)
  • ibis-server/tests/routers/v2/connector/test_postgres.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci
🔇 Additional comments (2)
ibis-server/tests/routers/v2/connector/test_postgres.py (2)

211-286: Comprehensive test coverage for floating-point formatting!

The test cases thoroughly cover various scenarios including scientific notation, decimal numbers, integers, float boundaries, mixed arithmetic, and special values. This ensures robust handling of floating-point representations.


246-249: Verify PostgreSQL version compatibility for special float values.

The test includes special float values (NaN, Infinity). Ensure these are consistently handled across different PostgreSQL versions.

✅ Verification successful

Special float values are safely handled across PostgreSQL versions

These CAST operations for special float values (NaN, Infinity) use standard SQL syntax and have been reliably supported in PostgreSQL since before 2013, with continuous maintenance and improvements. The test implementation is version-agnostic and uses well-established features.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check PostgreSQL version support for special float values

# Test: Check PostgreSQL version and float handling documentation
gh api graphql -f query='
{
  repository(owner: "postgres", name: "postgres") {
    object(expression: "master") {
      ... on Commit {
        history(path: "src/backend/utils/adt/float.c") {
          nodes {
            messageHeadline
            committedDate
          }
        }
      }
    }
  }
}
'

Length of output: 11491


Script:

#!/bin/bash
# Check test configuration and related test cases
rg -A 5 "postgresql" ibis-server/tests/routers/v2/connector/test_postgres.py
rg -A 5 "FLOAT" ibis-server/tests/routers/v2/connector/test_postgres.py

Length of output: 523

@goldmedal goldmedal requested a review from wwwy3y3 January 6, 2025 02:14
@goldmedal goldmedal merged commit 8c93ec6 into Canner:main Jan 6, 2025
5 checks passed
@goldmedal goldmedal deleted the chore/format-floating branch January 6, 2025 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bigquery ibis python Pull requests that update Python code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants