Pre-aggregations containing nullable decimal measures fail in Athena Driver #9010

Larissa-Rocha · 2024-12-02T18:51:15Z

Describe the bug
I'm trying to pre-aggregate a measure that contains a decimal value, while other measures can be NULL, and I'm having the following error in pre-aggregation:

Error: Error during create table: CREATE TABLE prod_pre_aggregations.fct_my_table_name_main20241001_asdfghjkl ('list of fields and data types') WITH (input_format = 'csv_no_header', delimiter = '^A', build_range_end = '2024-10-31T23:59:59.999') LOCATION ?, ?: Create table failed: Can't parse '\N
4010' column value for 'fct_user_progress_contents__value_workload_total_completed' column: Internal: invalid float literal
    at WebSocket.<anonymous> (/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:121:30)
    at WebSocket.emit (node:events:519:28)
    at Receiver.receiverOnMessage (/node_modules/ws/lib/websocket.js:1008:20)
    at Receiver.emit (node:events:519:28)
    at Receiver.dataMessage (/node_modules/ws/lib/receiver.js:502:14)
    at Receiver.getData (/node_modules/ws/lib/receiver.js:435:17)
    at Receiver.startLoop (/node_modules/ws/lib/receiver.js:143:22)
    at Receiver._write (/node_modules/ws/lib/receiver.js:78:10)
    at writeOrBuffer (node:internal/streams/writable:570:12)
    at _write (node:internal/streams/writable:499:10)
    at Receiver.Writable.write (node:internal/streams/writable:508:10)
    at TLSSocket.socketOnData (/node_modules/ws/lib/websocket.js:1102:35)
    at TLSSocket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:559:12)
    at readableAddChunkPushByteMode (node:internal/streams/readable:510:3)
    at TLSSocket.Readable.push (node:internal/streams/readable:390:5)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:191:23)

To Reproduce
Steps to reproduce the behavior:

Create a cube yml with the following metrics:

- name: value_workload_total
        sql: 1000 * ROUND(value_workload_minutes/cast(value_workload_minutes_count as double), 3)
        type: sum

      - name: value_workload_total_completed
        sql: 1000 * ROUND(value_workload_minutes/cast(value_workload_minutes_count as double), 3)
        type: sum
        filters:
          - sql: flag_enrollment_completed_pathway = TRUE

Put both in pre aggregation
Run pre aggregation

Expected behavior
I expect the metric to pre aggregate

Version:
[e.g. 1.0.5]

Additional context
I found a similar issue here: #6693 for snowflake driver, I wonder how to solve it for athena

The text was updated successfully, but these errors were encountered:

igorlukanin · 2024-12-04T18:12:34Z

Hi @Larissa-Rocha 👋

I've tried to reproduce this and while I wasn't able to get exactly the same error, I acknowledge that this case could have been handled better.

The data model that I used:

cubes:
  - name: test
    sql: >
      SELECT
        NULL AS value, 
        NULL AS value_workload_minutes,
        NULL AS value_workload_minutes_count,
        TRUE AS flag_enrollment_completed_pathway

    dimensions:
      - name: value
        sql: value
        type: string

    measures:
      - name: value_workload_total
        sql: 1000 * ROUND(CAST(value_workload_minutes AS DOUBLE)/cast(value_workload_minutes_count as double), 3)
        type: sum

      - name: value_workload_total_completed
        sql: 1000 * ROUND(CAST(value_workload_minutes AS DOUBLE)/cast(value_workload_minutes_count as double), 3)
        type: sum
        filters:
          - sql: flag_enrollment_completed_pathway = TRUE

    pre_aggregations:
      - name: main
        dimensions:
          - value
        measures:
          - value_workload_total
          - value_workload_total_completed

Pre-aggregation does not build—here's the error:

Error: Error during create table: CREATE TABLE prod_pre_aggregations.test_main_srp5fcpr_isgxjz4g_1jl16fr (`test__value` unknown) LOCATION ?: Custom type 'unknown' is not supported
    at WebSocket.<anonymous> (/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:121:30)
    at WebSocket.emit (node:events:519:28)
    at Receiver.receiverOnMessage (/node_modules/ws/lib/websocket.js:1008:20)
    at Receiver.emit (node:events:519:28)
    at Receiver.dataMessage (/node_modules/ws/lib/receiver.js:502:14)
    at Receiver.getData (/node_modules/ws/lib/receiver.js:435:17)
    at Receiver.startLoop (/node_modules/ws/lib/receiver.js:143:22)
    at Receiver._write (/node_modules/ws/lib/receiver.js:78:10)
    at writeOrBuffer (node:internal/streams/writable:570:12)
    at _write (node:internal/streams/writable:499:10)
    at Receiver.Writable.write (node:internal/streams/writable:508:10)
    at TLSSocket.socketOnData (/node_modules/ws/lib/websocket.js:1102:35)
    at TLSSocket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:559:12)
    at readableAddChunkPushByteMode (node:internal/streams/readable:510:3)
    at TLSSocket.Readable.push (node:internal/streams/readable:390:5)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:191:23)

My Cube version was v1.1.7.

That being said, I guess that a possible workaround would be to safeguard against NULL values as much as possible. For example, all potentially NULL values can be surrounded with COALESCE(..., 0) or something similar so that they evaluate to 0 instead of NULL. Obviously, what to substitute NULLs with depends on your domain.

Larissa-Rocha · 2024-12-10T16:28:54Z

Hi @igorlukanin, after further investigation, I found that pre aggregation in columns containing special characters like ", á, â also fail with Can't parse column value error. Do you have any tips on how to escape those values or if this issue would be addressed any time soon? I'm using Cube Cloud 1.0.5

igorlukanin · 2024-12-16T12:29:21Z

Hi @Larissa-Rocha 👋

I've quickly checked it with the following data model:

cubes:
  - name: test_2
    sql: >
      SELECT 'á' AS value UNION ALL
      SELECT 'â' AS value UNION ALL
      SELECT '"' AS value 

    dimensions:
      - name: value
        sql: value
        type: string

    pre_aggregations:
      - name: main
        dimensions:
          - value

And it worked as expected:

So I guess there's something else that should be causing that error. Could you please help narrow this down?

Larissa-Rocha · 2024-12-16T13:52:50Z

Hi @igorlukanin!
We’ve identified an issue when using the schema below in Cube.js. The pre-aggregation configured to be partitioned by month encounters errors when Cube attempts to create a CSV file to store the pre-aggregation data.
The problem specifically occurs when one of the columns contains more than two double quotes ("). This seems to cause an error during the CSV creation process using the csv-write-stream library, resulting in a malformed file. The generated CSV cannot be read correctly, which breaks queries that depend on the pre-aggregation.
Here’s the schema used for testing:

cubes:
  - name: fct_test
    public: false

    sql: |
      SELECT 
        title,
        value_sum,
        value_count,
        date_reference
      FROM (
        SELECT 
          '"Title" x "Subtitle"' AS title,
          97.8 AS value_sum,
          10 AS value_count,
          CAST('2024-10-10' AS TIMESTAMP(3)) AS date_reference
        UNION ALL
        SELECT 
          '"Title" \ "Subtitle"' AS title,
          27.3 AS value_sum,
          20 AS value_count,
          CAST('2024-10-10' AS TIMESTAMP(3)) AS date_reference
        UNION ALL
        SELECT 
          '"Title 3"' AS title,
          30.9 AS value_sum,
          30 AS value_count,
          CAST('2024-11-10' AS TIMESTAMP(3)) AS date_reference
        UNION ALL
        SELECT 
          '"Title\\4"' AS title,
          40.7 AS value_sum,
          40 AS value_count,
          CAST('2024-11-10' AS TIMESTAMP(3)) AS date_reference
        UNION ALL
        SELECT 
          '"Title"5"' AS title,
          58.1 AS value_sum,
          50 AS value_count,
          CAST('2024-12-10' AS TIMESTAMP(3)) AS date_reference
        UNION ALL
        SELECT 
          '"Title" 6' AS title,
          69.8 AS value_sum,
          60 AS value_count,
          CAST('2024-12-10' AS TIMESTAMP(3)) AS date_reference
      )
    
    dimensions:
      - name: title
        sql: title
        type: string
        
      - name: dateReference
        sql: date_reference
        type: time
        
    measures:
      - name: valueSum
        sql: value_sum
        type: sum
        
      - name: valueCount
        sql: value_count
        type: sum
        
      - name: valueAvg
        sql: ({valueSum} / {valueCount})
        type: number
        
    preAggregations:
      - name: main
        type: rollup
        
        dimensions:
          - title
        
        measures:
          - valueSum
          - valueCount          
          
        time_dimension: dateReference
        granularity: day
        partition_granularity: month

In the example above, titles like '"Title" x "Subtitle"' and '"Title" \ "Subtitle"' contain multiple double quotes, which appears to be causing the issue. When titles include only one or two double quotes, the CSV appears to be generated correctly.
Could you test this schema and verify if the problem occurs in your environment? It would be great to confirm whether the issue is directly related to csv-write-stream or if there’s something we can adjust in Cube.js to prevent these errors.
Thank you in advance for your attention and support!

Larissa-Rocha · 2024-12-16T13:54:37Z

igorlukanin added question The issue is a question. Please use Stack Overflow for questions. driver:athena Issues related to the AWS Athena driver pre-aggregations Issues related to pre-aggregations labels Dec 4, 2024

igorlukanin self-assigned this Dec 4, 2024

igorlukanin added bug Something isn't working and removed question The issue is a question. Please use Stack Overflow for questions. labels Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-aggregations containing nullable decimal measures fail in Athena Driver #9010

Pre-aggregations containing nullable decimal measures fail in Athena Driver #9010

Larissa-Rocha commented Dec 2, 2024 •

edited by igorlukanin

Loading

igorlukanin commented Dec 4, 2024 •

edited

Loading

Larissa-Rocha commented Dec 10, 2024

igorlukanin commented Dec 16, 2024

Larissa-Rocha commented Dec 16, 2024

Larissa-Rocha commented Dec 16, 2024

Pre-aggregations containing nullable decimal measures fail in Athena Driver #9010

Pre-aggregations containing nullable decimal measures fail in Athena Driver #9010

Comments

Larissa-Rocha commented Dec 2, 2024 • edited by igorlukanin Loading

igorlukanin commented Dec 4, 2024 • edited Loading

Larissa-Rocha commented Dec 10, 2024

igorlukanin commented Dec 16, 2024

Larissa-Rocha commented Dec 16, 2024

Larissa-Rocha commented Dec 16, 2024

Larissa-Rocha commented Dec 2, 2024 •

edited by igorlukanin

Loading

igorlukanin commented Dec 4, 2024 •

edited

Loading