Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scatter struct nulls when deserializing Presto wire format #7813

Closed
wants to merge 1 commit into from

Conversation

oerling
Copy link
Contributor

@oerling oerling commented Nov 30, 2023

No description provided.

Copy link

netlify bot commented Nov 30, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 3869ae0
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/6580850092045e0008b5ba24

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 30, 2023
@oerling oerling requested a review from Yuhta November 30, 2023 22:33
@oerling oerling force-pushed the exc-scatter-pr branch 2 times, most recently from 3d06ccf to ec70715 Compare December 4, 2023 13:50
@@ -304,21 +321,45 @@ void readDecimalValues(
}
}

vector_size_t sizeWithIncomingNulls(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can be just incomingNulls ? numIncomingNulls : size, just inline it would be shorter

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@Yuhta
Copy link
Contributor

Yuhta commented Dec 5, 2023

@oerling There is a failure in aggregate fuzzer that rooting from deserialization: https://app.circleci.com/pipelines/github/facebookincubator/velox/39898/workflows/2f5fc9f3-3588-416c-be6a-7e47c50b40db/jobs/271651

I20231204 14:45:18.585304 30452 AggregationFuzzer.cpp:870] ==============================> Started iteration 215 (seed: 2903333334)
I20231204 14:45:18.747350 30452 AggregationFuzzer.cpp:1031] Executing query plan: 
-- Aggregation[SINGLE [g0, g1, g2, g3, g4] a0 := approx_set(ROW["c0"])] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:HYPERLOGLOG
  -- Values[1000 rows in 10 vectors] -> c0:INTEGER, g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN
I20231204 14:45:18.811694 49367 Task.cpp:1062] All drivers (1) finished for task test_cursor 7907 after running for 61 ms.
I20231204 14:45:18.811745 49367 Task.cpp:1746] Terminating task test_cursor 7907 with state Finished after running for 61 ms.
I20231204 14:45:18.816495 30452 AggregationFuzzer.cpp:303] Testing plan #0
I20231204 14:45:18.816550 30452 AggregationFuzzer.cpp:1031] Executing query plan: 
-- Aggregation[SINGLE [g0, g1, g2, g3, g4] a0 := approx_set(ROW["c0"])] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:HYPERLOGLOG
  -- Values[1000 rows in 10 vectors] -> c0:INTEGER, g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN
I20231204 14:45:18.889261 49369 Task.cpp:1062] All drivers (1) finished for task test_cursor 7908 after running for 70 ms.
I20231204 14:45:18.889328 49369 Task.cpp:1746] Terminating task test_cursor 7908 with state Finished after running for 70 ms.
I20231204 14:45:18.894990 30452 AggregationFuzzer.cpp:313] Testing plan #0 with spilling
I20231204 14:45:18.895045 30452 AggregationFuzzer.cpp:1031] Executing query plan: 
-- Aggregation[SINGLE [g0, g1, g2, g3, g4] a0 := approx_set(ROW["c0"])] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:HYPERLOGLOG
  -- Values[1000 rows in 10 vectors] -> c0:INTEGER, g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN
I20231204 14:45:18.897756 30452 Cursor.cpp:201] Task spill directory[/tmp/velox_test_oxBVkV/test_cursor 7909] created
I20231204 14:45:19.042686 49371 Task.cpp:1062] All drivers (1) finished for task test_cursor 7909 after running for 145 ms.
I20231204 14:45:19.042739 49371 Task.cpp:1746] Terminating task test_cursor 7909 with state Finished after running for 145 ms.
I20231204 14:45:19.049182 30452 TempDirectoryPath.cpp:31] TempDirectoryPath:: removing all files from/tmp/velox_test_oxBVkV
I20231204 14:45:19.049362 30452 AggregationFuzzer.cpp:303] Testing plan #1
I20231204 14:45:19.049387 30452 AggregationFuzzer.cpp:1031] Executing query plan: 
-- Aggregation[FINAL [g0, g1, g2, g3, g4] a0 := approx_set("a0")] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:HYPERLOGLOG
  -- Aggregation[PARTIAL [g0, g1, g2, g3, g4] a0 := approx_set(ROW["c0"])] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:VARBINARY
    -- Values[1000 rows in 10 vectors] -> c0:INTEGER, g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN
I20231204 14:45:19.156227 49373 Task.cpp:1062] All drivers (1) finished for task test_cursor 7910 after running for 103 ms.
I20231204 14:45:19.156276 49373 Task.cpp:1746] Terminating task test_cursor 7910 with state Finished after running for 103 ms.
I20231204 14:45:19.161682 30452 AggregationFuzzer.cpp:313] Testing plan #1 with spilling
I20231204 14:45:19.161725 30452 AggregationFuzzer.cpp:1031] Executing query plan: 
-- Aggregation[FINAL [g0, g1, g2, g3, g4] a0 := approx_set("a0")] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:HYPERLOGLOG
  -- Aggregation[PARTIAL [g0, g1, g2, g3, g4] a0 := approx_set(ROW["c0"])] -> g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN, a0:VARBINARY
    -- Values[1000 rows in 10 vectors] -> c0:INTEGER, g0:MAP<VARCHAR,SMALLINT>, g1:ROW<f0:TIMESTAMP,f1:TIMESTAMP,f2:TINYINT,f3:TIMESTAMP,f4:VARBINARY,f5:TINYINT,f6:BOOLEAN>, g2:MAP<VARCHAR,VARCHAR>, g3:MAP<TIMESTAMP,TIMESTAMP>, g4:BOOLEAN
I20231204 14:45:19.165709 30452 Cursor.cpp:201] Task spill directory[/tmp/velox_test_je5c16/test_cursor 7911] created
E20231204 14:45:19.290892 49374 Exceptions.h:69] Line: ../../velox/common/memory/ByteStream.cpp:100, Function:seekp, Expression:  Seeking past end of ByteInputStream: 1002667, Source: RUNTIME, ErrorCode: INVALID_STATE
I20231204 14:45:19.291966 49374 Task.cpp:1746] Terminating task test_cursor 7911 with state Failed after running for 127 ms.
I20231204 14:45:19.294098 49374 Task.cpp:1062] All drivers (1) finished for task test_cursor 7911 after running for 129 ms.
I20231204 14:45:19.295768 30452 TempDirectoryPath.cpp:31] TempDirectoryPath:: removing all files from/tmp/velox_test_je5c16
I20231204 14:45:19.946060 30452 AggregationFuzzer.cpp:646] Persisted aggregation plans to /tmp/aggregate_fuzzer_repro/velox_aggregationVerifier_lKt8sQ/plan_nodes
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Seeking past end of ByteInputStream: 1002667
Retriable: False
Function: seekp
File: ../../velox/common/memory/ByteStream.cpp
Line: 100
Stack trace:
# 0  _ZN8facebook5velox7process10StackTraceC1Ei
# 1  _ZN8facebook5velox14VeloxException5State4makeIZNS1_C4EPKcmS5_St17basic_string_viewIcSt11char_traitsIcEES9_S9_S9_bNS1_4TypeES9_EUlRT_E_EESt10shared_ptrIKS2_ESA_SB_
# 2  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
# 3  _ZN8facebook5velox17VeloxRuntimeErrorC2EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bS7_
# 4  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_
# 5  _ZN8facebook5velox15ByteInputStream5seekpESt4fposI11__mbstate_tE
# 6  _ZN8facebook5velox10serializer6presto12_GLOBAL__N_114readTopColumnsERNS0_15ByteInputStreamERKSt10shared_ptrIKNS0_7RowTypeEEPNS0_6memory10MemoryPoolERKS6_INS0_9RowVectorEEib
# 7  _ZN8facebook5velox10serializer6presto17PrestoVectorSerde11deserializeEPNS0_15ByteInputStreamEPNS0_6memory10MemoryPoolESt10shared_ptrIKNS0_7RowTypeEEPS9_INS0_9RowVectorEEiPKNS0_11VectorSerde7OptionsE
# 8  _ZN8facebook5velox10serializer6presto17PrestoVectorSerde11deserializeEPNS0_15ByteInputStreamEPNS0_6memory10MemoryPoolESt10shared_ptrIKNS0_7RowTypeEEPS9_INS0_9RowVectorEEPKNS0_11VectorSerde7OptionsE
# 9  _ZN8facebook5velox17VectorStreamGroup4readEPNS0_15ByteInputStreamEPNS0_6memory10MemoryPoolESt10shared_ptrIKNS0_7RowTypeEEPS7_INS0_9RowVectorEEPKNS0_11VectorSerde7OptionsE
# 10 _ZN8facebook5velox4exec13SpillReadFile9nextBatchERSt10shared_ptrINS0_9RowVectorEE
# 11 _ZN8facebook5velox4exec20FileSpillMergeStream9nextBatchEv
# 12 _ZN8facebook5velox4exec16SpillMergeStream12setNextBatchEv
# 13 _ZN8facebook5velox4exec16SpillMergeStream3popEv
# 14 _ZN8facebook5velox4exec11GroupingSet23mergeNextWithAggregatesEiiRKSt10shared_ptrINS0_9RowVectorEE
# 15 _ZN8facebook5velox4exec11GroupingSet9mergeNextEiiRKSt10shared_ptrINS0_9RowVectorEE
# 16 _ZN8facebook5velox4exec11GroupingSet18getOutputWithSpillEiiRKSt10shared_ptrINS0_9RowVectorEE
# 17 _ZN8facebook5velox4exec11GroupingSet9getOutputEiiRNS1_20RowContainerIteratorERSt10shared_ptrINS0_9RowVectorEE
# 18 _ZN8facebook5velox4exec15HashAggregation9getOutputEv
# 19 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
# 20 _ZN8facebook5velox4exec6Driver3runESt10shared_ptrIS2_E
# 21 _ZZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS2_EENKUlvE_clEv
# 22 _ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallIZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS9_EEUlvE_EEvRNS1_4DataE
# 23 _ZN5folly6detail8function14FunctionTraitsIFvvEEclEv
# 24 _ZN5folly18ThreadPoolExecutor7runTaskERKSt10shared_ptrINS0_6ThreadEEONS0_4TaskE
# 25 _ZN5folly21CPUThreadPoolExecutor9threadRunESt10shared_ptrINS_18ThreadPoolExecutor6ThreadEE
# 26 _ZSt13__invoke_implIvRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEERPS1_JRS4_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_
# 27 _ZSt8__invokeIRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEJRPS1_RS4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSC_DpOSD_
# 28 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EE6__callIvJEJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
# 29 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EEclIJEvEET0_DpOT_
# 30 _ZN5folly6detail8function14FunctionTraitsIFvvEE9callSmallISt5_BindIFMNS_18ThreadPoolExecutorEFvSt10shared_ptrINS7_6ThreadEEEPS7_SA_EEEEvRNS1_4DataE
# 31 _ZN5folly6detail8function14FunctionTraitsIFvvEEclEv
# 32 _ZZN5folly18NamedThreadFactory9newThreadEONS_8FunctionIFvvEEEENUlvE_clEv
# 33 _ZSt13__invoke_implIvZN5folly18NamedThreadFactory9newThreadEONS0_8FunctionIFvvEEEEUlvE_JEET_St14__invoke_otherOT0_DpOT1_
# 34 _ZSt8__invokeIZN5folly18NamedThreadFactory9newThreadEONS0_8FunctionIFvvEEEEUlvE_JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS8_DpOS9_
# 35 _ZNSt6thread8_InvokerISt5tupleIJZN5folly18NamedThreadFactory9newThreadEONS2_8FunctionIFvvEEEEUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE
# 36 _ZNSt6thread8_InvokerISt5tupleIJZN5folly18NamedThreadFactory9newThreadEONS2_8FunctionIFvvEEEEUlvE_EEEclEv
# 37 _ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5folly18NamedThreadFactory9newThreadEONS3_8FunctionIFvvEEEEUlvE_EEEEE6_M_runEv
# 38 0x0000000000000000
# 39 start_thread
# 40 clone

*** Aborted at 1701701120 (Unix time, try 'date -d @1701701120') ***
*** Signal 6 (SIGABRT) (0x76f4) received by PID 30452 (pthread TID 0x7fde7f304640) (linux TID 30452) (maybe from PID 30452, UID 0) (code: -6), stack trace: ***
(error retrieving stack trace)
/bin/bash: line 9: 30452 Aborted                 (core dumped) _build/debug/velox/exec/tests/velox_aggregation_fuzzer_test --seed ${RANDOM} --duration_sec 1800 --logtostderr=1 --minloglevel=0 --repro_persist_path=/tmp/aggregate_fuzzer_repro

@oerling oerling force-pushed the exc-scatter-pr branch 2 times, most recently from 5a1ee86 to 8a3382a Compare December 10, 2023 14:48
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@oerling oerling force-pushed the exc-scatter-pr branch 2 times, most recently from fd74a8d to 98ce76a Compare December 13, 2023 15:55
using StructNullsMap =
folly::F14FastMap<int64_t, std::pair<std::vector<uint64_t>, int32_t>>;

auto& structNullsMap() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

std::unique_ptr<StructNullsMap> structNullsMap() {

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@oerling oerling force-pushed the exc-scatter-pr branch 2 times, most recently from 690cd06 to 33d55e9 Compare December 18, 2023 13:31
@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@oerling merged this pull request in aaaa079.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by ef47305.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants