Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add empty columns to chunked node group if needed during COPY #4882

Merged
merged 5 commits into from
Feb 14, 2025

Conversation

royi-luo
Copy link
Collaborator

@royi-luo royi-luo commented Feb 10, 2025

Description

Follow-up to #4786

After dropping a column (and before checkpointing), if we are directly appending newly-flushed chunked groups to a node group (during copy) the newly inserted node group may not have as many columns as the parent node group, which would cause the next checkpoint to fail.

This PR adds empty columns to the newly inserted node group so that its number of columns is as expected

Also fixes another bug where warning data was being flushed to disk for rel batch insert.

Contributor agreement

@royi-luo royi-luo self-assigned this Feb 10, 2025
@royi-luo royi-luo force-pushed the royi/chunked-node-group-subcolumn branch from c2c35b9 to 043ef90 Compare February 10, 2025 19:33
Copy link

Benchmark Result

Master commit hash: 3b37b0845e07fa51e47f9dfecf4f2f87ee399e42
Branch commit hash: e61caaae8d3af08f2c631f1a9c4066502dc44b21

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 729.03 731.20 -2.17 (-0.30%)
aggregation q28 6364.08 6339.96 24.12 (0.38%)
filter q14 125.69 118.57 7.12 (6.00%)
filter q15 123.59 115.52 8.07 (6.99%)
filter q16 306.91 309.00 -2.09 (-0.68%)
filter q17 447.01 444.76 2.25 (0.51%)
filter q18 1904.16 1911.24 -7.08 (-0.37%)
filter zonemap-node 88.39 83.03 5.37 (6.47%)
filter zonemap-node-lhs-cast 88.45 81.52 6.93 (8.51%)
filter zonemap-node-null 88.36 80.69 7.67 (9.50%)
filter zonemap-rel 5602.41 5471.28 131.13 (2.40%)
fixed_size_expr_evaluator q07 570.89 563.42 7.47 (1.33%)
fixed_size_expr_evaluator q08 800.26 795.32 4.94 (0.62%)
fixed_size_expr_evaluator q09 801.80 796.53 5.27 (0.66%)
fixed_size_expr_evaluator q10 237.39 228.96 8.43 (3.68%)
fixed_size_expr_evaluator q11 230.78 222.04 8.74 (3.94%)
fixed_size_expr_evaluator q12 226.87 217.14 9.73 (4.48%)
fixed_size_expr_evaluator q13 1448.88 1447.51 1.37 (0.09%)
fixed_size_seq_scan q23 110.44 105.86 4.58 (4.33%)
join q29 785.68 715.70 69.98 (9.78%)
join q30 10412.24 9894.64 517.60 (5.23%)
join q31 6.19 7.14 -0.95 (-13.31%)
join SelectiveTwoHopJoin 57.29 55.23 2.06 (3.73%)
ldbc_snb_ic q35 2582.53 2689.46 -106.93 (-3.98%)
ldbc_snb_ic q36 472.66 474.01 -1.35 (-0.28%)
ldbc_snb_is q32 5.64 5.73 -0.09 (-1.62%)
ldbc_snb_is q33 15.30 15.37 -0.07 (-0.43%)
ldbc_snb_is q34 1.23 1.18 0.05 (4.29%)
multi-rel multi-rel-large-scan 1356.41 1397.44 -41.03 (-2.94%)
multi-rel multi-rel-lookup 32.20 31.59 0.61 (1.92%)
multi-rel multi-rel-small-scan 95.10 55.71 39.39 (70.70%)
order_by q25 129.35 123.06 6.29 (5.11%)
order_by q26 451.52 440.62 10.90 (2.47%)
order_by q27 1416.37 1437.25 -20.89 (-1.45%)
recursive_join recursive-join-bidirection 298.41 315.22 -16.81 (-5.33%)
recursive_join recursive-join-dense 6409.32 7359.64 -950.32 (-12.91%)
recursive_join recursive-join-path 24452.37 24295.00 157.37 (0.65%)
recursive_join recursive-join-sparse 1057.97 1047.62 10.36 (0.99%)
recursive_join recursive-join-trail 6915.53 7333.16 -417.63 (-5.70%)
scan_after_filter q01 173.53 168.37 5.16 (3.06%)
scan_after_filter q02 159.18 151.14 8.04 (5.32%)
shortest_path_ldbc100 q37 80.58 100.48 -19.90 (-19.80%)
shortest_path_ldbc100 q38 285.72 387.24 -101.52 (-26.22%)
shortest_path_ldbc100 q39 63.53 64.98 -1.45 (-2.23%)
shortest_path_ldbc100 q40 393.78 462.65 -68.87 (-14.89%)
var_size_expr_evaluator q03 2074.29 2093.01 -18.72 (-0.89%)
var_size_expr_evaluator q04 2329.20 2232.23 96.97 (4.34%)
var_size_expr_evaluator q05 2626.55 2663.57 -37.02 (-1.39%)
var_size_expr_evaluator q06 1337.36 1321.14 16.22 (1.23%)
var_size_seq_scan q19 1449.74 1443.32 6.43 (0.45%)
var_size_seq_scan q20 2307.91 2341.89 -33.99 (-1.45%)
var_size_seq_scan q21 2268.95 2302.94 -33.98 (-1.48%)
var_size_seq_scan q22 127.19 126.92 0.27 (0.22%)

Copy link

codecov bot commented Feb 10, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 86.51%. Comparing base (a88d57e) to head (eba79fa).
Report is 3 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #4882   +/-   ##
=======================================
  Coverage   86.50%   86.51%           
=======================================
  Files        1403     1403           
  Lines       60629    60661   +32     
  Branches     7460     7461    +1     
=======================================
+ Hits        52447    52480   +33     
+ Misses       8013     8012    -1     
  Partials      169      169           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@royi-luo royi-luo marked this pull request as ready for review February 10, 2025 23:10
@royi-luo royi-luo requested a review from ray6080 February 10, 2025 23:10
Copy link

Benchmark Result

Master commit hash: 3b37b0845e07fa51e47f9dfecf4f2f87ee399e42
Branch commit hash: ee0efb71faf4a263ba34d85386942135bfdb8dc5

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 718.32 731.20 -12.88 (-1.76%)
aggregation q28 6358.53 6339.96 18.58 (0.29%)
filter q14 118.87 118.57 0.30 (0.25%)
filter q15 115.64 115.52 0.12 (0.10%)
filter q16 295.01 309.00 -13.99 (-4.53%)
filter q17 438.68 444.76 -6.08 (-1.37%)
filter q18 1954.29 1911.24 43.05 (2.25%)
filter zonemap-node 80.75 83.03 -2.27 (-2.74%)
filter zonemap-node-lhs-cast 80.64 81.52 -0.88 (-1.08%)
filter zonemap-node-null 80.27 80.69 -0.43 (-0.53%)
filter zonemap-rel 5395.61 5471.28 -75.67 (-1.38%)
fixed_size_expr_evaluator q07 562.76 563.42 -0.67 (-0.12%)
fixed_size_expr_evaluator q08 798.71 795.32 3.39 (0.43%)
fixed_size_expr_evaluator q09 792.99 796.53 -3.54 (-0.44%)
fixed_size_expr_evaluator q10 228.07 228.96 -0.89 (-0.39%)
fixed_size_expr_evaluator q11 220.70 222.04 -1.34 (-0.60%)
fixed_size_expr_evaluator q12 218.04 217.14 0.90 (0.41%)
fixed_size_expr_evaluator q13 1458.42 1447.51 10.91 (0.75%)
fixed_size_seq_scan q23 103.99 105.86 -1.87 (-1.77%)
join q29 764.09 715.70 48.40 (6.76%)
join q30 9931.40 9894.64 36.76 (0.37%)
join q31 10.38 7.14 3.23 (45.27%)
join SelectiveTwoHopJoin 58.20 55.23 2.96 (5.37%)
ldbc_snb_ic q35 2697.17 2689.46 7.71 (0.29%)
ldbc_snb_ic q36 496.92 474.01 22.91 (4.83%)
ldbc_snb_is q32 3.43 5.73 -2.30 (-40.12%)
ldbc_snb_is q33 13.92 15.37 -1.45 (-9.43%)
ldbc_snb_is q34 1.17 1.18 -0.01 (-0.45%)
multi-rel multi-rel-large-scan 1305.52 1397.44 -91.91 (-6.58%)
multi-rel multi-rel-lookup 9.56 31.59 -22.04 (-69.75%)
multi-rel multi-rel-small-scan 87.76 55.71 32.05 (57.53%)
order_by q25 125.93 123.06 2.87 (2.34%)
order_by q26 448.32 440.62 7.70 (1.75%)
order_by q27 1384.45 1437.25 -52.81 (-3.67%)
recursive_join recursive-join-bidirection 274.53 315.22 -40.69 (-12.91%)
recursive_join recursive-join-dense 7413.52 7359.64 53.87 (0.73%)
recursive_join recursive-join-path 24377.28 24295.00 82.28 (0.34%)
recursive_join recursive-join-sparse 1066.83 1047.62 19.22 (1.83%)
recursive_join recursive-join-trail 7395.95 7333.16 62.79 (0.86%)
scan_after_filter q01 169.51 168.37 1.14 (0.68%)
scan_after_filter q02 151.95 151.14 0.82 (0.54%)
shortest_path_ldbc100 q37 89.47 100.48 -11.01 (-10.96%)
shortest_path_ldbc100 q38 397.45 387.24 10.21 (2.64%)
shortest_path_ldbc100 q39 67.12 64.98 2.14 (3.29%)
shortest_path_ldbc100 q40 461.37 462.65 -1.28 (-0.28%)
var_size_expr_evaluator q03 2046.11 2093.01 -46.90 (-2.24%)
var_size_expr_evaluator q04 2207.69 2232.23 -24.54 (-1.10%)
var_size_expr_evaluator q05 2609.40 2663.57 -54.17 (-2.03%)
var_size_expr_evaluator q06 1316.18 1321.14 -4.96 (-0.38%)
var_size_seq_scan q19 1432.64 1443.32 -10.68 (-0.74%)
var_size_seq_scan q20 2301.54 2341.89 -40.35 (-1.72%)
var_size_seq_scan q21 2254.39 2302.94 -48.55 (-2.11%)
var_size_seq_scan q22 123.20 126.92 -3.72 (-2.93%)

@royi-luo royi-luo force-pushed the royi/chunked-node-group-subcolumn branch 2 times, most recently from 6ab993c to eb895fb Compare February 12, 2025 20:35
Copy link

Benchmark Result

Master commit hash: dfabf90eab17ec0dc0f87d18464152412e1fd8ee
Branch commit hash: 3dd4284a9dcb7e3eb02b60bd7792216af9f56bb6

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 726.24 736.85 -10.61 (-1.44%)
aggregation q28 6423.61 6358.04 65.56 (1.03%)
filter q14 127.93 128.29 -0.36 (-0.28%)
filter q15 123.50 126.50 -3.00 (-2.37%)
filter q16 305.39 306.18 -0.79 (-0.26%)
filter q17 446.23 446.79 -0.56 (-0.13%)
filter q18 1973.55 1922.93 50.62 (2.63%)
filter zonemap-node 88.52 88.87 -0.35 (-0.40%)
filter zonemap-node-lhs-cast 90.59 90.75 -0.16 (-0.18%)
filter zonemap-node-null 90.78 90.66 0.12 (0.13%)
filter zonemap-rel 5583.24 5394.15 189.09 (3.51%)
fixed_size_expr_evaluator q07 572.77 581.95 -9.18 (-1.58%)
fixed_size_expr_evaluator q08 824.97 801.57 23.40 (2.92%)
fixed_size_expr_evaluator q09 804.55 803.44 1.11 (0.14%)
fixed_size_expr_evaluator q10 238.61 236.67 1.94 (0.82%)
fixed_size_expr_evaluator q11 230.70 229.64 1.06 (0.46%)
fixed_size_expr_evaluator q12 226.50 231.70 -5.20 (-2.24%)
fixed_size_expr_evaluator q13 1475.22 1465.25 9.97 (0.68%)
fixed_size_seq_scan q23 110.41 111.76 -1.35 (-1.21%)
join q29 722.11 703.37 18.74 (2.66%)
join q30 10287.52 11083.57 -796.05 (-7.18%)
join q31 9.75 9.98 -0.22 (-2.25%)
join SelectiveTwoHopJoin 53.23 59.99 -6.76 (-11.27%)
ldbc_snb_ic q35 2604.73 2607.02 -2.28 (-0.09%)
ldbc_snb_ic q36 471.67 485.56 -13.89 (-2.86%)
ldbc_snb_is q32 6.06 4.47 1.59 (35.48%)
ldbc_snb_is q33 15.48 14.83 0.65 (4.38%)
ldbc_snb_is q34 1.26 1.25 0.01 (0.83%)
multi-rel multi-rel-large-scan 1319.88 1392.59 -72.71 (-5.22%)
multi-rel multi-rel-lookup 31.64 32.54 -0.90 (-2.76%)
multi-rel multi-rel-small-scan 71.23 102.16 -30.93 (-30.28%)
order_by q25 134.23 131.92 2.31 (1.75%)
order_by q26 456.65 452.45 4.20 (0.93%)
order_by q27 1399.75 1420.37 -20.62 (-1.45%)
recursive_join recursive-join-bidirection 288.56 296.22 -7.66 (-2.59%)
recursive_join recursive-join-dense 7380.09 7444.01 -63.92 (-0.86%)
recursive_join recursive-join-path 24337.68 24117.33 220.35 (0.91%)
recursive_join recursive-join-sparse 1060.58 1057.45 3.13 (0.30%)
recursive_join recursive-join-trail 7368.17 7418.08 -49.91 (-0.67%)
scan_after_filter q01 175.95 175.01 0.94 (0.54%)
scan_after_filter q02 161.15 159.85 1.29 (0.81%)
shortest_path_ldbc100 q37 94.90 97.65 -2.75 (-2.81%)
shortest_path_ldbc100 q38 405.30 377.28 28.02 (7.43%)
shortest_path_ldbc100 q39 62.79 64.85 -2.05 (-3.17%)
shortest_path_ldbc100 q40 458.84 464.15 -5.31 (-1.14%)
var_size_expr_evaluator q03 2114.96 2149.45 -34.48 (-1.60%)
var_size_expr_evaluator q04 2250.32 2203.44 46.87 (2.13%)
var_size_expr_evaluator q05 2601.98 2620.11 -18.13 (-0.69%)
var_size_expr_evaluator q06 1356.99 1345.39 11.60 (0.86%)
var_size_seq_scan q19 1465.68 1459.82 5.86 (0.40%)
var_size_seq_scan q20 2641.64 2352.12 289.53 (12.31%)
var_size_seq_scan q21 2287.95 2311.06 -23.11 (-1.00%)
var_size_seq_scan q22 127.97 128.13 -0.16 (-0.12%)

@royi-luo royi-luo changed the title Allow chunked node group to store a subset of the parent node group's columns Add empty columns to chunked node group if needed during COPY Feb 13, 2025
Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks!

src/include/storage/store/chunked_node_group.h Outdated Show resolved Hide resolved
src/include/storage/store/csr_chunked_node_group.h Outdated Show resolved Hide resolved

for (column_id_t i = 0; i < columnTypes.size(); ++i) {
if (chunks[i] == nullptr) {
chunks[i] = std::make_unique<ColumnChunk>(mm, columnTypes[i].copy(), capacity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not set capacity to 0 here for padding chunks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@royi-luo royi-luo force-pushed the royi/chunked-node-group-subcolumn branch from 78c925e to eba79fa Compare February 14, 2025 15:20
Copy link

Benchmark Result

Master commit hash: a88d57e3d3a1323feb88a31f946171cd17f20ac9
Branch commit hash: 996fc29ca678739aabec413d1299ab9b9d327114

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 724.99 724.38 0.61 (0.08%)
aggregation q28 6400.13 6365.18 34.95 (0.55%)
filter q14 127.90 126.15 1.75 (1.39%)
filter q15 127.21 125.06 2.15 (1.72%)
filter q16 306.85 301.31 5.54 (1.84%)
filter q17 449.18 446.32 2.86 (0.64%)
filter q18 1999.29 1939.86 59.43 (3.06%)
filter zonemap-node 88.88 90.69 -1.81 (-1.99%)
filter zonemap-node-lhs-cast 90.30 88.98 1.32 (1.48%)
filter zonemap-node-null 90.76 88.59 2.16 (2.44%)
filter zonemap-rel 5566.60 5422.24 144.35 (2.66%)
fixed_size_expr_evaluator q07 577.63 578.36 -0.72 (-0.13%)
fixed_size_expr_evaluator q08 807.60 807.96 -0.37 (-0.05%)
fixed_size_expr_evaluator q09 808.04 811.59 -3.55 (-0.44%)
fixed_size_expr_evaluator q10 244.31 245.64 -1.34 (-0.54%)
fixed_size_expr_evaluator q11 236.47 238.22 -1.75 (-0.73%)
fixed_size_expr_evaluator q12 233.78 236.79 -3.01 (-1.27%)
fixed_size_expr_evaluator q13 1458.63 1456.47 2.16 (0.15%)
fixed_size_seq_scan q23 116.63 117.14 -0.51 (-0.43%)
join q29 775.24 732.30 42.95 (5.86%)
join q30 10759.72 10505.58 254.14 (2.42%)
join q31 7.91 7.93 -0.02 (-0.28%)
join SelectiveTwoHopJoin 58.32 57.26 1.05 (1.84%)
ldbc_snb_ic q35 2629.21 2603.11 26.10 (1.00%)
ldbc_snb_ic q36 478.50 482.67 -4.17 (-0.86%)
ldbc_snb_is q32 5.42 5.32 0.10 (1.94%)
ldbc_snb_is q33 15.15 17.42 -2.27 (-13.05%)
ldbc_snb_is q34 1.24 1.10 0.14 (12.72%)
multi-rel multi-rel-large-scan 1356.06 1642.31 -286.25 (-17.43%)
multi-rel multi-rel-lookup 33.23 21.48 11.75 (54.69%)
multi-rel multi-rel-small-scan 105.73 92.81 12.93 (13.93%)
order_by q25 136.22 131.07 5.14 (3.93%)
order_by q26 455.98 454.88 1.10 (0.24%)
order_by q27 1397.35 1406.59 -9.24 (-0.66%)
recursive_join recursive-join-bidirection 284.28 310.91 -26.63 (-8.57%)
recursive_join recursive-join-dense 7390.74 7376.56 14.17 (0.19%)
recursive_join recursive-join-path 24078.63 24322.43 -243.80 (-1.00%)
recursive_join recursive-join-sparse 1053.21 1047.53 5.69 (0.54%)
recursive_join recursive-join-trail 7401.62 7353.06 48.57 (0.66%)
scan_after_filter q01 168.44 170.35 -1.91 (-1.12%)
scan_after_filter q02 157.87 160.39 -2.52 (-1.57%)
shortest_path_ldbc100 q37 97.06 88.63 8.43 (9.51%)
shortest_path_ldbc100 q38 368.66 371.13 -2.47 (-0.67%)
shortest_path_ldbc100 q39 61.67 66.35 -4.67 (-7.05%)
shortest_path_ldbc100 q40 415.37 426.33 -10.95 (-2.57%)
var_size_expr_evaluator q03 2098.50 2082.96 15.54 (0.75%)
var_size_expr_evaluator q04 2276.88 2225.81 51.06 (2.29%)
var_size_expr_evaluator q05 2624.40 2628.58 -4.18 (-0.16%)
var_size_expr_evaluator q06 1325.26 1334.23 -8.98 (-0.67%)
var_size_seq_scan q19 1468.95 1464.03 4.92 (0.34%)
var_size_seq_scan q20 2480.72 2346.12 134.60 (5.74%)
var_size_seq_scan q21 2304.52 2278.22 26.31 (1.15%)
var_size_seq_scan q22 127.00 127.74 -0.74 (-0.58%)

@royi-luo royi-luo merged commit 8a29797 into master Feb 14, 2025
24 of 27 checks passed
@royi-luo royi-luo deleted the royi/chunked-node-group-subcolumn branch February 14, 2025 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants