Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43040: [C++] Reduce the recursion of many-join test #43042

Merged
merged 3 commits into from
Jun 26, 2024

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Jun 25, 2024

Rationale for this change

The current recursion 64 in many-join test is too aggressive so stack (the C program stack) overflow may happen on alpine or emscripten causing issues like #43040 .

What changes are included in this PR?

Reduce the recursion to 16, which is strong enough for the purpose of #41335 which introduced this test.

Are these changes tested?

Change is test.

Are there any user-facing changes?

None.

@zanmato1984 zanmato1984 requested a review from westonpace as a code owner June 25, 2024 11:35
@zanmato1984 zanmato1984 marked this pull request as draft June 25, 2024 11:35
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

Copy link

Revision: 7251176

Submitted crossbow builds: ursacomputing/crossbow @ actions-5714627e25

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

Copy link

Revision: 3e6acd8

Submitted crossbow builds: ursacomputing/crossbow @ actions-d143fbd7c0

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@zanmato1984
Copy link
Contributor Author

Revision: 7251176

Submitted crossbow builds: ursacomputing/crossbow @ actions-5714627e25

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

Ran with join recursion = 16.

@zanmato1984
Copy link
Contributor Author

Revision: 3e6acd8

Submitted crossbow builds: ursacomputing/crossbow @ actions-d143fbd7c0

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

Ran with join recursion = 72.

@zanmato1984
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

Copy link

Revision: 6ccaa6c

Submitted crossbow builds: ursacomputing/crossbow @ actions-368846a25e

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@zanmato1984 zanmato1984 changed the title EXPERIMENT: [C++] Reduce the recursion of many join test GH-43040: [C++] Reduce the recursion of many-join test Jun 25, 2024
Copy link

⚠️ GitHub issue #43040 has been automatically assigned in GitHub to PR creator.

@zanmato1984
Copy link
Contributor Author

Revision: 6ccaa6c

Submitted crossbow builds: ursacomputing/crossbow @ actions-368846a25e

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

Ran with join recursion = 16 again.

@zanmato1984 zanmato1984 marked this pull request as ready for review June 25, 2024 12:20
@zanmato1984
Copy link
Contributor Author

Hi @pitrou @felipecrv , would you help to take a look? This will fix two long failing jobs.

cc @jorisvandenbossche

@@ -3220,7 +3220,7 @@ TEST(HashJoin, ManyJoins) {
// stack), which is essentially the recursive usage of the temp vector stack.

// A fair number of joins to guarantee temp vector stack overflow before GH-41335.
const int num_joins = 64;
const int num_joins = 16;
Copy link
Contributor Author

@zanmato1984 zanmato1984 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure this conservative value serves the same protection purpose, I've verified in my local that, by reverting commit 6c386da, the test failed (with "temp stack overflow") with 16 joins (actually the minimal number for joins to fail is 14).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you condition the reduction on the specific platforms that can't handle num_joins=64? To ensure possible bugs on a high number of joins are caught in regression tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a nice idea. It's just that the condition could be very tricky to identify. So far I've experienced the following combinations on number of joins being 64:

  1. Ubuntu w/ or w/o ASAN (the CI jobs), all good.
  2. MacOS w/ ASAN, stack overflow; MacOS w/o ASAN, good.
  3. Alpine and Emscripten w/o ASAN (the CI jobs), segfault or memory out-of-bound (presumably to be caused by stack overflow as well).

And I don't find macros to differentiate Linux distributions such as Alpine and Ubuntu. To enable at least one build to run 64-join, it seems the only safe condition is to enable 64 joins on Linux w/ ASAN - but that's just because we have only sanitizer build on Ubuntu.

Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick with 16 if it's enough to reproduce the issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can stick with 16 if it's enough to reproduce the issue.

What do you mean by "enough to repro the issue"? Reducing to 16 is making the issue "go away".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "issue" here means this:

by reverting commit 6c386da, the test failed (with "temp stack overflow") with 16 joins

In other words, 16 joins serves the purpose that this test is originally designed to cover.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. Now I see it.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 25, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 25, 2024
@felipecrv
Copy link
Contributor

I will let @pitrou approve and merge this one.

@github-actions github-actions bot removed the awaiting changes Awaiting changes label Jun 26, 2024
@github-actions github-actions bot added the awaiting merge Awaiting merge label Jun 26, 2024
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing this @zanmato1984

@pitrou pitrou merged commit 2a8fa3e into apache:main Jun 26, 2024
45 of 50 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Jun 26, 2024
@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Jun 26, 2024
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 2a8fa3e.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

zanmato1984 added a commit to zanmato1984/arrow that referenced this pull request Jul 9, 2024
…43042)

### Rationale for this change

The current recursion 64 in many-join test is too aggressive so stack (the C program stack) overflow may happen on alpine or emscripten causing issues like apache#43040 .

### What changes are included in this PR?

Reduce the recursion to 16, which is strong enough for the purpose of apache#41335 which introduced this test.

### Are these changes tested?

Change is test.

### Are there any user-facing changes?

None.

* GitHub Issue: apache#43040

Authored-by: Ruoxi Sun <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants