Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-43254: [C++] Always prefer mimalloc to jemalloc #40875

Merged
merged 4 commits into from
Jul 16, 2024

Conversation

pitrou
Copy link
Member

@pitrou pitrou commented Mar 28, 2024

Rationale for this change

As discussed on the mailing-list, this PR switches the default memory pool to mimalloc for all platforms. This should have several desirable effects:

  • less variability between platforms
  • mimalloc generally has a nicer, more consistent API and is easier to work with (in particular, jemalloc's configuration scheme is slightly abtruse)
  • potentially better performance, or at least not significantly worse, than the statu quo

Are these changes tested?

Yes, by existing CI configurations.

Are there any user-facing changes?

Behavior should not change. Performance characteristics of some user workloads might improve or regress, but this is something we cannot predict in advance.

@pitrou
Copy link
Member Author

pitrou commented Mar 28, 2024

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Mar 28, 2024

Benchmark runs are scheduled for commit 107f99d. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Mar 29, 2024

Ok, it's not obvious if this actually changed anything in the benchmarking setup, because the benchmark numbers don't seem to show any relevant change in performance (it could also be that our benchmarks are not that sensitive to memory allocation details).

cc @austin3dickey FYI

@austin3dickey
Copy link
Contributor

austin3dickey commented Apr 9, 2024

Ok, it's not obvious if this actually changed anything in the benchmarking setup

I'm not exactly sure either. Before my recent changes, we had set ARROW_MIMALLOC=ON and ARROW_JEMALLOC=OFF in the environment:

ARROW_MIMALLOC=ON

ARROW_JEMALLOC=OFF

But we weren't necessarily picking up the environment variables during the archery build. I think both might have been OFF during cmake because that's what archery did by default. Not sure what happens then.

I recently made a series of PRs so we know we're picking up those environment variables for the build we're using for benchmarks:

That last PR has yet to be merged but I'm running the final benchmark tests on it now. Those should definitely be running with mimalloc, not jemalloc.

@pitrou
Copy link
Member Author

pitrou commented May 28, 2024

Benchmark numbers of jemalloc vs. mimalloc should appear in #41205 (comment)

@pitrou
Copy link
Member Author

pitrou commented May 28, 2024

@github-actions crossbow submit -g cpp

This comment was marked as outdated.

@pitrou
Copy link
Member Author

pitrou commented Jun 3, 2024

The benchmarking experiment in #41205 shows many regressions when switching the default allocator from mimalloc to jemalloc in the benchmarking setup. This supports the idea of always using mimalloc as the default allocator, regardless of operating system.

@pitrou
Copy link
Member Author

pitrou commented Jul 15, 2024

@github-actions crossbow submit -g cpp

Copy link

Revision: cca0edf

Submitted crossbow builds: ursacomputing/crossbow @ actions-f6738f965b

Task Status
test-alpine-linux-cpp GitHub Actions
test-build-cpp-fuzz GitHub Actions
test-conda-cpp GitHub Actions
test-conda-cpp-valgrind GitHub Actions
test-cuda-cpp GitHub Actions
test-debian-12-cpp-amd64 GitHub Actions
test-debian-12-cpp-i386 GitHub Actions
test-fedora-39-cpp GitHub Actions
test-ubuntu-20.04-cpp GitHub Actions
test-ubuntu-20.04-cpp-bundled GitHub Actions
test-ubuntu-20.04-cpp-minimal-with-formats GitHub Actions
test-ubuntu-20.04-cpp-thread-sanitizer GitHub Actions
test-ubuntu-22.04-cpp GitHub Actions
test-ubuntu-22.04-cpp-20 GitHub Actions
test-ubuntu-22.04-cpp-emscripten GitHub Actions
test-ubuntu-22.04-cpp-no-threading GitHub Actions
test-ubuntu-24.04-cpp GitHub Actions
test-ubuntu-24.04-cpp-gcc-14 GitHub Actions

@pitrou pitrou changed the title EXPERIMENT: [C++] Always prefer mimalloc to jemalloc GH-43254: [C++] Always prefer mimalloc to jemalloc Jul 15, 2024
@pitrou pitrou marked this pull request as ready for review July 15, 2024 14:41
@pitrou pitrou requested review from assignUser, kou and raulcd as code owners July 15, 2024 14:41
Copy link

⚠️ GitHub issue #43254 has been automatically assigned in GitHub to PR creator.

@pitrou
Copy link
Member Author

pitrou commented Jul 15, 2024

@github-actions crossbow submit -g python -g wheel -g linux

Copy link

Revision: cca0edf

Submitted crossbow builds: ursacomputing/crossbow @ actions-573bff6e17

Task Status
almalinux-8-amd64 GitHub Actions
almalinux-8-arm64 GitHub Actions
almalinux-9-amd64 GitHub Actions
almalinux-9-arm64 GitHub Actions
amazon-linux-2023-amd64 GitHub Actions
amazon-linux-2023-arm64 GitHub Actions
centos-7-amd64 GitHub Actions
centos-8-stream-amd64 GitHub Actions
centos-8-stream-arm64 GitHub Actions
centos-9-stream-amd64 GitHub Actions
centos-9-stream-arm64 GitHub Actions
debian-bookworm-amd64 GitHub Actions
debian-bookworm-arm64 GitHub Actions
debian-trixie-amd64 GitHub Actions
debian-trixie-arm64 GitHub Actions
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.10-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0-numpy-1.19 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest-numpy-latest GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-20.04-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
ubuntu-focal-amd64 GitHub Actions
ubuntu-focal-arm64 GitHub Actions
ubuntu-jammy-amd64 GitHub Actions
ubuntu-jammy-arm64 GitHub Actions
ubuntu-noble-amd64 GitHub Actions
ubuntu-noble-arm64 GitHub Actions
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Jul 15, 2024

@kou These errors on Redhat-like builds do not seem relevant?

error: %changelog not in descending chronological order
Failed ./build.sh
rake aborted!

Edit: these errors might be caused by pitrou/arrow being out of date with our git main. I synced the fork and restarted those jobs.

Edit again: this doesn't seem to have fixed the issue.

@kou
Copy link
Member

kou commented Jul 16, 2024

Hmm. I haven't seen the error and it's not reproduced on local.
I'll add some debug prints to this branch.

@kou
Copy link
Member

kou commented Jul 16, 2024

@github-actions crossbow submit almalinux-*-amd64

This comment was marked as outdated.

@kou
Copy link
Member

kou commented Jul 16, 2024

@github-actions crossbow submit almalinux-*-amd64

This comment was marked as outdated.

@kou
Copy link
Member

kou commented Jul 16, 2024

I got it.

Our RPM build script adds a changelog entry based on the latest commit date.
In this case cca0edf is the latest commit and its date is Date: Thu, 28 Mar 2024 16:51:09 +0100.

The latest changelog entry on main is:

* Thu May 09 2024 Raúl Cumplido <[email protected]> - 16.1.0-1

In the RPM build job prepend a new changelog entry with Date: Thu, 28 Mar 2024 16:51:09 +0100. It generates the following changelog entries:

* Thu Mar 28 2024 ...
- New upstream release.

* Thu May 09 2024 Raúl Cumplido <[email protected]> - 16.1.0-1
- New upstream release.

They aren't sorted. So the error is happen.

I've added a commit that we always use "now" in the RPM build job.

@kou
Copy link
Member

kou commented Jul 16, 2024

@github-actions crossbow submit almalinux-* amazon-linux-* centos-*

Copy link

Revision: 9483626

Submitted crossbow builds: ursacomputing/crossbow @ actions-d4d35d1386

Task Status
almalinux-8-amd64 GitHub Actions
almalinux-8-arm64 GitHub Actions
almalinux-9-amd64 GitHub Actions
almalinux-9-arm64 GitHub Actions
amazon-linux-2023-amd64 GitHub Actions
amazon-linux-2023-arm64 GitHub Actions
centos-7-amd64 GitHub Actions
centos-8-stream-amd64 GitHub Actions
centos-8-stream-arm64 GitHub Actions
centos-9-stream-amd64 GitHub Actions
centos-9-stream-arm64 GitHub Actions

@pitrou
Copy link
Member Author

pitrou commented Jul 16, 2024

Thanks for the fix @kou ! Could you review this PR?

Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Oh, sorry. I forgot to add a review comment...!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jul 16, 2024
@pitrou pitrou merged commit 36fe1da into apache:main Jul 16, 2024
40 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label Jul 16, 2024
@pitrou pitrou deleted the exp_mimalloc branch July 16, 2024 12:07
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 36fe1da.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants