GH-43254: [C++] Always prefer mimalloc to jemalloc #40875

pitrou · 2024-03-28T15:53:18Z

Rationale for this change

As discussed on the mailing-list, this PR switches the default memory pool to mimalloc for all platforms. This should have several desirable effects:

less variability between platforms
mimalloc generally has a nicer, more consistent API and is easier to work with (in particular, jemalloc's configuration scheme is slightly abtruse)
potentially better performance, or at least not significantly worse, than the statu quo

Are these changes tested?

Yes, by existing CI configurations.

Are there any user-facing changes?

Behavior should not change. Performance characteristics of some user workloads might improve or regress, but this is something we cannot predict in advance.

GitHub Issue: [C++] Always prefer mimalloc over jemalloc #43254

pitrou · 2024-03-28T15:53:26Z

@ursabot please benchmark

ursabot · 2024-03-28T15:53:36Z

Benchmark runs are scheduled for commit 107f99d. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

pitrou · 2024-03-29T12:55:08Z

Ok, it's not obvious if this actually changed anything in the benchmarking setup, because the benchmark numbers don't seem to show any relevant change in performance (it could also be that our benchmarks are not that sensitive to memory allocation details).

cc @austin3dickey FYI

austin3dickey · 2024-04-09T20:17:26Z

Ok, it's not obvious if this actually changed anything in the benchmarking setup

I'm not exactly sure either. Before my recent changes, we had set ARROW_MIMALLOC=ON and ARROW_JEMALLOC=OFF in the environment:

arrow/dev/conbench_envs/benchmarks.env

Line 30 in dbedcfc

ARROW_MIMALLOC=ON

ARROW_JEMALLOC=OFF

But we weren't necessarily picking up the environment variables during the archery build. I think both might have been OFF during cmake because that's what archery did by default. Not sure what happens then.

I recently made a series of PRs so we know we're picking up those environment variables for the build we're using for benchmarks:

That last PR has yet to be merged but I'm running the final benchmark tests on it now. Those should definitely be running with mimalloc, not jemalloc.

pitrou · 2024-05-28T16:17:20Z

Benchmark numbers of jemalloc vs. mimalloc should appear in #41205 (comment)

pitrou · 2024-05-28T16:19:14Z

@github-actions crossbow submit -g cpp

pitrou · 2024-06-03T16:52:03Z

The benchmarking experiment in #41205 shows many regressions when switching the default allocator from mimalloc to jemalloc in the benchmarking setup. This supports the idea of always using mimalloc as the default allocator, regardless of operating system.

pitrou · 2024-07-15T14:24:48Z

@github-actions crossbow submit -g cpp

github-actions · 2024-07-15T14:27:29Z

Revision: cca0edf

Submitted crossbow builds: ursacomputing/crossbow @ actions-f6738f965b

Task	Status
test-alpine-linux-cpp
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-39-cpp
test-ubuntu-20.04-cpp
test-ubuntu-20.04-cpp-bundled
test-ubuntu-20.04-cpp-minimal-with-formats
test-ubuntu-20.04-cpp-thread-sanitizer
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-gcc-14

github-actions · 2024-07-15T14:41:34Z

⚠️ GitHub issue #43254 has been automatically assigned in GitHub to PR creator.

pitrou · 2024-07-15T14:41:47Z

@github-actions crossbow submit -g python -g wheel -g linux

github-actions · 2024-07-15T14:46:30Z

Revision: cca0edf

Submitted crossbow builds: ursacomputing/crossbow @ actions-573bff6e17

Task	Status
almalinux-8-amd64
almalinux-8-arm64
almalinux-9-amd64
almalinux-9-arm64
amazon-linux-2023-amd64
amazon-linux-2023-arm64
centos-7-amd64
centos-8-stream-amd64
centos-8-stream-arm64
centos-9-stream-amd64
centos-9-stream-arm64
debian-bookworm-amd64
debian-bookworm-arm64
debian-trixie-amd64
debian-trixie-arm64
example-python-minimal-build-fedora-conda
example-python-minimal-build-ubuntu-venv
test-conda-python-3.10
test-conda-python-3.10-cython2
test-conda-python-3.10-hdfs-2.9.2
test-conda-python-3.10-hdfs-3.2.1
test-conda-python-3.10-pandas-latest-numpy-1.26
test-conda-python-3.10-pandas-latest-numpy-latest
test-conda-python-3.10-pandas-nightly-numpy-nightly
test-conda-python-3.10-spark-v3.5.0
test-conda-python-3.10-substrait
test-conda-python-3.11
test-conda-python-3.11-dask-latest
test-conda-python-3.11-dask-upstream_devel
test-conda-python-3.11-hypothesis
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly
test-conda-python-3.11-spark-master
test-conda-python-3.12
test-conda-python-3.8
test-conda-python-3.8-pandas-1.0-numpy-1.19
test-conda-python-3.8-spark-v3.5.0
test-conda-python-3.9
test-conda-python-3.9-pandas-latest-numpy-latest
test-conda-python-emscripten
test-cuda-python
test-debian-12-python-3-amd64
test-debian-12-python-3-i386
test-fedora-39-python-3
test-ubuntu-20.04-python-3
test-ubuntu-22.04-python-3
ubuntu-focal-amd64
ubuntu-focal-arm64
ubuntu-jammy-amd64
ubuntu-jammy-arm64
ubuntu-noble-amd64
ubuntu-noble-arm64
wheel-macos-big-sur-cp310-arm64
wheel-macos-big-sur-cp311-arm64
wheel-macos-big-sur-cp312-arm64
wheel-macos-big-sur-cp38-arm64
wheel-macos-big-sur-cp39-arm64
wheel-macos-catalina-cp310-amd64
wheel-macos-catalina-cp311-amd64
wheel-macos-catalina-cp312-amd64
wheel-macos-catalina-cp38-amd64
wheel-macos-catalina-cp39-amd64
wheel-manylinux-2-28-cp310-amd64
wheel-manylinux-2-28-cp310-arm64
wheel-manylinux-2-28-cp311-amd64
wheel-manylinux-2-28-cp311-arm64
wheel-manylinux-2-28-cp312-amd64
wheel-manylinux-2-28-cp312-arm64
wheel-manylinux-2-28-cp38-amd64
wheel-manylinux-2-28-cp38-arm64
wheel-manylinux-2-28-cp39-amd64
wheel-manylinux-2-28-cp39-arm64
wheel-manylinux-2014-cp310-amd64
wheel-manylinux-2014-cp310-arm64
wheel-manylinux-2014-cp311-amd64
wheel-manylinux-2014-cp311-arm64
wheel-manylinux-2014-cp312-amd64
wheel-manylinux-2014-cp312-arm64
wheel-manylinux-2014-cp38-amd64
wheel-manylinux-2014-cp38-arm64
wheel-manylinux-2014-cp39-amd64
wheel-manylinux-2014-cp39-arm64
wheel-windows-cp310-amd64
wheel-windows-cp311-amd64
wheel-windows-cp312-amd64
wheel-windows-cp38-amd64
wheel-windows-cp39-amd64

pitrou · 2024-07-15T15:04:45Z

@kou These errors on Redhat-like builds do not seem relevant?

error: %changelog not in descending chronological order
Failed ./build.sh
rake aborted!

Edit: these errors might be caused by pitrou/arrow being out of date with our git main. I synced the fork and restarted those jobs.

Edit again: this doesn't seem to have fixed the issue.

kou · 2024-07-16T05:39:39Z

Hmm. I haven't seen the error and it's not reproduced on local.
I'll add some debug prints to this branch.

kou · 2024-07-16T05:41:49Z

@github-actions crossbow submit almalinux-*-amd64

kou · 2024-07-16T05:55:58Z

@github-actions crossbow submit almalinux-*-amd64

kou · 2024-07-16T06:04:03Z

I got it.

Our RPM build script adds a changelog entry based on the latest commit date.
In this case cca0edf is the latest commit and its date is Date: Thu, 28 Mar 2024 16:51:09 +0100.

The latest changelog entry on main is:

arrow/dev/tasks/linux-packages/apache-arrow/yum/arrow.spec.in

Line 890 in 21238a7

* Thu May 09 2024 Raúl Cumplido <[email protected]> - 16.1.0-1

In the RPM build job prepend a new changelog entry with Date: Thu, 28 Mar 2024 16:51:09 +0100. It generates the following changelog entries:

* Thu Mar 28 2024 ...
- New upstream release.

* Thu May 09 2024 Raúl Cumplido <[email protected]> - 16.1.0-1
- New upstream release.

They aren't sorted. So the error is happen.

I've added a commit that we always use "now" in the RPM build job.

kou · 2024-07-16T06:04:48Z

@github-actions crossbow submit almalinux-* amazon-linux-* centos-*

github-actions · 2024-07-16T06:07:12Z

Revision: 9483626

Submitted crossbow builds: ursacomputing/crossbow @ actions-d4d35d1386

Task	Status
almalinux-8-amd64
almalinux-8-arm64
almalinux-9-amd64
almalinux-9-arm64
amazon-linux-2023-amd64
amazon-linux-2023-arm64
centos-7-amd64
centos-8-stream-amd64
centos-8-stream-arm64
centos-9-stream-amd64
centos-9-stream-arm64

pitrou · 2024-07-16T08:09:24Z

Thanks for the fix @kou ! Could you review this PR?

kou

+1

Oh, sorry. I forgot to add a review comment...!

conbench-apache-arrow · 2024-07-16T20:07:51Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 36fe1da.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

pitrou mentioned this pull request Mar 28, 2024

[C++] Performance of numeric casts #40874

Open

github-actions bot added Component: C++ awaiting review Awaiting review labels Mar 28, 2024

This comment was marked as outdated.

Sign in to view

pitrou force-pushed the exp_mimalloc branch from 107f99d to dc1a9bf Compare May 28, 2024 16:16

This comment was marked as outdated.

Sign in to view

EXPERIMENT: [C++] Always prefer mimalloc to jemalloc

cca0edf

pitrou force-pushed the exp_mimalloc branch from dc1a9bf to cca0edf Compare July 15, 2024 14:24

pitrou changed the title ~~EXPERIMENT: [C++] Always prefer mimalloc to jemalloc~~ GH-43254: [C++] Always prefer mimalloc to jemalloc Jul 15, 2024

pitrou marked this pull request as ready for review July 15, 2024 14:41

pitrou requested review from assignUser, kou and raulcd as code owners July 15, 2024 14:41

Relevant doc changes

271fb71

github-actions bot added the Component: Documentation label Jul 15, 2024

Add a debug log

005a753

This comment was marked as outdated.

Sign in to view

Specify ARROW_RELEASE_TIME explicitly

9483626

This comment was marked as outdated.

Sign in to view

kou approved these changes Jul 16, 2024

View reviewed changes

github-actions bot added awaiting merge Awaiting merge and removed awaiting review Awaiting review labels Jul 16, 2024

pitrou merged commit 36fe1da into apache:main Jul 16, 2024
40 checks passed

pitrou removed the awaiting merge Awaiting merge label Jul 16, 2024

pitrou mentioned this pull request Jul 16, 2024

[C++] Always prefer mimalloc over jemalloc #43254

Closed

pitrou deleted the exp_mimalloc branch July 16, 2024 12:07

pitrou mentioned this pull request Jul 17, 2024

EXPERIMENT: [C++] Benchmark jemalloc against mimalloc #41205

Closed

nwalters512 mentioned this pull request Dec 5, 2024

Python 3.12 os.fork deprecation PrairieLearn/PrairieLearn#9817

Closed

GH-43254: [C++] Always prefer mimalloc to jemalloc #40875

GH-43254: [C++] Always prefer mimalloc to jemalloc #40875

Uh oh!

Conversation

pitrou commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

pitrou commented Mar 28, 2024

Uh oh!

ursabot commented Mar 28, 2024

Uh oh!

This comment was marked as outdated.

pitrou commented Mar 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

austin3dickey commented Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pitrou commented May 28, 2024

Uh oh!

pitrou commented May 28, 2024

Uh oh!

This comment was marked as outdated.

pitrou commented Jun 3, 2024

Uh oh!

pitrou commented Jul 15, 2024

Uh oh!

github-actions bot commented Jul 15, 2024

Uh oh!

github-actions bot commented Jul 15, 2024

Uh oh!

pitrou commented Jul 15, 2024

Uh oh!

github-actions bot commented Jul 15, 2024

Uh oh!

pitrou commented Jul 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kou commented Jul 16, 2024

Uh oh!

kou commented Jul 16, 2024

Uh oh!

This comment was marked as outdated.

kou commented Jul 16, 2024

Uh oh!

This comment was marked as outdated.

kou commented Jul 16, 2024

Uh oh!

kou commented Jul 16, 2024

Uh oh!

github-actions bot commented Jul 16, 2024

Uh oh!

pitrou commented Jul 16, 2024

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

conbench-apache-arrow bot commented Jul 16, 2024

Uh oh!

Uh oh!

pitrou commented Mar 28, 2024 •

edited

Loading

pitrou commented Mar 29, 2024 •

edited

Loading

austin3dickey commented Apr 9, 2024 •

edited

Loading

pitrou commented Jul 15, 2024 •

edited

Loading