a profiler of "malloc" activities".
It will trace location (stacktrace) and size of memory allocations (malloc
, etc) for each thread and report them at the end of the process.
The dump tries to reproduce the flamegraph input format (https://github.com/jlfwong/speedscope/wiki/Importing-from-custom-sources#brendan-greggs-collapsed-stack-format) accepted by speedscope as well.
An API is provided to configure it and get reports on user request.
Besides providing such a detailed map, the tool also accumulate statistics for both total-memory and live-memory in form of counters and histogram.
GCC12 or newer. A version not older than Nov 15, 2023.
configured with --enable-libstdcxx-backtrace=yes
.
This tool has been tested with GCC14.
clone this repository ad cd
in it.
source compile
export LD_PRELOAD=./mallocProfiler.so
invoke the application to profile and filter the profile using grep _mptrace
selecting the field of your choice (see below)
export LD_PRELOAD=""
drop the resulting file in speedscope (or generate a flamegraph svg)
go in the demos directory and run the trivial python example (taken from a numpy tutorial)
export LD_PRELOAD=../mallocProfiler.so
python3 demo.py | grep _mpTrace | cut -f1,3 -d'$' | tr '$' ' ' > & pyDemo.md
drop the resulting file (pyDemo.md) in https://www.speedscope.app
select the sandwich view, sort by total, click on file_rules
and one should get an output like this one showing the typical huge call stacks of python
demos/instrumentationDemo.cpp contains a simple example of how to instrument user code: it is supposed to track and report all allocations performed while filling a hash-map (std::unordered_map)
compile it with
c++ -g instrumentationDemo.cpp ../dummyMallocProfiler.so -o instrumentationDemo
preload the profiler disabled by default and run it
export LD_PRELOAD=../mallocProfilerOFF.so
../instrumentationDemo
compile it again activating the reserve
call
c++ -g instrumentationDemo.cpp ../dummyMallocProfiler.so -o instrumentationDemo -DRESERVE
and compare the two outputs
The user API, to configure the profiler and to instrument the code, is all in the header file include/mallocProfiler.h and is documented inline.
A simple mechanism to configure the profiler w/o instrumenting the code is to introduce a middle-library to be preloaded after the profiler itself. An example can be found in tests/testConfiguration.cc
It is easy to switch off detailed tracing and just accumulate global statistics. The ready to use statOnlyThread.so library will start a thread that each 10 seconds will dump in a file (named memstat_PID.mdr
) three lines containing global statistics, the histogram of total memory and the one of live memory.
This file can then be split in three csv-files with some trivial grep
and sed
and read using a visualization tool.
Exemples of such files can be found in the demos directory togehter with a jupyter
notebook to visualize them in form of time-serie plots and histogram animations.