-
Notifications
You must be signed in to change notification settings - Fork 80
Parallel builds of clang on Windows
We have found that it is easy to end up with too much build parallelism in Windows when using msbuild or Visual Studio to build clang or LLVM (see issue #268). This can lead to virtual memory paging, which can slow down your clang or LLVM build by a factor of 2 to 3. You can avoid these problems by passing additional flags to msbuild or changing settings in Visual Studio.
If you find the settings below not working well, read the Details section to figure what to do next. Note that parallel builds of debug builds of LLVM bottleneck on LLVM table generation, which is very slow in debug builds (this is Amdahl's law in action).
Here are recommended settings for parallel builds of clang/LLVM using msbuild. These seem to provide a good trade-off between parallelism and memory usage.
msbuild /p:CL_MPCount=3 /m
If you are building from the Visual Studio IDE, in VS 2015 or VS 2017 go to Debug->Options->Projects and Solutions->VC++ Project Solutions and set the Maximum Number of concurrent C++ compilations
to 3.
CL_MPCount
is a build variable that controls the number of parallel compilations launched by the cl C++ compiler driver. We recommend setting it to 2 or 3, if you have at least 1 GByte of memory per CPU core on your machine. If you have less than 1 GByte of memory per CPU core (or a lot more memory than that), you should it to Total amount of memory on your machine / (# cores * 333 Mbytes). For msbuild, we recommend also using the /m
switch (short for /maxcpucount
), which will cause msbuild to try to create one build task per core.
The setting in the IDE corresponds to the CL_CPUCount
variable in msbuild. By default, 0 causes it to be the number of available CPU cores on your machine, which is too much. Visual Studio already sets the /m
flag by default. The VS 2017 setting under Debug->Options->Projects and Solutions ->Build and Run for the maximum number of parallel project builds corresponds to the /m
option to msbuild.
CMake generates MSBuild files that set the /MP flag for the Microsoft Visual C++ compiler. When the C++ compiler is applied to a long list of files, it will launch as many compiler processes as there are CPU cores on your machine. For example,
cl A.c B.c, C.c D.c
causes the Microsoft Visual C++ compiler to launch 4 processes (one per file), if your machine as least 4 processors available (see this blog article). You can limit this in msbuild by setting the CL_MPCount
property using the /p
option. For example,
msbuild /p:CL_MPCount=nnn
where nnn
is the maximum number of processes to launch.
When you use the /m:nnn
option to msbuild
, it launches as many build processes as are specified by nnn
. If you omit nnn
and use only /m
, it launches as many build processes as there are CPU cores on your machine. In our automated scripts, we were setting nnn
to be 1/4 the number of CPU cores. The end result was that at times a quadratic number of compiler processes were being launched. If p is the number of processors, p^2/4 compilations were being launched. This caused build machines with ample amounts of memory to page. It is known that you can accidentally mis-use build parallelism with Visual Studio when using cmake: see the comments section for this Kitware CMake blog post.
The clang build does other things besides invoke the C++ compiler. It invokes tools like tblgen and builds libraries. There is lots of parallelism available in a clang/LLVM build, and the build system is better situated to recognize it and take advantage of it than invocations of the compiler driver. At the same time, the generated build system is invoking the C++ compiler with long lists of file arguments, so ratcheting individual build nodes down to no parallelism seems like a bad idea.
We need to make a trade-off: it seems better to set the build system parallelism to be high, and limit the individual compiler node parallelism to a constant. Modern OSes are very good at time-slicing CPU time across processes. They are not so good at time-slicing physical memory across processes. When there is more demand for physical memory than is actually available, this leads to virtual memory thrashing. Our goal is to create high CPU utilization while limiting memory usage to physical memory.
We have found that limiting the number of parallel C compiler processes spawned by the C++ compiler driver to 2 to 3 and setting the build system parallelism to the number of cores seems to achieve this. This is given 1 GB physical memory/core (333 MByte to 512 MBytes of memory/C++ compiler process). For 3 processes, you can do this by adding the following options to MSBuild:
msbuild /p:CL_MPCount=3 /m
If you have less memory per core, you'll need to reduce the amount of build parallelism accordingly. We would suggest reducing the number of parallel C compiler launches.