Makefile updates #198

manodeep · 2020-04-06T02:20:18Z

Thanks for releasing the code-base. I took a quick look at the Makefile and fixed a couple of issues.

Optimisation flags should be on the compile-time and not link-time. I moved the -O3 flag to the compile stage. Plus, the -march=k8 was only targeting the AMD K8 architecture that was released in ~2003-2004 (SSE2 is the "fastest" instructions used on K8, see details below). Modern CPUs have more advanced instructions, and the generic -march=native will optimise for the CPU that COMPAS is being compiled on. (If you are compiling on OzSTAR, then I would recommend using -march=skylake-avx512 instead of -march=native). You guys should re-run any benchmarks and check if there are any speedups.
Duplicate CFLAGS on two different lines containing -g and the include options. Replaced with the more appropriate CXXFLAGS for C++ code
Replaced all hard-coded COMPAS with a single-sourced variable ($EXE)
Embedded the path to the BOOST library into the executable by using the -Xlinker -rpath -Xlinker </path/to/boost/lib>. This will allow the executable to always find the BOOST library used at compile-time (regardless of whether that library is loaded at run-time) -- I suspect this will remove the need for the static linking
Numerous other small-updates to the Makefile structure, including removing of include flags at link-time
I have also attempted to line-up trailing backslash on the SOURCES lines.

Comparison of instruction-sets enabled by compile-time options.

Here are the instruction sets enabled by -march=k8 and -march=skylake-avx512, checked with g++-7.5.0 on my OSX laptop:

[~/temp/COMPAS/src @Manodeeps-MacBook-Pro] g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin18/7.5.0/lto-wrapper
Target: x86_64-apple-darwin18
Configured with:
/opt/local/var/macports/build/_opt_bblocal_var_buildworker_ports_build_ports_lang_gcc7/gcc7/work/gcc-7.5.0/configure
--prefix=/opt/local --build=x86_64-apple-darwin18
--enable-languages=c,c++,objc,obj-c++,lto,fortran --libdir=/opt/local/lib/gcc7
--includedir=/opt/local/include/gcc7 --infodir=/opt/local/share/info
--mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-7
--with-local-prefix=/opt/local --with-system-zlib --disable-nls
--program-suffix=-mp-7 --with-gxx-include-dir=/opt/local/include/gcc7/c++/
--with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local
--with-isl=/opt/local --enable-stage1-checking --disable-multilib --enable-lto
--enable-libstdcxx-time --with-build-config=bootstrap-debug
--with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld
--with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket
--disable-tls --with-pkgversion='MacPorts gcc7 7.5.0_0'
--with-sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX10.14.sdk
Thread model: posix
gcc version 7.5.0 (MacPorts gcc7 7.5.0_0)

[~/temp/COMPAS/src @Manodeeps-MacBook-Pro] g++ -dM -E -x c++ -std=c++11 -O3
-march=k8  - < /dev/null | grep -i -e sse -e avx
#define __cpp_static_assert 200410
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE__ 1
#define __SSE_MATH__ 1

[~/temp/COMPAS/src @Manodeeps-MacBook-Pro] g++ -dM -E -x c++ -std=c++11 -O3
-march=skylake-avx512  - < /dev/null | grep -i -e sse -e avx
#define __AVX512F__ 1
#define __cpp_static_assert 200410
#define __SSE4_1__ 1
#define __core_avx2__ 1
#define __SSE4_2__ 1
#define __AVX512BW__ 1
#define __SSE2_MATH__ 1
#define __AVX__ 1
#define __AVX512DQ__ 1
#define __AVX512VL__ 1
#define __AVX512CD__ 1
#define __AVX2__ 1
#define __tune_core_avx2__ 1
#define __SSE2__ 1
#define __SSSE3__ 1
#define __core_avx2 1
#define __SSE__ 1

Pull request from dev to master to create COMPAS beta

…akefile

SimonStevenson · 2020-04-06T02:29:08Z

This should be a pull request into dev not master

manodeep · 2020-04-06T02:30:13Z

@SimonStevenson Oops - fixed now

SimonStevenson · 2020-04-06T02:50:08Z

This compiled fine for me, would like to get @jeffriley 's eyes on it as well if possible

SimonStevenson · 2020-04-06T02:58:10Z

time make -j $(nproc)
real 7m39.157s
user 24m9.879s
sys 0m37.284s

jeffriley · 2020-04-06T04:08:44Z

The changes look fine. Happy somebody is looking at the makefile - I've basically been ignoring it. I'll officially review a bit later.

SimonStevenson · 2020-04-06T04:42:00Z

running ./COMPAS -n 1000 on master and this branch
master:
Clock time = 160.132 CPU seconds
Wall time = 0:2:45 (hh:mm:ss)

makefile branch
Clock time = 58.9174 CPU seconds
Wall time = 0:1:3 (hh:mm:ss)

So this leads to a factor of >2 speed up.

The outputs are however not identical, I need to check that this is expected, and that they produce identical outputs for identical inputs.

SimonStevenson · 2020-04-06T04:46:42Z

OK spotted a possible bug in default random seed (issue #199 ). Fixing the seed to be 42:
./COMPAS -n 1000 --random-seed 42

I now get identical outputs, and these are the run times:

master
Clock time = 166.763 CPU seconds
Wall time = 0:2:51 (hh:mm:ss)

makefile
Clock time = 59.2974 CPU seconds
Wall time = 0:1:1 (hh:mm:ss)

manodeep · 2020-04-06T04:46:54Z

@SimonStevenson Given that -Ofast is not enabled, the likely culprit is the fused-multiply-add operations (e.g., a*y + b becomes one operation rather than 2). If you recompile with -mno-fma and re-run, then the differences might disappear.

Never-mind - fixed :)

SimonStevenson · 2020-04-06T04:46:54Z

This is awesome @manodeep !

manodeep · 2020-04-06T04:50:35Z

As a point of reference, SSE2 operates on 128 bit vector registers (4x floats, 2x doubles) and AVX-512 operates on 512-bit vector registers (16x floats, 8x doubles) - so the best case speedup would be 4x.

SimonStevenson

I have checked that this compiles and runs and produces the same binaries as master. I found that for a small test run this leads to a speed up of >2! I am happy signing off on this.

SimonStevenson · 2020-04-06T05:06:50Z

I just approved this, but @jeffriley pointed out that we should document Manodeep's suggestion
If you are compiling on OzSTAR, then I would recommend using -march=skylake-avx512 instead of -march=native somewhere. Is it possible to add a flag to the makefile to specify this?

ilyamandel · 2020-04-06T05:11:37Z

"time make -j $(nproc)
real 7m39.157s
user 24m9.879s
sys 0m37.284s"

Query: does it make sense to have two compilation options, one that can compile quickly (for quick tests) at the cost of less efficient running, and another that optimises runtime for longer runs at the cost of long compilation times?

ilyamandel · 2020-04-06T05:12:47Z

Thank you very much for your advice and help, @manodeep -- we really appreciate it!

manodeep · 2020-04-06T05:23:44Z

I just approved this, but @jeffriley pointed out that we should document Manodeep's suggestion
If you are compiling on OzSTAR, then I would recommend using -march=skylake-avx512 instead of -march=native somewhere. Is it possible to add a flag to the makefile to specify this?

Btw, the reason why I recommended -march=skylake-avx512 is because gcc < 7 seems not fully enable the available instructions with -march=native (see here). Given that you only see a 2x speedup, just leaving -march=native on by default everywhere is probably fine.

If you guys haven't done so already, I would recommend using Intel Vtune to visually profile the code and see if there are any easy gains. Intel Vtune is available as a module on OzSTAR, just make sure to compile with icpc and with -xHost. Instructions on how to run Vtune is on the Cookies-n-Code page.

@SimonStevenson Happy to chat via slack if that would be easier.

manodeep · 2020-04-06T05:27:16Z

@ilyamandel Not a problem - glad I could help!

jeffriley · 2020-04-06T06:32:30Z

When I compile with this makefile, using gcc version

gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

I get the following warning for every source file:

utils.cpp: In function ‘void __static_initialization_and_destruction_0(int, int)’:
utils.cpp:501:1: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without
 }

If I add

--param=max-vartrack-size=0

to CXXFLAGS the warning goes away

manodeep · 2020-04-06T06:44:11Z

@jeffriley Good solve - that error message was annoying. I wonder if your solution is identical to -fno-var-tracking-assignments

jeffriley · 2020-04-06T07:08:24Z

@manodeep I think -fno-var-tracking-assignments turns assignment tracking off, whereas --param=max-vartrack-size=0 changes the default limit from 10000 (I think) to unlimited. I'm not sure we need it, so we could just use -fno-var-tracking-assignments

manodeep · 2020-04-06T07:16:06Z

From a quick googling (earlier), that option seems to facilitate debugging. May be worth removing from production runs, particularly so if the compile times change depending on that parameter settiing.

manodeep · 2020-04-07T00:34:55Z

@SimonStevenson @jeffriley I made a few more changes that incorporate good Makefile practices.

Should not affect the build, but now has better dependency checker. For instance, if the header files are altered, then the associated object file will be rebuilt. Same for all object files, if the Makefile is updated

I am done with the edits. Any other edits will be in response to code-review from you guys.

SimonStevenson · 2020-04-07T00:47:11Z

Thanks @manodeep , I'll rerun my tests. Can you respond to @ilyamandel 's query above regarding optimising for compilation time vs optimising for runtime?

Query: does it make sense to have two compilation options, one that can compile quickly (for quick tests) at the cost of less efficient running, and another that optimises runtime for longer runs at the cost of long compilation times?

SimonStevenson · 2020-04-07T00:58:04Z

Compiles and runs fine for me!

manodeep/makefile
./COMPAS -n 1000 --random-seed 42
Clock time = 55.4875 CPU seconds
Wall time = 0:0:56 (hh:mm:ss)

manodeep · 2020-04-07T01:16:23Z

@SimonStevenson Ohh I didn't realise that query was directed at me! Yup - totally possible to create two compilation pathways, but that requires a choice about usability. How do you guys envision running these two different compilation tracks? I have implemented one solution, where the non-optimised build is the default (i.e., mirrors the setup prior to this PR). Running make fast or make staticfast creates the optimised builds (faster runtime, slower compile time), whereas simply typing make or make static creates the non-optimised (slower runtime, faster compile time).

@SimonStevenson When you re-ran the code just now - was the output still identical to before?

Substantial changes since approved, should be reapproved

ilyamandel · 2020-04-07T01:39:29Z

@manodeep -- thank you, exactly what I had in mind!

SimonStevenson · 2020-04-07T02:02:17Z

Hi @manodeep , thanks for this!

Compiling with
time make -j $(nproc)

real 3m6.504s
user 8m9.177s
sys 0m28.017s

Compiling with
time make fast -j $(nproc)

real 7m10.527s
user 23m42.493s
sys 0m34.970s

Checked that it runs and produces the same output in each case using
./COMPAS -n 1000 --random-seed 42

When you re-ran the code just now - was the output still identical to before?
Yes

SimonStevenson

I have checked that the code compiles using both
make
which compiles faster and runs slower
and
make fast
which compiles slower and runs faster
and produces the same output in each case.

jeffriley and others added 3 commits April 4, 2020 09:19

Merge pull request TeamCOMPAS#194 from TeamCOMPAS/dev

ab04727

Pull request from dev to master to create COMPAS beta

Optimisation flags needed at compile-time. Modernised remaineder of M…

520916a

…akefile

Attempting white-space fix

e9d2a1c

manodeep changed the base branch from master to dev April 6, 2020 02:29

SimonStevenson self-requested a review April 6, 2020 02:35

SimonStevenson previously approved these changes Apr 6, 2020

View reviewed changes

manodeep added 5 commits April 6, 2020 22:06

Removed unused variable

1d794a8

Changed to the fixed assignment

daa540a

Remove the executables when cleaning

32baaaf

Added the header files as a dependency

bca3ad4

Altered the sequence of targets to how match how make works

11a73b3

Added a way for a non-optimised build

0c574b4

SimonStevenson approved these changes Apr 7, 2020

View reviewed changes

SimonStevenson merged commit 8a298af into TeamCOMPAS:dev Apr 7, 2020

manodeep deleted the makefile branch April 7, 2020 02:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makefile updates #198

Makefile updates #198

manodeep commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

manodeep commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

jeffriley commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020 •

edited

Loading

manodeep commented Apr 6, 2020 •

edited

Loading

SimonStevenson commented Apr 6, 2020

manodeep commented Apr 6, 2020

SimonStevenson left a comment

SimonStevenson commented Apr 6, 2020

ilyamandel commented Apr 6, 2020

ilyamandel commented Apr 6, 2020

manodeep commented Apr 6, 2020

manodeep commented Apr 6, 2020

jeffriley commented Apr 6, 2020

manodeep commented Apr 6, 2020

jeffriley commented Apr 6, 2020

manodeep commented Apr 6, 2020

manodeep commented Apr 7, 2020

SimonStevenson commented Apr 7, 2020 •

edited

Loading

SimonStevenson commented Apr 7, 2020

manodeep commented Apr 7, 2020

ilyamandel commented Apr 7, 2020

SimonStevenson commented Apr 7, 2020

SimonStevenson left a comment

Makefile updates #198

Makefile updates #198

Conversation

manodeep commented Apr 6, 2020

Comparison of instruction-sets enabled by compile-time options.

SimonStevenson commented Apr 6, 2020

manodeep commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

jeffriley commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020

SimonStevenson commented Apr 6, 2020 • edited Loading

manodeep commented Apr 6, 2020 • edited Loading

SimonStevenson commented Apr 6, 2020

manodeep commented Apr 6, 2020

SimonStevenson left a comment

Choose a reason for hiding this comment

SimonStevenson commented Apr 6, 2020

ilyamandel commented Apr 6, 2020

ilyamandel commented Apr 6, 2020

manodeep commented Apr 6, 2020

manodeep commented Apr 6, 2020

jeffriley commented Apr 6, 2020

manodeep commented Apr 6, 2020

jeffriley commented Apr 6, 2020

manodeep commented Apr 6, 2020

manodeep commented Apr 7, 2020

SimonStevenson commented Apr 7, 2020 • edited Loading

SimonStevenson commented Apr 7, 2020

manodeep commented Apr 7, 2020

ilyamandel commented Apr 7, 2020

SimonStevenson commented Apr 7, 2020

SimonStevenson left a comment

Choose a reason for hiding this comment

SimonStevenson commented Apr 6, 2020 •

edited

Loading

manodeep commented Apr 6, 2020 •

edited

Loading

SimonStevenson commented Apr 7, 2020 •

edited

Loading