Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducible Docker build for test cases in src/test #288

Open
wants to merge 86 commits into
base: main
Choose a base branch
from

Conversation

katrinafyi
Copy link
Member

@katrinafyi katrinafyi commented Dec 13, 2024

This has been my pet project for a while and my continuous focus for the last week and a bit. It has been through a fair bit of beta-testing to work out some kinks. Hopefully, it is a well-prepared change.

The effect of this PR is to remove all of the committed lifter outputs from src/test/ and replace them with a system to compile and lift them in a reproducible way. This is done with a Docker image which contains fixed versions of the required tools, a number of scripts to tie it together, and hashes recorded within the repository in place of the committed files.

Please see https://github.com/UQ-PAC/BASIL/blob/nix-docker-build/src/test/readme.md for documentation about the Docker build. If you want to try out the changes, that readme should have all the steps needed (let me know if anything is not clear).

Note! The SystemTests are not passing. I will need some help to work these out, since the changes to ADT/GTIRB files might be very large since they were last updated in the repository.

There are obvious benefits:

  • Clear provenance for lifted examples. Versions of tools are clearly tracked by the Docker image, generated by Nix.
  • Consistency across different operating systems. The existing makefile implicitly assumes some version of Ubuntu and a x86_64-linux system.
  • Reduced repository size and preventing further increases as the binary files are updated.

We foresaw the following pain points and have worked to reduce them:

  • Strong dependency on Docker for CI and running tests. We avoid this by uploading a tarball of the Docker-compiled files and adding a Makefile rule which simply downloads and extracts them. This is fast and suitable for CI as well as users who do not need to change test cases. We hope this will not worsen the Basil onboarding experience.
  • Accidentally mixing locally-compiled files and Docker-compiled files when making the hashes. The md5sum-updating rules are written to ensure they are run within Docker. To avoid files being built without Docker, then activating Docker and updating hashes, we track whether each intermediate file was made inside or outside Docker. Updating hashes is only allowed when they are all from Docker.

There are also drawbacks / limitations.

  • Lifted files differences are not visible in web view. This may be a benefit as it cleans up diffs. However, it is more troublesome to find the diffs if you want to see them.
  • Size and complexity of Docker image. It is a little bulky to handle and takes up sizable disk space. Hopefully, this burden is lessened by the pre-built tarball. Unused Docker images can also be pruned frequently from your computer.
  • Dependency on Docker to start developing test cases. This is true, but I think the trade-off is worthwhile to get consistency. Building using local tools is still possible using the old Makefile infrastructure.
  • To ensure identical file hashes, BAP adt and bir files are normalised by replacing variable Tid addresses with deterministic ones, starting from %00000000 and incrementing. Basil might have made assumptions on these IDs.
  • The use of some scripts here might have bugs. We make efforts to program defensively and have tested on Linux and MacOS, but some bugs might remain.
  • The Makefiles inherit the existing limitation that test cases must be a single C file. This can be extracted to only handle the lifting / hashing part, to make this more flexible and usable with binaries such as cntlm.

Anticipated questions:

  • Why not Git LFS? LFS only solves one part of the problem (the distribution of binaries). The versioning of compilers and tools is still unknown. Github also has tight limits on LFS storage and bandwidth.
  • Why not Docker by itself instead of Docker via Nix? I think that it would be fairly difficult to make a traditional Dockerfile build have the level of consistency which we want. If the Docker build is not repeatable, it has much the same problems as the existing committed binaries, just in GHCR instead of the git repository. Nix is already used to distribute some of the lifter tools, and there is Nix tooling to easily bundle these into a Docker image which can be built by anyone with Nix.
  • Okay, but who knows Nix? Sadly not too many. People should learn! However, for the purposes of building this Docker image, it pulls directly from the packages already in Nix. Updating these packages is already automatic, so all you have to do here is update the docker-flake reference.
  • Why not what we already had? See benefits.

Catalogue of changed files:

To make the changes easier to see, I have made a diff which excludes all the generated file changes: https://gist.github.com/katrinafyi/23f9a7de5b99cf1843814ccb72302e18.

In summary,

  • Most of the changes are to the Makefiles in src/test, including the .mk files in make/.
  • Helper scripts in make/docker-helper.sh and make/bap-normalise.py.
    • Maybe bap-normalise should live in the bap-aslp Nix package. That way, it would also be versioned by the Docker image.
  • make/docker-flake.txt pins the Docker flake to a particular pac-nix commit (and hence the packages of that commit). src/test/docker-contents.txt lists all Nix derivations within the Docker image.
  • Each test case's compiler directory, e.g. correct/arrays_simple/clang, has a new md5sums file which tracks the hashes of a.out, adt, bir, relf, and gts files.
  • src/test/compiled.md5sums collates the md5sums from all subdirectories, for easier checking
  • src/test/compiled.url.txt is a URL to an online file holding the pre-compiled files, in a .tar.zst file. This is extracted on top of src/test to populate the files.
    • maybe we should hash the tarball, in case our file host ever becomes malicious [update: tarball is hashed now]
  • src/test/readme.md extensively documents this system. docs/... is updated to point to this.

Orthogonal but related changes:

  • Makefile fix and improvements to be more robust and customisable (e.g. lifting individual tests or folders).
  • Restructuring DSA and irreducible loops folders to be lifted by the makefile.

TODO

  • update expected files
  • ci tarball creation and download
  • docker-hash in one directory, next to big md5sum. name could be docker-contents.txt or similar
  • problem: since makefiles wildcard directories, it is easy for leftover empty directories to cause unknowable errors like "clang no input files". maybe fixed by writing makefile expression to filter for directories containing C files.
  • document above problem

@l-kent
Copy link
Contributor

l-kent commented Dec 17, 2024

I think it was 11.3.0

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

The docker-helper.sh script seems to assume podman is being used.

@katrinafyi
Copy link
Member Author

assume podman

It does, but it should be overridable. I'll update the docs to mention this.

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

For

+ exec docker pull --platform linux/amd64 ghcr.io/uq-pac/basil-tools-docker:flake-014a-ef9c2374

when running docker-helper.sh pull I get

Error response from daemon: manifest unknown

@katrinafyi
Copy link
Member Author

katrinafyi commented Dec 18, 2024

flake-014a-ef9c2374

You seem to have \r\n line endings in the src/test/make/docker-flake.txt file. I'll see if i can force \n line endings in a gitattributes.

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

Oh I thought I'd fixed that but I guess not

@katrinafyi
Copy link
Member Author

Just checking, do you use git through WSL or through just Windows? WSL, I'd think, wouldn't have this problem.

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

A much better solution is to add a .gitattributes file to mandate that .sh files (& whichever other files you are referencing in your scripts) keep LF line endings and but I hadn't bothered to do that previously because it hadn't really been much of a nuisance with previous bash scripts.

@katrinafyi
Copy link
Member Author

I was planning to do that, I was just curious.

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

At the very least, all of the following probably should have eol=lf but I'm probably missing something since I'm still having an issue with make md5sum-check

Makefile
*.sh 
*.mk
*.md5sum

Docker works to lift everything, it's just incredibly slow (unsurprisingly). The md5 sums all match.

There's still a bunch of test cases where it needs to be added to the config files not to compile the BAP or GTIRB variants - it's mostly GTIRB variants in the indirect_calls folder I think.

The .expected files will need to be updated before merging since they're still for the earlier version I think?

@katrinafyi
Copy link
Member Author

What happens with md5sum-check?

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

md5sum: 'correct/arrays_simple/clang/a.out'$'\r': No such file or directory
: FAILED open or read/clang/a.out

but I can't track down where the \r is being introduced into it easily

Comment on lines 89 to 92
elif [[ "$1" == stop ]]; then
# starts the instance of the docker image.
set -x
exec $DOCKER rm -f -t 1 $unique_container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work with docker, only podman, since docker doesn't have a -t flag for rm

docker stop supports -t, unlike docker rm.
@katrinafyi
Copy link
Member Author

issue with make md5sum-check

I can't encounter this, even after forcing git to use crlf by default. You can try (very carefully) resetting the src/test directory: rm -rf src/test && git checkout src/test. Otherwise, does make extract work? how about md5sum -c compiled.md5sum?

@l-kent
Copy link
Contributor

l-kent commented Dec 18, 2024

md5sum -c compiled.md5sum

worked fine once I fixed the line endings in compiled.md5sum, which is how I knew the md5sums were fine, but the \rs were still getting into make md5sum-check somewhere. I'll try it again

@katrinafyi
Copy link
Member Author

katrinafyi commented Dec 18, 2024

@l-kent For correct/syscall/gcc_O2, can I trouble you to build the test case with -O2 -v , then post the log output and binary? I didn't have success with either gcc 11, 12, or 13. It would be good to do this on an OS as similar to the one you think was used before.

update: i've docker-installed ubuntu:22.04 and it looks like it has -pie by default:

Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin
>>>>>>>>>>>--enable-default-pie<<<<<<< --with-system-zlib --enable-libphobos-checking=release --without-target-system-zlib --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=aarch64-linux-gnu --program-prefix=aarch64-linux-gnu- --includedir=/usr/aarch64-linux-gnu/include --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04) 

This leads to -pie in the collect2 (linker?) call:

 /usr/lib/gcc-cross/aarch64-linux-gnu/11/collect2
-plugin /usr/lib/gcc-cross/aarch64-linux-gnu/11/liblto_plugin.so -plugin-opt=/usr/lib/gcc-cross/aarch64-linux-gnu/11/lto-wrapper -plugin-opt=-fresolution=/tmp/ccUysZL9.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --sysroot=/ --build-id --eh-frame-hdr --hash-style=gnu --as-needed -dynamic-linker /lib/ld-linux-aarch64.so.1 -X -EL -maarch64linux --fix-cortex-a53-843419
>>>>>>>-pie<<<<<<<< -z now -z relro /usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib/../lib/Scrt1.o /usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib/../lib/crti.o /usr/lib/gcc-cross/aarch64-linux-gnu/11/crtbeginS.o -L/usr/lib/gcc-cross/aarch64-linux-gnu/11 -L/usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib/../lib -L/lib/aarch64-linux-gnu -L/lib/../lib -L/usr/lib/aarch64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib /tmp/cccMXoL0.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc-cross/aarch64-linux-gnu/11/crtendS.o /usr/lib/gcc-cross/aarch64-linux-gnu/11/../../../../aarch64-linux-gnu/lib/../lib/crtn.o

So, I feel justified in adding -pie to all the gcc configs. I have every hope that this will fix the correct/syscall/gcc_O2 example. Let me know if you agree.

@katrinafyi
Copy link
Member Author

katrinafyi commented Dec 19, 2024

Diffs in ./mill test output as of time of this comment: https://gist.github.com/katrinafyi/01181184e153a75a8c0d710ee089c6da

We fail 54 tests compared to 30 failing in main. I'll take a look, maybe it's BAP-related.

Edit: As of 05e7deb, indirect call tests are fixed and 38 tests fail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants