Skip to content

Commit

Permalink
add build caching option (#514)
Browse files Browse the repository at this point in the history
First touch on #499

Depends on: google/oss-fuzz#12284

The way this work is by saving a cached version of `build_fuzzers` post
running of `compile` and then modifying the Dockerfiles of a project to
use this cached build image + an adjusted build script.

For example, for brotli the Dockerfile is originally:

```sh
                                                                                
FROM gcr.io/oss-fuzz-base/base-builder                                          
RUN apt-get update && apt-get install -y cmake libtool make                     
                                                                                
RUN git clone --depth 1 https://github.com/google/brotli.git                    
WORKDIR brotli                                                                  
COPY build.sh $SRC/                                                             
                                                                                
COPY 01.c /src/brotli/c/fuzz/decode_fuzzer.c      
```

a Dockerfile is then created which relies on the cached version, and it
loosk like:

```sh
FROM cached_image_brotli                                                        
# RUN apt-get update && apt-get install -y cmake libtool make                   
#                                                                               
# RUN git clone --depth 1 https://github.com/google/brotli.git                  
# WORKDIR brotli                                                                
# COPY build.sh $SRC/                                                           
#                                                                               
COPY 01.c /src/brotli/c/fuzz/decode_fuzzer.c                                    
#                                                                               
COPY adjusted_build.sh $SRC/build.sh 
```

`adjusted_build.sh` is then the script that only builds fuzzers. This
means we can also use `build_fuzzers`/`compile` workflows as we know it.

More specifically, this PR:

- Makes it possible to build Docker images of fuzzer build containers.
Does this by running `build_fuzzers`, saving the docker container and
then commit the docker container to an image. This image will have a
projects' build set up post running of `compile`. This is then used when
building fuzzers by OFG.
- Supports only ASAN mode for now. Should be easy to extend to coverage
too.
- Currently builds images first and then uses them locally. We could
extend, probably on another step of this, to use containers pushed by
OSS-Fuzz itself.
- Only does the caching if a "cache-build-script" exists (added a few
for some projects) which contains the build instructions post-build
process. It should be easy to extend such that we can rely on some DB of
auto-generated build scripts as well (ref:
google/oss-fuzz#11937) but I think it's nice to
have both the option of us creating the scripts ourselves + an
auto-generated DB.

---------

Signed-off-by: David Korczynski <[email protected]>
  • Loading branch information
DavidKorczynski authored Aug 10, 2024
1 parent abb5a5f commit f9a8df9
Show file tree
Hide file tree
Showing 11 changed files with 285 additions and 1 deletion.
20 changes: 19 additions & 1 deletion experiment/builder_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,6 @@

# The directory in the oss-fuzz image
JCC_DIR = '/usr/local/bin'

RUN_TIMEOUT: int = 30
CLOUD_EXP_MAX_ATTEMPT = 5

Expand Down Expand Up @@ -534,7 +533,24 @@ def build_target_local(self,
log_path: str,
sanitizer: str = 'address') -> bool:
"""Builds a target with OSS-Fuzz."""

logger.info('Building %s with %s', generated_project, sanitizer)

if oss_fuzz_checkout.ENABLE_CACHING and oss_fuzz_checkout.is_image_cached(
self.benchmark.project, sanitizer):
logger.info('We should use cached instance.')
# Rewrite for caching.
oss_fuzz_checkout.rewrite_project_to_cached_project(
self.benchmark.project, generated_project, sanitizer)

# Prepare build
oss_fuzz_checkout.prepare_build(self.benchmark.project, sanitizer,
generated_project)

else:
logger.info('The project does not have any cache')

# Build the image
command = [
'docker', 'build', '-t', f'gcr.io/oss-fuzz/{generated_project}',
os.path.join(oss_fuzz_checkout.OSS_FUZZ_DIR, 'projects',
Expand Down Expand Up @@ -639,13 +655,15 @@ def get_coverage_local(
sample_id = os.path.splitext(benchmark_target_name)[0]
log_path = os.path.join(self.work_dirs.build_logs,
f'{sample_id}-coverage.log')
logger.info('Building project for coverage')
built_coverage = self.build_target_local(generated_project,
log_path,
sanitizer='coverage')
if not built_coverage:
logger.info('Failed to make coverage build for %s', generated_project)
return None, None

logger.info('Extracting coverage')
corpus_dir = self.work_dirs.corpus(benchmark_target_name)
command = [
'python3',
Expand Down
173 changes: 173 additions & 0 deletions experiment/oss_fuzz_checkout.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,13 @@

import yaml

from experiment import benchmark as benchmarklib

logger = logging.getLogger(__name__)

BUILD_DIR: str = 'build'
GLOBAL_TEMP_DIR: str = ''
ENABLE_CACHING = bool(int(os.getenv('OFG_USE_CACHING', '0')))
# Assume OSS-Fuzz is at repo root dir by default.
# This will change if temp_dir is used.
OSS_FUZZ_DIR: str = os.path.join(
Expand Down Expand Up @@ -174,3 +177,173 @@ def get_project_repository(project: str) -> str:
with open(project_yaml_path, 'r') as benchmark_file:
data = yaml.safe_load(benchmark_file)
return data.get('main_repo', '')


def _get_project_cache_name(project: str) -> str:
"""Gets name of cached container for a project."""
return f'gcr.io.oss-fuzz.{project}_cache'


def _get_project_cache_image_name(project: str, sanitizer: str) -> str:
"""Gets name of cached Docker image for a project and a respective
sanitizer."""
return f'gcr.io/oss-fuzz/{project}_{sanitizer}_cache'


def _has_cache_build_script(project: str) -> bool:
"""Checks if a project has cached fuzzer build script."""
cached_build_script = os.path.join('fuzzer_build_script', project)
return os.path.isfile(cached_build_script)


def _prepare_image_cache(project: str) -> bool:
"""Prepares cached images of fuzzer build containers."""
# Only create a cached image if we have a post-build build script
if not _has_cache_build_script(project):
logger.info('No cached script for %s', project)
return False
logger.info('%s has a cached build script', project)

cached_container_name = _get_project_cache_name(project)
adjusted_env = os.environ | {
'OSS_FUZZ_SAVE_CONTAINERS_NAME': cached_container_name
}

logger.info('Creating a cached images')
for sanitizer in ['address', 'coverage']:
# Create cached image by building using OSS-Fuzz with set variable
command = [
'python3', 'infra/helper.py', 'build_fuzzers', project, '--sanitizer',
sanitizer
]
try:
sp.run(command, cwd=OSS_FUZZ_DIR, env=adjusted_env, check=True)
except sp.CalledProcessError:
logger.info('Failed to build fuzzer for %s.', project)
return False

# Commit the container to an image
cached_image_name = _get_project_cache_image_name(project, sanitizer)

command = ['docker', 'commit', cached_container_name, cached_image_name]
try:
sp.run(command, check=True)
except sp.CalledProcessError:
logger.info('Could not rename image.')
return False
logger.info('Created cached image %s', cached_image_name)

# Delete the container we created
command = ['docker', 'container', 'rm', cached_container_name]
try:
sp.run(command, check=True)
except sp.CalledProcessError:
logger.info('Could not rename image.')
return True


def prepare_cached_images(
experiment_targets: list[benchmarklib.Benchmark]) -> None:
"""Builds cached Docker images for a set of targets."""
all_projects = set()
for benchmark in experiment_targets:
all_projects.add(benchmark.project)

logger.info('Preparing cache for %d projects', len(all_projects))

for project in all_projects:
_prepare_image_cache(project)


def is_image_cached(project_name: str, sanitizer: str) -> bool:
"""Checks whether a project has a cached Docker image post fuzzer
building."""
cached_image_name = _get_project_cache_image_name(project_name, sanitizer)
try:
sp.run(
['docker', 'inspect', '--type=image', cached_image_name],
check=True,
stdin=sp.DEVNULL,
stdout=sp.DEVNULL,
stderr=sp.STDOUT,
)
return True
except sp.CalledProcessError:
return False


def rewrite_project_to_cached_project(project_name: str, generated_project: str,
sanitizer: str) -> None:
"""Rewrites Dockerfile of a project to enable cached build scripts."""
cached_image_name = _get_project_cache_image_name(project_name, sanitizer)

generated_project_folder = os.path.join(OSS_FUZZ_DIR, 'projects',
generated_project)

cached_dockerfile = os.path.join(generated_project_folder,
f'Dockerfile_{sanitizer}_cached')
if os.path.isfile(cached_dockerfile):
logger.info('Already converted')
return

# Check if there is an original Dockerfile, because we should use that in
# case,as otherwise the "Dockerfile" may be a copy of another sanitizer.
original_dockerfile = os.path.join(generated_project_folder,
'Dockerfile_original')
if not os.path.isfile(original_dockerfile):
dockerfile = os.path.join(generated_project_folder, 'Dockerfile')
shutil.copy(dockerfile, original_dockerfile)

with open(original_dockerfile, 'r') as f:
docker_content = f.read()

docker_content = docker_content.replace(
'FROM gcr.io/oss-fuzz-base/base-builder', f'FROM {cached_image_name}')
docker_content += '\n' + 'COPY adjusted_build.sh $SRC/build.sh\n'

# Now comment out everything except the first FROM and the last two Dockers
from_line = -1
copy_fuzzer_line = -1
copy_build_line = -1

for line_idx, line in enumerate(docker_content.split('\n')):
if line.startswith('FROM') and from_line == -1:
from_line = line_idx
if line.startswith('COPY'):
copy_fuzzer_line = copy_build_line
copy_build_line = line_idx

lines_to_keep = {from_line, copy_fuzzer_line, copy_build_line}
new_content = ''
for line_idx, line in enumerate(docker_content.split('\n')):
if line_idx not in lines_to_keep:
new_content += f'# {line}\n'
else:
new_content += f'{line}\n'

# Overwrite the existing one
with open(cached_dockerfile, 'w') as f:
f.write(new_content)

# Copy over adjusted build script
shutil.copy(os.path.join('fuzzer_build_script', project_name),
os.path.join(generated_project_folder, 'adjusted_build.sh'))


def prepare_build(project_name, sanitizer, generated_project):
"""Prepares the correct Dockerfile to be used for cached builds."""
generated_project_folder = os.path.join(OSS_FUZZ_DIR, 'projects',
generated_project)
if not ENABLE_CACHING:
return
dockerfile_to_use = os.path.join(generated_project_folder, 'Dockerfile')
original_dockerfile = os.path.join(generated_project_folder,
'Dockerfile_original')
if is_image_cached(project_name, sanitizer):
logger.info('Using cached dockerfile')
cached_dockerfile = os.path.join(generated_project_folder,
f'Dockerfile_{sanitizer}_cached')
shutil.copy(cached_dockerfile, dockerfile_to_use)
else:
logger.info('Using original dockerfile')
shutil.copy(original_dockerfile, dockerfile_to_use)
27 changes: 27 additions & 0 deletions fuzzer_build_script/bluez
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
INCLUDES="-I. -I./src -I./lib -I./gobex -I/usr/local/include/glib-2.0/ -I/src/glib/_build/glib/"
STATIC_LIBS="./src/.libs/libshared-glib.a ./lib/.libs/libbluetooth-internal.a -l:libical.a -l:libicalss.a -l:libicalvcal.a -l:libdbus-1.a /src/glib/_build/glib/libglib-2.0.a"

$CC $CFLAGS $INCLUDES $SRC/fuzz_xml.c -c
$CC $CFLAGS $INCLUDES $SRC/fuzz_sdp.c -c
$CC $CFLAGS $INCLUDES $SRC/fuzz_textfile.c -c
$CC $CFLAGS $INCLUDES $SRC/fuzz_gobex.c -c
$CC $CFLAGS $INCLUDES $SRC/fuzz_hci.c -c

$CXX $CXXFLAGS $LIB_FUZZING_ENGINE \
./src/bluetoothd-sdp-xml.o fuzz_xml.o -o $OUT/fuzz_xml \
$STATIC_LIBS -ldl -lpthread

$CXX $CXXFLAGS $LIB_FUZZING_ENGINE \
fuzz_sdp.o -o $OUT/fuzz_sdp $STATIC_LIBS -ldl -lpthread

$CXX $CXXFLAGS $LIB_FUZZING_ENGINE fuzz_textfile.o -o $OUT/fuzz_textfile \
$STATIC_LIBS -ldl -lpthread src/textfile.o

$CXX $CXXFLAGS $LIB_FUZZING_ENGINE \
fuzz_gobex.o ./gobex/gobex*.o -o $OUT/fuzz_gobex \
$STATIC_LIBS -ldl -lpthread

$CXX $CXXFLAGS $LIB_FUZZING_ENGINE \
fuzz_hci.o ./gobex/gobex*.o -o $OUT/fuzz_hci \
$STATIC_LIBS -ldl -lpthread

7 changes: 7 additions & 0 deletions fuzzer_build_script/brotli
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
$CC $CFLAGS -c -std=c99 -I. -I./c/include c/fuzz/decode_fuzzer.c

$CXX $CXXFLAGS ./decode_fuzzer.o -o $OUT/decode_fuzzer \
$LIB_FUZZING_ENGINE ./libbrotlidec.a ./libbrotlicommon.a

cp java/org/brotli/integration/fuzz_data.zip $OUT/decode_fuzzer_seed_corpus.zip
chmod a-x $OUT/decode_fuzzer_seed_corpus.zip # we will try to run it otherwise
4 changes: 4 additions & 0 deletions fuzzer_build_script/htslib
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
make -j$(nproc) libhts.a test/fuzz/hts_open_fuzzer.o

# build fuzzers
$CXX $CXXFLAGS -o "$OUT/hts_open_fuzzer" test/fuzz/hts_open_fuzzer.o $LIB_FUZZING_ENGINE libhts.a -lz -lbz2 -llzma -lcurl -lcrypto -lpthread
16 changes: 16 additions & 0 deletions fuzzer_build_script/libraw
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# build fuzzers
$CXX $CXXFLAGS -std=c++11 -Ilibraw \
$SRC/libraw_fuzzer.cc -o $OUT/libraw_fuzzer \
$LIB_FUZZING_ENGINE -lz lib/.libs/libraw.a

$CXX $CXXFLAGS -std=c++11 -Ilibraw \
$SRC/libraw_fuzzer.cc -o $OUT/libraw_cr2_fuzzer \
$LIB_FUZZING_ENGINE -lz lib/.libs/libraw.a

$CXX $CXXFLAGS -std=c++11 -Ilibraw \
$SRC/libraw_fuzzer.cc -o $OUT/libraw_nef_fuzzer \
$LIB_FUZZING_ENGINE -lz lib/.libs/libraw.a

$CXX $CXXFLAGS -std=c++11 -Ilibraw \
$SRC/libraw_fuzzer.cc -o $OUT/libraw_raf_fuzzer \
$LIB_FUZZING_ENGINE -lz lib/.libs/libraw.a
7 changes: 7 additions & 0 deletions fuzzer_build_script/libsndfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
./ossfuzz/ossfuzz.sh

# To make CIFuzz fast, see here for details: https://github.com/libsndfile/libsndfile/pull/796
for fuzzer in sndfile_alt_fuzzer sndfile_fuzzer; do
echo "[libfuzzer]" > ${OUT}/${fuzzer}.options
echo "close_fd_mask = 3" >> ${OUT}/${fuzzer}.options
done
12 changes: 12 additions & 0 deletions fuzzer_build_script/mosh
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
cd src/fuzz

make -j$n CFLAGS+="$CFLAGS" CXXFLAGS+="$CXXFLAGS"

for fuzzer in *_fuzzer; do
cp $fuzzer $OUT

corpus=${fuzzer%_fuzzer}_corpus
if [ -d $corpus ]; then
zip -j $OUT/${fuzzer}_seed_corpus.zip $corpus/*
fi
done
13 changes: 13 additions & 0 deletions fuzzer_build_script/quickjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash -eu

build_fuzz_target () {
local target=$1
shift
$CC $CFLAGS -I. -c fuzz/$target.c -o $target.o
$CXX $CXXFLAGS $target.o -o $OUT/$target $@ $LIB_FUZZING_ENGINE
}

build_fuzz_target fuzz_eval .obj/fuzz_common.o libquickjs.fuzz.a
build_fuzz_target fuzz_compile .obj/fuzz_common.o libquickjs.fuzz.a
build_fuzz_target fuzz_regexp .obj/libregexp.fuzz.o .obj/cutils.fuzz.o .obj/libunicode.fuzz.o

4 changes: 4 additions & 0 deletions fuzzer_build_script/tmux
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
make -j"$(nproc)" check
find "${SRC}/tmux/fuzz/" -name '*-fuzzer' -exec cp -v '{}' "${OUT}"/ \;
find "${SRC}/tmux/fuzz/" -name '*-fuzzer.options' -exec cp -v '{}' "${OUT}"/ \;
find "${SRC}/tmux/fuzz/" -name '*-fuzzer.dict' -exec cp -v '{}' "${OUT}"/ \;
3 changes: 3 additions & 0 deletions run_all_experiments.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,9 @@ def main():
experiment_targets = prepare_experiment_targets(args)
experiment_results = []

if oss_fuzz_checkout.ENABLE_CACHING:
oss_fuzz_checkout.prepare_cached_images(experiment_targets)

logger.info('Running %s experiment(s) in parallels of %s.',
len(experiment_targets), str(NUM_EXP))

Expand Down

0 comments on commit f9a8df9

Please sign in to comment.