add build caching option #514

DavidKorczynski · 2024-07-27T21:11:22Z

First touch on #499

The way this work is by saving a cached version of build_fuzzers post running of compile and then modifying the Dockerfiles of a project to use this cached build image + an adjusted build script.

For example, for brotli the Dockerfile is originally:

                                                                                
FROM gcr.io/oss-fuzz-base/base-builder                                          
RUN apt-get update && apt-get install -y cmake libtool make                     
                                                                                
RUN git clone --depth 1 https://github.com/google/brotli.git                    
WORKDIR brotli                                                                  
COPY build.sh $SRC/                                                             
                                                                                
COPY 01.c /src/brotli/c/fuzz/decode_fuzzer.c

a Dockerfile is then created which relies on the cached version, and it loosk like:

FROM cached_image_brotli                                                        
# RUN apt-get update && apt-get install -y cmake libtool make                   
#                                                                               
# RUN git clone --depth 1 https://github.com/google/brotli.git                  
# WORKDIR brotli                                                                
# COPY build.sh $SRC/                                                           
#                                                                               
COPY 01.c /src/brotli/c/fuzz/decode_fuzzer.c                                    
#                                                                               
COPY adjusted_build.sh $SRC/build.sh

adjusted_build.sh is then the script that only builds fuzzers. This means we can also use build_fuzzers/compile workflows as we know it.

More specifically, this PR:

Makes it possible to build Docker images of fuzzer build containers. Does this by running build_fuzzers, saving the docker container and then commit the docker container to an image. This image will have a projects' build set up post running of compile. This is then used when building fuzzers by OFG.
Supports only ASAN mode for now. Should be easy to extend to coverage too.
Currently builds images first and then uses them locally. We could extend, probably on another step of this, to use containers pushed by OSS-Fuzz itself.
Only does the caching if a "cache-build-script" exists (added a few for some projects) which contains the build instructions post-build process. It should be easy to extend such that we can rely on some DB of auto-generated build scripts as well (ref: chronos: Pause compile just before compiling the fuzz target so that we can reuse it later. oss-fuzz#11937) but I think it's nice to have both the option of us creating the scripts ourselves + an auto-generated DB.

DavidKorczynski · 2024-07-27T21:17:14Z

Can confirm from local runs that this saves time. I assume a lot of time will be saved by (1) using OSS-Fuzz bots to generated the containers; (2) extend to coverage as well.

Signed-off-by: David Korczynski <[email protected]>

DonggeLiu

Thanks for looking into this, @DavidKorczynski.
Would you plan to support more projects before merging?

DonggeLiu · 2024-07-30T07:15:56Z

experiment/builder_runner.py

@@ -529,12 +529,108 @@ def run_target_local(self, generated_project: str, benchmark_target_name: str,
    else:
      logger.info(f'Successfully run {generated_project}.')

+  def is_image_cached(self, project_name):
+    cached_image_name = f'cached_image_{project_name}'


nit: Normally the original image name from OSS-Fuzz is gcr.io/oss-fuzz/project-name, hence suffixes might be more suitable for this case (e.g., gcr.io/oss-fuzz/bluez-ofg-cache)

Done, i've now renamed caches so the images are called e.g.:

gcr.io/oss-fuzz/libraw_coverage_cache gcr.io/oss-fuzz/libraw_coverage_cache

Note that there is difference between container/image names, and container names have some additional naming limitations. However, we the containers are still temporary and it's the image names we care about.

experiment/builder_runner.py

DonggeLiu · 2024-07-30T10:31:53Z

fuzzer_build_script/brotli

+$CC $CFLAGS -c -std=c99 -I. -I./c/include c/fuzz/decode_fuzzer.c
+
+$CXX $CXXFLAGS ./decode_fuzzer.o  -o $OUT/decode_fuzzer \
+    $LIB_FUZZING_ENGINE ./libbrotlidec.a ./libbrotlicommon.a


Glad to see this works, but an automatic solution is preferred given there are more ~300 projects in total, some of which builds the fuzz target with make.

It's actually fine that make is used because in most, if not all (am not sure if time sometimes matters), times make will only recompile the targets that have change, i.e. the harness

+1, let's keep this as more of an "override" option that's used in if the automatic solution fails.

run_all_experiments.py

DonggeLiu · 2024-07-30T10:36:13Z

run_all_experiments.py

+  }
+
+  # Create cached image by building using OSS-Fuzz with set variable
+  command = ['python3', 'infra/helper.py', 'build_fuzzers', project]


I reckon we can build the cache image of all projects every day (if we have their fuzzer_build_script in your solution, or address the TODOs in mine).
OSS-Fuzz is already building all original projects anyway.

We run multiple experiments every day, no need to rebuild the cache images in every experiment.

I'm fine with that, I say we land it now as is and then do the migration of OSS-Fuzz build images/images build from cloud in the next steps. It's working well locally and I think that's a nice stage

DavidKorczynski · 2024-07-30T21:08:32Z

Would you plan to support more projects before merging?

Yep. I'm also planning on supporting caching of code coverage before merging. I was asking for review from a perspective of the architecture. In short, I agree on #514 (comment) and #514 (comment) but thought this would be a good start

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski · 2024-08-05T13:57:14Z

Would you plan to support more projects before merging?

Yep. I'm also planning on supporting caching of code coverage before merging. I was asking for review from a perspective of the architecture. In short, I agree on #514 (comment) and #514 (comment) but thought this would be a good start

@DonggeLiu this has now been achieved. I think the PR is now in a good state to land in that it works well on local runs.

next steps will be:

Tie this together with cached builds uploaded by OSS-Fuzz
Migrate changes onto the cloud environment

I would prefer to do both of these steps in follow-ups

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski · 2024-08-05T13:57:59Z

/gcbrun exp -n dk-5439 -m vertex_ai_gemini-1-5 -b comparison

Signed-off-by: David Korczynski <[email protected]>

DonggeLiu · 2024-08-06T02:04:34Z

Cached image generation is almost ready on the OSS-Fuzz side: google/oss-fuzz#11937

Once merged, we can integrate it into the automatic daily build workflow.

DonggeLiu

Thanks @DavidKorczynski !

DonggeLiu · 2024-08-06T03:21:44Z

experiment/builder_runner.py

+
+  def _rewrite_project_to_cached_project(self, generated_project,
+                                         sanitizer: str) -> None:
+    """Rewrites Dockerfile of a project to enable cached build scripts."""


I've cleaned up #11937 and documented an alternative way to cache and reuse images.
Could you please have a look and let me know if changes are needed to integrate it better into your code here?
(E.g., maybe no need to set ENV in Dockerfile or use OSS_FUZZ_SAVE_CONTAINERS_NAME?)
Format/Linter/Structure/Coding style suggestions are welcome, too.

Thanks!

experiment/builder_runner.py

DonggeLiu · 2024-08-06T04:01:07Z

experiment/builder_runner.py

+        from_line = line_idx
+      if line.startswith('COPY'):
+        copy_fuzzer_line = copy_build_line
+        copy_build_line = line_idx


I am not 100% sure if this is reliable, because some Dockerfiles copy other things (e.g., dictionaries, fuzzer options, seeds, or others) which may break this.

Maybe keep all COPY lines?
Alternatively we keep each file copied if it can be a fuzz target or is the build.sh, but I am a bit worried that missing *.options/*.dict/seeds may cause file not found error during fuzzing.

With that said, if this is only used for the manual examples and none of them has this issue, then it is OK for now.

Just in case I forget, if we decide to change this loop, could we merge it with the loop below?
E.g., (modify and) add lines to new_content directly in each condition.

We don't need COPY for this, as the dictionaries will already existing due to the containers being a snapshot post the first running of the original Dockerfile, i.e. the seeds/dicts will already exist in the Dockerfile.

However, the new build.sh will not copy those to $OUT, which is also okay, since we're change the fuzzer and should not use the .options/corpus etc provided by the original harness (rather we should use our own)

Thanks for the explanation about no need to COPY dictionary/options.
My primary concern is the first sentence: I meant to say 'The code above does not reliably store the lines that COPYs fuzzer and build script'.

IIUC, the code assumes copy_build_line is always the last line starting with COPY, and the copy_fuzzer_line is always the second last. The links I added are some counter-examples to show this assumption is unreliable. For example, in the first case, the second last COPY is COPY sksl.dict $SRC/skia_data/sksl.dict.

If we use this on more projects later, then projects will be tripped up by this.

Ah no, what these two lines are not commenting out is:

the COPY added by OFG which copies in the new auto-generated harness

the COPY added for the adjusted build script COPY adjusted_build.sh $SRC/build.sh

It's always guaranteed to be those two

experiment/builder_runner.py

run_all_experiments.py

DonggeLiu · 2024-08-06T05:46:33Z

fuzzer_build_script/bluez

In a later PR, would it be better to store these file content in their benchmark YAML file?

Or better, can we store these upstream in oss-fuzz instead with some kind of convention? Are these reusable enough such that the upstream build.sh can call these so there's a single implementation of build script everywhere?

Storing them here will inevitably cause things to diverge over time.

Oh nice, OSS-Fuzz is definitely a better place.
If the upstream project maintainers do not mind, maybe we can separate this part from build.sh into a new file:

# Original commands to build the project ... # Call this script to generate fuzz target binaries. /src/build_fuzzer

I probably prefer keeping as is rather than storing in benchmark or oss-fuzz.

I don't think it will be a big burden to maintain these, and if the auto-gen works well then we can most likely get rid of many manual build scripts.

I'm not too much for putting it in OSS-Fuzz as I don't think it will necessarily be intuitive why it's there, and I would probably think it's best not to increase complexity of the fuzzing build setups in OSS-Fuzz. it's kind of mixing things a bit IMO.

I lean towards storing them here for now -- and if after some time it's clear storing them in OSS-Fuzz provides much benefit then we can move them there.

For OFG's need, I am not very confident that chronos will work for all projects immediately, so we might need manual scripts for a while.
A benefit of storing them in OSS-Fuzz is that Cloud Build instances can directly use them from OSS-Fuzz without needing to download them from OFG.

@oliverchang Is there any other way that OSS-Fuzz can benefit from these files?

Either case should work, but I still think it would be, at least for now, good to not land them in OSS-Fuzz projects' folders until we have a stable set up.

I'll look to land this as it's useful as is and then we can adjust the OSS-Fuzz set up in further steps.

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski · 2024-08-07T21:21:39Z

/gcbrun exp -n dk-5440 -m vertex_ai_gemini-1-5 -b comparison

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski · 2024-08-08T21:26:11Z

/gcbrun exp -n dk-5442 -m vertex_ai_gemini-1-5 -b comparison

DavidKorczynski requested review from DonggeLiu and oliverchang July 27, 2024 21:17

DavidKorczynski added 2 commits July 27, 2024 14:29

add build caching option

4a4cde4

Signed-off-by: David Korczynski <[email protected]>

nit

106d136

Signed-off-by: David Korczynski <[email protected]>

DonggeLiu reviewed Jul 30, 2024

View reviewed changes

DavidKorczynski added 4 commits August 2, 2024 22:43

Merge branch 'main' into add-caching-option

ede8b0d

address review and enable coverage sanitizer support

fec8b7b

Signed-off-by: David Korczynski <[email protected]>

fix style

72fe5e2

Signed-off-by: David Korczynski <[email protected]>

add quickjs script

998695d

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski marked this pull request as ready for review August 5, 2024 13:19

DavidKorczynski added 4 commits August 5, 2024 06:30

add more build scripts

26250f8

Signed-off-by: David Korczynski <[email protected]>

docs

e4c82ac

Signed-off-by: David Korczynski <[email protected]>

style

787dc3e

Signed-off-by: David Korczynski <[email protected]>

fix style

f679113

Signed-off-by: David Korczynski <[email protected]>

nit

8ef2270

Signed-off-by: David Korczynski <[email protected]>

nit

38d3a3b

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski requested a review from DonggeLiu August 5, 2024 14:30

DonggeLiu approved these changes Aug 6, 2024

View reviewed changes

DavidKorczynski added 3 commits August 7, 2024 21:39

fix minor nits

241a481

Signed-off-by: David Korczynski <[email protected]>

restructure according to review

2d5c794

Signed-off-by: David Korczynski <[email protected]>

fix style

f099447

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski added 2 commits August 8, 2024 14:05

Merge branch 'main' into add-caching-option

00b4892

make functions private

a5b03ce

Signed-off-by: David Korczynski <[email protected]>

DavidKorczynski merged commit f9a8df9 into main Aug 10, 2024
6 checks passed

DavidKorczynski deleted the add-caching-option branch August 10, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add build caching option #514

add build caching option #514

DavidKorczynski commented Jul 27, 2024 •

edited

Loading

DavidKorczynski commented Jul 27, 2024

DonggeLiu left a comment

DonggeLiu Jul 30, 2024

DavidKorczynski Aug 5, 2024

DonggeLiu Jul 30, 2024

DavidKorczynski Aug 2, 2024

oliverchang Aug 6, 2024

DonggeLiu Jul 30, 2024

DavidKorczynski Aug 5, 2024

DavidKorczynski commented Jul 30, 2024

DavidKorczynski commented Aug 5, 2024

DavidKorczynski commented Aug 5, 2024

DonggeLiu commented Aug 6, 2024

DonggeLiu left a comment

DonggeLiu Aug 6, 2024

DonggeLiu Aug 6, 2024

DonggeLiu Aug 6, 2024

DavidKorczynski Aug 7, 2024 •

edited

Loading

DonggeLiu Aug 7, 2024

DavidKorczynski Aug 7, 2024

DonggeLiu Aug 6, 2024 •

edited

Loading

oliverchang Aug 6, 2024

DonggeLiu Aug 6, 2024

DavidKorczynski Aug 7, 2024 •

edited

Loading

DonggeLiu Aug 9, 2024

DavidKorczynski Aug 9, 2024

DavidKorczynski commented Aug 7, 2024

DavidKorczynski commented Aug 8, 2024

add build caching option #514

add build caching option #514

Conversation

DavidKorczynski commented Jul 27, 2024 • edited Loading

DavidKorczynski commented Jul 27, 2024

DonggeLiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidKorczynski commented Jul 30, 2024

DavidKorczynski commented Aug 5, 2024

DavidKorczynski commented Aug 5, 2024

DonggeLiu commented Aug 6, 2024

DonggeLiu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidKorczynski Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DonggeLiu Aug 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidKorczynski Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidKorczynski commented Aug 7, 2024

DavidKorczynski commented Aug 8, 2024

DavidKorczynski commented Jul 27, 2024 •

edited

Loading

DavidKorczynski Aug 7, 2024 •

edited

Loading

DonggeLiu Aug 6, 2024 •

edited

Loading

DavidKorczynski Aug 7, 2024 •

edited

Loading