-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add build caching option #514
Conversation
Can confirm from local runs that this saves time. I assume a lot of time will be saved by (1) using OSS-Fuzz bots to generated the containers; (2) extend to coverage as well. |
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this, @DavidKorczynski.
Would you plan to support more projects before merging?
experiment/builder_runner.py
Outdated
@@ -529,12 +529,108 @@ def run_target_local(self, generated_project: str, benchmark_target_name: str, | |||
else: | |||
logger.info(f'Successfully run {generated_project}.') | |||
|
|||
def is_image_cached(self, project_name): | |||
cached_image_name = f'cached_image_{project_name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Normally the original image name from OSS-Fuzz is gcr.io/oss-fuzz/project-name
, hence suffixes might be more suitable for this case (e.g., gcr.io/oss-fuzz/bluez-ofg-cache
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, i've now renamed caches so the images are called e.g.:
gcr.io/oss-fuzz/libraw_coverage_cache
gcr.io/oss-fuzz/libraw_coverage_cache
Note that there is difference between container/image names, and container names have some additional naming limitations. However, we the containers are still temporary and it's the image names we care about.
$CC $CFLAGS -c -std=c99 -I. -I./c/include c/fuzz/decode_fuzzer.c | ||
|
||
$CXX $CXXFLAGS ./decode_fuzzer.o -o $OUT/decode_fuzzer \ | ||
$LIB_FUZZING_ENGINE ./libbrotlidec.a ./libbrotlicommon.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see this works, but an automatic solution is preferred given there are more ~300 projects in total, some of which builds the fuzz target with make
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually fine that make
is used because in most, if not all (am not sure if time sometimes matters), times make
will only recompile the targets that have change, i.e. the harness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, let's keep this as more of an "override" option that's used in if the automatic solution fails.
run_all_experiments.py
Outdated
} | ||
|
||
# Create cached image by building using OSS-Fuzz with set variable | ||
command = ['python3', 'infra/helper.py', 'build_fuzzers', project] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reckon we can build the cache image of all projects every day (if we have their fuzzer_build_script
in your solution, or address the TODOs in mine).
OSS-Fuzz is already building all original projects anyway.
We run multiple experiments every day, no need to rebuild the cache images in every experiment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with that, I say we land it now as is and then do the migration of OSS-Fuzz build images/images build from cloud in the next steps. It's working well locally and I think that's a nice stage
Yep. I'm also planning on supporting caching of code coverage before merging. I was asking for review from a perspective of the architecture. In short, I agree on #514 (comment) and #514 (comment) but thought this would be a good start |
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
@DonggeLiu this has now been achieved. I think the PR is now in a good state to land in that it works well on local runs. next steps will be:
I would prefer to do both of these steps in follow-ups |
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-5439 -m vertex_ai_gemini-1-5 -b comparison |
Signed-off-by: David Korczynski <[email protected]>
Cached image generation is almost ready on the OSS-Fuzz side: google/oss-fuzz#11937 Once merged, we can integrate it into the automatic daily build workflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DavidKorczynski !
experiment/builder_runner.py
Outdated
|
||
def _rewrite_project_to_cached_project(self, generated_project, | ||
sanitizer: str) -> None: | ||
"""Rewrites Dockerfile of a project to enable cached build scripts.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've cleaned up #11937 and documented an alternative way to cache and reuse images.
Could you please have a look and let me know if changes are needed to integrate it better into your code here?
(E.g., maybe no need to set ENV
in Dockerfile
or use OSS_FUZZ_SAVE_CONTAINERS_NAME
?)
Format/Linter/Structure/Coding style suggestions are welcome, too.
Thanks!
experiment/builder_runner.py
Outdated
from_line = line_idx | ||
if line.startswith('COPY'): | ||
copy_fuzzer_line = copy_build_line | ||
copy_build_line = line_idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not 100% sure if this is reliable, because some Dockerfile
s copy other things (e.g., dictionaries, fuzzer options, seeds, or others) which may break this.
Maybe keep all COPY
lines?
Alternatively we keep each file copied if it can be a fuzz target or is the build.sh
, but I am a bit worried that missing *.options
/*.dict
/seeds may cause file not found error during fuzzing.
With that said, if this is only used for the manual examples and none of them has this issue, then it is OK for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case I forget, if we decide to change this loop, could we merge it with the loop below?
E.g., (modify and) add lines to new_content
directly in each condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need COPY
for this, as the dictionaries will already existing due to the containers being a snapshot post the first running of the original Dockerfile, i.e. the seeds/dicts will already exist in the Dockerfile.
However, the new build.sh
will not copy those to $OUT, which is also okay, since we're change the fuzzer and should not use the .options/corpus etc provided by the original harness (rather we should use our own)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation about no need to COPY
dictionary/options.
My primary concern is the first sentence: I meant to say 'The code above does not reliably store the lines that COPY
s fuzzer and build script'.
IIUC, the code assumes copy_build_line
is always the last line starting with COPY
, and the copy_fuzzer_line
is always the second last. The links I added are some counter-examples to show this assumption is unreliable. For example, in the first case, the second last COPY
is COPY sksl.dict $SRC/skia_data/sksl.dict
.
If we use this on more projects later, then projects will be tripped up by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, what these two lines are not commenting out is:
- the
COPY
added by OFG which copies in the new auto-generated harness - the
COPY
added for the adjusted build scriptCOPY adjusted_build.sh $SRC/build.sh
It's always guaranteed to be those two
fuzzer_build_script/bluez
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a later PR, would it be better to store these file content in their benchmark YAML file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or better, can we store these upstream in oss-fuzz instead with some kind of convention? Are these reusable enough such that the upstream build.sh
can call these so there's a single implementation of build script everywhere?
Storing them here will inevitably cause things to diverge over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nice, OSS-Fuzz is definitely a better place.
If the upstream project maintainers do not mind, maybe we can separate this part from build.sh
into a new file:
# Original commands to build the project
...
# Call this script to generate fuzz target binaries.
/src/build_fuzzer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably prefer keeping as is rather than storing in benchmark or oss-fuzz.
I don't think it will be a big burden to maintain these, and if the auto-gen works well then we can most likely get rid of many manual build scripts.
I'm not too much for putting it in OSS-Fuzz as I don't think it will necessarily be intuitive why it's there, and I would probably think it's best not to increase complexity of the fuzzing build setups in OSS-Fuzz. it's kind of mixing things a bit IMO.
I lean towards storing them here for now -- and if after some time it's clear storing them in OSS-Fuzz provides much benefit then we can move them there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For OFG's need, I am not very confident that chronos will work for all projects immediately, so we might need manual scripts for a while.
A benefit of storing them in OSS-Fuzz is that Cloud Build instances can directly use them from OSS-Fuzz without needing to download them from OFG.
@oliverchang Is there any other way that OSS-Fuzz can benefit from these files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either case should work, but I still think it would be, at least for now, good to not land them in OSS-Fuzz projects' folders until we have a stable set up.
I'll look to land this as it's useful as is and then we can adjust the OSS-Fuzz set up in further steps.
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-5440 -m vertex_ai_gemini-1-5 -b comparison |
Signed-off-by: David Korczynski <[email protected]>
/gcbrun exp -n dk-5442 -m vertex_ai_gemini-1-5 -b comparison |
First touch on #499
Depends on: google/oss-fuzz#12284
The way this work is by saving a cached version of
build_fuzzers
post running ofcompile
and then modifying the Dockerfiles of a project to use this cached build image + an adjusted build script.For example, for brotli the Dockerfile is originally:
a Dockerfile is then created which relies on the cached version, and it loosk like:
adjusted_build.sh
is then the script that only builds fuzzers. This means we can also usebuild_fuzzers
/compile
workflows as we know it.More specifically, this PR:
build_fuzzers
, saving the docker container and then commit the docker container to an image. This image will have a projects' build set up post running ofcompile
. This is then used when building fuzzers by OFG.compile
just before compiling the fuzz target so that we can reuse it later. oss-fuzz#11937) but I think it's nice to have both the option of us creating the scripts ourselves + an auto-generated DB.