Skip to content

Use a bash init script for starting instances on batch. #4786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: oss-fuzz
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions src/clusterfuzz/_internal/google_cloud_utils/batch-init-linux.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#!/usr/bin/env bash
# Copyright 2025 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
set -euox pipefail

# Performance tweaks.
SWAP="/var/swap"
fallocate -l 1G $SWAP
chmod 600 $SWAP
mkswap $SWAP

swapon -a
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo core > /proc/sys/kernel/core_pattern # For AFL.
sysctl -w vm.disk_based_swap=1
sysctl -w vm.swappiness=10
# Disable hung task checking. Otherwise we may incorrectly panic when we use
# up CPU/disk from fuzzing or downloading large builds.
sysctl -w kernel.hung_task_timeout_secs=0

# More config.
useradd --system --home-dir /home/root --uid 1337 clusterfuzz
mkdir -p /home/root /var/scratch0
chown clusterfuzz:clusterfuzz /var/scratch0 /home/root
docker-credential-gcr configure-docker

docker run --rm --net=host \
-v /var/scratch0:/mnt/scratch0 \
--privileged --cap-add=ALL \
--name=clusterfuzz \
--memory-swappiness=40 --shm-size=1.9g --rm --net=host \
-e HOST_UID=1337 -P --privileged --cap-add=all \
-e CLUSTERFUZZ_RELEASE -e UNTRUSTED_WORKER=False -e UWORKER=True \
-e UWORKER_INPUT_DOWNLOAD_URL \
$IMAGE
31 changes: 19 additions & 12 deletions src/clusterfuzz/_internal/google_cloud_utils/batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@
# limitations under the License.
"""Cloud Batch helpers."""
import collections
import functools
import threading
import os
from typing import Dict
from typing import List
from typing import Tuple
Expand All @@ -37,6 +39,8 @@

DEFAULT_RETRY_COUNT = 0

LINUX_INIT_SCRIPT_FILENAME = 'batch-init-linux.bash'

# Controls how many containers (ClusterFuzz tasks) can run on a single VM.
# THIS SHOULD BE 1 OR THERE WILL BE SECURITY PROBLEMS.
TASK_COUNT_PER_NODE = 1
Expand Down Expand Up @@ -122,19 +126,19 @@ def create_uworker_main_batch_jobs(batch_tasks: List[BatchTask]):
return jobs


@functools.cache
def get_linux_linit_script() -> str:
Copy link
Collaborator

@vitorguidi vitorguidi May 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an infrastructure definition, IMO it should live under the clusterfuzz-config repo. More specifically, under configs/batch.

If this lives here, whenever we want to change the script, we will have to open a public PR in a code repo to change an infrastructcure configuration, which is precise one of the problems @javanlacerda is trying to solve with his argocd stuff by moving kubernetes definitions away from here.

Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. would be good to move it there.

@jonathanmetzman is it easy to do this now? otherwise this could be a TODO for now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it isn't hard to move this. Will do

init_script_path = os.path.join(
os.path.dirname(__file__), LINUX_INIT_SCRIPT_FILENAME)
with open(init_script_path, 'r') as f:
return f.read()


def _get_task_spec(batch_workload_spec):
"""Gets the task spec based on the batch workload spec."""
runnable = batch.Runnable()
runnable.container = batch.Runnable.Container()
runnable.container.image_uri = batch_workload_spec.docker_image
clusterfuzz_release = batch_workload_spec.clusterfuzz_release
runnable.container.options = (
'--memory-swappiness=40 --shm-size=1.9g --rm --net=host '
'-e HOST_UID=1337 -P --privileged --cap-add=all '
f'-e CLUSTERFUZZ_RELEASE={clusterfuzz_release} '
'--name=clusterfuzz -e UNTRUSTED_WORKER=False -e UWORKER=True '
'-e UWORKER_INPUT_DOWNLOAD_URL')
runnable.container.volumes = ['/var/scratch0:/mnt/scratch0']
runnable.script = batch.Runnable.Script()
runnable.script = batch.Runnable.Script(text=get_linux_linit_script())
task_spec = batch.TaskSpec()
task_spec.runnables = [runnable]
if batch_workload_spec.retry:
Expand Down Expand Up @@ -195,8 +199,11 @@ def _create_job(spec, input_urls):
task_group.task_count = len(input_urls)
assert task_group.task_count < MAX_CONCURRENT_VMS_PER_JOB
task_environments = [
batch.Environment(variables={'UWORKER_INPUT_DOWNLOAD_URL': input_url})
for input_url in input_urls
batch.Environment(variables={
'CLUSTERFUZZ_RELEASE': spec.clusterfuzz_release,
'UWORKER_INPUT_DOWNLOAD_URL': input_url,
})
for input_url in input_urls
]
task_group.task_environments = task_environments
task_group.task_spec = _get_task_spec(spec)
Expand Down
Loading