Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Service rewrite #99

Draft
wants to merge 123 commits into
base: main
Choose a base branch
from
Draft

Conversation

1ntEgr8
Copy link
Contributor

@1ntEgr8 1ntEgr8 commented Nov 4, 2024

Depends on #97, hence the larger than expected diff.

1ntEgr8 and others added 30 commits September 23, 2024 08:43
Dhruv Garg and others added 19 commits December 4, 2024 01:21
Fixes an issue where if start-master.sh or start-worker.sh exits with a
nonzero code, or more generally if an exception happens in
Service.__enter__(), run_service_experiments.py hangs and doesn't report
the exception.
When the last application is deregistered from the spark service,
execute all remaining events from the simulator.  This allows the
final LOG_STATS event to be processed so we can calculate the SLO
attainment.

Unlike normal runs of the simulator, a SIMULATOR_END event is not
inserted as some tasks might not have finished in the simulator and it's
unclear when they will finish.  The simulator is patched to allow an
empty event queue in Simulator.simulate().
On a TASK_FINISH event, set the task completion time to the time of
the event rather than the last time the task was stepped.  Resolves a
bug in the service where tasks that finish later than the simulator's
profiled runtime predicts get assigned the wrong completion time.
We found that deadlines for task graphs weren't consistent between the
simulator and Spark even with the same RNG seed being used, due to the
fact that EventTime keeps a global RNG it uses for all of its fuzzing
and both deadlines and runtime variances are fuzzed.  Since in
simulator runs, task deadlines are all calculated at the start and
runtime variances are calculated later, and in Spark, task deadlines
and runtime variances are calculated throughout the experiment
lifecycle, different deadline variances are obtained between simulator
and Spark runs on the same experiment.

Our solution is to pass a unique RNG used just for calculating
deadline variances to the fuzzer.  This RNG is hardcoded with a seed
of 42; this is fine for experiments but it should probably be changed
to the random_seed command line flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants