-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 725 #731
base: main
Are you sure you want to change the base?
Issue 725 #731
Conversation
GawyWOOOHOOO
commented
Nov 16, 2024
- Adding a flag to track whether any process was parallelized
- Track execution time for each process
- Modify Schedule.run() to generate feedback messages based on the flag and execution time status.
2. Track execution time for each process 3. Modify Schedule.run() to generate feedback messages based on the flag and execution time status.
OS = |
OS:ubuntu-20.04 |
compiler/pash.py
Outdated
@@ -35,6 +35,9 @@ def main(): | |||
return_code = preprocess_and_execute_asts(input_script_path, args, input_script_arguments, shell_name) | |||
|
|||
log("-" * 40) #log end marker | |||
|
|||
if args.debug >= 1: | |||
log("Use the '-d 1' option for detailed debugging information.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not exactly sure what this is trying to do here.
compiler/pash_compilation_server.py
Outdated
@@ -295,6 +295,11 @@ def compile_and_add(self, compiled_script_file, var_file, input_ir_file): | |||
pass | |||
else: | |||
self.running_procs += 1 | |||
|
|||
if ast_or_ir is not None: | |||
compile_success = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this variable set here? Isn't is set previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have left some comments with questions and here are some more needed changes to get this merged:
- Rebase against
binpash:future
which is the branch of PaSh where we make all new changes. - Add a short example usage of this (for example with an
echo hi
script and with acat README.md | grep "foo"
script indocs/tutorial/tutorial.md
- Add a test to check that this behavior happens (for
echo hi
andcat README.md | grep "foo"
). We need to create a new test category in this script (https://github.com/binpash/pash/blob/future/scripts/run_tests.sh) that we can call api_tests. And the test should check that if pash is invoked on these two scripts, its standard error contains these two messages.
compiler/pash_compilation_server.py
Outdated
|
||
if ast_or_ir is not None: | ||
compile_success = True | ||
if run_parallel: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not indicate whether a fragment of the script was parallelized successfully. compile_success
checks whether a script region was compiled successfully (which means that it was successfully translated into a dataflow graph, which means that we have annotations for all commands in it and the annotations for all commands are pure, parallelizable pure, or stateless). However, we need to also check if there was any parallelization transformation applied (which has to be kept as state and checked further in the compiler (see this function: https://github.com/binpash/pash/blob/future/compiler/pash_compiler.py#L227).
compiler/pash_compilation_server.py
Outdated
@@ -336,9 +341,19 @@ def handle_exit(self, input_cmd): | |||
## Get the execution time | |||
command_finish_exec_time = datetime.now() | |||
command_start_exec_time = self.process_id_input_ir_map[process_id].get_start_exec_time() | |||
exec_time = (command_finish_exec_time - command_start_exec_time) / timedelta(milliseconds=1) | |||
exec_time = (command_finish_exec_time - command_start_exec_time).total_seconds() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changed?
compiler/pash_compilation_server.py
Outdated
log("Process:", process_id, "exited. Exec time was:", exec_time) | ||
self.handle_time_measurement(process_id, exec_time) | ||
|
||
proc_info = self.process_id_input_ir_map[process_id] | ||
if proc_info.compiler_config.width > 1: # Check if it was parallelized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't check if parallelization was successful, but just whether the compiler would even try to parallelize.
@@ -92,6 +92,8 @@ def compile_ir(ir_filename, compiled_script_file, args, compiler_config): | |||
ret = None | |||
try: | |||
ret = compile_optimize_output_script(ir_filename, compiled_script_file, args, compiler_config) | |||
if ret is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this means that there were no parallelization opportunities for the whole script, but only for this region. Also, I think it might be subsumed by the exception handling. Is this code ever called?
compiler/pash_compilation_server.py
Outdated
@@ -414,6 +429,16 @@ def run(self): | |||
self.parse_and_run_cmd(input_cmd) | |||
|
|||
self.connection_manager.close() | |||
if not self.parallelized_flag: | |||
log("No parts of the input script were parallelized. Ensure commands are annotated for parallelization.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages need to be logged no matter the debug level, so they should be given a different level (the way it is now they will only be printed if we have -d 1. Also it would be good to prefix them with [PaSh Warning]. Here is some wordsmithing to make them a bit clearer too:
[PaSh Warning] No region of the script was parallelized. Maybe you are missing relevant annotations? Use
-d 1for more info
.[PaSh Warning] Some script regions were parallelized but their execution times were negligible (<1s). If your script takes a long time maybe annotations are missing from relevant regions. Use
-d 1for more info.
if not self.parallelized_flag: | ||
log("No parts of the input script were parallelized. Ensure commands are annotated for parallelization.") | ||
elif all( | ||
proc_info.exec_time is not None and proc_info.exec_time < 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we checked that this ever actually passes? I am skeptical about the all proc_info.exec_time is not None. Is there anyway one of those is None and this becomes false?
compiler/pash_compilation_server.py
Outdated
for proc_info in self.process_id_input_ir_map.values() | ||
): | ||
log("Some script fragments were parallelized, but their execution times were negligible.") | ||
log("Consider optimizing your script to include longer-running tasks.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this message, this is not really an optimization that we are asking them to do, but rather making sure that they have annotations for the long-running parts that they care about.
compiler/pash_compilation_server.py
Outdated
log("Some script fragments were parallelized, but their execution times were negligible.") | ||
log("Consider optimizing your script to include longer-running tasks.") | ||
else: | ||
log("Parallelization completed successfully.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary, should be deleted.
OS = |
OS:ubuntu-20.04 |
OS:ubuntu-20.04 |
OS = |
OS:ubuntu-20.04 |
OS = |
This needs to be rebased for the future branch BTW :) |