Scalability test wall clock (#239)

* add gpu utilization decorator and begin work on plots * add decorator for gpu energy utilization * Added config option to hpo script, styling (#235) * Update README.md * Update README.md * Update createEnvVega.sh * remove unused dist file * run black and isort to fix linting errors * temporary changes * remove redundant variable * add absolute time plot * remove trailing whitespace * remove redundant variable * remove trailing whitespace * begin implementation of backup * fix issues from PR * fix issues from PR * add backup to gpu monitoring * fix import in eurac trainer * cleanup backup mechanism slightly * fix linting errors * update logging directory and pattern * update default pattern for gpu energy plots * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * fix import in eurac trainer * fix linting errors * update logging directory and pattern * update default pattern for gpu energy plots * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * begin implementation of backup * add backup to gpu monitoring * add backup functionality to communication plot * rewrite epochtimetracker and refactor scalability plot code * cleanup scalability plot code * updating some epochtimetracker dependencies * add configurable and dynamic wait and warmup times for the profiler * temporary changes * add absolute time plot * begin implementation of backup * add backup to gpu monitoring * cleanup backup mechanism slightly * fix isort linting * add support for none pattern and general cleanup * fix linting errors with black and isort * begin implementation of backup * add backup functionality to communication plot * rewrite epochtimetracker and refactor scalability plot code * cleanup scalability plot code * updating some epochtimetracker dependencies * fix linting errors * fix more linting errors * add utilization percentage plot * run isort for linting * update default save path for metrics * add decorators to virgo and some cleanup * add contributions and cleanup * fix linting errors * change 'credits' to 'credit' * update communication plot style * update function names * update scalability function for a more streamlined approach * run isort * move horovod import * fix linting errors * add contributors --------- Co-authored-by: Anna Lappe <[email protected]> Co-authored-by: Matteo Bunino <[email protected]>
interTwin-eu · Nov 20, 2024 · b2ceb4f · b2ceb4f
1 parent 1a34203
commit b2ceb4f
Showing 1 changed file with 11 additions and 12 deletions.
diff --git a/src/itwinai/torch/profiling/communication_plot.py b/src/itwinai/torch/profiling/communication_plot.py
@@ -142,39 +142,38 @@ def communication_overhead_stacked_bar_plot(
     return fig, ax
 
 
->>>>>> > d538510(Gpu monitoring(  # 237))
 def get_comp_fraction_full_array(
-    df: pd.DataFrame, print_table: bool=False
+    df: pd.DataFrame, print_table: bool = False
 ) -> np.ndarray:
     """Creates a MxN NumPy array where M is the number of strategies and N is the
     number of GPU configurations. The strategies are sorted alphabetically and the GPU
     configurations are sorted in ascending number of GPUs.
     """
-    unique_num_gpus=sorted(df["num_gpus"].unique(), key=lambda x: int(x))
-    unique_strategies=sorted(df["strategy"].unique())
-    values=[]
+    unique_num_gpus = sorted(df["num_gpus"].unique(), key=lambda x: int(x))
+    unique_strategies = sorted(df["strategy"].unique())
+    values = []
 
-    table_string=""
+    table_string = ""
 
     for strategy in unique_strategies:
-        strategy_values=[]
+        strategy_values = []
         for num_gpus in unique_num_gpus:
-            filtered_df=df[
+            filtered_df = df[
                 (df["strategy"] == strategy) & (df["num_gpus"] == num_gpus)
             ]
 
-            row_string=f"{strategy:>12} | {num_gpus:>10}"
+            row_string = f"{strategy:>12} | {num_gpus:>10}"
 
             # Allows some strategies or num GPUs to not be included
             if len(filtered_df) == 0:
-                comp_time, comm_time=np.NaN, np.NaN
+                comp_time, comm_time = np.NaN, np.NaN
                 strategy_values.append(np.NaN)
 
                 row_string += f" | {'(NO DATA)':>15}"
             else:
-                comp_time, comm_time=calculate_comp_and_comm_time(df=filtered_df)
+                comp_time, comm_time = calculate_comp_and_comm_time(df=filtered_df)
                 # Avoid division-by-zero errors (1e-10)
-                comp_fraction=comp_time / (comp_time + comm_time + 1e-10)
+                comp_fraction = comp_time / (comp_time + comm_time + 1e-10)
                 strategy_values.append(comp_fraction)
 
                 row_string += f" | {comp_time:>8.2f}s"