Updates to Docker worker image entrypoints #405

robertbartel · 2023-07-27T17:51:25Z

Moving entrypoint scripts for NWM/ngen/ngen-cal images to flag-based approach for receiving arguments
Updated scheduler lib's Launcher class to properly generate flag-style Docker CMD arguments when creating job workers
Updating ngen and ngen-cal images to copy configuration data files to output datasets
Other miscellaneous image improvements

Relates to #168.

Closes #268.

aaraney

Overall this looks good, thanks @robertbartel! I left a few minor comments that should be trivial to address.

aaraney · 2023-08-04T16:54:44Z

docker/main/ngen-calibration/entrypoint.sh

+else
+    MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
+fi
+RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"


Should we put this in /var instead?

Hmm. To go that route, I'd want us to add a dedicated directory under /var as part of the image. It would hold job-execution-items like this and be chown-ed to mpiuser. Otherwise we eventually won't be able to create the sentinel file in the container (see #238).

We also aren't really writing anything to this file, just creating, watching, and removing it. We can discuss further, but I'm inclined to leave it as-is for now.

Im totally okay with leave it how it is. The only reason I asked was in the situation that an MPI_USER did not have a /home directory.

docker/main/ngen-calibration/entrypoint.sh

aaraney · 2023-08-04T17:21:01Z

docker/main/ngen/entrypoint.sh

@@ -36,6 +14,7 @@ if [ "$(whoami)" = "${MPI_USER}" ]; then
 else
    MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
 fi
+RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"


Same as above comment. Just marking

Move ngen entrypoint args to use of explicit flags and to use of single composite config dataset arg/name/directory.

Updating functions for generating Docker CMD args to account for a flag-based approach rather than a positional one.

Updating entrypoint.sh to copy various job-related configs into output dataset for record keeping and reproducibility.

Fixing initial sanity checks that use tests and print error messages but were not actually exiting in error, and updating test for WORKER_INDEX to account for any non-integer values properly.

Adding sanity check for received MPI_NODE_COUNT value in ngen worker image entrypoint.sh that ensures both that a value was provided and that it is a positive integer.

- Moving to bash for access to trap command - Move logic for closing of remote workers to function - Add trap on any exit (though specific to when MPI user is active) for remote worker cleanup - Add trap on any exit (though specific to SSHD user is active) for closing sshd process

Copying config files to output datasets for record keeping, and making sure calibration routine uses (i.e., alters) the copy written to output rather than the original.

Updating to use bash shell and to use flag-based approach for args.

Account for ".yml" and ".yaml" extensions when trying to find calibration config file, as well as ignore the case. Co-authored-by: Austin Raney <[email protected]>

aaraney · 2023-08-07T16:42:58Z

docker/main/ngen-calibration/entrypoint.sh

+else
+    MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
+fi
+RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"


Im totally okay with leave it how it is. The only reason I asked was in the situation that an MPI_USER did not have a /home directory.

aaraney · 2023-08-07T16:44:19Z

Merge at will. I had a question above that does not hold up merging this.

robertbartel added enhancement New feature or request maas MaaS Workstream labels Jul 27, 2023

robertbartel requested review from hellkite500 and aaraney July 27, 2023 17:51

robertbartel mentioned this pull request Aug 2, 2023

Cleanup and centralize ngen-related images #408

Merged

aaraney requested changes Aug 4, 2023

View reviewed changes

robertbartel and others added 10 commits August 4, 2023 16:48

Move NWM image entrypoint args to explicit flags.

f5a1c61

Updates to ngen Docker image entrypoint script.

eef9f3a

Move ngen entrypoint args to use of explicit flags and to use of single composite config dataset arg/name/directory.

Update Launcher funcs for generating Docker args.

a10f96f

Updating functions for generating Docker CMD args to account for a flag-based approach rather than a positional one.

Have ngen entrypoint copy configs to output.

17d244f

Updating entrypoint.sh to copy various job-related configs into output dataset for record keeping and reproducibility.

Fix a few sanity checks in ngen image entrypoint.

0fe280b

Fixing initial sanity checks that use tests and print error messages but were not actually exiting in error, and updating test for WORKER_INDEX to account for any non-integer values properly.

Add ngen entrypoint node count sanity check.

5269c9e

Adding sanity check for received MPI_NODE_COUNT value in ngen worker image entrypoint.sh that ensures both that a value was provided and that it is a positive integer.

Copy configs to output in ngen-cal Docker image.

08eabe3

Copying config files to output datasets for record keeping, and making sure calibration routine uses (i.e., alters) the copy written to output rather than the original.

Update ngen-cal Docker image to match ngen.

387abce

Updating to use bash shell and to use flag-based approach for args.

Update docker/main/ngen-calibration/entrypoint.sh

df67199

Account for ".yml" and ".yaml" extensions when trying to find calibration config file, as well as ignore the case. Co-authored-by: Austin Raney <[email protected]>

robertbartel force-pushed the f/update_ngen_docker/update_entrypoint branch from e9d3ccc to df67199 Compare August 4, 2023 22:02

aaraney approved these changes Aug 7, 2023

View reviewed changes

robertbartel merged commit 5b46b8e into NOAA-OWP:master Aug 28, 2023
1 check passed

robertbartel deleted the f/update_ngen_docker/update_entrypoint branch August 28, 2023 12:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to Docker worker image entrypoints #405

Updates to Docker worker image entrypoints #405

robertbartel commented Jul 27, 2023 •

edited

Loading

aaraney left a comment

aaraney Aug 4, 2023

robertbartel Aug 4, 2023

aaraney Aug 7, 2023

aaraney Aug 4, 2023

aaraney Aug 7, 2023

aaraney commented Aug 7, 2023

Updates to Docker worker image entrypoints #405

Updates to Docker worker image entrypoints #405

Conversation

robertbartel commented Jul 27, 2023 • edited Loading

aaraney left a comment

Choose a reason for hiding this comment

aaraney Aug 4, 2023

Choose a reason for hiding this comment

robertbartel Aug 4, 2023

Choose a reason for hiding this comment

aaraney Aug 7, 2023

Choose a reason for hiding this comment

aaraney Aug 4, 2023

Choose a reason for hiding this comment

aaraney Aug 7, 2023

Choose a reason for hiding this comment

aaraney commented Aug 7, 2023

robertbartel commented Jul 27, 2023 •

edited

Loading