Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to Docker worker image entrypoints #405

Merged

Conversation

robertbartel
Copy link
Contributor

@robertbartel robertbartel commented Jul 27, 2023

  • Moving entrypoint scripts for NWM/ngen/ngen-cal images to flag-based approach for receiving arguments
  • Updated scheduler lib's Launcher class to properly generate flag-style Docker CMD arguments when creating job workers
  • Updating ngen and ngen-cal images to copy configuration data files to output datasets
  • Other miscellaneous image improvements

Relates to #168.

Closes #268.

@robertbartel robertbartel added enhancement New feature or request maas MaaS Workstream labels Jul 27, 2023
Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good, thanks @robertbartel! I left a few minor comments that should be trivial to address.

else
MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
fi
RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put this in /var instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. To go that route, I'd want us to add a dedicated directory under /var as part of the image. It would hold job-execution-items like this and be chown-ed to mpiuser. Otherwise we eventually won't be able to create the sentinel file in the container (see #238).

We also aren't really writing anything to this file, just creating, watching, and removing it. We can discuss further, but I'm inclined to leave it as-is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im totally okay with leave it how it is. The only reason I asked was in the situation that an MPI_USER did not have a /home directory.

docker/main/ngen-calibration/entrypoint.sh Outdated Show resolved Hide resolved
@@ -36,6 +14,7 @@ if [ "$(whoami)" = "${MPI_USER}" ]; then
else
MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
fi
RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment. Just marking

robertbartel and others added 10 commits August 4, 2023 16:48
Move ngen entrypoint args to use of explicit flags and to use of single
composite config dataset arg/name/directory.
Updating functions for generating Docker CMD args to account for a
flag-based approach rather than a positional one.
Updating entrypoint.sh to copy various job-related configs into output
dataset for record keeping and reproducibility.
Fixing initial sanity checks that use tests and print error messages but
were not actually exiting in error, and updating test for WORKER_INDEX
to account for any non-integer values properly.
Adding sanity check for received MPI_NODE_COUNT value in ngen worker
image entrypoint.sh that ensures both that a value was provided and
that it is a positive integer.
- Moving to bash for access to trap command
- Move logic for closing of remote workers to function
- Add trap on any exit (though specific to when MPI user is active) for
  remote worker cleanup
- Add trap on any exit (though specific to SSHD user is active) for
  closing sshd process
Copying config files to output datasets for record keeping, and making
sure calibration routine uses (i.e., alters) the copy written to output
rather than the original.
Updating to use bash shell and to use flag-based approach for args.
Account for ".yml" and ".yaml" extensions when trying to find calibration config file, as well as ignore the case.

Co-authored-by: Austin Raney <[email protected]>
@robertbartel robertbartel force-pushed the f/update_ngen_docker/update_entrypoint branch from e9d3ccc to df67199 Compare August 4, 2023 22:02
else
MPI_HOSTS_FILE="$(su ${MPI_USER} -c 'echo "${HOME}"')/.mpi_hosts"
fi
RUN_SENTINEL="/home/${MPI_USER}/.run_sentinel"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im totally okay with leave it how it is. The only reason I asked was in the situation that an MPI_USER did not have a /home directory.

@aaraney
Copy link
Member

aaraney commented Aug 7, 2023

Merge at will. I had a question above that does not hold up merging this.

@robertbartel robertbartel merged commit 5b46b8e into NOAA-OWP:master Aug 28, 2023
1 check passed
@robertbartel robertbartel deleted the f/update_ngen_docker/update_entrypoint branch August 28, 2023 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request maas MaaS Workstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update entrypoint param logic and Launcher arg assembly for ngen and ngen-calibration images
2 participants