Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git-based provenance artifact #43

Open
qwofford opened this issue Jun 29, 2023 · 1 comment
Open

Git-based provenance artifact #43

qwofford opened this issue Jun 29, 2023 · 1 comment
Assignees

Comments

@qwofford
Copy link
Collaborator

Dockerfiles are used to create provenance for the creation of a container. They are scripts executed to generate a container. There is a similar provenance mechanism for a container run-time script which can be added in a Dockerfile using the CMD prefix. A script added with a CMD prefix is a default script for the container to run. The default script can be ignored to run custom commands. I will propose a new kind of run-time provenance data structure with an intent similar to CMD, but with arbitrary flexibility. It fits within the DSI model (core/plugin/driver).

The Consumer Plugin

This plugin would be a child class of the SystemKernel Environment plugin, targeting a Charliecloud container run-time. The plugin should initialize with a path to a file or a git remote. The input file should use a syntax of known type, similar to a Dockerfile. The first line of the Dockerfile-like file should, by container sha, validate or acquire that the requested container is currently available in storage. The remaining text in this file describes the run-time behavior, similar to CMD. Inside the plugin data structure, this file should be inserted into a new or existing git repo. Alternatively, the plugin can be initialized with a git remote, which pulls down a similar repo. Once the git repo is init'd or downloaded, the git repo should be stored as a key-val pair in the DSI Middleware, perhaps a simple tar of the directory containing .git. This could be a column called "container_run_commands" or similar with a value referencing or literally the git tarball.

This plugin is most useful when paired with a Driver designed to interact with the git repo. The Driver would get and put the git-based run-time artifact to a back-end as usual. The artifact_handler method should set a git remote and pull or push to the remote as specified by the user. The remote should not be stored in the backend. The idea is not to make the user interact with git directly through DSI, but to give them a way to work somewhere more appropriate for highly interactive development, and move that work in and out of DSI easily.

The Producer Plugin

This plugin initializes with a git repo like the one described above, and a commit hash present in that git source tree. This plugin, when transloaded, will validate or pull the container by sha, and execute it with the run-time provenance file stored at the given commit hash. To be clear, the container sha is a reference on the container registry, but the commit hash is a reference to the run-time provenance repo controlled by this plugin.

This plugin is a child of the SystemKernel plugin, so it will inherit have all of the kernel metadata required to ensure run-time compatibility between the system configuration of previous runs and the current system config. It is not clear to me where this checking and error handling should take place yet. Maybe this should just be recorded, to start. If it ends up being useful we can add error handling.

@qwofford qwofford self-assigned this Jun 29, 2023
@qwofford
Copy link
Collaborator Author

Before we spend substantial time on this, we should evaluate this: https://github.com/GoogleCloudPlatform/ramble

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant