-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for secure data movement #256
Comments
Yes, fetching the eventlog as a guest user requires accessing the KVS through the job-info service, which restricts users to data only within their own jobs. E.g. if I try to look at the eventlog of another user's job: $ flux job eventlog -H f445SZmdRv7q
flux-job: flux_job_eventlog_lookup_get: Operation not permitted Neither can I access the KVS directory of the job directly: $ flux kvs dir $(flux job id --to=kvs f445SZmdRv7q)
flux-kvs: job.1240.c1e8.b100.0800: Operation not permitted However, users with |
We do tend to heedlessly copy and paste eventlogs into issues, slack and MM messages since there is not much if anything security significant in there now. I wonder if there's some way we can automatically obscure sensitive information in an eventlog context, kind of like github workflows do in output. Are these secrets going to be the same for every job, or will a new secret be created per job, reducing the impact of a potential compromise? |
The secret will be unique per Workflow. James, is there a 1-1 relationship between Workflow and job? |
There is, yeah. |
Just discussed this issue a bit offline with @garlick and here's a summary of our conclusions: Best practice will be to keep secrets out of the job eventlog and encrypt them. This could be done in stages. First step would be to move any sensitive data from the dws jobtap plugin to a KVS key in the job's kvs directory. This probably includes the random integer in the cray-pals-port-distribution event as well as the workflow token. If the prolog-finish event is delayed until the KVS commit completes, and the keys are in a well known location, then the coral2 shell plugin should no longer need to read the job eventlog. It can fetch the KVS key from the job-info.lookup service, and if this returns ENOENT, then it can be assumed the jobtap plugin was not loaded and can issue the appropriate error. O/w, the kvs key is guaranteed to be present since the job shells are not started until the last prolog-finish event. The first step solves the issue of sensitive data in the eventlog. The second step would be to encode the sensitive data using munge_encode(3) with MUNGE_OPT_UID_RESTRICTION set to the job userid. Then in the job shell, this credential would be decoded after it is fetched from the KVS. There's a wrinkle here in that |
@roehrich-hpe I know you've explained this in our calls but can you explain here how the secret will be used and what for? My understanding is the user will make some library calls or invoke some tool or something, which is going to use the secret in its environment variable to trigger some copy offload action? |
The user's compute application will link with a new libcopyoffload library that is a frontend for libcurl. This library knows how to configure and use libcurl to talk to the new copy-offload server which will be running on the rabbit. The secret is actually a JWT--a token. The libcopyoffload library will use this as the bearer token in its https messages when it communicates with the server. A serialized JWT looks like this (taken directly from https://jwt.io):
The NNF software will generate one token for the Workflow and will store it in a kubernetes "Secret". I'd like to have Flux read the token from the secret and provide it in an environment variable for the user's compute application. If using kubectl, you'd get it this way: |
In the rabbit meeting today we discussed a need for Flux to grab secrets from Kubernetes and export them as environment variables to Flux jobs. As far as I understand it, Flux will need to look for a flag in a workflow (or maybe a directivebreakdown) and then, if the flag is set, grab a secret from Kubernetes and add it as an environment variable.
@grondo this will I think require that the eventlog be secure, so one user can't look at the events for another user's job, because currently flux-coral2 puts the environment variables in the event log for the shell to fetch and set. We already guarantee that security though right?
I remember some months ago there was a discussion (with Kalan I think) about the safety of sharing event logs, but maybe that was just informally, like between us.
The text was updated successfully, but these errors were encountered: