Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document use of SQL Runner #16

Closed
iaindillingham opened this issue Sep 28, 2022 · 6 comments
Closed

Document use of SQL Runner #16

iaindillingham opened this issue Sep 28, 2022 · 6 comments
Assignees

Comments

@iaindillingham
Copy link
Member

iaindillingham commented Sep 28, 2022

We need to document who should use SQL Runner, when, and why. This documentation should replace this playbook, which describes how we update the database notebooks and how we query L2, more generally.

When we have documented the use of SQL Runner, we may wish to enforce it by, for example, requiring code reviews on the repositories; or permitting some team members but not others to execute the studies. There should be separate issues for these tasks.

In this Slack thread, which relates to the database notebooks, Will emphasizes that:

  • there's no dummy data, so development tends to be done within a backend;1
  • there's no study definition to provide some protection against disclosure;
  • external collaborator and policy checks are less relevant to the database notebooks than to a study, because the database notebooks are created for OpenSAFELY users.

Consequently, the database notebooks aren't fully covered by the publication process for automated reports.

Footnotes

  1. Will mentions that development tends to be done within L2 and L3, where L2 is the database and L3 is a "highly sensitive" workspace. With SQL Runner, development will tend to be done within L3 and L4, where L4 is a "moderately sensitive" workspace.

@iaindillingham iaindillingham changed the title Document use of sqlrunner Document use of SQL Runner Oct 3, 2022
@iaindillingham iaindillingham self-assigned this Oct 3, 2022
@inglesp
Copy link
Contributor

inglesp commented Oct 3, 2022

there's no dummy data, so development tends to be done within a backend;

Note that with latest ideas for dummy data generation, this might one day be possible.

@iaindillingham
Copy link
Member Author

I think that the "latest ideas for dummy data generation" are that Data Builder could write a directory of CSV files that represent L2-like data (Slack thread). We could point SQL Runner at these files, to reduce the need to do development within a backend.

A study would probably include a Data Builder action in the pipeline, even though we'd not run that action in a backend: we'd run that action locally, when doing development locally.

@iaindillingham
Copy link
Member Author

iaindillingham commented Oct 19, 2022

We discussed this issue during today's team meeting; our discussion was prompted by this Slack thread.

Summary of the Slack thread

My straw man proposal was "Don't use SQL Runner for new projects". Peter and I followed this up with "For each new project (i.e. a project that isn't a translation of an existing project), write a short summary and explain why the new project needs to use SQL Runner. Ask for sign off on the short summary before starting work on the new project". Dave suggested that people with L2 (i.e. database) access shouldn't need to write a short summary and ask for sign off, because they already had the means and permission to query the database directly.

Summary of the discussion

The short-term need is to clarify who should use SQL Runner, when, and why. The (potential) medium-term need is to clarify who should have access to L2, L3, and L4. Ideally, we'd like only people with L2 access to be able to use SQL Runner; this would make SQL Runner an effective replacement for querying the database directly.

However...

  • We don't think there's a clear mapping between access and role. This makes it hard to say "This role has this access". (Our access policies describe two roles: researcher and platform developer. Are there other roles?)

  • People with L3 access can inspect outputs in all workspaces. In some cases, SQL Runner outputs may need to be restricted to only people with L2 access. How many people have L3 access but don't have L2 access?

  • Outputs marked as highly sensitive are written to L3; outputs marked as moderately sensitive are written to L4. However, job-runner doesn't enforce that cohort-extractor outputs, which are often highly sensitive,1 are marked as such. Similarly, if we required SQL Runner outputs to be marked highly sensitive, then job-runner wouldn't enforce our requirement.

Existing documentation:

  • Our access policies describe what researchers and platform developers can do.
  • Our publication process describes how results should be released. SQL Runner fits somewhere between a written report (first run) and an automated report (subsequent runs), with caveats about external collaborator and policy checks given above.

Straw man proposal

If you have L2 access, then you can use SQL Runner. However, you should be very mindful that the results will be written to L3 and so may be inspected by people who don’t also have L2 access.

Footnotes

  1. Are measures moderately sensitive?

@iaindillingham
Copy link
Member Author

We discussed this in the tech group meeting on Thursday 20th October and also in a meeting with Alex. I've written up some notes.

@iaindillingham
Copy link
Member Author

@CLStables and I are going to write a one-pager to accompany this value statement, which encompasses the above discussion:

Fix the current confusion about data access related to roles/purposes so that the whole team are clear on this and we can demonstrate that all access and use of tools (including SQL Runner) is aligned with our policies.

@iaindillingham
Copy link
Member Author

This Google doc contains the one-pager. I'm going to mark this issue as closed, as the focus moves from this issue to that doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants