Document use of SQL Runner #16

iaindillingham · 2022-09-28T11:05:33Z

We need to document who should use SQL Runner, when, and why. This documentation should replace this playbook, which describes how we update the database notebooks and how we query L2, more generally.

When we have documented the use of SQL Runner, we may wish to enforce it by, for example, requiring code reviews on the repositories; or permitting some team members but not others to execute the studies. There should be separate issues for these tasks.

In this Slack thread, which relates to the database notebooks, Will emphasizes that:

there's no dummy data, so development tends to be done within a backend;¹
there's no study definition to provide some protection against disclosure;
external collaborator and policy checks are less relevant to the database notebooks than to a study, because the database notebooks are created for OpenSAFELY users.

Consequently, the database notebooks aren't fully covered by the publication process for automated reports.

Will mentions that development tends to be done within L2 and L3, where L2 is the database and L3 is a "highly sensitive" workspace. With SQL Runner, development will tend to be done within L3 and L4, where L4 is a "moderately sensitive" workspace. ↩

inglesp · 2022-10-03T12:01:19Z

there's no dummy data, so development tends to be done within a backend;

Note that with latest ideas for dummy data generation, this might one day be possible.

iaindillingham · 2022-10-03T14:03:59Z

I think that the "latest ideas for dummy data generation" are that Data Builder could write a directory of CSV files that represent L2-like data (Slack thread). We could point SQL Runner at these files, to reduce the need to do development within a backend.

A study would probably include a Data Builder action in the pipeline, even though we'd not run that action in a backend: we'd run that action locally, when doing development locally.

iaindillingham · 2022-10-19T11:45:37Z

We discussed this issue during today's team meeting; our discussion was prompted by this Slack thread.

Summary of the Slack thread

My straw man proposal was "Don't use SQL Runner for new projects". Peter and I followed this up with "For each new project (i.e. a project that isn't a translation of an existing project), write a short summary and explain why the new project needs to use SQL Runner. Ask for sign off on the short summary before starting work on the new project". Dave suggested that people with L2 (i.e. database) access shouldn't need to write a short summary and ask for sign off, because they already had the means and permission to query the database directly.

Summary of the discussion

The short-term need is to clarify who should use SQL Runner, when, and why. The (potential) medium-term need is to clarify who should have access to L2, L3, and L4. Ideally, we'd like only people with L2 access to be able to use SQL Runner; this would make SQL Runner an effective replacement for querying the database directly.

However...

We don't think there's a clear mapping between access and role. This makes it hard to say "This role has this access". (Our access policies describe two roles: researcher and platform developer. Are there other roles?)
People with L3 access can inspect outputs in all workspaces. In some cases, SQL Runner outputs may need to be restricted to only people with L2 access. How many people have L3 access but don't have L2 access?
Outputs marked as highly sensitive are written to L3; outputs marked as moderately sensitive are written to L4. However, job-runner doesn't enforce that cohort-extractor outputs, which are often highly sensitive,¹ are marked as such. Similarly, if we required SQL Runner outputs to be marked highly sensitive, then job-runner wouldn't enforce our requirement.

Existing documentation:

Our access policies describe what researchers and platform developers can do.
Our publication process describes how results should be released. SQL Runner fits somewhere between a written report (first run) and an automated report (subsequent runs), with caveats about external collaborator and policy checks given above.

Straw man proposal

If you have L2 access, then you can use SQL Runner. However, you should be very mindful that the results will be written to L3 and so may be inspected by people who don’t also have L2 access.

Are measures moderately sensitive? ↩

iaindillingham · 2022-10-21T12:25:13Z

We discussed this in the tech group meeting on Thursday 20th October and also in a meeting with Alex. I've written up some notes.

iaindillingham · 2022-10-26T08:10:54Z

@CLStables and I are going to write a one-pager to accompany this value statement, which encompasses the above discussion:

Fix the current confusion about data access related to roles/purposes so that the whole team are clear on this and we can demonstrate that all access and use of tools (including SQL Runner) is aligned with our policies.

iaindillingham · 2022-11-07T09:57:02Z

This Google doc contains the one-pager. I'm going to mark this issue as closed, as the focus moves from this issue to that doc.

iaindillingham mentioned this issue Oct 3, 2022

Replace notebooks with pipelines that use SQL Runner #9

Closed

iaindillingham changed the title ~~Document use of sqlrunner~~ Document use of SQL Runner Oct 3, 2022

iaindillingham self-assigned this Oct 3, 2022

iaindillingham closed this as completed Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document use of SQL Runner #16

Document use of SQL Runner #16

iaindillingham commented Sep 28, 2022 •

edited

Loading

inglesp commented Oct 3, 2022 •

edited by iaindillingham

Loading

iaindillingham commented Oct 3, 2022

iaindillingham commented Oct 19, 2022 •

edited

Loading

iaindillingham commented Oct 21, 2022

iaindillingham commented Oct 26, 2022

iaindillingham commented Nov 7, 2022

Document use of SQL Runner #16

Document use of SQL Runner #16

Comments

iaindillingham commented Sep 28, 2022 • edited Loading

Footnotes

inglesp commented Oct 3, 2022 • edited by iaindillingham Loading

iaindillingham commented Oct 3, 2022

iaindillingham commented Oct 19, 2022 • edited Loading

Summary of the Slack thread

Summary of the discussion

Straw man proposal

Footnotes

iaindillingham commented Oct 21, 2022

iaindillingham commented Oct 26, 2022

iaindillingham commented Nov 7, 2022

iaindillingham commented Sep 28, 2022 •

edited

Loading

inglesp commented Oct 3, 2022 •

edited by iaindillingham

Loading

iaindillingham commented Oct 19, 2022 •

edited

Loading