Skip to content

Commit

Permalink
Merge pull request #1164 from uc-cdis/docs/alembic
Browse files Browse the repository at this point in the history
Reorganizing fence documentation.
  • Loading branch information
AlbertSnows authored Jul 30, 2024
2 parents 7b0aa60 + d46ed58 commit 58f8164
Show file tree
Hide file tree
Showing 24 changed files with 573 additions and 568 deletions.
561 changes: 35 additions & 526 deletions README.md

Large diffs are not rendered by default.

8 changes: 8 additions & 0 deletions docs/additional_documentation/authorization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@

## Access Control / Authz

Currently fence works with another Gen3 service named
[arborist](https://github.com/uc-cdis/arborist) to implement attribute-based access
control for commons users. The YAML file of access control information (see
[#create-user-access-file](setup.md#create-user-access-file)) contains a section `authz` which are data sent to
arborist in order to set up the access control model.
22 changes: 22 additions & 0 deletions docs/additional_documentation/data_access.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
## Accessing Data

Fence has multiple options that provide a mechanism to access data. The access
to data can be moderated through authorization information in a User Access File.

Users can be provided specific `privilege`'s on `projects` in the User Access
File. A `project` is identified by a unique authorization identifier AKA `auth_id`.

A `project` can be associated with various storage backends that store
object data for that given `project`. You can assign `read-storage` and `write-storage`
privileges to users who should have access to that stored object data. `read` and
`write` allow access to the data stored in a graph database.

Depending on the backend, Fence can be configured to provide users access to
the data in different ways.


### Signed URLS

Temporary signed URLs are supported in all major commercial clouds. Signed URLs are the most 'cloud agnostic' way to allow users to access data located in different platforms.

Fence has the ability to request a specific file by its GUID (globally unique identifier) and retrieve a temporary signed URL for object data in AWS or GCP that will provide direct access to that object.
File renamed without changes.
19 changes: 19 additions & 0 deletions docs/additional_documentation/default_expiration_times.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## Default Expiration Times in Fence

Table contains various artifacts in fence that have temporary lifetimes and their default values.

> NOTE: "SA" in the below table stands for Service Account
| Name | Lifetime | Extendable? | Maximum Lifetime | Details |
|-------------------------------------|--------------|-------------|-----------------------|-----------------------------------------------------------------------------------------------------------------------------|
| Access Token | 20 minutes | TRUE | Life of Refresh Token | |
| Refresh Token | 30 days | FALSE | N/A | |
| User's SA Account Access | 7 days | TRUE | N/A | Access to data (e.g. length it stays in the proxy group). Can optionally provide an expiration less than 7 days |
| User's Google Account Access | 1 day | TRUE | N/A | After AuthN, how long we associate a Google email with the given user. Can optionally provide an expiration less than 1 day |
| User's Google Account Linkage | Indefinite | N/A | N/A | Can optionally provide an expiration less than 1 hour |
| Google Signed URL | Up to 1 hour | FALSE | N/A | Can optionally provide an expiration less than 1 hour |
| AWS Signed URL | Up to 1 hour | FALSE | N/A | Obtained by an oauth client through /credentials/google |
| Client SA (for User) Key | 10 days | FALSE | N/A | Obtained by the user themselves for temp access. Can optionally provide an expiration less than 10 days |
| User Primary SA Key | 10 days | FALSE | N/A | Used for Google URL signing |
| User Primary SA Key for URL Signing | 30 days | FALSE | N/A | |
| Sliding Session Window | 15 minutes | TRUE | 8 hours | access_token cookies get generated automatically when expired if session is still active |
File renamed without changes.
126 changes: 126 additions & 0 deletions docs/additional_documentation/fence_create.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
## fence-create: Automating common tasks with a command line interface

fence-create is a command line utility that is bundled with fence and allows you to automate some commons tasks within fence. For the latest and greatest run the command `fence-create --help`.

WARNING: fence-create directly modifies the database in some cases and may circumvent security checks (most of these utilities are used for testing). BE CAREFUL when you're running these commands and make sure you know what they're doing.


### Register Internal Oauth Client

As a Gen3 commons administrator, if you want to create an oauth client that skips user consent step, use the following command:

```bash
fence-create client-create --client CLIENT_NAME --urls OAUTH_REDIRECT_URL --username USERNAME --auto-approve (--expires-in 30)
```

The optional `--expires-in` parameter allows specifying the number of days until this client expires.

### Register an Implicit Oauth Client

As a Gen3 commons administrator, if you want to create an implicit oauth client for a webapp:

```bash
fence-create client-create --client fancywebappname --urls 'https://betawebapp.example/fence
https://webapp.example/fence' --public --username fancyapp --grant-types authorization_code refresh_token implicit
```

If there are more than one URL to add, use space to delimit them like this:

```bash
fence-create client-create --urls 'https://url1/' 'https://url2/' --client ...
```

To specify allowed scopes, use the `allowed-scopes` argument:
```bash
fence-create client-create ... --allowed-scopes openid user data
```

### Register an Oauth Client for a Client Credentials flow

The OAuth2 Client Credentials flow is used for machine-to-machine communication and scenarios in which typical authentication schemes like username + password do not make sense. The system authenticates and authorizes the app rather than a user. See the [OAuth2 specification](https://www.rfc-editor.org/rfc/rfc6749#section-4.4) for more details.

As a Gen3 commons administrator, if you want to create an OAuth client for a client credentials flow:

```bash
fence-create client-create --client CLIENT_NAME --grant-types client_credentials (--expires-in 30)
```

This command will return a client ID and client secret, which you can then use to obtain an access token:

```bash
curl --request POST https://FENCE_URL/oauth2/token?grant_type=client_credentials -d scope="openid user" --user CLIENT_ID:CLIENT_SECRET
```

The optional `--expires-in` parameter allows specifying the number of *days* until this client expires. The recommendation is to rotate credentials with the `client_credentials` grant at least once a year (see [Rotate client credentials](#rotate-client-credentials) section).

NOTE: In Gen3, you can grant specific access to a client the same way you would to a user. See the [user.yaml guide](https://github.com/uc-cdis/fence/blob/master/docs/user.yaml_guide.md) for more details.

NOTE: Client credentials tokens are not linked to a user (the claims contain no `sub` or `context.user.name` like other tokens). Some Gen3 endpoints that assume the token is linked to a user, or whose logic require there being a user, do not support them. For an example of how to adapt an endpoint to support client credentials tokens, see [here](https://github.com/uc-cdis/requestor/commit/a5078fae27fa258ac78045cf2bb89cb2104f53cf). For an example of how to explicitly reject client credentials tokens, see [here](https://github.com/uc-cdis/requestor/commit/0f4974c25343d2185c7cdb48dcdeb58f97800672).

### Modify OAuth Client

```bash
fence-create client-modify --client CLIENT_NAME --urls http://localhost/api/v0/oauth2/authorize
```

That command should output any modifications to the client. Similarly, multiple URLs are
allowed here too.

Add `--append` argument to add new callback urls or allowed scopes to existing client (instead of replacing them) using `--append --urls` or `--append --allowed-scopes`
```bash
fence-create client-modify --client CLIENT_NAME --urls http://localhost/api/v0/new/oauth2/authorize --append (--expires-in 30)
```

### Rotate client credentials

Use the `client-rotate` command to receive a new set of credentials (client ID and secret) for a client. The old credentials are NOT deactivated and must be deleted or expired separately (see [Delete Expired OAuth Clients](#delete-expired-oauth-clients) section). This allows for a rotation without downtime.

```bash
fence-create client-rotate --client CLIENT_NAME (--expires-in 30)
```

Note that the `usersync` job must be run after rotating the credentials so that the new client ID is granted the same access as the old one.

### Delete OAuth Client

```bash
fence-create client-delete --client CLIENT_NAME
```
That command should output the result of the deletion attempt.

### Delete Expired OAuth Clients

```bash
fence-create client-delete-expired
```

To post a warning in Slack about any clients that expired or are about to expire:

```bash
fence-create client-delete-expired --slack-webhook <url> --warning-days <default 7: only post about clients expiring in under 7 days>
```


### List OAuth Clients

```bash
fence-create client-list
```
That command should output the full records for any registered OAuth clients.

### Set up for External Buckets on Google

```bash
fence-create link-external-bucket --bucket-name demo-bucket
fence-create link-bucket-to-project --bucket_id demo-bucket --bucket_provider google --project_auth_id test-project
```

The link-external-bucket returns an email for a Google group which needs to be added to access to the bucket `demo-bucket`.

### Notify users who are blocking service account registration

```bash
fence-create notify-problem-users --emails [email protected] [email protected] --auth_ids test --google_project_id test-google
```

`notify-problem-users` emails users in the provided list (can be fence user email or linked google email) who do not have access to any of the auth_ids provided. Also accepts a `check_linking` flag to check that each user has linked their google account.
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ The `/login/shib` endpoint accepts the query parameter `shib_idp`. Fence checks

After the user logs in and is redirected to `/login/shib/login`, we get the `eppn` (EduPerson Principal Name) from the request headers to use as username. If the `eppn` is not available, we use the `persistent-id` (or `cn`) instead.

![Shibboleth Login Flow](images/seq_diagrams/shibboleth_flow.png)
![Shibboleth Login Flow](../images/seq_diagrams/shibboleth_flow.png)

Notes about the NIH login implementation:
- NIH login is used as the default when the `idp` is fence and no `shib_idp` is specified (for backwards compatibility).
Expand All @@ -32,7 +32,7 @@ Notes about the NIH login implementation:

### In the multi-tenant Fence instance

The [Shibboleth dockerfile](../DockerfileShib) image is at https://quay.io/repository/cdis/fence-shib and is NOT compatible yet with python 3/the latest Fence (for now, use Fence 2.7.x).
The [Shibboleth dockerfile](../../DockerfileShib) image is at https://quay.io/repository/cdis/fence-shib and is NOT compatible yet with python 3/the latest Fence (for now, use Fence 2.7.x).

The deployment only includes `revproxy` and `fenceshib`. The Fence configuration enables the `shibboleth` provider:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,19 +25,19 @@ References:

This shows external DRS Client(s) communicating with Gen3 Framework Services (as a GA4GH DRS Server) and how G3FS interacts with Passport Brokers to validate and verify JWTs.

![Passport and Visa JWT Handling](images/ga4gh/passport_jwt_handling.png)
![Passport and Visa JWT Handling](../images/ga4gh/passport_jwt_handling.png)

## G3FS: Configurable Roles for Data Access

Gen3 Framework Services are capable of acting in many different roles. As data repositories (or DRS Servers in GA4GH terminology), as authorization decision makers (GA4GH Claims Clearinghouses), and/or as token issuers (GA4GH Passport Brokers). G3FS is also capable of being a client to other Passport Brokers. G3FS must be a client to an upstream Identity Provider (IdP) as it does not ever store user passwords but relies on authentication from another trusted source.

In order to describe the role of the passport in these various configurations, the following diagrams may help.

![Gen3 as DRS Server](images/ga4gh/gen3_as_drs.png)
![Gen3 as DRS Server](../images/ga4gh/gen3_as_drs.png)

![Gen3 as Client](images/ga4gh/gen3_as_client.png)
![Gen3 as Client](../images/ga4gh/gen3_as_client.png)

![Gen3 as Both](images/ga4gh/gen3_as_client_and_drs_server.png)
![Gen3 as Both](../images/ga4gh/gen3_as_client_and_drs_server.png)

## Performance Improvements

Expand All @@ -52,22 +52,22 @@ We added a number of things to mitigate the performance impact on researchers' w

To illustrate the need for such a cache, see the images below for before and after.

![Before Caching](images/ga4gh/caching_before.png)
![Before Caching](../images/ga4gh/caching_before.png)

![After Caching](images/ga4gh/caching_after.png)
![After Caching](../images/ga4gh/caching_after.png)

## User Identities

Different GA4GH Visas may refer to the same subject differently. In order to maintain the known mappings between different representations of the same identity, we are creating an Issuer+Subject to User mapping table. The primary key on this table is the combination of the `iss` and `sub` from JWTs.

![User Identities](images/ga4gh/users.png)
![User Identities](../images/ga4gh/users.png)

## Backend Updates and Expiration

In order to ensure the removal of access at the right time, the cronjobs we have are updated based on the figure and notes below. We are requiring movement away from the deprecated, legacy, limited Fence authorization support in favor of the new policy engine (which allows expiration of policies out of the box).

There is an argument here for event-based architecture, but Gen3 does not currently support such an architecture. We are instead extending the support of our cronjobs to ensure expirations occur at the right time.

![Cronjobs and Expirations](images/ga4gh/expiration.png)
![Cronjobs and Expirations](../images/ga4gh/expiration.png)

> _All diagrams are originally from an **internal** CTDS Document. The link to that document is [here](https://lucid.app/lucidchart/5c52b868-5cd2-4c6e-b53b-de2981f7da98/edit?invitationId=inv_9a757cb1-fc81-4189-934d-98c3db06d2fc) for internal people who need to edit the above diagrams._
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ To support the 3 methods of access mentioned above, we have a generic architectu

That architecture involves Google's concept of **groups** and use of their **IAM Policies** in the Google Cloud Platform. The following diagram shows the layers between the user themselves and the bucket.

![Google Access Architecture](images/g_architecture.png)
![Google Access Architecture](../images/g_architecture.png)

Working backwards from the Google Bucket itself, we have a **Google Bucket Access Group**, which, as you probably guessed, is a Google Group that provides access to the bucket. That group is assigned a **role** on the Google **resource** (the Google Bucket). **Roles** provide a set of permissions (like read privileges). The combinations of those roles on the bucket become the bucket's **Policy**. You can read more about Google's IAM terms and concepts in [their docs](https://cloud.google.com/iam/docs).

Expand All @@ -46,7 +46,7 @@ Google groups contain **members** (another Google term) and a Google group can b

A more representative diagram of the structures that allow users to get access to the buckets may look something like this:

![Representative Google Access Architecture](images/rep_g_architecture.png)
![Representative Google Access Architecture](../images/rep_g_architecture.png)

#### User's Proxy Group

Expand Down Expand Up @@ -169,7 +169,7 @@ In the above script, `google-project-to-bill` is either the `userProject` provid

Fence facilitates the creation of Signed URLs to access Google Storage objects. These URLs provide temporary, authenticated, access to anyone with the URL but must be generated by someone who has access.

![Signed URLs](images/signed_urls.png)
![Signed URLs](../images/signed_urls.png)

Design Requirements:

Expand All @@ -195,7 +195,7 @@ This allows clients to manage their temporary credentials without the chance of

Each Client Service Account is a member in the User's Proxy Group, meaning it has the same access that the user themselves have.

![Temporary Service Account Credentials](images/g_sa_creds.png)
![Temporary Service Account Credentials](../images/g_sa_creds.png)

> WARNING: By default, Google Service Account Keys have an expiration of 10 years. To create a more manageable and secure expiration you must manually "expire" the keys by deleting them with a cronjob (once they are alive longer than a configured expiration). Fence's command line tool `fence-create` has a function for expiring keys that you should run on a schedule. Check out `fence-create google-manage-keys --help`
Expand Down Expand Up @@ -229,7 +229,7 @@ A user logs into fence with their eRA Commons ID. To get access to data through

Google Account Linking is achieved by sending the user through the beginning of the OIDC flow with Google. The user is redirected to a Google Login page and whichever account they successfully log in to becomes linked to their fence identity.

![Google Account Linking](images/g_accnt_link.png)
![Google Account Linking](../images/g_accnt_link.png)

We require the user to log in so that we can authenticate them and only link an account they actually own.

Expand All @@ -239,7 +239,7 @@ Once linked, the user's Google Account is then placed *temporarily* inside their
At the moment, the *link* between the User and their Google Account does not expire. The access to data *does* expire though. Explicit refreshing of access must be done by an authenticated user or valid client with those permissions through Fence's API.

![Google Account Linking After Expiration](images/g_accnt_link_2.png)
![Google Account Linking After Expiration](../images/g_accnt_link_2.png)

#### Service Account Registration

Expand Down Expand Up @@ -312,7 +312,7 @@ The Service Accounts are validated first in the cronjob so that if multiple SA's

This diagram shows a single Google Project with 3 users (`UserA`, `UserB`, and `UserC`). All of them have already gone through the linking process with fence to associate their Google Account with their fence identity.

![Service Account Registration](images/sa_reg.png)
![Service Account Registration](../images/sa_reg.png)

The project service account, `Service Account A`, has been registered for access to a fence `Project` which has data in `Bucket Y`. The service account is given access by placing it *directly in the Google Bucket Access Group*.

Expand All @@ -326,6 +326,6 @@ The user must request fence `Projects` that the service account should have acce

If someone attempting to register `Service Account A` with fence `Projects` that have data in *both* `Bucket X` and `Bucket Y`, registration will fail. Why? Because not every user in the Google Project have access to that data.

![Service Account Registration](images/sa_invalid_reg.png)
![Service Account Registration](../images/sa_invalid_reg.png)

---
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit 58f8164

Please sign in to comment.