Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Open Policy Agent #19532

Merged
merged 1 commit into from
Jan 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions core/trino-server/src/main/provisio/trino.xml
Original file line number Diff line number Diff line change
Expand Up @@ -325,4 +325,10 @@
<unpack />
</artifact>
</artifactSet>

<artifactSet to="plugin/opa">
<artifact id="${project.groupId}:trino-opa:zip:${project.version}">
<unpack />
</artifact>
</artifactSet>
</runtime>
247 changes: 247 additions & 0 deletions plugin/trino-opa/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,247 @@
# trino-opa

This plugin enables Trino to use Open Policy Agent (OPA) as an authorization engine.

For more information on OPA, please refer to the Open Policy Agent [documentation](https://www.openpolicyagent.org/).

> While every attempt will be made to keep backwards compatibility, this plugin is a recent addition
> and as such the API may change.

## Configuration

You will need to configure Trino to use the OPA plugin as its access control engine, then configure the
plugin to contact your OPA endpoint.

`config.properties` - **enabling the plugin**:

Make sure to enable the plugin by configuring Trino to pull in the relevant config file for the OPA
authorizer, e.g.:

```properties
access-control.config-files=/etc/trino/access-control-file-based.properties,/etc/trino/access-control-opa.properties
```

`access-control-opa.properties` - **configuring the plugin**:

Set the access control name to `opa` and specify the policy URI, for example:

```properties
access-control.name=opa
opa.policy.uri=https://your-opa-endpoint/v1/data/allow
```

If you also want to enable the _batch_ mode (see [Batch mode](#batch-mode)), you must additionally set up an
`opa.policy.batched-uri` configuration entry.

> Batch mode is _not_ a replacement for the "main" URI. The batch mode is _only_
> used for certain authorization queries where batching is applicable. Even when using
> `opa.policy.batched-uri`, you _must_ still provide an `opa.policy.uri`

For instance:

```properties
access-control.name=opa
opa.policy.uri=https://your-opa-endpoint/v1/data/allow
opa.policy.batched-uri=https://your-opa-endpoint/v1/data/batch
```

### All configuration entries

| Configuration name | Required | Default | Description |
|----------------------------------------------|:--------:|:-------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `opa.policy.uri` | Yes | N/A | Endpoint to query OPA |
| `opa.policy.batched-uri` | No | Unset | Endpoint for batch OPA requests |
| `opa.log-requests` | No | `false` | Determines whether requests (URI, headers and entire body) are logged prior to sending them to OPA |
| `opa.log-responses` | No | `false` | Determines whether OPA responses (URI, status code, headers and entire body) are logged |
| `opa.allow-permission-management-operations` | No | `false` | Determines whether permission / role management operations will be allowed. These operations will be allowed or denied based on this setting, no request is sent to OPA |
| `opa.http-client.*` | No | Unset | Additional HTTP client configurations that get passed down. E.g. `opa.http-client.http-proxy` for configuring the HTTP proxy |

> When request / response logging is enabled, they will be logged at DEBUG level under the `io.trino.plugin.opa.OpaHttpClient` logger, you will need to update
> your log configuration accordingly.
>
> Be aware that enabling these options will produce very large amounts of logs

##### About permission management operations

The following operations are controlled by the `opa.allow-permission-management-operations` setting. If this setting is `true`, these
operations will be allowed; they will otherwise be denied. No request is sent to OPA either way:

- `GrantSchemaPrivilege`
- `DenySchemaPrivilege`
- `RevokeSchemaPrivilege`
- `GrantTablePrivilege`
- `DenyTablePrivilege`
- `RevokeTablePrivilege`
- `CreateRole`
- `DropRole`
- `GrantRoles`
- `RevokeRoles`

This is due to the complexity and potential unexpected consequences of having SQL-style grants / roles together with OPA, as per [discussion](https://github.com/trinodb/trino/pull/19532#discussion_r1380776593)
on the initial PR.

Additionally, users are always allowed to show information about roles (`SHOW ROLES`), regardless of this setting. The following operations are _always_ allowed:
- `ShowRoles`
- `ShowCurrentRoles`
- `ShowRoleGrants`

## OPA queries

The plugin will contact OPA for each authorization request as defined on the SPI.

OPA must return a response containing a boolean `allow` field, which will determine whether the operation
is permitted or not.

The plugin will pass as much context as possible within the OPA request. A simple way of checking
what data is passed in from Trino is to run OPA locally in verbose mode.

### Query structure

A query will contain a `context` and an `action` as its top level fields.

#### Query context:

While the `action` object contains information about _what_ action is being performed, the `context` object
contains all other contextual information about it. The `context` object contains the following fields:
- `identity`: The identity of the user performing the operation, containing the following 2 fields:
- `user` (string): username
- `groups` (array of strings): list of groups this user belongs to
- `softwareStack`: Information about the software stack running in the Trino server, more fields may be added later, currently:
- `trinoVersion` (string): Trino version

#### Query action:

This determines _what_ action is being performed and upon what resources, the top level fields are as follows:

- `operation` (string): operation being performed
- `resource` (object, nullable): information about the object being operated upon
- `targetResource` (object, nullable): information about the _new object_ being created, if applicable
- `grantee` (object, nullable): grantee of a grant operation.

Fields that are not applicable for a specific operation (e.g. `targetResource` if not modifying a table/schema/catalog, or `grantee` if not granting
permissions) will be set to null. Any null field will be omitted altogether from the `action` object.

#### Examples

Accessing a table will result in a query like the one below:

```json
{
"context": {
"identity": {
"user": "foo",
"groups": ["some-group"]
},
"softwareStack": {
"trinoVersion": "434"
}
},
"action": {
"operation": "SelectFromColumns",
"resource": {
"table": {
"catalogName": "my_catalog",
"schemaName": "my_schema",
"tableName": "my_table",
"columns": [
"column1",
"column2",
"column3"
]
}
}
}
}
```

`targetResource` is used in cases where a new resource, distinct from the one in `resource` is being created. For instance,
when renaming a table.

```json
{
"context": {
"identity": {
"user": "foo",
"groups": ["some-group"]
},
"softwareStack": {
"trinoVersion": "434"
}
},
"action": {
"operation": "RenameTable",
"resource": {
"table": {
"catalogName": "my_catalog",
"schemaName": "my_schema",
"tableName": "my_table"
}
},
"targetResource": {
"table": {
"catalogName": "my_catalog",
"schemaName": "my_schema",
"tableName": "new_table_name"
}
}
}
}
```


## Batch mode

A very powerful feature provided by OPA is its ability to respond to authorization queries with
more complex answers than a `true`/`false` boolean value.

Many features in Trino require _filtering_ to be performed to determine, given a list of resources,
(e.g. tables, queries, views, etc...) which of those a user should be entitled to see/interact with.

If `opa.policy.batched-uri` is _not_ configured, the plugin will send one request to OPA _per item_ being
filtered, then use the responses from OPA to construct a filtered list containing only those items for which
a `true` response was returned.

Configuring `opa.policy.batched-uri` will allow the plugin to send a request to that _batch_ endpoint instead,
with a **list** of the resources being filtered under `action.filterResources` (as opposed to `action.resource`).

> The other fields in the request are identical to the non-batch endpoint.

An OPA policy supporting batch operations should return a (potentially empty) list containing the _indices_
of the items for which authorization is granted (if any). Returning a `null` value instead of a list
is equivalent to returning an empty list.

> We may want to reconsider the choice of using _indices_ in the response as opposed to returning a list
> containing copies of elements from the `filterResources` field in the request for which access should
> be granted. Indices were chosen over copying elements as it made validation in the plugin easier,
> and from the few examples we tried, it also made certain policies a bit simpler. Any feedback is appreciated!
vagaerg marked this conversation as resolved.
Show resolved Hide resolved

An interesting side effect of this is that we can add batching support for policies that didn't originally
have it quite easily. Consider the following rego:

```rego
package foo

# ... rest of the policy ...
# this assumes the non-batch response field is called "allow"
batch contains i {
some i
raw_resource := input.action.filterResources[i]
allow with input.action.resource as raw_resource
}

# Corner case: filtering columns is done with a single table item, and many columns inside
# We cannot use our normal logic in other parts of the policy as they are based on sets
# and we need to retain order
batch contains i {
some i
input.action.operation == "FilterColumns"
count(input.action.filterResources) == 1
raw_resource := input.action.filterResources[0]
count(raw_resource["table"]["columns"]) > 0
new_resources := [
object.union(raw_resource, {"table": {"column": column_name}})
| column_name := raw_resource["table"]["columns"][_]
]
allow with input.action.resource as new_resources[i]
}
```
Loading
Loading