Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for row filtering and column masking access control #24277

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

BryanCutler
Copy link
Contributor

@BryanCutler BryanCutler commented Dec 18, 2024

Description

This adds support for row filtering and column masking to access control as part of governance requirements. Filters/masks are supplied as part of an access control implementation and then applied to queries according to matching table and column names supplied by access control.

These changes are cherry-picked from the following commits:

Adding support for row filters
Cherry-pick of trinodb/trino@fae3147
Co-authored-by: Martin Traverso [email protected]

Adding support for column masks
Cherry-pick of trinodb/trino@7e0d88e
Co-authored-by: Martin Traverso [email protected]

Disallow multiple masks on a given column
Cherry-pick of trinodb/trino@bdd1cb5
Co-authored-by: Martin Traverso [email protected]

Add access control SPI function for table column masks.
Cherry-pick of trinodb/trino@5da64a9
Co-authored-by: Cristian Osiac [email protected]

Update core access control to fetch column masks in bulk
Cherry-pick of trinodb/trino@1dbbcb3
Co-authored-by: Cristian Osiac [email protected]

For original IBM Presto port
Co-authored-by: Reetika Agrawal [email protected]

From #24278

Motivation and Context

Governance requirements include the ability to restrict access to certain rows of tables or sensitive data in certain columns. This adds the ability to provide filters/masks as expressions to Presto where they will then be applied to relevant queries.

Impact

Additions to SPI include access control interfaces to get row filters and column masks.

ConnectorAccessControl

    /**
     * Get row filters associated with the given table and identity.
     * <p>
     * Each filter must be a scalar SQL expression of boolean type over the columns in the table.
     *
     * @return the list of filters, or empty list if not applicable
     */
    default List<ViewExpression> getRowFilters(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName)
    {
        return Collections.emptyList();
    }

    /**
     * Bulk method for getting column masks for a subset of columns in a table.
     * <p>
     * Each mask must be a scalar SQL expression of a type coercible to the type of the column being masked. The expression
     * must be written in terms of columns in the table.
     *
     * @return a mapping from columns to masks, or an empty map if not applicable. The keys of the return Map are a subset of {@code columns}.
     */
    default Map<ColumnMetadata, ViewExpression> getColumnMasks(ConnectorTransactionHandle transactionHandle, ConnectorIdentity identity, AccessControlContext context, SchemaTableName tableName, List<ColumnMetadata> columns)
    {
        return Collections.emptyMap();
    }

Test Plan

Unit tests added.

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* ... :pr:`12345`
* ... :pr:`12345`

Hive Connector Changes
* ... :pr:`12345`
* ... :pr:`12345`

If release note is NOT required, use:

== NO RELEASE NOTE ==

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 18, 2024
@prestodb-ci prestodb-ci requested review from a team, imjalpreet and wanglinsong and removed request for a team December 18, 2024 19:07
@BryanCutler
Copy link
Contributor Author

There are some followup commits that I did not include here, we can add these in or look into as a followup:

Apply filters and masks for view object
trinodb/trino@6721cfa

Apply filters and masks for tables referenced by views
trinodb/trino@84429d8

Add test for join on masked column
trinodb/trino@705ed21

Record filter and mask information in query event
trinodb/trino@1b55b86

@jaystarshot
Copy link
Member

@tdcmeehan Its high time that we consider merging this stuff, as we discussed last time in the meeting doing in connector optimizer is not feasible for all cases and offerrs no benefit

@BryanCutler BryanCutler force-pushed the qaag-forward-fit-trino branch 2 times, most recently from 0c28887 to 3d1ecda Compare December 19, 2024 19:35
@tdcmeehan
Copy link
Contributor

I'm in favor of this approach. While it does increase our SPI footprint, I think it's nice to align with Trino's approach in this case because it makes porting plugins easier for folks. I think this is worth the small increase in footprint.

@BryanCutler
Copy link
Contributor Author

Updated to include followup commits to retrieve a list of row filters and multiple column masks with a single call to the interface.

{
String column = columnMetadata.getName();
if (analysis.hasColumnMask(tableName, column, currentIdentity)) {
throw new PrestoException(INVALID_ROW_FILTER, format("Column mask for '%s.%s' is recursive", tableName, column), null);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you please use INVALID_COLUMN_MASK error code in this method?


planBuilder = subqueryPlanner.handleSubqueries(planBuilder, mask, mask, context);

Map<VariableReferenceExpression, RowExpression> assignments = new LinkedHashMap<>();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move the creation of assignments and the new planBuilder outside the for loop? This will allow invoking planBuilder.withNewRoot only once per method invocation, rather than creating a new planBuilder for each mask. It will improve efficiency and make the logic cleaner.

Map<String, Expression> columnMasks = analysis.getColumnMasks(table);

List<VariableReferenceExpression> mappings = plan.getFieldMappings();
TranslationMap translations = new TranslationMap(plan, analysis, lambdaDeclarationToVariableMap);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can simplify this section by reusing some code. Instead of directly using translations.rewrite(mask), you could initialize the PlanBuilder with PlanBuilder planBuilder = initializePlanBuilder(plan);. Then, use planBuilder.rewrite(mask) to leverage the translations map already stored internally within planBuilder.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants