Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add row-level security filter in query #17564

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

cecemei
Copy link
Contributor

@cecemei cecemei commented Dec 13, 2024

This PR adds the ability to attach row filters to a query, thus restrict row-level data access for given users.

Description

A query follows these steps: initialize -> authorize -> execute. In the authorize step, the permissions are checked for all the required resources in the query. Before this PR, the authorize step only returns allow or deny access on a table. Granting access to a table means a user can see all data in this table. After this PR, the authorize step can return allow access along with restrictions (i.e. a row filter that must be applied to the table ), which restrict users' data access at row level. For example, customers can only see rows relevant to their company.

The authorizeAllResourceActions now returns a AuthorizationResult instead of Access, this class also replaces DruidPlanner.AuthResult class. The main difference between AuthorizationResult and Access is that the former contains a map of table with DimFilter. It can also have ResourceAction Iterables which DruidPlanner cares about.

In the authorize step of QueryLifecycle, it would enforce the filters on tables in the datasource tree, transform TableDataSource to RestrictedDataSource. In the execute step, filters are applied through RestrictedSegment and RestrictedCursorFactory.

Key changed/added classes in this PR
  • a new class AuthorizationResult. The class should be used for all the authorization calls, while the Access class is still used in Authorizer interface. It has an static variable ALLOW_ALL, which should be used for all internal calls. getPermissionErrorMessage(boolean policyFilterNotPermitted) is called to get a failure message, which replaced access.toString(), access.toMessage(), access.getMessage()`. The class contains:
    • deny/ allow results for checking permissions on a list of resource actions.
    • failure message if authorization fails, this is null when auth is allowed, and is the error message of the first resource action authorization failure (there might be more failures, but we don't try further)
    • a map of table name with row-level policy filters.
  • Access. Added Optional<DimFilter> rowFilter field, which represents a restrictions returned from authorizer. Also updated constructor.
  • AbstractStatement. Replace DruidPlanner.AuthResult with AuthorizationResult.
  • AuthConfig. Added flag enableStrictPolicyCheck, when enabled, it would check every table needs to have a restriction in place, meaning it has an entry in the restrictions map, could be Optional.empty().
  • AuthorizationUtils. It now consolidates all restrictions for authorizing resource actions into a restrictions map, which is included in AuthorizationResult. Also updated javadoc for all public methods.
  • a new class RestrictedDataSource, which wraps a TableDataSource with a DimFilter. If the filter is null, meaning there's no applied.
  • a new class RestrictedSegment, which represents a segment with a filter.
  • a new class RestrictedCursorFactory, can be created by RestrictedSegment.asCursorFactory, enforces the DimFilter on Cursor.
  • DataSource interface, added a sub type of restrict, added a default method mapWithRestriction.
  • TableDataSource, added the impl of mapWithRestriction.
  • JoinDataSource can accept RestrictedDataSource as left-hand side datasource.
  • Query interfaced, added a default method withPolicyRestrictions.
  • SegmentMetadataQuery, added the impl of withDataSource.
  • QueryLifeCycle, replace baseQuery with baseQuery.withPolicyRestrictions if authorization result is not ALLOW_ALL (calls from internal services).

Caveats

  • The restrictions don't apply to VIEWs.
  • UnionDataSource doesn't work with RestrictedDataSource, planning to fix that later.
  • The restrictions don't apply to DartQueryMaker and MSQTaskQueryMaker, for now they would throw an error if there's any policy restrictions.

This PR has:

  • been self-reviewed.
  • [] added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • [] been tested in a test Druid cluster.

public class RestrictedDataSourceTest
{
@Rule
public ExpectedException expectedException = ExpectedException.none();

Check notice

Code scanning / CodeQL

Deprecated method or constructor invocation Note test

Invoking
ExpectedException.none
should be avoided because it has been deprecated.
@cecemei cecemei marked this pull request as ready for review December 13, 2024 20:45
…it as the default interface for checking permission
Comment on lines 613 to +617
resourceAction,
authorizerMapper
);
if (!authResult.isAllowed()) {

authResult.getPermissionErrorMessage(true).ifPresent(error -> {

Check failure

Code scanning / CodeQL

User-controlled bypass of sensitive method High

Sensitive method may not be executed depending on a
this condition
, which flows from
user-controlled value
.
Copy link
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review includes some high level design comments.

I haven't checked the tests yet (will do so in a follow up review). The things I'd be looking for in the tests are:

  • we should have tests for the resources, including negative tests for resources (such as the Dart resource) that don't support policies yet. The negative tests should verify that we get an error like "this endpoint doesn't support policies".
  • we should also have tests for the lower level pieces like QueryLifecycle, the DataSource mapping, and RestrictedSegment.

* @return authorization result
*/
public Access authorize(HttpServletRequest req)
public AuthorizationResult authorize(HttpServletRequest req)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because AuthorizationResult includes policies, this method signature makes it ambiguous as to whether the caller should apply policies.

In this case, the QueryLifecycle itself applies the policies, and the caller therefore doesn't need to. We should make that clear somehow. Javadoc could do it, or perhaps returning something other than AuthorizationResult.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the javadoc here to indicate query/datasource is transformed here

import java.util.Set;

/**
* Static utility functions for performing authorization checks.
*/
public class AuthorizationUtils
{
static final ImmutableSet<String> RESTRICTION_APPLICABLE_RESOURCE_TYPES = ImmutableSet.of(
ResourceType.DATASOURCE,
ResourceType.VIEW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we should do datasource-only for now, since VIEW would be another bundle of stuff to think about: views are resolved in the SQL planner, so the restrictions would need to be applied in a different place.

This does bring up a question about the model though. If a user has restricted access to a DATASOURCE, should those restrictions apply when the datasource is accessed through a SQL view? My stance is "yes" and I think the way we're doing it will achieve that. Please include some tests for this just to be sure it works as expected.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually that's opposite of the status quo, see the restrictedView calcite tests. the view is created on forbiddenDatasource which users don't have access to, but they can query the restrictedView.


private final boolean allowed;
private final String message;
// A row-level policy filter on top of table-level read access. It should be empty if there are no policy restrictions
// or if access is requested for an action other than reading the table.
private final Optional<DimFilter> rowFilter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would be good to keep the value as some more general class, like Policies rather than Optional<DimFilter>, so when we want to add other kinds of policies (such as column restrictions, possibly) they can fit right in without further changes to the Access object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a Policy class, plz review

* @param rowFilters a mapping of table names to row filters, every table in the datasource tree must have an entry
* @return the updated datasource, with restrictions applied in the datasource tree
*/
default DataSource mapWithRestriction(Map<String, Optional<DimFilter>> rowFilters, boolean enableStrictPolicyCheck)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should design the strict check a bit differently. With this current design, a single literally TRUE filter would pass, but we don't want that. I think it would be better for this to not take enableStrictPolicyCheck, but instead for the strict check to happen in QueryLifecycle after the query is mapped. That would enable the check to be even stricter: it should really check not just that there is a filter, but also that the filter is actually doing something. To allow for the druid_internal or admin case, we can bypass the strict check if the user has permission for STATE READ (a broad administrative permission).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow for the druid_internal or admin case, we can bypass the strict check if the user has permission for STATE READ (a broad administrative permission).

Actually upon further reflection this seems too complex. We don't want to have to consider both policies and STATE permissions. Instead, let's introduce a Policy that is of type admin. It doesn't apply any restrictions, but it's something an authorizer can return to signify that the user is OK to query unrestricted.

Btw, the strict check in QueryLifecycle would need to happen even if the authorized returns ALLOW. (Strict check should fail in this case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes strict check should happen. btw, i added this flag because some tests in views are failing, which i assume won't be an easy solution to fix.

in theory if there's no views, we could default to strict check, it just wants to see table has an entry in policyMap (could Optional.empty() if authorizer says there's no policy).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. The strict check should be even stricter: there should be a mode that requires all authorization results to have some non-empty set of policies. The idea with that check is it's a defense against the authorizer being mis-configured in such a way that policies aren't being reported properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally if we have a PolicyConfig class similar to AuthConfig, or put it in a policy context, maybe could be more flexible.

* Returns an updated datasource based on the policy restrictions on tables. If this datasource contains no table, no
* changes should occur.
*
* @param rowFilters a mapping of table names to row filters, every table in the datasource tree must have an entry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should take Map<String, Policies> instead, so other types of policies can be applied in the future without changing the DataSource interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Takes Map<String, Optional> now, since policy is optional for druid tables, ppl can config any (or all) tables to be policy restricted. It's a single policy, since policy is returned from authorizer and it's a merged result of (policy rule, or policy template, which could be a serialized format and supports crud and stuff).


import javax.annotation.Nullable;

public class RestrictedSegment extends WrappedSegmentReference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include javadoc describing the purpose of this class, and how its defensive mechanisms work.

The way they should work is something like: that if you call asCursorFactory or as(CursorFactory.class) (plus perhaps some other small list), restrictions are handled automatically. But if you call asQueryableIndex or as for something other than that small list, the query gets the unrestricted internal object and it needs to call some method on the RestrictedSegment confirming that it applied the restrictions on its own.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to call some method on the RestrictedSegment confirming that it applied the restrictions on its own

To make this robust, we should actually have a 1-1 relationship between as calls and "i know what i'm doing" calls. If a query engine calls as three different times it should call "i know what i'm doing" three times.

@@ -798,6 +799,20 @@

}

@MethodSource("data")
@ParameterizedTest(name = "{index}:with context {0}")
public void testSelectRestricted(String contextName, Map<String, Object> context)

Check notice

Code scanning / CodeQL

Useless parameter Note test

The parameter 'contextName' is never used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants