[RFC]: Access Control and Authentication #13

abhiaagarwal · 2024-06-29T13:48:14Z

abhiaagarwal
Jun 29, 2024
Maintainer

TL;DR

The Unity Catalog in its platonic ideal is an access-control server for data assets living in cloud ecosystems. As such, how do actually scope the assets to users with proper access control?

The current situation

The upstream OSS unitycatalog in java does not implement any sort of authentication yet.
- It does not also implement vending of temporary credentials yet, it hands out the same AWS lease that it uses.
The example CLI tool assumes bearer auth, but the server itself doesn't do anything with it.

Some questions, loosely organized

How should we implement RBAC?

It should probably be done in databases. I personally haven't implemented RBAC before, but here's a few resources I found clicking around on reddit and hacker news:

https://tailscale.com/blog/rbac-like-it-was-meant-to-be (excellent article as a conceptual overview)
https://axellarsson.com/blog/rbac-and-nodejs/ (great for showing how this actually looks in practice)

In the case of Databricks, while the user-facing GRANT involves the use of compute, it's implemented as a REST API. We could implement this API ourselves pretty easily, the only question becomes "how does the service principal, who is the only one with initial access to the server, get access to start creating users/granting permissions to users?" Maybe, the server spits out a token at runtime in stdout that can be used to perform operations.

Token-bearer auth will be the method of choice since it's 1) simple 2) stateless 3) upstream does it. I think it's done via JWTs, I'm not sure. We probably have some flexibility here and should wait/work with the FOSS Java implementation.

How do we create users?

There needs to probably be an authentication flow where users can register via email/pw. However, they should start with zero permissions, only the service principal can grant them access to any assets. If we're going the JWT route, then they can issue a token to themselves to perform operations.

How do we vend credentials?

I have zero experience here and I can't find any concrete examples online, but here's a few anyways:

AWS: https://github.com/aws-samples/data-governance-w-temp-credentials-vending/blob/main/sagemaker-lf-credential-vending.ipynb
Azure: https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/generate-temporary-credentials (maybe?)
GCP: https://cloud.google.com/iam/docs/configuring-temporary-access

Unity Catalog needs to have the ability to vend credentials, the downstream consumers should never be able to observe anything about the asset outside what we give them.

Should this integrate with external providers?

The answer is yes, obviously, Microsoft Entra, AWS STS, LDAP(S) probably, etc. But it's quite complicated, so maybe just focus on internal auth with username/password to begin with?

How long do we hand our leases to authenticated users?

Let's say an user wants to use their compute engine of choice to run a query on a Delta Table, managed by Unity Catalog. This is a long-running query. How do we actually hand out the appropriate lease for the relevant time?

There isn't really a good solution to this. It should probably be configurable on an server-based level or per asset (as a property?). We should allow users to give up their leases, but this isn't behavior we should rely on. At the very least, for observability, the server should maintain some sort of state tracking which users have access to an asset at every given time.

Do we allow multiple users to have a lease on the same asset at the same time?

This is an interesting question. Normally, the compute engine is what handles mutual exclusion for a particular asset (at least in Databricks). We're compute agnostic, so maybe we just allow users to go wild with modifying an asset at a given time. In the case of Delta, not a huge deal since ACID is built into the format. But what about parquets? or CSVs?

Maybe this is something we implement by using the generic property tag to make the UC handle locking if we want that asset to be locked (and not necessarily a mutex, but there could be a RW lock). UC can't control how the asset is handled (though I believe it's possible to hand out Read-only and Read-Write temp credentials), but if we're confident we're the only handler of said data, we can make the assertion that we know only one person has Write access and all reads will be denied.

However, this ties into the point above — a rogue user can DOS by just keeping an infinite write-only lease on a mutexed asset. Definitely needs a better design if we choose to support this feature.

kulte · 2024-07-01T11:21:12Z

kulte
Jul 1, 2024

"how does the service principal, who is the only one with initial access to the server, get access to start creating users/granting permissions to users?"

Somewhat related to this is a key question that needs to be answered: long-term, does/should UC OSS plan to be able to run in a no-auth mode like it does today? For instance, Trino can run without authentication yet it supports a wide variety of authentication methods as well. So is the vision to allow a no-auth mode to exist or will that be completely done away with eventually?

1 reply

abhiaagarwal Jul 1, 2024
Maintainer Author

Good question — I think the main point of UC should be access control, so I assume no-auth is just a temporary stopgap. It's a security nightmare without it :D

kulte · 2024-07-01T19:23:38Z

kulte
Jul 1, 2024

Leaning heavily on Trino as an analogy here for whatever reason 😄 but looking at how some of the authentication types work could be helpful in terms of thinking through how these sections could work:

How do we create users?

Should this integrate with external providers?

Assuming the answer is yes, how to do that, I mean

https://trino.io/docs/current/security/authentication-types.html

1 reply

abhiaagarwal Jul 8, 2024
Maintainer Author

Haha I'm my company's Starburst/Trino administrator, so I know it quite well

I agree with all of this, but Trino doesn't have native cloud authentication, everything goes through LDAP(S) (at my company, we use Microsoft Entra -> LDAP(S) -> Ranger for ACL).

It would be nice if we could have a solution that bypasses the use of LDAP(S) entirely since it's a bit archaic to deal with and not really a necessity with the modern cloud ecosystem. I don't believe a generalized solution exists yet, and maybe that's something we should tackle in relation to what @roeap mentioned at #15

That isn't to say that LDAPs shouldn't be supported, but rather than everything looking like:

Cloud provider (AWS IAM, Azure Entra, etc) -> LDAPS -> Server Auth

it's more:

Cloud Provider -> Server Auth
LDAPS -> Server Auth

(where we can use LDAPS as a generic plugin for solutions we don't natively support)

roeap · 2024-07-07T10:18:37Z

roeap
Jul 7, 2024

In the context of delta-sharing-rs, we have recently been thinking about something similar. i.e. how to provide a flexible / pluggable means of authentication and authorization. While the current implementation might lean a bit heavy on the side of flexibility, I think the learnings might be valuable to this discussion.

First thing is to have a very clear separation between the two - authentication would usually be handled in a middleware or some reverse proxy before the server. As such we defined a simple Authenticator trait.

pub trait Authenticator: Send + Sync {
    type Request;
    type Recipient: Send;

    /// Authenticate a request.
    ///
    /// This method should return the recipient of the request, or an error if the request
    /// is not authenticated or the recipient cannot be determined from the request.
    fn authenticate(&self, request: &Self::Request) -> Result<Self::Recipient>;
}

The Request is modeled as an associated type to be agnostic of the server framework, which i guess is one example i was referring to in maybe being overly flexible 😆. Implementations would usually either validate the auth header and extract the principals identity, or simply extract it from a header injected by a proxy.

As authorization may need to be handled deep within the code we defined a Policy trait that is passed along with every request along with some related types ...

pub enum Securable {
    Catalog(String),
    Schema(String),
    Table(String),
    Function(String),
    Volume(String),
    Model(String),
}

pub enum Permission {
    Read,
    Write,
    Manage,
}

pub enum Decision {
    Allow,
    Deny,
}

/// Policy for access control.
#[async_trait::async_trait]
pub trait Policy: Send + Sync {
    type Recipient: Send;

    /// Check if the policy allows the action.
    ///
    /// Specifically, this method should return [`Decision::Allow`] if the recipient
    /// is granted the requested permission on the resource, and [`Decision::Deny`] otherwise.
    async fn authorize(
        &self,
        securable: Securable,
        permission: Permission,
        recipient: &Self::Recipient,
    ) -> Result<Decision>;
}

The basic idea is that we may either keep track of permissions as part of the catalog itself (i.e. in the database) or defer that decision to an external service like Open Policy Agent or any other policy engine / implementation for that matter.

As stated earlier in this thread, authorization is probably the core thing that unitycatalog is doing, as well as something that might look quite differently across adopters.

@tdas, not sure if you already settled on a design in the JVM implementation, but it would of course be great if external services (IdP, policy engine, ...) could be used transparently between the OSS implementations.

5 replies

roeap Jul 7, 2024

as a side note, having things nicely tucked away in traits also makes testing different scenarios on the server quite simple 😄.

abhiaagarwal Jul 7, 2024
Maintainer Author

@roeap this is great stuff! I didn't mention this in this RFC, but take a look at axum-login, it provides generic middleware with traits representing backends.

abhiaagarwal Jul 7, 2024
Maintainer Author

Axum-login notably doesn't have any sort of specific ACL control, but we can probably wrap our own middleware around it relatively easily. If we want to model this exactly like a chain of Request->Response, then that's just a tower middleware we can plug in.

abrassel Jul 8, 2024
Maintainer

just chiming in to say that I like the traits quite a lot. Testability is important.

abhiaagarwal Jul 8, 2024
Maintainer Author

Actually, I lied, I didn't realize that axum-login does have a sort of ACL control built in. https://docs.rs/axum-login/latest/axum_login/trait.AuthzBackend.html.

These traits are probably good enough for our purposes, but we might want to not depend on a third-party crate. We could probably make our own subtraits though that inherit the upstream ones, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC]: Access Control and Authentication #13

{{title}}

Replies: 3 comments 7 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[RFC]: Access Control and Authentication #13

abhiaagarwal Jun 29, 2024 Maintainer

TL;DR

The current situation

Some questions, loosely organized

Replies: 3 comments · 7 replies

kulte Jul 1, 2024

abhiaagarwal Jul 1, 2024 Maintainer Author

kulte Jul 1, 2024

abhiaagarwal Jul 8, 2024 Maintainer Author

roeap Jul 7, 2024

roeap Jul 7, 2024

abhiaagarwal Jul 7, 2024 Maintainer Author

abhiaagarwal Jul 7, 2024 Maintainer Author

abrassel Jul 8, 2024 Maintainer

abhiaagarwal Jul 8, 2024 Maintainer Author

abhiaagarwal
Jun 29, 2024
Maintainer

Replies: 3 comments 7 replies

kulte
Jul 1, 2024

abhiaagarwal Jul 1, 2024
Maintainer Author

kulte
Jul 1, 2024

abhiaagarwal Jul 8, 2024
Maintainer Author

roeap
Jul 7, 2024

abhiaagarwal Jul 7, 2024
Maintainer Author

abhiaagarwal Jul 7, 2024
Maintainer Author

abrassel Jul 8, 2024
Maintainer

abhiaagarwal Jul 8, 2024
Maintainer Author