[RFC]: Access Control and Authentication #13
Replies: 3 comments 7 replies
-
Somewhat related to this is a key question that needs to be answered: long-term, does/should UC OSS plan to be able to run in a no-auth mode like it does today? For instance, Trino can run without authentication yet it supports a wide variety of authentication methods as well. So is the vision to allow a no-auth mode to exist or will that be completely done away with eventually? |
Beta Was this translation helpful? Give feedback.
-
Leaning heavily on Trino as an analogy here for whatever reason 😄 but looking at how some of the authentication types work could be helpful in terms of thinking through how these sections could work:
Assuming the answer is yes, how to do that, I mean https://trino.io/docs/current/security/authentication-types.html |
Beta Was this translation helpful? Give feedback.
-
In the context of delta-sharing-rs, we have recently been thinking about something similar. i.e. how to provide a flexible / pluggable means of authentication and authorization. While the current implementation might lean a bit heavy on the side of flexibility, I think the learnings might be valuable to this discussion. First thing is to have a very clear separation between the two - authentication would usually be handled in a middleware or some reverse proxy before the server. As such we defined a simple pub trait Authenticator: Send + Sync {
type Request;
type Recipient: Send;
/// Authenticate a request.
///
/// This method should return the recipient of the request, or an error if the request
/// is not authenticated or the recipient cannot be determined from the request.
fn authenticate(&self, request: &Self::Request) -> Result<Self::Recipient>;
} The As authorization may need to be handled deep within the code we defined a pub enum Securable {
Catalog(String),
Schema(String),
Table(String),
Function(String),
Volume(String),
Model(String),
}
pub enum Permission {
Read,
Write,
Manage,
}
pub enum Decision {
Allow,
Deny,
}
/// Policy for access control.
#[async_trait::async_trait]
pub trait Policy: Send + Sync {
type Recipient: Send;
/// Check if the policy allows the action.
///
/// Specifically, this method should return [`Decision::Allow`] if the recipient
/// is granted the requested permission on the resource, and [`Decision::Deny`] otherwise.
async fn authorize(
&self,
securable: Securable,
permission: Permission,
recipient: &Self::Recipient,
) -> Result<Decision>;
} The basic idea is that we may either keep track of permissions as part of the catalog itself (i.e. in the database) or defer that decision to an external service like Open Policy Agent or any other policy engine / implementation for that matter. As stated earlier in this thread, authorization is probably the core thing that unitycatalog is doing, as well as something that might look quite differently across adopters. @tdas, not sure if you already settled on a design in the JVM implementation, but it would of course be great if external services (IdP, policy engine, ...) could be used transparently between the OSS implementations. |
Beta Was this translation helpful? Give feedback.
-
TL;DR
The Unity Catalog in its platonic ideal is an access-control server for data assets living in cloud ecosystems. As such, how do actually scope the assets to users with proper access control?
The current situation
Some questions, loosely organized
How should we implement RBAC?
It should probably be done in databases. I personally haven't implemented RBAC before, but here's a few resources I found clicking around on reddit and hacker news:
In the case of Databricks, while the user-facing GRANT involves the use of compute, it's implemented as a REST API. We could implement this API ourselves pretty easily, the only question becomes "how does the service principal, who is the only one with initial access to the server, get access to start creating users/granting permissions to users?" Maybe, the server spits out a token at runtime in stdout that can be used to perform operations.
Token-bearer auth will be the method of choice since it's 1) simple 2) stateless 3) upstream does it. I think it's done via JWTs, I'm not sure. We probably have some flexibility here and should wait/work with the FOSS Java implementation.
How do we create users?
There needs to probably be an authentication flow where users can register via email/pw. However, they should start with zero permissions, only the service principal can grant them access to any assets. If we're going the JWT route, then they can issue a token to themselves to perform operations.
How do we vend credentials?
I have zero experience here and I can't find any concrete examples online, but here's a few anyways:
AWS: https://github.com/aws-samples/data-governance-w-temp-credentials-vending/blob/main/sagemaker-lf-credential-vending.ipynb
Azure: https://learn.microsoft.com/en-us/azure/databricks/ingestion/copy-into/generate-temporary-credentials (maybe?)
GCP: https://cloud.google.com/iam/docs/configuring-temporary-access
Unity Catalog needs to have the ability to vend credentials, the downstream consumers should never be able to observe anything about the asset outside what we give them.
Should this integrate with external providers?
The answer is yes, obviously, Microsoft Entra, AWS STS, LDAP(S) probably, etc. But it's quite complicated, so maybe just focus on internal auth with username/password to begin with?
How long do we hand our leases to authenticated users?
Let's say an user wants to use their compute engine of choice to run a query on a Delta Table, managed by Unity Catalog. This is a long-running query. How do we actually hand out the appropriate lease for the relevant time?
There isn't really a good solution to this. It should probably be configurable on an server-based level or per asset (as a property?). We should allow users to give up their leases, but this isn't behavior we should rely on. At the very least, for observability, the server should maintain some sort of state tracking which users have access to an asset at every given time.
Do we allow multiple users to have a lease on the same asset at the same time?
This is an interesting question. Normally, the compute engine is what handles mutual exclusion for a particular asset (at least in Databricks). We're compute agnostic, so maybe we just allow users to go wild with modifying an asset at a given time. In the case of Delta, not a huge deal since ACID is built into the format. But what about parquets? or CSVs?
Maybe this is something we implement by using the generic property tag to make the UC handle locking if we want that asset to be locked (and not necessarily a mutex, but there could be a RW lock). UC can't control how the asset is handled (though I believe it's possible to hand out Read-only and Read-Write temp credentials), but if we're confident we're the only handler of said data, we can make the assertion that we know only one person has Write access and all reads will be denied.
However, this ties into the point above — a rogue user can DOS by just keeping an infinite write-only lease on a mutexed asset. Definitely needs a better design if we choose to support this feature.
Beta Was this translation helpful? Give feedback.
All reactions