-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add allowed_scopes
to all authenticators to allow some users based on granted scopes
#719
Conversation
We can already control what scopes we ask for by setting 'scope'. However, OAuth2 [states](https://datatracker.ietf.org/doc/html/rfc6749#section-3.3) that not all the scopes requested may be granted, for whatever reason. This could be because the user choose not to give us the grant, or because the user themselves doesn't have access to grant us this scope (This is how Auth0 uses it for example - https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow/call-your-api-using-the-authorization-code-flow). This PR adds a `required_scopes` property, listing all the scopes that *must* be granted for the user to be allowed access. By default it's empty, so no behavior is changed.
af20457
to
e801059
Compare
for more information, see https://pre-commit.ci
I've a local version of this deployed in 2i2c-org/infrastructure#3618, and it works well for use in auth0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable and clear for future maintenance! This looks almost ready to merge to me!
- Could you update the title to include the new config name?
- Could you clarify if "scope" is expected to be returned with the access token request response and/or the userinfo response from the oauth spec, and that we get it from where its specced to be returned?
- Could you provide in code (either config help or inline code comment) reference links to the oauth2 docs of relevance?
required_scopes
to all authenticators to allow gating access based on granted scopes
Thanks for the review, @consideRatio! I've done the three things :) |
oauthenticator/oauth2.py
Outdated
@@ -1025,6 +1047,18 @@ async def check_allowed(self, username, auth_model): | |||
if username in self.allowed_users: | |||
return True | |||
|
|||
# If we specific scope grants are required, validate that they have been granted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've considered if this should be in an earlier stage than check_allowed. Its nice to have it here if seen as authorization logic, but it could be seen as a technical requirement - for example to get relevant info for the username. Then, it would be more suitable that not meeting this criteria is seen ss an error rather than 403.
Hmmmm... Consider if we request email scope and username_claim email and declare it as email scope required but don't get it - should that be another kind of error rather than denied authorization 403?
If we think so, this could be more like an assertion early in update_auth_model instead raising an error maybe? Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what i think, but it wouldn't be breaking to relocate this check so I'm open with going onwards either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. There are two ways to think about this:
- It's part of authentication logic, as if we aren't granted a specific claim from a scope, we can't even log the user in (for username_claim, for example). So if the scope isn't granted, we can't actually validate who this user is. So it gets relocated elsewhere.
- It is part of authorization logic, where scopes are used as a way to 'gate' access after we could validate who the user is. So we know who the user is, but we don't want to give them access. In this case it stays where it is.
So how do we best determine if this is part of authentication or authorization?
I went to look at possible available scopes for a bunch of authenticators:
- GitHub scopes list
- Mediawiki rights
- Standard OpenID connect scopes (as supported by Auth0)
username_claim
feels clearly part of authentication to me, and right now, if we don't get a username_claim
back for any reason, it's already a 403. But the 403 is a little bit unclear on why exactly the permission was denied and can be frustrating for the user. However, the scopes that can be related to username_claim
are few in number. In GitHub for example, it really is just user
- its unlikely you would want the username to be derived from anything else (for the most part). I think this is generally true for most auth providers, particularly given the prevalance & design decisions of openid connect. This means that when configuring, if the username_claim
is not present due to a missing scope
, it most likely is because the hub admin did not ask for the scope, and it's going to be clear to them in initial testing.
However, you may want to deny authorization for some subsets - for example, someone may want to setup a specific ssh key in github for a JupyterHub, and hence reject authorization if they don't grant write:public_key
, even though it has nothing to do with authentication itself - we know who the user is, just don't want them in here.
I'm a big fan of @GeorgianaElena's work in #594, and think that separation of authentication and authorization here makes things easier to reason about. I think the most likely use of this feature is authorization and not authentication. username_claim
being set to something that isn't present if a specific scope is not granted is IMO a different problem, and we can tackle that differently if you desire - I'd say the primary extra bit there is to find a way to provide useful error information.
All this to say, I think it's ok for this check to remain here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #721 to make the missing username_claim situation slightly easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reasoning about this - in great depth as well @yuvipanda!! I think this makes a good case for considering this to be part of authorization (authz) logic rather than authentication (authn) logic!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes great sense. Approved by me, pending @consideRatio's more detailed feedback.
oauthenticator/oauth2.py
Outdated
granted_scopes = auth_model.get('auth_state', {}).get('scope', []) | ||
missing_scopes = set(self.required_scopes) - set(granted_scopes) | ||
if missing_scopes: | ||
self.log.info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is often useful to log which scopes are held and not granted, rather than just log what wasn't granted.
I suspect that the most common case for this branch will be that scope
is entirely missing due to some misconfiguration or something, and it's helpful for that to appear distinct from a completely successful authentication with insufficient permissions. Another example would be a typo in a scope name. Showing the held scopes would make that much easier to identify and resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the scopes listed in this config are not granted, the user will not be allowed to log in.
I think this isn't correctly implemented currently.
- If
allow_all
is true, and therequired_scopes
isn't met, this will not deny authorization. - Further, if a subclass like Generic's
check_allowed
is called, it will simply return True if the base class returns true, but not return false if the base class return false.
In general terms, check_allowed
as implemented in Authenticator, replaced in OAuthenticator, and augmented in GenericOAuthenticator etc, needs to handle three situations.
- The authenticated user gets disallowed
- The authenticated user gets allowed
- The authenticated user doesn't get disallowed or allowed, which will in the end lead to a default of being disallowed.
This could be implemented if check_allowed
makes use of True/None/False, or if check_allowed
makes use of True/Raise error/False. I'm confident we can safely take the raise error path to handle this, but I'm not confident we can adopt a True/None/False approach without breaking interactions with subclasses or JupyterHub itself.
With this considered, I think the path forward is to:
- relocate the
required_scopes
check to the first thing happening incheck_allowed
after the mentioned workaround for jupyterhub<5 - make the
required_scopes
check do nothing or raise an error
@consideRatio I spent some more time thinking through this, and agree that as is worded it doesn't actually do what it says in the documentation. So I've made the following changes:
How does it look to you now? |
17e4347
to
ae734ef
Compare
And if we decide throwing a 403 here is ok, we should document it in JupyterHub too - I've done so at jupyterhub/jupyterhub#4682 |
To give allow_all special power in allowing, making it more allowing than allowed_users for example, is not intuitive at all in my mind. It would also mess with the use case of allowing all users as long as they show up with the required issued scope - isn't this a key use case? required_scopes and allow_all makes sense together imo - a user must be allowed and have the required scopes to be authorized - either the user is allowed via allow_all or allowed_users etc. |
The original idea behind If |
required_scopes
to all authenticators to allow gating access based on granted scopesallowed_scopes
to all authenticators to allow some users based on granted scopes
Ok so the core issue was that this was an authorization check that was kinda being implemented like an authentication check. Since 16, OAuthenticator has many different I've now changed the PR to make this much clearer. The traitlet is now This is different from But regardless, I think this fits in much nicer than |
I think the latest behavior as it is makes sense and is consistent with other If someone does want "block unless this scope is held" behavior, the method to implement is |
I see I think it should be allowed to configure |
@yuvipanda I think the pivot to providing NamingI'd like to propose the naming I propose another name than All scopes, or any of the scopes?If we start with allowing users issued all scopes in a set of scopes via this config, we can then easily either now or later let the config accept a set of "set of scopes", where having any set of scopes fully issued would allow the user. Example configurations # allow users for whom both a and b are issued
# this represents the PR in its current form - a single set of scopes is provided
allowing_scope = {"a", "b"}
# allow users for whom either a are issued or b1 and b2 are issued
allowing_scopes = {
{"a"},
{"b1", "b2"},
} |
I think
A warning could provide some experience improvement to make it clearer that some config will have no effect, but I agree an error is too far. |
If we wanted to be consistent we could go with the suggestion in #719 (comment) of makeing |
Considering naming, here is an overview of the config that allows users:
We have the prefixes Looking at the terminology in https://datatracker.ietf.org/doc/html/rfc6749#section-3.3, I think the terminology is "requested scope" and "granted scope" (the access token is "issued" and the scope the token has is "granted"). Do you consider |
|
If we support the sets of sets / lists of lists version, its like allowed_scope_sets, where each set of scopes are granted, making it consistent with being individual parts allowing a user access, but each part is a "scope set" instead of individual requested scope entries. Function when providing a single set could be made to mean "allow when any of scope entries are granted" (OR) or "allow when all of scope entries are granted" (AND), do you think it should be the OR version if just passing the config a single set/list of scopes @manics? |
I don't know..... given this is a new property we could go for the list-of-lists only? It's more complicated for admins but at least it's unambiguous? |
Thanks for the comments everyone, I'm very swamped this week but will come back to this next week. |
Sigh, momentum seems so important to me getting stuff done. There are merge conflicts here already :( |
ok, so my understanding is that there are two questions at play:
For naming, I would like to propose we keep this as is, as I agree with @minrk's comments in #719 (comment). I also think For list of lists, the question for me is if we can add that in the future in a way that doesn't break backwards compatibility. Can we meaningfully differentiate between a 'list of strings' vs 'list of lists' scenario? The way I approached this is by trying to figure out what documentation for such a property in the future would look like. If we add this feature in the future, here's what the docstring may be like
This reads natural enough to me. You can either have a list of list of strings, or for the most common case, a list of strings. It's unambiguous. We don't even know if admins would want to have a list of list of strings. We do know that they would want to have a list of strings. Given that we can evolve this later on in a backwards compatible way, I think it's safe to try this one as is. So to summarize:
I would love to see this be merged! I think it's ready to go as is, now that I've fixed the merge conflicts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like all of @yuvipanda's explanation and reasoning. 👍 to merge as-is from me.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
anyone wanna hit merge? :) |
yay, thank you @minrk |
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
The Auth0 authenticator made the EarthScope authentication mechanism look more custom than it really was - it's really just standard auth0, and if you look at the [auth0 authenticator](https://github.com/jupyterhub/oauthenticator/blob/main/oauthenticator/auth0.py) code it's fairly minimal - just a couple of convenience functions. This PR switches that (+ our auth0 documentation) to simply use GenericOAuthenticator. This has the following advantages: 1. The Auth0 documentation we have can be easily ported to just support any Generic OAuth provider if needed in the future. 2. The GenericOAuthenticator has features the Auth0 one does not - particularly around groups management that we do want to use. While eventually I think this should be made available to all authenticators (and will work with upstream in doing so), moving to GenericOAuthenticator unblocks planning & scheduling engineering work here as soon as 2i2c-org#3818 is merged. 3. It signals that the EarthScope hub is not *that* special, just the first as a way for us to develop and offer new features. We should work on structuring how we do this, and signal when features are available in what hubs. But in the meantime, this reduces the overall apparent complexity to match actual complexity 4. Removes the custom logout_url work, and instead just mentions you need to set the `client_id` in the logout_url, and adds that to our auth0 documentation. This removes more custom code we have for EarthScope. Once jupyterhub/oauthenticator#719 is merged, this helps us remove all custom code here from earthscope. This part fixes 2i2c-org#3715 This was triggered as cleanup by https://2i2c.freshdesk.com/a/tickets/1453. I'll create appropriate issues for next steps, and will prioritize any future work accordingly with our engineering processes.
We can already control what scopes we ask for by setting 'scope'. However, OAuth2
states that not all the scopes requested may be granted, for whatever reason. This could be because the user choose not to give us the grant, or because the user themselves doesn't have access to grant us this scope (This is how Auth0 uses it for example -
https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow/call-your-api-using-the-authorization-code-flow).
This PR adds an
allowed_scopes
property, which allows granting access to a subset of users based on the presence of a particular scope in the list of scopes granted. This can be used in addition to other authorization mechanisms to allow access to the hub.