-
Notifications
You must be signed in to change notification settings - Fork 24
Adding pilot registrations and authentification (Router) #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Adding pilot registrations and authentification (Router) #421
Conversation
e74fe72
to
9d1c062
Compare
The failed CI i'm not sure if I have to regenerate the client manually. |
Yes, you need to regenerate the client manually, here is the documentation: https://github.com/DIRACGrid/diracx/blob/main/docs/CLIENT.md#updating-the-client If you have any trouble, please let me know |
a269416
to
8645c01
Compare
if "foreign key" in str(e.orig).lower(): | ||
raise PilotNotFoundError(pilot_id=pilot_id) from e | ||
if "duplicate entry" in str(e.orig).lower(): | ||
raise PilotAlreadyExistsError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look a bit fragile (e.g. at the moment we are effectively only supporting MySQL, but what if we add support also for e.g. PG?).
Maybe there's nothing different that can be done, but worth checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just went to the code of SQLAlchemy, there's indeed an IntegrityError
, but nothing is generic. We have to get some db-specific error: psycopg2.errors.ForeignKeyViolation
for postgres, if error_code == 2291:
for oracle, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you rely on an error code instead of relying on a string at least?
Also, it seems you are not using and testing the case where PilotAlreadyExistsError
is raised (or I possibly missed it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we check if an error is an instance of another module pymysql
we could potentially catch some errors as code that are specific on a db
. And even with that, I saw errors where people had to use both IntegrityError
from sql-alchemy and pymy
integrity error because of a bad handling..
It is not pretty, and you can read this response: https://stackoverflow.com/a/70714697
Also, it seems you are not using and testing the case where PilotAlreadyExistsError is raised (or I possibly missed it)
This part add_pilot_credentials
is not used yet but soon will be when Dirac or another entity will register pilots on DiracX and add credentials. I currently didn't catch it, because HTTPExceptions
are to be raised on a router, and in the logic it will be automatically raised.
I don't know if it is fine to raise an error from the logic and raise the same one to the router: in a way it helps understand from the logic the potential, in another, it adds code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll open an issue for this, to later fix this
5e80165
to
b22d1dc
Compare
Modified from |
536c2a5
to
a38f6ea
Compare
Tested with this Pilot PR version and worked successfully. Could retrieve a DiracX token from a Pilot. |
252da7c
to
b3822cd
Compare
If someone has a solution for this CI, I'm all ears. I moved a function as suggested above to |
8730f95
to
44310ed
Compare
if "foreign key" in str(e.orig).lower(): | ||
raise PilotNotFoundError(pilot_id=pilot_id) from e | ||
if "duplicate entry" in str(e.orig).lower(): | ||
raise PilotAlreadyExistsError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you rely on an error code instead of relying on a string at least?
Also, it seems you are not using and testing the case where PilotAlreadyExistsError
is raised (or I possibly missed it)
44310ed
to
70acf70
Compare
[DB Specific bug:]
|
c1da39c
to
be87858
Compare
9823139
to
5bec6bd
Compare
|
|
5d46d0f
to
cf2dd5b
Compare
e38c107
to
d882974
Compare
4f66b63
to
06442f4
Compare
pilots_credentials = await self.get_pilot_credentials_by_stamp([pilot_stamp]) | ||
|
||
# 2. Get the pilot secret itself | ||
secrets = await self.get_secrets_by_hashed_secrets_bulk([pilot_hashed_secret]) | ||
secret = secrets[0] # Semantic, assured by fetch_records_bulk_or_raises | ||
|
||
matches = [ | ||
pilot_credential | ||
for pilot_credential in pilots_credentials | ||
if secret["SecretID"] == pilot_credential["PilotSecretID"] | ||
] | ||
|
||
# 3. Compare the secret_id | ||
if len(matches) == 0: | ||
|
||
raise BadPilotCredentialsError( | ||
data={ | ||
"pilot_stamp": pilot_stamp, | ||
"pilot_hashed_secret": pilot_hashed_secret, | ||
"real_hashed_secret": secret["HashedSecret"], | ||
"pilot_secret_id[]": str( | ||
[ | ||
pilot_credential["PilotSecretID"] | ||
for pilot_credential in pilots_credentials | ||
] | ||
), | ||
"secret_id": secret["SecretID"], | ||
"test": str(pilots_credentials), | ||
} | ||
) | ||
elif len(matches) > 1: | ||
|
||
raise DBInBadStateError( | ||
detail="This should not happen. Duplicates in the database." | ||
) | ||
pilot_credentials = matches[0] # Semantic | ||
|
||
# 4. Check if the secret is expired | ||
now = datetime.now(tz=timezone.utc) | ||
# Convert the timezone, TODO: Change with #454: https://github.com/DIRACGrid/diracx/pull/454 | ||
expiration = secret["SecretExpirationDate"].replace(tzinfo=timezone.utc) | ||
if expiration < now: | ||
|
||
try: | ||
await self.delete_secrets_bulk([secret["SecretID"]]) | ||
except SecretNotFoundError as e: | ||
await self.conn.rollback() | ||
|
||
raise DBInBadStateError( | ||
detail="This should not happen. Pilot should have a secret, but not found." | ||
) from e | ||
|
||
raise SecretHasExpiredError( | ||
data={ | ||
"pilot_hashed_secret": pilot_hashed_secret, | ||
"now": str(now), | ||
"expiration_date": secret["SecretExpirationDate"], | ||
} | ||
) | ||
|
||
# 5. Now the pilot is authorized, increment the counters (globally and locally). | ||
try: | ||
# 5.1 Increment the local count | ||
await self.increment_pilot_local_secret_and_last_time_use( | ||
pilot_secret_id=pilot_credentials["PilotSecretID"], | ||
pilot_stamp=pilot_credentials["PilotStamp"], | ||
) | ||
|
||
# 5.2 Increment the global count | ||
await self.increment_global_secret_use( | ||
secret_id=pilot_credentials["PilotSecretID"] | ||
) | ||
except Exception as e: # Generic, to catch it. | ||
# Should NOT happen | ||
# Wrapped in a try/catch to still catch in case of an error in the counters | ||
# Caught and raised here to avoid raising a 4XX error | ||
await self.conn.rollback() | ||
|
||
raise DBInBadStateError( | ||
detail="This should not happen. Pilot has credentials, but has a corrupted secret." | ||
) from e | ||
|
||
# 6. Delete all secrets if its count attained the secret_global_use_count_max | ||
if secret["SecretGlobalUseCountMax"]: | ||
if secret["SecretGlobalUseCount"] + 1 == secret["SecretGlobalUseCountMax"]: | ||
try: | ||
await self.delete_secrets_bulk([secret["SecretID"]]) | ||
except SecretNotFoundError as e: | ||
# Should NOT happen | ||
await self.conn.rollback() | ||
raise DBInBadStateError( | ||
detail="This should not happen. Pilot has credentials, but has corrupted secret." | ||
) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I have the feeling that this function should go in
diracx.logic
, it looks like it does not directly interact withsqlalchemy
- Aren't you supposed to use the
search
method instead ofget_pilot_credentials_by_stamp
,get_secrets_by_hashed_secrets_bulk
- I haven't checked what you do with
DBInBadStateError
, but if you don't catch it anywhere then the transaction should be automatically rolled back. Seediracx/docs/dev/explanations/components/routes.md
Lines 94 to 128 in b787d8d
### SQL Databases To depend on a SQL-backed database, use the classes in `diracx.routers.dependencies`. The connection is managed through a central pool, with transactions opened for the duration of a request. Successful requests commit the transaction, while requests with HTTP status code `>=400` roll back the transaction. Connections are returned to the pool for reuse. Example: ```python from diracx.routers.dependencies import JobDB, JobLoggingDB @router.delete("/{job_id}") async def delete_single_job(job_db: JobDB, job_logging_db: JobLoggingDB): ... ``` There are advanced and uncommon scenarios where committing a transaction is necessary even when returning an error response (e.g., revoking tokens in the database and returning an error to a potentially malicious user). In such cases, explicitly committing the transaction before raising an exception is crucial. Without this explicit commit, the intended changes would be rolled back along with the transaction, leading to unintended consequences: ```python from diracx.routers.dependencies import AuthDB @router.post("/token") async def token(auth_db: AuthDB, ...) ... if refresh_token_attributes["status"] == RefreshTokenStatus.REVOKED: # Revoke all the user tokens associated with the subject await auth_db.revoke_user_refresh_tokens(sub) # Explicitly commit the transaction to ensure the revocation is saved, # even though an error will be returned to the user. await auth_db.conn.commit() # Raise an HTTP exception to signal the error raise HTTPException(status_code=401) ``` Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/pooling.html) for more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this part:
- Maybe I need yes to add it to the logic 🤔
- For the search, I'm waiting it to be generic, because for now the mechanic of search does not completely do what I want (the error for example)
- For the DBInBadStateError, I don't catch it to raise a 500 error: if I catch it and raise a 4XX error, it is bad, because the problem does not come from the client but from the server.
For the last point, I prefer when there's an error inside the DB to rollback everything, because if I insert corrupted data, it will be a mess to find which one is corrupted
status_code=status.HTTP_400_BAD_REQUEST, | ||
detail="expiration_minutes must be strictly positive.", | ||
) | ||
if pilot_secret_use_count_max and pilot_secret_use_count_max <= 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as earlier here: don't you need these checks in create_credentials
?
Why do you think the Having a |
65cd1cd
to
9ed1a7b
Compare
Facing #417 , so set |
8471d6c
to
f7f4c4a
Compare
1a2e563
to
a5d2788
Compare
e8866db
to
cc8de96
Compare
As discussed offline:
|
720f302
to
a957480
Compare
Changes
Endpoints
Adding a pilot service with some endpoints:
POST /
creates a pilot with (if not prevented) a secretDELETE /
deletes pilots by stampDELETE /interval
deletes pilots that lived more than n daysPOST /token
exchanges a pilot secret for a tokenPOST /refresh-token
refresh a pilot tokenPOST /fields/secrets
creates secretsPATCH /fields/secrets
associates a pilot with a secretPATCH /fields/jobs
associates a pilot with jobsPATCH /fields
helps modifying pilot fields (benchmark, gridsite, ...)GET /search
searchs for pilots with parametersNote
The
DELETE /interval
is there because we need it directly and because it is faster, but we can simplify it withGET /search
thenDELETE /
.Security Model
As the security model dictates, pilot secrets are strings, and hashed in the db itself.
Important
For the JWT perspective, we need to chose whether a pilot will need refresh tokens or not, and how long a token will live to implement it.
These changes are mandatory for this PR.
After offline discussions: A pilot will have a different token (refresh and access), and with a different duration.