Adding pilot registrations and authentification (Router) #421

Robin-Van-de-Merghel · 2025-03-27T08:07:18Z

Changes

Endpoints

Adding a pilot service with some endpoints:

POST / creates a pilot with (if not prevented) a secret
DELETE / deletes pilots by stamp
DELETE /interval deletes pilots that lived more than n days
POST /token exchanges a pilot secret for a token
POST /refresh-token refresh a pilot token
POST /fields/secrets creates secrets
PATCH /fields/secrets associates a pilot with a secret
PATCH /fields/jobs associates a pilot with jobs
PATCH /fields helps modifying pilot fields (benchmark, gridsite, ...)
GET /search searchs for pilots with parameters

Note

The DELETE /interval is there because we need it directly and because it is faster, but we can simplify it with GET /search then DELETE /.

Security Model

As the security model dictates, pilot secrets are strings, and hashed in the db itself.

Important

For the JWT perspective, we need to chose whether a pilot will need refresh tokens or not, and how long a token will live to implement it.

These changes are mandatory for this PR.

After offline discussions: A pilot will have a different token (refresh and access), and with a different duration.

Robin-Van-de-Merghel · 2025-03-28T09:33:18Z

The failed CI i'm not sure if I have to regenerate the client manually.

diracx-db/src/diracx/db/sql/pilot_agents/db.py

diracx-logic/src/diracx/logic/auth/token.py

diracx-routers/src/diracx/routers/auth/pilot_auth.py

diracx-routers/src/diracx/routers/pilots/access_policies.py

aldbr · 2025-03-28T10:05:29Z

The failed CI i'm not sure if I have to regenerate the client manually.

Yes, you need to regenerate the client manually, here is the documentation: https://github.com/DIRACGrid/diracx/blob/main/docs/CLIENT.md#updating-the-client

If you have any trouble, please let me know

diracx-db/src/diracx/db/sql/pilot_agents/db.py

fstagni · 2025-03-31T11:18:27Z

diracx-db/src/diracx/db/sql/pilot_agents/db.py

+            if "foreign key" in str(e.orig).lower():
+                raise PilotNotFoundError(pilot_id=pilot_id) from e
+            if "duplicate entry" in str(e.orig).lower():
+                raise PilotAlreadyExistsError(


These look a bit fragile (e.g. at the moment we are effectively only supporting MySQL, but what if we add support also for e.g. PG?).
Maybe there's nothing different that can be done, but worth checking.

Just went to the code of SQLAlchemy, there's indeed an IntegrityError, but nothing is generic. We have to get some db-specific error: psycopg2.errors.ForeignKeyViolation for postgres, if error_code == 2291: for oracle, ...

Can't you rely on an error code instead of relying on a string at least?
Also, it seems you are not using and testing the case where PilotAlreadyExistsError is raised (or I possibly missed it)

If we check if an error is an instance of another module pymysql we could potentially catch some errors as code that are specific on a db. And even with that, I saw errors where people had to use both IntegrityError from sql-alchemy and pymy integrity error because of a bad handling..

It is not pretty, and you can read this response: https://stackoverflow.com/a/70714697

Also, it seems you are not using and testing the case where PilotAlreadyExistsError is raised (or I possibly missed it)

This part add_pilot_credentials is not used yet but soon will be when Dirac or another entity will register pilots on DiracX and add credentials. I currently didn't catch it, because HTTPExceptions are to be raised on a router, and in the logic it will be automatically raised.
I don't know if it is fine to raise an error from the logic and raise the same one to the router: in a way it helps understand from the logic the potential, in another, it adds code...

I'll open an issue for this, to later fix this

diracx-db/src/diracx/db/sql/pilot_agents/db.py

Robin-Van-de-Merghel · 2025-04-01T09:53:05Z

Modified from (PilotID, secret) login request to (PilotRef, secret), see this issue I opened about it.

Robin-Van-de-Merghel · 2025-04-02T08:34:59Z

Tested with this Pilot PR version and worked successfully. Could retrieve a DiracX token from a Pilot.

Robin-Van-de-Merghel · 2025-04-02T14:22:30Z

If someone has a solution for this CI, I'm all ears.

I moved a function as suggested above to diracx.logic, and it seems to have destroyed OSDB? (I don't use OpenSearch).

diracx-logic/src/diracx/logic/auth/token.py

diracx-db/src/diracx/db/sql/pilot_agents/db.py

aldbr · 2025-04-04T12:40:18Z

diracx-db/src/diracx/db/sql/pilot_agents/db.py

+            if "foreign key" in str(e.orig).lower():
+                raise PilotNotFoundError(pilot_id=pilot_id) from e
+            if "duplicate entry" in str(e.orig).lower():
+                raise PilotAlreadyExistsError(


Can't you rely on an error code instead of relying on a string at least?
Also, it seems you are not using and testing the case where PilotAlreadyExistsError is raised (or I possibly missed it)

Robin-Van-de-Merghel · 2025-04-08T08:30:21Z

[DB Specific bug:]

(pymysql.err.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'RETURNING `PilotAgents`.`PilotID`' at line 1")
[SQL: INSERT INTO `PilotAgents` (`InitialJobID`, `CurrentJobID`, `PilotJobReference`, `PilotStamp`, `DestinationSite`, `Queue`, `GridSite`, `VO`, `GridType`, `BenchMark`, `SubmissionTime`, `LastUpdateTime`, `Status`, `StatusReason`, `AccountingSent`) VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) RETURNING `PilotAgents`.`PilotID`]
[parameters: (0, 0, 'aa', '', 'NotAssigned', 'Unknown', 'Unknown', 'diracAdmin', 'DIRAC', 0.0, datetime.datetime(2025, 4, 8, 8, 27, 35, 874664, tzinfo=datetime.timezone.utc), datetime.datetime(2025, 4, 8, 8, 27, 35, 874664, tzinfo=datetime.timezone.utc), 'Submitted', 'Unknown', 'False')]

insert(PilotAgents).values(values).returning(PilotAgents.pilot_id) is not supported in mysql, but the CI passes.

Robin-Van-de-Merghel · 2025-05-05T09:12:53Z

~~Added support for pilots in this diracx-charts PR~~

Robin-Van-de-Merghel · 2025-05-05T09:15:13Z

Could merge cli commands to have only dirac internal add-pilot*

diracx-cli/src/diracx/cli/internal/config.py

aldbr · 2025-05-09T08:29:18Z

diracx-db/src/diracx/db/sql/pilot_agents/db.py

+        pilots_credentials = await self.get_pilot_credentials_by_stamp([pilot_stamp])
+
+        # 2. Get the pilot secret itself
+        secrets = await self.get_secrets_by_hashed_secrets_bulk([pilot_hashed_secret])
+        secret = secrets[0]  # Semantic, assured by fetch_records_bulk_or_raises
+
+        matches = [
+            pilot_credential
+            for pilot_credential in pilots_credentials
+            if secret["SecretID"] == pilot_credential["PilotSecretID"]
+        ]
+
+        # 3. Compare the secret_id
+        if len(matches) == 0:
+
+            raise BadPilotCredentialsError(
+                data={
+                    "pilot_stamp": pilot_stamp,
+                    "pilot_hashed_secret": pilot_hashed_secret,
+                    "real_hashed_secret": secret["HashedSecret"],
+                    "pilot_secret_id[]": str(
+                        [
+                            pilot_credential["PilotSecretID"]
+                            for pilot_credential in pilots_credentials
+                        ]
+                    ),
+                    "secret_id": secret["SecretID"],
+                    "test": str(pilots_credentials),
+                }
+            )
+        elif len(matches) > 1:
+
+            raise DBInBadStateError(
+                detail="This should not happen. Duplicates in the database."
+            )
+        pilot_credentials = matches[0]  # Semantic
+
+        # 4. Check if the secret is expired
+        now = datetime.now(tz=timezone.utc)
+        # Convert the timezone, TODO: Change with #454: https://github.com/DIRACGrid/diracx/pull/454
+        expiration = secret["SecretExpirationDate"].replace(tzinfo=timezone.utc)
+        if expiration < now:
+
+            try:
+                await self.delete_secrets_bulk([secret["SecretID"]])
+            except SecretNotFoundError as e:
+                await self.conn.rollback()
+
+                raise DBInBadStateError(
+                    detail="This should not happen. Pilot should have a secret, but not found."
+                ) from e
+
+            raise SecretHasExpiredError(
+                data={
+                    "pilot_hashed_secret": pilot_hashed_secret,
+                    "now": str(now),
+                    "expiration_date": secret["SecretExpirationDate"],
+                }
+            )
+
+        # 5. Now the pilot is authorized, increment the counters (globally and locally).
+        try:
+            # 5.1 Increment the local count
+            await self.increment_pilot_local_secret_and_last_time_use(
+                pilot_secret_id=pilot_credentials["PilotSecretID"],
+                pilot_stamp=pilot_credentials["PilotStamp"],
+            )
+
+            # 5.2 Increment the global count
+            await self.increment_global_secret_use(
+                secret_id=pilot_credentials["PilotSecretID"]
+            )
+        except Exception as e:  # Generic, to catch it.
+            # Should NOT happen
+            # Wrapped in a try/catch to still catch in case of an error in the counters
+            # Caught and raised here to avoid raising a 4XX error
+            await self.conn.rollback()
+
+            raise DBInBadStateError(
+                detail="This should not happen. Pilot has credentials, but has a corrupted secret."
+            ) from e
+
+        # 6. Delete all secrets if its count attained the secret_global_use_count_max
+        if secret["SecretGlobalUseCountMax"]:
+            if secret["SecretGlobalUseCount"] + 1 == secret["SecretGlobalUseCountMax"]:
+                try:
+                    await self.delete_secrets_bulk([secret["SecretID"]])
+                except SecretNotFoundError as e:
+                    # Should NOT happen
+                    await self.conn.rollback()
+                    raise DBInBadStateError(
+                        detail="This should not happen. Pilot has credentials, but has corrupted secret."
+                    ) from e


I have the feeling that this function should go in diracx.logic, it looks like it does not directly interact with sqlalchemy

Aren't you supposed to use the search method instead of get_pilot_credentials_by_stamp, get_secrets_by_hashed_secrets_bulk

I haven't checked what you do with DBInBadStateError, but if you don't catch it anywhere then the transaction should be automatically rolled back. See

diracx/docs/dev/explanations/components/routes.md

Lines 94 to 128 in b787d8d

### SQL Databases

To depend on a SQL-backed database, use the classes in `diracx.routers.dependencies`. The connection is managed through a central pool, with transactions opened for the duration of a request. Successful requests commit the transaction, while requests with HTTP status code `>=400` roll back the transaction. Connections are returned to the pool for reuse.

Example:

```python

from diracx.routers.dependencies import JobDB, JobLoggingDB

@router.delete("/{job_id}")

async def delete_single_job(job_db: JobDB, job_logging_db: JobLoggingDB): ...

```

There are advanced and uncommon scenarios where committing a transaction is necessary even when returning an error response (e.g., revoking tokens in the database and returning an error to a potentially malicious user). In such cases, explicitly committing the transaction before raising an exception is crucial. Without this explicit commit, the intended changes would be rolled back along with the transaction, leading to unintended consequences:

```python

from diracx.routers.dependencies import AuthDB

@router.post("/token")

async def token(auth_db: AuthDB, ...)

...

if refresh_token_attributes["status"] == RefreshTokenStatus.REVOKED:

# Revoke all the user tokens associated with the subject

await auth_db.revoke_user_refresh_tokens(sub)

# Explicitly commit the transaction to ensure the revocation is saved,

# even though an error will be returned to the user.

await auth_db.conn.commit()

# Raise an HTTP exception to signal the error

raise HTTPException(status_code=401)

```

Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/pooling.html) for more details.

For this part:

Maybe I need yes to add it to the logic 🤔

For the search, I'm waiting it to be generic, because for now the mechanic of search does not completely do what I want (the error for example)

For the DBInBadStateError, I don't catch it to raise a 500 error: if I catch it and raise a 4XX error, it is bad, because the problem does not come from the client but from the server.

For the last point, I prefer when there's an error inside the DB to rollback everything, because if I insert corrupted data, it will be a mess to find which one is corrupted

diracx-db/src/diracx/db/sql/pilot_agents/db.py

diracx-logic/src/diracx/logic/auth/pilot.py

diracx-routers/src/diracx/routers/pilots/fields.py

aldbr · 2025-05-09T09:55:44Z

diracx-routers/src/diracx/routers/pilots/fields.py

+            status_code=status.HTTP_400_BAD_REQUEST,
+            detail="expiration_minutes must be strictly positive.",
+        )
+    if pilot_secret_use_count_max and pilot_secret_use_count_max <= 0:


Same question as earlier here: don't you need these checks in create_credentials?

diracx-logic/src/diracx/logic/pilots/management.py

diracx-routers/src/diracx/routers/auth/token.py

diracx-db/src/diracx/db/sql/pilot_agents/schema.py

diracx-db/src/diracx/db/sql/utils/__init__.py

diracx-db/src/diracx/db/sql/utils/functions.py

diracx-logic/src/diracx/logic/auth/pilot.py

aldbr · 2025-05-12T11:37:04Z

@fstagni

It is true that:
a pilot is associated to at most 1 secret
a secret could be associated to more than a pilot, but 1 by default.
This table looks to me it is still needed.

Why do you think the PilotToSecretMapping is still needed if we have a 1-N relationship?
In PilotAgents we can have a secret_id foreign key that would point to PilotSecrets.secret_id.

Having a PilotToSecretMapping implies a N-N relationship I think.

Robin-Van-de-Merghel · 2025-05-13T13:40:19Z

Facing #417 , so set require_auth to False.

diracx-cli/src/diracx/cli/internal/config.py

diracx-core/src/diracx/core/exceptions.py

diracx-logic/src/diracx/logic/pilots/utils.py

diracx-routers/src/diracx/routers/pilots/management.py

diracx-db/tests/pilot_agents/test_pilot_auth.py

Robin-Van-de-Merghel · 2025-05-26T09:42:38Z

As discussed offline:

We won't use pilot user anymore
We will separate pilot route into pilots/ and pilot_management/
Users and Pilots will have a different token, and we will have to separate one from the other
Pilot refresh tokens will last more than user's, and we will refresh tokens as we fetch data from the Pilot

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 3 times, most recently from e74fe72 to 9d1c062 Compare March 28, 2025 09:11

Robin-Van-de-Merghel marked this pull request as ready for review March 28, 2025 09:31

aldbr reviewed Mar 28, 2025

View reviewed changes

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from a269416 to 8645c01 Compare March 28, 2025 13:09

fstagni reviewed Mar 31, 2025

View reviewed changes

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 3 times, most recently from 5e80165 to b22d1dc Compare April 1, 2025 07:19

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 536c2a5 to a38f6ea Compare April 2, 2025 08:03

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 252da7c to b3822cd Compare April 2, 2025 13:54

ryuwd reviewed Apr 2, 2025

View reviewed changes

diracx-logic/src/diracx/logic/auth/token.py Outdated Show resolved Hide resolved

aldbr reviewed Apr 2, 2025

View reviewed changes

diracx-db/src/diracx/db/sql/pilot_agents/db.py Outdated Show resolved Hide resolved

Robin-Van-de-Merghel requested a review from aldbr April 3, 2025 07:37

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 4 times, most recently from 8730f95 to 44310ed Compare April 4, 2025 07:56

aldbr reviewed Apr 4, 2025

View reviewed changes

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 44310ed to 70acf70 Compare April 7, 2025 14:30

Robin-Van-de-Merghel mentioned this pull request Apr 7, 2025

SQLAlchemy bad error handling #450

Open

Robin-Van-de-Merghel mentioned this pull request Apr 8, 2025

Testing db with Postgres or Mysql directly #455

Closed

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from c1da39c to be87858 Compare April 8, 2025 12:18

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 9823139 to 5bec6bd Compare May 2, 2025 15:29

Robin-Van-de-Merghel mentioned this pull request May 2, 2025

Add group for pilots, and pilot user in the CS DIRACGrid/diracx-charts#153

Closed

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 2 times, most recently from 5d46d0f to cf2dd5b Compare May 7, 2025 12:12

Robin-Van-de-Merghel mentioned this pull request May 7, 2025

Pilot Migration #520

Open

39 tasks

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from e38c107 to d882974 Compare May 8, 2025 09:06

Robin-Van-de-Merghel requested a review from aldbr May 8, 2025 09:15

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 2 times, most recently from 4f66b63 to 06442f4 Compare May 9, 2025 08:44

aldbr reviewed May 9, 2025

View reviewed changes

fstagni reviewed May 9, 2025

View reviewed changes

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 65cd1cd to 9ed1a7b Compare May 12, 2025 12:51

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 8471d6c to f7f4c4a Compare May 14, 2025 08:57

fstagni reviewed May 14, 2025

View reviewed changes

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch 3 times, most recently from 1a2e563 to a5d2788 Compare May 15, 2025 07:19

Robin-Van-de-Merghel marked this pull request as ready for review May 15, 2025 07:41

Robin-Van-de-Merghel requested review from aldbr and fstagni May 15, 2025 07:41

Robin-Van-de-Merghel mentioned this pull request May 15, 2025

feat: Add DiracX support into the database. DIRACGrid/DIRAC#8196

Draft

2 tasks

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from e8866db to cc8de96 Compare May 21, 2025 13:58

Robin-Van-de-Merghel mentioned this pull request May 22, 2025

Robin pilot logging #550

Open

feat: Adding pilot registrations, with secret management

a957480

Robin-Van-de-Merghel force-pushed the robin-pilot-registrations branch from 720f302 to a957480 Compare May 26, 2025 14:12

	### SQL Databases

	To depend on a SQL-backed database, use the classes in `diracx.routers.dependencies`. The connection is managed through a central pool, with transactions opened for the duration of a request. Successful requests commit the transaction, while requests with HTTP status code `>=400` roll back the transaction. Connections are returned to the pool for reuse.

	Example:

	```python
	from diracx.routers.dependencies import JobDB, JobLoggingDB


	@router.delete("/{job_id}")
	async def delete_single_job(job_db: JobDB, job_logging_db: JobLoggingDB): ...
	```

	There are advanced and uncommon scenarios where committing a transaction is necessary even when returning an error response (e.g., revoking tokens in the database and returning an error to a potentially malicious user). In such cases, explicitly committing the transaction before raising an exception is crucial. Without this explicit commit, the intended changes would be rolled back along with the transaction, leading to unintended consequences:

	```python
	from diracx.routers.dependencies import AuthDB

	@router.post("/token")
	async def token(auth_db: AuthDB, ...)
	...
	if refresh_token_attributes["status"] == RefreshTokenStatus.REVOKED:
	# Revoke all the user tokens associated with the subject
	await auth_db.revoke_user_refresh_tokens(sub)

	# Explicitly commit the transaction to ensure the revocation is saved,
	# even though an error will be returned to the user.
	await auth_db.conn.commit()

	# Raise an HTTP exception to signal the error
	raise HTTPException(status_code=401)
	```

	Refer to the [SQLAlchemy documentation](https://docs.sqlalchemy.org/en/20/core/pooling.html) for more details.

Adding pilot registrations and authentification (Router) #421

Are you sure you want to change the base?

Adding pilot registrations and authentification (Router) #421

Uh oh!

Conversation

Robin-Van-de-Merghel commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Endpoints

Security Model

Uh oh!

Robin-Van-de-Merghel commented Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aldbr commented Mar 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Robin-Van-de-Merghel commented Apr 1, 2025

Uh oh!

Robin-Van-de-Merghel commented Apr 2, 2025

Uh oh!

Robin-Van-de-Merghel commented Apr 2, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Robin-Van-de-Merghel commented Apr 8, 2025

Uh oh!

Robin-Van-de-Merghel commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Robin-Van-de-Merghel commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Robin-Van-de-Merghel commented Mar 27, 2025 •

edited

Loading

Robin-Van-de-Merghel commented Mar 28, 2025 •

edited

Loading

Robin-Van-de-Merghel commented May 5, 2025 •

edited

Loading

Robin-Van-de-Merghel commented May 5, 2025 •

edited

Loading

aldbr commented May 12, 2025 •

edited

Loading