The API part of the TESK project. For a general overview of the project, please visit TESK core project page
TESK is an implementation of GA4GH TES standard for Kubernetes and TESK API exposes TES-compliant endpoints.
TESK uses Kubernetes Batch API (Jobs) to schedule execution of TES tasks. The result of task execution is stored in and retrieved directly from Kubernetes API objects (Jobs and Pods objects); TESK currently does not use any other storage (DB) than Kubernetes itself. Persistent Volume Claims are used as a temporary storage to handle input and output files of a task and pass it over between executors of a task and are destroyed immediately after task completion.
When API gets a request to create a new TES task, TESK performs following steps:
- API creates a Taskmaster - a new single Kubernetes Job per task, that handles the logic of processing inputs, outputs and running task's executors in a sequence. API passes a parameter to Taskmaster - a JSON object composed of - unchanged - TES Inputs/Outputs/Volumes and TES executors packaged as Kubernetes Job objects. Request finishes immediately without waiting for the task completion. Auto-generated task ID is returned as a response.
- If any Inputs/Outputs/Volumes are present Taskmaster creates a single PVC per task.
- Taskmaster creates Inputs Filer, yet another Kubernetes Job (one per task), adds PVC from step 2. to it and maps Input/Output/Volume directories as volume mounts pointing to subpaths of the PVC.
- Inputs Filer downloads input files (currently supports HTTP and FTP), places them at paths inside PVC and finishes.
- Taskmaster for each executor in a list: adds PVC as a volume, maps Inputs/Outputs/Volumes as volume mounts (the same way as for Filer), creates Kubernetes Job and waits until job's completion (by polling get job endpoint). If job fails, Taskmaster finishes. If Executor job succeeds, Taskmaster repeats all the steps for next executor in the list. Taskmaster's loop can be interrupted by cancelling a task. If Taskmaster detects task cancellation, it immediately finishes.
- Each Executor runs actual computation and records its results in mounted PVC (or in stdout only).
- After successful completion of all Executors, Taskmaster creates Outputs Filer, yet another Kubernetes Job (one per task), adds PVC as its volume and maps Inputs/Outputs/Volumes as in previous steps.
- Outputs Filer upload output files from PVC to external storage (currently FTP) and finishes.
- Taskmaster deletes PVC and finishes.
When API gets a request to get details of a single task or a list of tasks, than it calls Kubernetes API endpoints, to retrieve: K8s Job objects corresponding to Taskmaster, Executors and Filers (of a single task or a list of tasks) and K8s Pod objects created by those Jobs. It also gets taskmaster's and executors' pod logs. Matching API objects (Jobs and Pods) and TES tasks heavily relies on the use of K8s labels. After retrieving all needed objects API uses its own logic to combine them in TES task details response.
When API gets a request to cancel a task, it labels both Taskmaster's Job and Pod objects with cancelled status. API then uses Job label to determine task's CANCELED status. Pod's label gets populated to Downward API file, which changes Taskmaster listens to. If Taskmaster detects a change in labels, it stops an execution of a currently running executor and finishes.
TESK supports OAuth2/OIDC to authorise API requests. Authentication and authorisation are optional and can be turned off completely. When turned on, TESK API expects an OIDC access token in Authorization Bearer header. TESK can be integrated with any standard OIDC provider, but the solution has been designed to support Elixir AAI in the first place and the authorisation part relies on Elixir's group model. For details, please see Authentication and Authorisation
TES spec does not define expected behaviour in case of errors. TESK API uses following HTTP codes:
HTTP Status Code | Meaning |
---|---|
200 | OK - successful request |
400 | Bad request: validation error, cancellation of not running task |
401 | Unauthorized - no access token in request, or the token is invalid |
403 | Forbidden - user is not authorised to perform API call (usually it means, user does not belong to the required Eixir group) |
404 | (Task) with a given ID not found |
500 | Something else (possibly configuration) went wrong |
- TESK API is a Maven Spring Boot (currently 2.0.1) application written in Java.
- API server stub is generated automatically with Swagger Code Generator (specifically Swagger Codegen Maven Plugin).
- Official Kuberneres API Java Client is used to communicate with Kubernetes API.
API can run both from outside and inside of Kubernetes cluster. To run API locally, run:
mvn spring-boot:run
then point your browser to http://localhost:8080, which should present Swagger UI page with API endpoints list.
To run from inside Kubernetes, API has to be packaged as a Docker image. Dockerfile is available here.
Current version of the image is hosted at GCR: eu.gcr.io/tes-wes/tesk-api:v0.3.0.2
Deployment descriptors for different K8s environments reside in TESK core project and description of how to install TESK at Kubernetes (including API) also lives there. Detailed explanation of TESK API configuration below
Authentication and authorisation at Kubernetes API are done by Kubernetes API Client (not well documented, refer to source code instead). When running API outside of Kubernetes, we used KUBECONFIG file succesfully. When run from within the cluster, service account is used. In both cases (in/external) API needs to run with permissions sufficient to create and query Jobs and Pods. We used built-in admin
and edit
roles successfully, more refined roles are on the way. When running from inside of the cluster, service account is defined in spec.template.spec.serviceAccountName
parameter of TESK API deployment file
TESK API contains a set of properties listed in application.properties file, that can be changed using any of the methods supported by Spring Boot. We usually use environment variables. When run from within the cluster, API should be deployed as K8s Deployment such as defined in TESK Core project and environment variables can be defined there ( spec.template.spec.containers[0].env
).
The meaning of chosen environment variables:
Environment variable | Meaning |
---|---|
TESK_API_TASKMASTER_DEBUG |
If true will switch on more verbose logs of the taskmaster (that will appear in a TESK task FULL response). false by default. |
TESK_API_TASKMASTER_SERVICE_ACCOUNT_NAME |
Service account with which each new taskmaster job will be created. Needs to have sufficient privileges granted (default edit role can be used). If not set, defaults to default . |
TESK_API_TASKMASTER_IMAGE_NAME |
The full name of taskmaster image, the API will use, when creating new taskmaster jobs. If not set, eu.gcr.io/tes-wes/taskmaster will be used. |
TESK_API_TASKMASTER_IMAGE_VERSION |
Version of taskmaster image, the API will use, when creating new taskmaster jobs. If not set, should default to stable version of taskmaster. |
TESK_API_TASKMASTER_FILER_IMAGE_NAME |
The full name of filer image, passed on as a parameter to the taskmaster. The taskmaster will create Inputs/Outputs filers using this image. If omitted, eu.gcr.io/tes-wes/filer will be used. |
TESK_API_TASKMASTER_FILER_IMAGE_VERSION |
Version of filer image, passed on as a parameter to taskmaster. Taskmaster will create Inputs/Outputs filer using the image in this version. If omitted, should default to latest stable version. |
TESK_API_K8S_NAMESPACE |
K8s namespace, where all the Job objects will be created. If omitted, defaults to default . |
TESK_API_TASKMASTER_FTP_SECRET_NAME |
Name of K8s secret storing credentials to a single FTP account. FTP account is used to demonstrate uploading output files to external storage. If ENV variable is set, FTP username and password will be included by API as taskmaster ENV variables. Otherwise (TESK_API_TASKMASTER_FTP_SECRET_NAME env variable not set), TESK should still work, but without the ability to upload files to a private FTP server. |
TESK_API_TASKMASTER_ENVIRONMENT_* |
Variables passed through to taskmaster as environment variables (the prefix TESK_API_TASKMASTER_ENVIRONMENT_ is stripped, so when you define TESK_API_TASKMASTER_ENVIRONMENT_XXX , the taskmaster will get XXX ). Of those currently implemented in taskmaster: TRANSFER_PVC_NAME , HOST_BASE_PATH , CONTAINER_BASE_PATH have been used to implement TESK using shared filesystem instead of FTP to exchange inputs and outputs. EXECUTOR_BACKOFF_LIMIT and FILER_BACKOFF_LIMIT decide how many times executor/filer jobs will retry pods on error. |
TESK_API_TASKMASTER_EXECUTOR_SECRET_NAME |
A name of a secret that (if variable not empty) will be mounted to each executor as a volume. The secret can contain multiple files. |
TESK_API_TASKMASTER_EXECUTOR_SECRET_MOUNT_PATH |
A path, where a secret named TESK_API_TASKMASTER_EXECUTOR_SECRET_NAME will be mounted. Defaults to /secret . |
SPRING_PROFILES_ACTIVE |
(default) noauth - authN/Z switched off. auth - authN/Z switched on. |
TESK_API_AUTHORISATION_* |
A set of env variables configuring authorisation using Elixir group membership |
TESK_API_SWAGGER_OAUTH_* |
A set of env variables configuring OAuth2/OIDC client built in Swagger UI |
The current version of TES specification (v1.0) is linked as a submodule. In case of the specification upgrade, the submodule needs to be updated. Then you need to run
mvn clean generate-sources -P generate-swagger
That will generate new versions of model and API stub files. The project model objects contains additional Bean Validation annotations than those derived from the spec, and those need to be manually restored in auto-generated code. Currently, the manual reconciliation of changes is always necessary after model regeneration.