wfx provides two RESTful APIs to interact with it: the northbound operator/management interface and the southbound interface used by clients as illustrated in the following figure:
Management
│
│
▼ Northbound API
┌──────────────────┐
│ wfx │
└──────────────────┘
▲ Southbound API
│
│
Device
The northbound API is used to create jobs and execute server-side state transitions, whereas the southbound API is used for client-side transitions.
The complete wfx API specification is accessible at runtime via the /api/wfx/v1/openapiv3.json
endpoint.
Clients may inspect this specification at run-time so to obey the various limits imposed, e.g, for parameter value ranges and array lengths.
Job events provide a notification mechanism that informs clients about certain operations happening on jobs. This approach eliminates the need for clients to continuously poll wfx, thus optimizing network usage and client resources. Job events can be useful for user interfaces (UIs) and other applications that demand near-instantaneous updates.
Below is a high-level overview of how the communication flow operates:
┌────────┐ ┌─────┐
│ Client │ │ wfx │
└────────┘ └─────┘
| |
| HTTP GET /jobs/events |
|-----------------------------------►|
| |
| |
| Event Loop |
┌───|────────────────────────────────────|───┐
│ | [Content-Type: text/event-stream] | │
│ | | │
│ | | │
│ | Push Event | │
│ |◄-----------------------------------| │
│ | | │
└───|────────────────────────────────────|───┘
| |
▼ ▼
┌────────┐ ┌─────┐
│ Client │ │ wfx │
└────────┘ └─────┘
- The client initiates communication by sending an HTTP
GET
request to the/jobs/events
endpoint. Clients may also include optional filter parameters within the request. - Upon receipt of the request,
wfx
sets theContent-Type
header totext/event-stream
. - The server then initiates a stream of job events in the response body, allowing clients to receive instant updates.
The job events stream is composed of server-sent events (SSE). Accordingly, the stream is structured as follows:
data: [...]
id: 1
data: [...]
id: 2
[...]
An individual event within the stream conforms to this format:
data: { "action": "<ACTION>", "ctime": <CTIME>, "tags": <TAGS>, "job": <JOB> }
id: <EVENT_ID>\n\n
Note: Each event is terminated by a pair of newline characters \n\n
(as required by the SSE spec).
The semantics of the individual fields is:
<ACTION>
specifies the type of event that occurred. The valid actions are:CREATE
: a new job has been createdDELETE
: an existing job has been deletedADD_TAGS
: tags were added to a jobDELETE_TAGS
: tags were removed from a jobUPDATE_STATUS
: job status has been updatedUPDATE_DEFINITION
: job definition has been updated
<CTIME>
: event creation time (ISO8601)<TAGS>
: JSON array of tags as provided by the client<JOB>
is a JSON object containing the portion of the job object which was changed, e.g., for anUPDATE_STATUS
event, the job status is sent but not its definition. To enable filtering, the fieldsid
,clientId
andworkflow.name
are always part of the response.<EVENT_ID>
: an integer which uniquely identifies each event, starting at 1 and incrementing by 1 for every subsequent event. Clients can use this to identify any missed messages. If an overflow occurs, the integer resets to zero, a scenario the client can recognize and address.
Example:
data: {"action":"UPDATE_STATUS","job":{"clientId":"Dana","id":"c6698105-6386-4940-a311-de1b57e3faeb","status":{"definitionHash":"adc1cfc1577119ba2a0852133340088390c1103bdf82d8102970d3e6c53ec10b","state":"PROGRESS"},"workflow":{"name":"wfx.workflow.kanban"}}}
id: 1\n\n
Job events can be filtered using any combination of the following parameters:
- Job IDs
- Client IDs
- Workflow Names
This enables more precise control over the dispatched events. Note that it is entirely possible to subscribe multiple times to job events using various filters in order to create a more advanced event recognition model.
wfxctl
offers a reference client implementation. The following command subscribes to all job events:
wfxctl job events
This may result in a large number of events, though. For a more targeted approach, filter parameters may be used. Assuming the job IDs are known (either because the jobs have been created already or the IDs are received via another subscription channel), the following will subscribe to events matching either of the two specified job IDs:
wfxctl job events --job-id=d305e539-1d41-4c95-b19a-2a7055c469d0 --job-id=e692ad92-45e6-4164-b3fd-8c6aa884011c
See wfxctl job events --help
for other filter parameters, e.g. workflow names.
- Asynchronous Job Status Updates: Job status updates are dispatched asynchronously to avoid the risk of a
subscriber interfering with the actual job operation. In particular, there is no guarantee that the messages sent to
the client arrive in a linear order (another reason for that may be networking-related). While this is typically not
a concern, it could become an issue in high-concurrency situations. For example, when multiple clients try to modify
the same job or when a single client issues a rapid sequence of status updates. As a result, messages could arrive in
a not necessarily linear order, possibly deviating from the client's expectation. However, the client can use the
(event)
id
andctime
fields to establish a natural ordering of events as emitted by wfx. - Unacknowledged Server-Sent Events (SSE): SSE operates on a one-way communication model and does not include an acknowledgment or handshake protocol to confirm message delivery. This design choice aligns with the fundamental principles of SSE but does mean that there's a possibility some events may not reach the intended subscriber (which the client can possibly detect by keeping track of SSE event IDs).
- Event Stream Orchestration: Each wfx instance only yields the events happening on that instance. Consequently, if there are multiple wfx instances, a consolidated "global" event stream can only be assembled by subscribing to all wfx instances (and aggregating the events).
- Browser Connection Limits for SSE: Web browsers typically restrict the number of SSE connections to six per domain. To overcome this limitation, HTTP/2 can be used, allowing up to 100 connections by default, or filter parameters can be utilized to efficiently manage the connections.
wfx allows server-side response content filtering prior to sending the response to the client so to tailor it to client information needs.
For example, if a client isn't interested in the job's history, it may request to omit it in wfx responses ― which also saves bandwidth.
To this end, clients send a custom HTTP header, X-Response-Filter
, with a jq
-like expression value.
For example, assuming a job with ID 1 exists,
curl -s -f http://localhost:8080/api/wfx/v1/jobs/1/status \
-H "X-Response-Filter: .state"
returns the current state
of the job as a string.
Note that the (filtered) response might no longer be a valid JSON expression as is the case in this example. It's the client's responsibility to handle the filtered response properly ― which it asked for being filtered in the first place.
wfx includes an internal health check service that's accessible at /health
, e.g., via
curl http://localhost:8080/health
in standard configuration or, alternatively,
wfxctl health
The version of wfx running is accessible at /version
, e.g., via
curl http://localhost:8080/version
in standard configuration or, alternatively,
wfxctl version
To assist a secure deployment, both, the northbound operator/management interface and the southbound client interface, are isolated from each other by being bound to distinct ports so that, e.g., a firewall can be used to steer and restrict access.
While this separation provides a basic level of security, it doesn't prevent clients interfering with each other:
For example, there is no mechanism in place to prevent a client A
from updating the jobs of another client B
.
This is a deliberate design choice following the Unix philosophy of "Make each program do one thing well".
It allows for flexible integration of wfx into existing or new infrastructure.
Existing infrastructure most probably has request authentication and authorization measures in place.
For new infrastructure, such a feature is likely better provided by specialized components/services supporting the overall deployment security architecture.
Thus, for productive deployments, a deployment along the lines of the following figure is recommended with an API Gateway subsuming the discussed security requirements and performing access steering with regard to, e.g., client access.
┌──────────────────┐
│ Operator / │
│ Management │
└──────────────────┘
│
│ Request Authentication & Authorization
▼
┌──────────────────┐
│ API Gateway │
└──────────────────┘
│
│ Northbound: Management API
▼
┌──────────────────┐
│ wfx │
└──────────────────┘
▲
│ Southbound: Client API
│
┌──────────────────┐
│ API Gateway │
└──────────────────┘
▲
│ Request Authentication & Authorization
│
┌──────────────────┐
┌┴─────────────────┐│
│ Client ├┘
└──────────────────┘
wfx offers a flexible (out-of-tree) plugin mechanism for extending its request processing capabilities. A plugin functions as a subprocess, both initiated and supervised by wfx. Communication between the plugin and wfx is facilitated through the exchange of flatbuffer messages over stdin/stdout, thus permitting plugins to be developed in any programming language.
Due to the potential use of plugins for authentication, it is critical that all requests are passed through the plugins before further processing and that that no request can slip through without being processed by the plugins. This has led to the following deliberate design choices:
- Should any plugin exit (e.g., due to a crash), wfx is designed to terminate gracefully. While it might be feasible for wfx to attempt restarting the affected plugins, this responsibility is more suitably handled by a dedicated process supervisor like systemd. The shutdown of wfx enables the process supervisor to restart wfx, which, in turn, starts all its plugins again.
- All plugins are initialized before wfx starts processing any requests. In particular, after the completion of wfx's startup phase, it's not possible to add or remove any plugins.
- Plugins are expected to function properly. Specifically, if a plugin returns an invalid response type or an unexpected response (for example, in response to a request that was never sent to the plugin), wfx will terminate gracefully. This is because such behavior usually indicates a misconfiguration. The overall strategy is to fail fast and early.
To use plugins at runtime, wfx must be compiled with the plugin
tag (enabled by default) and started with the
--mgmt-plugins-dir
resp. --client-plugins-dir
flag, specifying a directory containing the plugins to be used. This
enables the use of different plugin sets for the north- resp. southbound API.
Note: In a plugin directory, all executable files (including symlinks to executables) are assumed to be plugins. Non-executable files, like configuration files, are excluded. For deterministic behavior, plugins are sorted and executed in lexicographic order based on their filenames during the startup of wfx.
Communication between wfx and a plugin is achieved by exchanging flatbuffer messages via
stdin/stdout. The flatbuffer specification is available in the fbs directory.
A plugin can use stderr
for logging purposes (stderr
is forwarded and prefixed by wfx).
For every incoming request, wfx generates a unique number called cookie
. The cookie
, along with the complete request
(e.g., headers and body in the case of HTTP), is written to the plugin's stdin. The plugin then sends its response,
paired with the same cookie
, back to wfx by writing to its stdout. This cookie
mechanism ensures that wfx can
accurately associate responses with their corresponding requests.
Technical Note: Cookies are represented as unsigned 64-bit integers, which may lead to wraparound. This means there is a slight possibility that a cookie could be reused for more than one request over the lifespan of a plugin. However, this event occurs only once every 2^64 requests. By the time such a reuse might happen, the original request associated with the cookie would have already timed out.
Note:
- It is crucial for the plugin to read data from its stdin descriptor promptly to prevent blocking writes by wfx. The
cookie
mechanism facilitates (and encourages!) asynchronous processing. - The working directory for the plugin process is the same as the working directory of wfx, which is the directory from which wfx was launched.
Based on the plugin's response, wfx can:
- Modify the incoming request before it undergoes further processing by wfx in the usual manner.
- Send a preemptive response back to the client, such as a "permission denied" or "service unavailable" message.
- Leave the request unchanged.
Plugins are typically used for:
- Enforcing authentication and authorization for API endpoints.
- Handling URL rewriting and redirection tasks.
An example plugin written in Go demonstrates denying access to the /api/wfx/v1/workflows
endpoint.
No telemetry or user data is collected or processed by wfx.
Note that there is an indirect dependency on go.opentelemetry.io/otel
via Go OpenAPI by the client runtime (as used by wfxctl
).
Telemetry is deliberately turned off in wfx.
See Add support for tracing via OpenTelemetry for details.
wfx has been designed with performance and horizontal scalability in mind.
To regress-test and gauge the performance of wfx in particular scenarios, the wfx-loadtest
tool stressing the REST API
can be helpful.
It's build by default alongside the other wfx binaries, see Section Building wfx.
As an example, the following commands execute a benchmark of the default SQLite persistent storage:
wfx --log-format json --log-level warn &
wfx-loadtest --log-level warn --duration 60s
Note: In the above example, the log format is JSON since pretty-printing is an expensive operation.
The benchmark result including statistics is printed to the terminal after 60 seconds and also available in the
results
directory for further inspection.
*******************************************************************************
Summary
*******************************************************************************
Requests [total, rate, throughput] 6000, 100.02, 100.02
Duration [total, attack, wait] 59.987s, 59.986s, 792.516µs
Latencies [min, mean, 50, 90, 95, 99, max] 187.363µs, 612.69µs, 602.423µs, 744.281µs, 809.622µs, 1.062ms, 7.242ms
Bytes In [total, mean] 4041243, 673.54
Bytes Out [total, mean] 81693, 13.62
Success [ratio] 100.00%
Status Codes [code:count] 200:5625 201:375
Error Set:
and is to be interpreted as follows: There were 6,000 total requests at a rate of 100.02 requests per second. In terms of latency, the minimum response time was 187.363 microseconds, the mean was 612.69 microseconds, and the 99th percentile was 1.062 milliseconds. All requests were successful with a success ratio of 100%. The status codes indicate that a total of 5,625 status updates were sent with HTTP error code 200 and 375 jobs were created with HTTP error code 201. No errors were reported.
The latency over time distribution is illustrated in the following figure: