NOTE this repository contains both the monolith API backend code (inside of api/
) and the new port to some updated patterns based on fastapi (see: ooniapi/
)
The backend infrastructure performs multiple functions:
-
Provide APIs for data consumers
-
Instruct probes on what measurements to perform
-
Receive measurements from probes, process them and store them in the database
-
Upload new measurements to a bucket on S3 data bucket
-
Fetch data from external sources e.g. fingerprints from a GitHub repository
OONI Probes will run generally once every hour or every day, depending on the platform. As part of these runs the sequence diagram of a probe run looks like the following:
sequenceDiagram
participant OONIProbe as OONI Probe
participant ProbeServices as OONI Backend
participant Internet
OONIProbe ->>+ Internet: lookupProbeMeta()
Internet ->>- OONIProbe: ProbeMeta
OONIProbe ->>+ ProbeServices: checkIn(ProbeMeta)
ProbeServices -->>- OONIProbe: []Targets
loop Every target
OONIProbe ->>+ Internet: runExperiment(target)
opt Control
OONIProbe ->>+ ProbeServices: runControl(target)
ProbeServices ->>- OONIProbe: CtrlMeasurement
end
Internet ->>- OONIProbe: Measurement
OONIProbe ->> ProbeServices: upload(Measurement)
end
The following diagram on the other hand, represents the main flow of measurement data.
The dark rectangles represent processes. The cilinders represent data at rest: as files on disk, files on S3 or records in database tables.
flowchart LR
A(("Measurement")):::measurement --> B["Measurement is uploaded"]
B --> C["Fastpath (realtime)"]:::gray8Node & D["Disk Queue"]
C --> E["Fastpath Table"]:::gray3Node@{ shape: cyl}
D --> F["S3 Uploader (every hour)"]:::gray8Node
F --> G["s3://ooni-data-eu-fra bucket"]@{shape: cyl}
E --> H["OONI API"]:::gray8Node
D --> decision{"`is older than 1h?`"}
G --> decision
decision --> H
G --> PipelineV5["OONI Pipeline v5 (every day)"]:::gray8Node
PipelineV5 --> O["Observation Tables"]:::gray3Node@{ shape: cyl}
O --> H
classDef measurement fill:#0588cb,color:#fff
classDef gray2Node fill:#e9ecef,color:#000000
classDef gray3Node fill:#ced4da,color:#000000
classDef gray8Node fill:#343a40,color:#fff
Probes submit measurements to the API with a POST at the following path: https://api.ooni.io/apidocs/#/default/post_report__report_id_ The measurement is optionally decompressed if zstd compression is detected. It is then parsed and added with a unique ID and saved to disk. Very little validation is done at this time in order to ensure that all incoming measurements are accepted.
Measurements are enqueued on disk using one file per measurement. On hourly intervals they are batched together, compressed and uploaded to S3 by the Measurement uploader ⚙. The batching is performed to allow efficient compression. See the dedicated subchapter ⚙ for details.
The measurement is also sent to the Fastpath ⚙. The Fastpath runs as a dedicated daemon with a pool of workers. It calculates scoring for the measurement and writes a record in the fastpath table. Each measurement is processed individually in real time. See the dedicated subchapter ⚙ below.
The disk queue is also used by the API to access recent measurements that have not been uploaded to S3 yet. See the measurement API 🐝 for details.
The measurement processing pipeline is meant to generate outputs that can be equally generated by 3rd parties like external researchers and other organizations.
This is meant to keep OONI accountable and as a proof that we do not arbitrarily delete or alter measurements and that we score them as accessible/anomaly/confirmed/failure in a predictable and transparent way.
important The only exceptions were due to privacy breaches that required removal of the affected measurements from the S3 data bucket 💡 bucket.
As such, the backend infrastructure is FOSS and can be deployed by 3rd parties. We encourage researchers to replicate our findings.
Incoming measurements are minimally altered by the Measurement uploader ⚙ and uploaded to S3.