Repour archives source code from many untrusted origins into trusted git repositories. It supports git. Tools like PME can optionally be applied as adjust
ments, and the results archived alongside the unmodified sources.
Why the name "Repour"? First, it contains the common short word for "repository" ("repo"); second, the action of "repouring" could be used to describe what the program does to the repositories.
Repour is not a source repository mirroring service, it archives source code for builds. This may sound the same but there is a big difference. Repour solves an entire class of problems surrounding the unreliability of "SCM coordinates" (a SCM url and commit/reference/revision), that a mirror is not able to solve.
A mirror can only be an exact copy of the upstream repository. Critically, any modifications to existing objects must be propagated, or the mirror ceases to be a mirror. If the upstream repository is not history protected, then neither the upstream repository or the mirror are suitable to build from. If the upstream repository is history protected, then there is no need for a mirror (except to ensure server availability). However, since virtually all upstreams are not history protected, or only weakly protected, a different solution is required.
How can the mutability conflict be handled if a mirror can’t solve it? Strip everything back to the minimum required and go from there.
The only firm requirement of storing source code for builds is that the same coordinates, point to the same tree of files, forever. Everything else is a nice-to-have, including the upstream history.
This section describes how Repour is intended to be employed within the collection of services that form a build system.
Repour does not mirror or in any way store more from the origins than the files that are required to build from. In exchange Repour provides these guarantees:
The git URL and tag Repour returns will always be valid, and will always point to the same file tree. This is an important part of ensuring builds can be reproduced.
For the same name
and file tree, the same tag will be returned, as was returned the first time the tree was seen. This can be used to avoid rebuilding the same sources.
Repour needs a git repository provider that grants it the following permissions:
-
Create repositories
-
Clone repositories
-
Push new tags
-
Push commits to new branches
-
Push fast-forward commits to existing branches
Also, the repository provider must not grant the following permissions to Repour and all other users:
-
Delete repositories
-
Delete tags or push existing tags (re-tagging)
-
Push non-fast-forward commits (force push)
In this configuration, commits and tags are immutable in the repository, i.e. history is protected.
Since Repour is probably not exposed at the edge of the build system, other parts will have access to the raw SCM coordinates before Repour processes them. For the vast majority of cases, the rest of the build system should not attempt to interpret the raw SCM coordinates. The coordinates should be merely passed along, so there is no duplication of effort, or bad assumptions made about mutability.
Once Repour is called, it returns a set of git coordinates that can be used by the rest of the build system. See What Repour will do and not do.
When a builder host gets the sources, it won’t communicate with Repour, but with the git repository provider where Repour has stored the sources.
The builder should not assume there is a default branch (master
for example) in the repository, there won’t be one there, as the repository is a collection of orphan branches.
Repour specifically returns a tag instead of a commit ID so the builder can perform a fast shallow clone. Cloning the full repository is of no benefit to the build, as it uses only the single file tree. Also, the full clone time will grow linearly with the number of unique file trees stored in the repository. This is the recommended git command to use:
git clone --depth 1 --branch $tag $readonlyurl
Note that --branch
is only usable with references (tags or branches), not commit IDs. The builder typically should not clone from a branch name as this is inherently mutable.
Checkout a git ref from the origin repo and push it to the target repo. If the ref is a branch, the branch will be pushed to the target repo. If the ref is a tag, the tag will be pushed to the target repo. If the ref is a SHA, then a branch will be created with name 'branch-<ref>' and pushed as a branch to the target repo. The latter is required to get a particular commit to the target repo.
You can specify an 'buildType' option to set if you want to run PME/GME/project-manipulator or not on it. You can also specify the 'adjustParameters'. The result of the PME run, together with the tag information is found in the response, under key 'adjust_result'. The value of 'adjust_result' is the same as for '/adjust'.
'buildType' can be 'MVN', 'GRADLE', and 'NODEJS'.
The response will contain the exact ref name as what was sent
URL |
/adjust |
||||||||
---|---|---|---|---|---|---|---|---|---|
Request |
|
||||||||
Response (Success) |
|
||||||||
Response (Invalid request body) |
|
||||||||
Response (Processing error) |
|
If the ref to align is a merge-request (Gitlab, with format '/merge-requests/<number>') or a pull-request (Github with format '/pull/<number>'), with sync switched on, the merge-request is checkout into a temporary branch, then the alignment tool is run, and the results are added in a commit and pushed in a tag that starts with 'Pull_Request-'. The latter is done to not create any confusing that the tag is from a merge-request.
With sync on for a merge-request, the checkout is not pushed into the downstream repository in a branch, unlike for a regular ref. The commits in the checkout are indirectly present via the 'Pull_Request-<tag>' only.
Checkout a git ref from the origin repo and force push it to the target repo. If ref is not a branch name, new branch named branch-{ref} pointing to the ref will be pushed instead. The response will contain the resulting ref name.
URL |
/clone |
||||||
---|---|---|---|---|---|---|---|
Request |
|
||||||
Response (Success) |
|
Create a SCM repository on Gerrit.
URL |
/internal-scm |
||||||
---|---|---|---|---|---|---|---|
Request |
|
||||||
Response (Success) |
|
All endpoints can operate in callback mode, which is activated by defining the optional callback
parameter. In this mode an immediate response is given instead of waiting for the required processing to complete.
A request that does not pass the initial validation check will return the documented "Invalid request body" response. Otherwise, the following response will be sent:
Status |
202 |
---|---|
Content-Type |
application/json |
Body (Schema) |
{
"callback": {
"id": str,
},
} |
Body (Example) |
{
"callback": {
"id": "YQSQOIGKB3TPJPB7Q6UARPULTASTXW7WOZF2JZCXLGQCBYSE"
}
} |
The body of the usual "Success" or "Processing error" response will then be sent at a later time, as an HTTP request to the URL specified in the callback
request parameter. A "callback" object will be added, containing the status code and the ID string previously returned.
Method |
POST (by default, or PUT if so specified) |
---|---|
Content-Type |
application/json |
Body (Schema) |
{
object: object,
"callback": {
"status": int,
"id": str,
},
} |
Body (Example) |
{
"branch": "pull-1439285353",
"tag": "pull-1439285353-root",
"url": {
"readwrite": "file:///tmp/repour-test-repos/example",
"readonly": "file:///tmp/repour-test-repos/example"
},
"callback": {
"status": 200,
"id": "YQSQOIGKB3TPJPB7Q6UARPULTASTXW7WOZF2JZCXLGQCBYSE"
}
} |
The docker images can be run in plain Docker or OpenShift. Some less-than-ideal design choices were made to fit the applications into the OSE-compatible containers:
-
pid1.py
is the entrypoint of both images, and remains running for the life of the container. It works around the "Docker PID1 Zombie Problem" by reaping adopted children in addition to the primary child defined by its arguments. -
au.py
runs second in both images, but finishes with an exec call, so it doesn’t remain running. It detects if the container UID has been forced to a non-existing user (as OpenShift does). If so, it activatesnss_wrapper
so git and ssh can continue to operate.-
The HTTP and SSH servers can’t be split into seperate images because OSE does not allow containers to share persistent volumes
-
The lack of shared persistent volumes in OSE also means the container is not scalable
-
The configuration can’t be included in the image because the working directory is intended to be the persistent volume mount, which will start empty in OSE.
-
Install poetry
in your machine.
To install all the dependencies, run:
poetry install
To add a new dependency:
poetry add <dependency>
To update to latest versions:
poetry update
To get a (virtualenv) shell with the dependencies installed:
poetry shell
To run a command inside the virtualenv with the dependencies installed:
poetry run 'python3 ...'
To run pre-commit checks and test, run:
poetry run tox
poetry run python3 -m repour run-container
For more information, add the -h
switch to the command.
To set the log level, run:
poetry run python3 -m repour -l DEBUG run-container
By default the log level is 'INFO'. Use any of the logging level defined here
To run Repour with multiple replicas, you have to define a shared folder (defined with env var SHARED_FOLDER) where all the replicas can write / read to.
There is 1 case to consider when running multiple instances of Repour:
-
cancel operation
When a request to cancel a task comes in, we first check if the task is present in the repour server which got the request.
If yes → cancel that task and report back to the client
If no → - Create an indicator file in a shared location to indicate that we want to cancel a task - Verify every PERIOD_CANCEL_LOOP_SLEEP seconds, for up to 10 times, if the indicator file got deleted - if yes → the cancel was successful - if no → cancel was unsuccessful. Delete the indicator file and tell the caller the cancel operation was unsuccessful
-
that indicator file is seen by other repour replicas (via the shared location) in the 'start_cancel_loop' - the other repour replicas check their event loop, and if they found and cancelled the task, they delete the indicator file
The 'start_cancel_loop' runs every PERIOD_CANCEL_LOOP_SLEEP seconds to check
We currently monitor endpoints requests and error by using the Prometheus client, exposing the metrics in the /metrics
endpoint. We can optionally export the data to Graphite if environment variable GRAPHITE_SERVER
and GRAPHITE_KEY`
are defined. GRAPHITE_KEY
is the prefix for the data, and is usually set to the url of the server.
You can also override the default port for the Graphite server with GRAPHITE_PORT
The metrics monitored are:
-
time request of
/adjust
,/clone
,/cancel
,/external-to-internal
,/
, and sending result to callback urls-
This covers latency and traffic
-
-
errors:
-
validation json error
-
400 response
-
500 response
-
can’t send response to callback
-
-
CPU, memory, GC
-
Covers saturation
-
Repour can send logs to a Kafka server if and only if the appropriate settings are defined as env variables:
-
REPOUR_KAFKA_SERVER (format <host>:<port>)
-
REPOUR_KAFKA_TOPIC
-
REPOUR_KAFKA_SASL_USERNAME
-
REPOUR_KAFKA_SASL_PASSWORD
-
REPOUR_KAFKA_CAFILE (location of the ca file to talk to kafka)
-
REPOUR_KAFKA_CONNECTIONS_MAX_IDLE_MS (optional)
-
REPOUR_KAFKA_SASL_MECHANISM (optional; default is PLAIN)
If both REPOUR_KAFKA_SASL_USERNAME and REPOUR_KAFKA_SASL_PASSWORD are defined, then Repour authenticates to Kafka using the SASL_PLAIN method. Otherwise it’ll use the kafka-cafile to authenticate with authentication 'SSL'
The following environment variable needs to be defined for authentication for callback: - REPOUR_OIDC_SERVICE_ACCOUNT_SECRET
We are now switching to our own service account for any callback
The content of this repository is released under the ASL 2.0, as provided in the LICENSE file. See the NOTICE file for the copyright statement and a list of contributors. By submitting a "pull request" or otherwise contributing to this repository, you agree to license your contribution under the license identified above.