Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rfc for script run stage #4603

Merged
merged 3 commits into from
Dec 26, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
208 changes: 208 additions & 0 deletions docs/rfcs/0011-script-run-stage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
- Start Date: (fill me in with today's date, YYYY-MM-DD)
- Target Version: (1.x / 2.x)

# Summary

This RFC introduces a new way to enable users to use "script run stages” that users can execute any commands in their pipelines.

# Motivation

Currently, users can use only stages that PipeCD has already defined. However some users want to define new stages by their use-cases as below.

- Deploying infrastructure by tools other than that PipeCD supports (terraform and kubernetes) such as SAM, cloud formation….
- Running End to End tests
- Interacting with external systems
- Performing database migrations
- notifying the deployed result

`CUSTOM_SYNC` is implemented for the above use-cases, but it is for sync.
So more simply, some users want to execute commands.

# Detailed design

## feature

1. execute any commands in their pipeline.

```yaml
apiVersion: pipecd.dev/v1beta1
kind: LambdaApp
spec:
encryptedSecrets:
password: encrypted-secrets
pipeline:
stages:
- name: SCRIPT_RUN
with:
env:
AWS_PROFILE: default
runs:
- "echo {{ .encryptedSecrets.password }} | sudo -S su"
- "sam build"
- "sam deploy -g --profile $AWS_PROFILE"
```
2. combine with other stage
Compared to CUSTOM_SYNC, this stage can be combined with other stage.
For example,
```yaml
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
pipeline:
stages:
- name: K8S_CANARY_ROLLOUT
with:
replicas: 10%
- name: WAIT_APPROVAL
with:
timeout: 30m
- name: K8S_PRIMARY_ROLLOUT
- name: K8S_CANARY_CLEAN
- name: SCRIPT_RUN
with:
env:
SLACK_WEBHOOK_URL: ""
runs:
- "curl -X POST -H 'Content-type: application/json' --data '{"text":"successfully deployed!!"}' $SLACK_WEBHOOK_URL"
```
## when to rollback
Users can define commands to execute with `onRollback` when rolling back.
If `onRollback` is not set, nothing to execute when rolling back.

```yaml
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
pipeline:
stages:
- name: SCRIPT_RUN
with:
env:
SLACK_WEBHOOK_URL: ""
runs:
- "curl -X POST -H 'Content-type: application/json' --data '{"text":"successfully deployed!!"}' $SLACK_WEBHOOK_URL"
onRollback:
- "curl -X POST -H 'Content-type: application/json' --data '{"text":"failed to deploy: rollback"}' $SLACK_WEBHOOK_URL"
```

**SCRIPT_SYNC stage also rollbacks** when the deployment status is `DeploymentStatus_DEPLOYMENT_CANCELLED` or `DeploymentStatus_DEPLOYMENT_FAILURE` even though other rollback stage is also executed.

For example, here is a deploy pipeline combined with other k8s stages.
The result status of the pipeline is FAIL or CANCELED, piped rollbacks the stages `K8S_CANARY_ROLLOUT`, `K8S_PRIMARY_ROLLOUT`, and `SCRIPT_RUN`.

```yaml
apiVersion: pipecd.dev/v1beta1
kind: KubernetesApp
spec:
pipeline:
stages:
- name: K8S_CANARY_ROLLOUT
with:
replicas: 10%
- name: WAIT_APPROVAL
with:
timeout: 30m
- name: K8S_PRIMARY_ROLLOUT
- name: K8S_CANARY_CLEAN
- name: SCRIPT_RUN
with:
env:
SLACK_WEBHOOK_URL: ""
runs:
- "curl -X POST -H 'Content-type: application/json' --data '{"text":"successfully deployed!!"}' $SLACK_WEBHOOK_URL"
onRollback:
- "curl -X POST -H 'Content-type: application/json' --data '{"text":"failed to deploy: rollback"}' $SLACK_WEBHOOK_URL"
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other questions I think we should define before starting:

  • Do we support multi script_run stages combine
  • How the user-defined onRollback be run in case of multi script_run stages combine
  • How the user-defined onRollback be run in combination with the XXX_ROLLBACK stage? I mean, if k8s app is rollbacked and the pipeline contains script_run, how do we perform the rollback process? Should it be XXX_ROLLBACK first then SCRIPT_RUN_ROLLBACK?

Copy link
Member Author

@ffjlabo ffjlabo Oct 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your questions!
I will write about these things on rfc, but in advance, I will write a little bit my opinions on this.

Do we support multi script_run stages combine

Yes, I think so.

How the user-defined onRollback be run in case of multi script_run stages combine

Multi script_run stages can be executed in the order these are set.

How the user-defined onRollback be run in combination with the XXX_ROLLBACK stage? I mean, if k8s app is rollbacked and the pipeline contains script_run, how do we perform the rollback process? Should it be XXX_ROLLBACK first then SCRIPT_RUN_ROLLBACK?

I don't think about it. I will try to check the implementation and propose it 🙇

Copy link
Member

@khanhtc1202 khanhtc1202 Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi script_run stages can be executed in the order these are set.

How about those stages' rollback scripts? Which order should be made, which will we choice? (or neither)

  1. xxx_canary -> script_run -> script_run -> rollback -> script_rollback (merge all script rollback)
  2. xxx_canary -> script_run -> script_run -> rollback -> script_rollback -> script_rollback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be 2. (But I don't know yet it can be realized technically 🙇 )
xxx_canary -> script_run -> script_run -> rollback -> script_rollback -> script_rollback

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be better not to merge rollback scripts because it would be easy to handle or log error when to execute each rollback scripts.

## prepare environment for execution

Commands are executed on the container of piped or on the host OS(standalone).

## CUSTOM_SYNC in the future

"CUSTOM_SYNC" stage will be deprecated because the "SCRIPT_RUN" has also similar features.
I expect users to use "SCRIPT_RUN" for executing any command before or after other stages.


# Alternatives

## What's the difference between "CUSTOM_SYNC" and "SCRIPT_SYNC"?

"CUSTOM_SYNC" is one of the stages to **sync**, but "SCRIPT_RUN" is the stage to **execute commands**.

## How about other CD tools?

### Argo

#### strategy

- **execute command with another k8s resources**
#### details
Resource Hooks
https://argo-cd.readthedocs.io/en/stable/user-guide/resource_hooks/#resource-hooks
> Hooks are ways to run scripts before, during, and after a Sync operation. Hooks can also be run if a Sync operation fails at any point.

- There are four points to execute command.
- PreSync: before sync
- Sync: during sync
- PostSync: after sync
- SyncFaill: failed to sync
- **To execute command, ArgoCD applys k8s resources such as Job or Pod, [[Argo Workflows]] ...**
- users set some annotations and ArgoCD detects them to control the order to execute command
- e.g. https://argo-cd.readthedocs.io/en/stable/user-guide/resource_hooks/#using-a-hook-to-send-a-slack-message

#### pros/cons

**pros**
- can separate respolibility for delivery and executing any command

**cons**
- users need to prepare and manage the resource to execute any command

### FluxCD
- There is no functions to realize that

### Flagger

#### strategy
- Call api set as webhooks on each points and execute command on the api
- Flagger just call api registerd as webhooks.

#### details

**Webhooks**
https://fluxcd.io/flagger/usage/webhooks/#load-testing
>Flagger will call each webhook URL and determine from the response status code (HTTP 2xx) if the canary is failing or not.

- There are some webhook points.
- confirm-rollout
- pre-rollout
- rollout
- confirm-traffic-increase
- confirm-promotion
- post-rollout
- rollback
- event
- Flagger calls webhooks on each points.
- e.g. load testing with Flagger: https://fluxcd.io/flagger/usage/webhooks/#load-testing
- if users want to execute command with webhook, set the command text to `metadata` section and prepare webhook handler to execute cmd parsed from metadata.

#### pros/cons

**pros**
- can separate respolibility for delivery and executing any command

**cons**
- users need to prepare api for the webhooks.


# Unresolved questions

- It might be better to change from alpine to debian and so on to provide users popular commands (e.g. curl) by default.
- It might be better to add `runsOnRollback` on each sync stage if users want to control when rollback.
This is a just idea.
Comment on lines +207 to +208
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part, do you mean in case of multiple script_run stages are combined, we should add runsOnRollback to each? And is this runsOnRollback is same as onRollback (the one used in the example above)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khanhtc1202
This means It might be a way to add onRollback to each stage, not to create a new stage.

e.g. K8S_CANARY_ROLLOUT

  pipeline:
    stages:
      - name: K8S_CANARY_ROLLOUT
        with:
          replicas: 10%
          onRollback: "curl -X POST -H 'Content-type: application/json' --data '{"text":"failed to deploy: rollback"}' $SLACK_WEBHOOK_URL"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it come to this place 🤔 And also, how do we combine your idea with the current rollback logic? Do we have any plans for this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it come to this place 🤔 And also, how do we combine your idea with the current rollback logic? Do we have any plans for this?

It is a just idea, so I have no idea to realize it 🙇
But I thought it might be more useful to be able to set onRollback on each stages than set a stage to execute any command.

Loading