Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion hips/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,5 @@ restricted markdown format and can be found in the
- [hip-0023: Utilize Server Side Apply for installs/upgrades](hip-0023.md)
- [hip-0024: Improve the Helm Documentation](hip-0024.md)
- [hip-0025: Better Support for Resource Creation Sequencing](hip-0025.md)
- [hip:0026: H4HIP: Wasm plugin system](hip-0026.md)
- [hip-0026: H4HIP: Wasm plugin system](hip-0026.md)
- [hip-0027: Handling unknown hook-delete-policy values](hip-0027.md)
181 changes: 181 additions & 0 deletions hips/hip-0027.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
---
hip: 0027
title: "Handling unknown hook-delete-policy values"
authors: [ "Marcin Owsiany <[email protected]>" ]
created: "2025-07-23"
type: "feature"
status: "draft"
---

## Abstract

Currently, unknown _values_ of `helm.sh/hook-delete-policy` annotations are silently permitted.
They result in the given hook _never_ being deleted.
However this behaviour seems to be by coincidence, rather than intentional.
This HIP proposes how to handle them going forward.

## Motivation

There are [a number of _documented_ hook deletion policies][Docs].
All of them result in hook deletion _at some point_.
A default hook delete policy (namely `before-hook-creation`)
is [applied when the list of hook policies for a resource is empty][Code].

However, when an _unrecognized_ value (e.g. `badger`) is specified:
* Helm does not complain in any way, and
* a hook resource annotated this way is **never deleted**.

It seems that the current behaviour is a coincidence or mistake, rather than intended.

## Rationale

In line with [Hyrum's Law][Hyrum's Law], at least [one project][StackRox chart] depends on the current behaviour.
Namely, it specifies a hook deletion policy of `never` on most of its storage-related resources
(such as `PersistentVolumeClaim`).
These resources are applied as hooks, in order to guard against data loss,
see [appendix A](#appendix-a-why-create-storage-resources-as-hooks) if you would like to know more.

I was able to find [one more unrelated chart][Anance personal chart] that also specifies the same value.

Some charts also allow the users to specify the policy using a helm var,
and it's not clear whether they validate the value before using it in a template.

The current Helm behaviour in this case is unspecified.
There is a concern that a change in the current implementation
(for example applying the default policy more aggressively) could cause a catastrophic data loss.

## Specification

This proposal:
- suggests keeping status quo for helm 3, for the sake of backwards compatibility,
- outlines a few options for helm 4.

### For Helm 3

- No semantic changes to production code.
- Add a regression test to make sure that the current behaviour is not changed by mistake.
This is probably easiest to do by extracting the defaulting code to a separate unexported function and
adding a unit test for the function.

### For Helm 4

There are a few dimensions in which we can make a change, that are somewhat interconnected:
- whether to accept a deletion disabling policy at all, or break compatibility and reject it,
- if accepted, whether support it officially (documented, etc), or deprecate it,
- what to do about (other) undocumented hook deletion policy values,

#### 1. Official support for a policy which disables hook deletion

Add a new, documented hook deletion policy value: `never`.
When specified, such hook resource is not deleted.
Effectively, the same as suggestion for Helm 3, but explicitly supported.

Pros:
- Keeps compatibility with Helm 3.
- Allows a notion of resources which are installed by helm, but afterwards not managed by it in any way.
- StackRox chart continues to work as before without changing.
- Other charts (if any) which happen depend on current undocumented behaviour could easily be made to work by changing
whatever value they use to `never`.

Cons:
- It seems that hooks were generally intended to be resources whose previous instances are cleaned up
before (subsequent) chart applications. Supporting this policy breaks this assumption and
introduces some complexity when reasoning about possible scenarios. For example such `pre-install`
hook resources need to be guarded by a `if $.Release.IsInstall` condition in order not to break upgrades.
Comment on lines +67 to +84

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer this one and would call it a bug fix. I think the con you listed, while valid, is probably the expected behavior for anyone that was previously using a not-supported string. This would just make it more explicit.


#### 2. Keep the status quo

The same as described above for Helm 3.

Pros:
- Keeps compatibility with Helm 3.
- StackRox chart continues to work as before without changing.

#### 3. Reject all undocumented hook deletion policies (including `never`)

Make it an error to use undocumented hook deletion policies, including `never`.

Pros:
- Mental model for hook deletion stays simple: they are always deleted (at _some_ point).

Cons:
- Breaks compatibility with Helm 3. In particular:
- StackRox helm chart stops working as is. This would most likely force StackRox to stop supporting direct usage of `helm`
for installation; StackRox operator would still use Helm library internally, and manage the storage resources differently,
as it does now.

#### 4. Reject undocumented policies (_other than_ `never`)

Make it an error to use undocumented hook deletion policies, except `never` (which would be treated
as described in point either 1 or 2).

Pros:
- _Mostly_ keeps compatibility with Helm 3. Technically breaks it, but unlike (3),
there is a migration path for charts which use an undocumented value other than `never` -
they need to change it to `never`).
- A little easier to reason about the behaviour, since there are fewer options.

Cons:
- If an older chart exists out there that uses an undocumented value other than `never`,
then its legacy versions would be incompatible with Helm 4.

#### 5. Warn about undocumented policies

Add a linter check that complains if an undocumented value for hook deletion policy is used.

Pros:
- Raises awareness about this issue.

Cons:
- None?
Comment on lines +124 to +130

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also be true.


## Backwards compatibility

Described above, this HIP is all about compatibility.

## Security implications

None.

## How to teach this

Documentation and linter concerns mentioned above.

## Reference implementation

N/A ATM.

## Rejected ideas

N/A

## Open issues

See alternative points for Helm 4 above.

## Appendix A: Why create storage resources as hooks?

Disclaimer: some of the reasoning below are conjectures since the motivation is lost in the mists of time.

A Helm chart was introduced for the StackRox project around 2020.
At that time, there was a strong concern that a user mistake might lead to data loss,
in case the `PersistentVolume` or `PersistentVolumeClaim` were deleted in case the helm release was deleted accidentally.

Because of that it looks like all possible safeguards were applied, including `helm.sh/resource-policy: keep`
**and** `helm.sh/hook: pre-install,pre-upgrade` together with `helm.sh/hook-delete-policy: never`.
I do not know why this undocumented value was chosen.
I assume it might have been a mistake that was not caught until recently, since the result worked as desired.

According to [the documentation][Resource policy docs], using `helm.sh/resource-policy: keep` alone
would be sufficient to prevent deletion on uninstallation.
However, it is not completely clear (to me), given how helm behaves during `--force` rollbacks.
Deleting and recreating a `PersistentVolumeClaim` in that case would lead to data loss.

## References

- [Docs]: https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies
- [Code]: https://github.com/helm/helm/blob/bd897c96fbaf7546d6a5c57be009f16f9d38d6de/pkg/action/hooks.go#L47
- [Resource policy docs]: https://helm.sh/docs/howto/charts_tips_and_tricks/#tell-helm-not-to-uninstall-a-resource
- [StackRox chart]: https://github.com/stackrox/stackrox/tree/master/image/templates/helm/stackrox-central
- [Anance personal chart]: https://github.com/ananace/personal-charts/blob/0e60e96373c8d827c0723ec0bfaa336bd09ffb35/charts/matrix-synapse/templates/signing-key-job.yaml#L176
- [Hyrum's Law]: https://www.hyrumslaw.com/