From 372a7b265185740c009a49ac59558d1914f2f4e7 Mon Sep 17 00:00:00 2001 From: Marcin Owsiany Date: Wed, 23 Jul 2025 10:50:38 +0200 Subject: [PATCH 1/4] hip-9999: Handling unknown hook-delete-policy values Signed-off-by: Marcin Owsiany --- hips/hip-9999.md | 181 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 181 insertions(+) create mode 100644 hips/hip-9999.md diff --git a/hips/hip-9999.md b/hips/hip-9999.md new file mode 100644 index 00000000..25b9e36f --- /dev/null +++ b/hips/hip-9999.md @@ -0,0 +1,181 @@ +--- +hip: 9999 +title: "Handling unknown hook-delete-policy values" +authors: [ "Marcin Owsiany " ] +created: "2025-07-23" +type: "feature" +status: "draft" +--- + +## Abstract + +Currently, unknown _values_ of `helm.sh/hook-delete-policy` annotations are silently permitted. +They result in the given hook _never_ being deleted. +However this behaviour seems to be by coincidence, rather than intentional. +This HIP proposes how to handle them going forward. + +## Motivation + +There are [a number of _documented_ hook deletion policies][Docs]. +All of them result in hook deletion _at some point_. +A default hook delete policy (namely `before-hook-creation`) +is [applied when the list of hook policies for a resource is empty][Code]. + +However, when an _unrecognized_ value (e.g. `badger`) is specified: +* Helm does not complain in any way, and +* a hook resource annotated this way is **never deleted**. + +It seems that the current behaviour is a coincidence or mistake, rather than intended. + +## Rationale + +In line with [Hyrum's Law][Hyrum's Law], at least [one project][StackRox chart] depends on the current behaviour. +Namely, it specifies a hook deletion policy of `never` on most of its storage-related resources +(such as `PersistentVolumeClaim`). +These resources are applied as hooks, in order to guard against data loss, +see [appendix A](#appendix-a-why-create-storage-resources-as-hooks) if you would like to know more. + +I was able to find [one more unrelated chart][Anance personal chart] that also specifies the same value. + +Some charts also allow the users to specify the policy using a helm var, +and it's not clear whether they validate the value before using it in a template. + +The current Helm behaviour in this case is unspecified. +There is a concern that a change in the current implementation +(for example applying the default policy more aggressively) could cause a catastrophic data loss. + +## Specification + +This proposal: +- suggests keeping status quo for helm 3, for the sake of backwards compatibility, +- outlines a few options for helm 4. + +### For Helm 3 + +- No semantic changes to production code. +- Add a regression test to make sure that the current behaviour is not changed by mistake. + This is probably easiest to do by extracting the defaulting code to a separate unexported function and + adding a unit test for the function. + +### For Helm 4 + +There are a few dimensions in which we can make a change, that are somewhat interconnected: +- whether to accept a deletion disabling policy at all, or break compatibility and reject it, +- if accepted, whether support it officially (documented, etc), or deprecate it, +- what to do about (other) undocumented hook deletion policy values, + +#### 1. Official support for a policy which disables hook deletion + +Add a new, documented hook deletion policy value: `never`. +When specified, such hook resource is not deleted. +Effectively, the same as suggestion for Helm 3, but explicitly supported. + +Pros: +- Keeps compatibility with Helm 3. +- Allows a notion of resources which are installed by helm, but afterwards not managed by it in any way. +- StackRox chart continues to work as before without changing. +- Other charts (if any) which happen depend on current undocumented behaviour could easily be made to work by changing + whatever value they use to `never`. + +Cons: +- It seems that hooks were generally intended to be resources whose previous instances are cleaned up + before (subsequent) chart applications. Supporting this policy breaks this assumption and + introduces some complexity when reasoning about possible scenarios. For example such `pre-install` + hook resources need to be guarded by a `if $.Release.IsInstall` condition in order not to break upgrades. + +#### 2. Keep the status quo + +The same as described above for Helm 3. + +Pros: +- Keeps compatibility with Helm 3. +- StackRox chart continues to work as before without changing. + +#### 3. Reject all undocumented hook deletion policies (including `never`) + +Make it an error to use undocumented hook deletion policies, including `never`. + +Pros: +- Mental model for hook deletion stays simple: they are always deleted (at _some_ point). + +Cons: +- Breaks compatibility with Helm 3. In particular: + - StackRox helm chart stops working as is. This would most likely force StackRox to stop supporting direct usage of `helm` + for installation; StackRox operator would still use Helm library internally, and manage the storage resources differently, + as it does now. + +#### 4. Reject undocumented policies (_other than_ `never`) + +Make it an error to use undocumented hook deletion policies, except `never` (which would be treated +as described in point either 1 or 2). + +Pros: +- _Mostly_ keeps compatibility with Helm 3. Technically breaks it, but unlike (3), + there is a migration path for charts which use an undocumented value other than `never` - + they need to change it to `never`). +- A little easier to reason about the behaviour, since there are fewer options. + +Cons: +- If an older chart exists out there that uses an undocumented value other than `never`, + then its legacy versions would be incompatible with Helm 4. + +#### 5. Warn about undocumented policies + +Add a linter check that complains if an undocumented value for hook deletion policy is used. + +Pros: +- Raises awareness about this issue. + +Cons: +- None? + +## Backwards compatibility + +Described above, this HIP is all about compatibility. + +## Security implications + +None. + +## How to teach this + +Documentation and linter concerns mentioned above. + +## Reference implementation + +N/A ATM. + +## Rejected ideas + +N/A + +## Open issues + +See alternative points for Helm 4 above. + +## Appendix A: Why create storage resources as hooks? + +Disclaimer: some of the reasoning below are conjectures since the motivation is lost in the mists of time. + +A Helm chart was introduced for the StackRox project around 2020. +At that time, there was a strong concern that a user mistake might lead to data loss, +in case the `PersistentVolume` or `PersistentVolumeClaim` were deleted in case the helm release was deleted accidentally. + +Because of that it looks like all possible safeguards were applied, including `helm.sh/resource-policy: keep` +**and** `helm.sh/hook: pre-install,pre-upgrade` together with `helm.sh/hook-delete-policy: never`. +I do not know why this undocumented value was chosen. +I assume it might have been a mistake that was not caught until recently, since the result worked as desired. + +According to [the documentation][Resource policy docs], using `helm.sh/resource-policy: keep` alone +would be sufficient to prevent deletion on uninstallation. +However, it is not completely clear (to me), given how helm behaves during `--force` rollbacks. +Deleting and recreating a `PersistentVolumeClaim` in that case would lead to data loss. + +## References + +- [Docs]: https://helm.sh/docs/topics/charts_hooks/#hook-deletion-policies +- [Code]: https://github.com/helm/helm/blob/bd897c96fbaf7546d6a5c57be009f16f9d38d6de/pkg/action/hooks.go#L47 +- [Resource policy docs]: https://helm.sh/docs/howto/charts_tips_and_tricks/#tell-helm-not-to-uninstall-a-resource +- [StackRox chart]: https://github.com/stackrox/stackrox/tree/master/image/templates/helm/stackrox-central +- [Anance personal chart]: https://github.com/ananace/personal-charts/blob/0e60e96373c8d827c0723ec0bfaa336bd09ffb35/charts/matrix-synapse/templates/signing-key-job.yaml#L176 +- [Hyrum's Law]: https://www.hyrumslaw.com/ From 670df00e93ec6a7ce87721089958cab7e0653074 Mon Sep 17 00:00:00 2001 From: Marcin Owsiany Date: Wed, 23 Jul 2025 10:53:04 +0200 Subject: [PATCH 2/4] Update HIP index. Signed-off-by: Marcin Owsiany --- hips/README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/hips/README.md b/hips/README.md index 80c5a24d..fb9e6ab2 100644 --- a/hips/README.md +++ b/hips/README.md @@ -36,3 +36,4 @@ restricted markdown format and can be found in the - [hip-0024: Improve the Helm Documentation](hip-0024.md) - [hip-0025: Better Support for Resource Creation Sequencing](hip-0025.md) - [hip-0026: H4HIP: Wasm plugin system](hip-0026.md) +- [hip-9999: Handling unknown hook-delete-policy values](hip-9999.md) From 731168a8edc2063dec072edf31d31f3cdc940a52 Mon Sep 17 00:00:00 2001 From: Marcin Owsiany Date: Mon, 13 Oct 2025 10:07:59 +0200 Subject: [PATCH 3/4] fix: correct helm versioning - Refer to Helm charts API version, rather than Helm version. - Capitalize Helm. Signed-off-by: Marcin Owsiany --- hips/hip-9999.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/hips/hip-9999.md b/hips/hip-9999.md index 25b9e36f..ef51d99b 100644 --- a/hips/hip-9999.md +++ b/hips/hip-9999.md @@ -37,7 +37,7 @@ see [appendix A](#appendix-a-why-create-storage-resources-as-hooks) if you would I was able to find [one more unrelated chart][Anance personal chart] that also specifies the same value. -Some charts also allow the users to specify the policy using a helm var, +Some charts also allow the users to specify the policy using a Helm var, and it's not clear whether they validate the value before using it in a template. The current Helm behaviour in this case is unspecified. @@ -47,17 +47,17 @@ There is a concern that a change in the current implementation ## Specification This proposal: -- suggests keeping status quo for helm 3, for the sake of backwards compatibility, -- outlines a few options for helm 4. +- suggests keeping status quo for Helm charts API v2, for the sake of backwards compatibility, +- outlines a few options for charts API v3. -### For Helm 3 +### For Helm charts API v2 - No semantic changes to production code. - Add a regression test to make sure that the current behaviour is not changed by mistake. This is probably easiest to do by extracting the defaulting code to a separate unexported function and adding a unit test for the function. -### For Helm 4 +### For Helm charts API v3 There are a few dimensions in which we can make a change, that are somewhat interconnected: - whether to accept a deletion disabling policy at all, or break compatibility and reject it, @@ -68,11 +68,11 @@ There are a few dimensions in which we can make a change, that are somewhat inte Add a new, documented hook deletion policy value: `never`. When specified, such hook resource is not deleted. -Effectively, the same as suggestion for Helm 3, but explicitly supported. +Effectively, the same as suggestion for Helm charts API v2, but explicitly supported. Pros: -- Keeps compatibility with Helm 3. -- Allows a notion of resources which are installed by helm, but afterwards not managed by it in any way. +- Keeps compatibility with Helm charts API v2. +- Allows a notion of resources which are installed by Helm, but afterwards not managed by it in any way. - StackRox chart continues to work as before without changing. - Other charts (if any) which happen depend on current undocumented behaviour could easily be made to work by changing whatever value they use to `never`. @@ -85,10 +85,10 @@ Cons: #### 2. Keep the status quo -The same as described above for Helm 3. +The same as described above for Helm charts API v2. Pros: -- Keeps compatibility with Helm 3. +- Keeps compatibility with Helm charts API v2. - StackRox chart continues to work as before without changing. #### 3. Reject all undocumented hook deletion policies (including `never`) @@ -99,8 +99,8 @@ Pros: - Mental model for hook deletion stays simple: they are always deleted (at _some_ point). Cons: -- Breaks compatibility with Helm 3. In particular: - - StackRox helm chart stops working as is. This would most likely force StackRox to stop supporting direct usage of `helm` +- Breaks compatibility with Helm charts API v2. In particular: + - StackRox Helm chart stops working as is. This would most likely force StackRox to stop supporting direct usage of `helm` for installation; StackRox operator would still use Helm library internally, and manage the storage resources differently, as it does now. @@ -110,14 +110,14 @@ Make it an error to use undocumented hook deletion policies, except `never` (whi as described in point either 1 or 2). Pros: -- _Mostly_ keeps compatibility with Helm 3. Technically breaks it, but unlike (3), +- _Mostly_ keeps compatibility with Helm charts API v2. Technically breaks it, but unlike proposal (3), there is a migration path for charts which use an undocumented value other than `never` - they need to change it to `never`). - A little easier to reason about the behaviour, since there are fewer options. Cons: - If an older chart exists out there that uses an undocumented value other than `never`, - then its legacy versions would be incompatible with Helm 4. + then its legacy versions would be incompatible with Helm charts API v3. #### 5. Warn about undocumented policies @@ -151,7 +151,7 @@ N/A ## Open issues -See alternative points for Helm 4 above. +See alternative points for Helm charts API v3 above. ## Appendix A: Why create storage resources as hooks? @@ -159,7 +159,7 @@ Disclaimer: some of the reasoning below are conjectures since the motivation is A Helm chart was introduced for the StackRox project around 2020. At that time, there was a strong concern that a user mistake might lead to data loss, -in case the `PersistentVolume` or `PersistentVolumeClaim` were deleted in case the helm release was deleted accidentally. +in case the `PersistentVolume` or `PersistentVolumeClaim` were deleted in case the Helm release was deleted accidentally. Because of that it looks like all possible safeguards were applied, including `helm.sh/resource-policy: keep` **and** `helm.sh/hook: pre-install,pre-upgrade` together with `helm.sh/hook-delete-policy: never`. @@ -168,7 +168,7 @@ I assume it might have been a mistake that was not caught until recently, since According to [the documentation][Resource policy docs], using `helm.sh/resource-policy: keep` alone would be sufficient to prevent deletion on uninstallation. -However, it is not completely clear (to me), given how helm behaves during `--force` rollbacks. +However, it is not completely clear (to me), given how Helm behaves during `--force` rollbacks. Deleting and recreating a `PersistentVolumeClaim` in that case would lead to data loss. ## References From b785e3e41d9e3874c4b20107182ff3df1d990f92 Mon Sep 17 00:00:00 2001 From: Marcin Owsiany Date: Mon, 13 Oct 2025 10:36:39 +0200 Subject: [PATCH 4/4] Link to PR that adds the test. Signed-off-by: Marcin Owsiany --- hips/hip-9999.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/hips/hip-9999.md b/hips/hip-9999.md index ef51d99b..d8ac9321 100644 --- a/hips/hip-9999.md +++ b/hips/hip-9999.md @@ -54,8 +54,7 @@ This proposal: - No semantic changes to production code. - Add a regression test to make sure that the current behaviour is not changed by mistake. - This is probably easiest to do by extracting the defaulting code to a separate unexported function and - adding a unit test for the function. + Example: https://github.com/helm/helm/pull/31385 ### For Helm charts API v3