Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation for container checkpointing feature (KEP 2008) #31753

Conversation

@k8s-ci-robot k8s-ci-robot added this to the 1.24 milestone Feb 15, 2022
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Feb 15, 2022
@netlify
Copy link

netlify bot commented Feb 15, 2022

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Name Link
🔨 Latest commit 7db58a5
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/6242ae38d9c15e0008161012

@k8s-ci-robot
Copy link
Contributor

Welcome @adrianreber!

It looks like this is your first PR to kubernetes/website 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/website has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Feb 15, 2022
@k8s-ci-robot k8s-ci-robot added language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. labels Feb 15, 2022
@nate-double-u
Copy link
Contributor

/assign

sftim
sftim previously requested changes Feb 16, 2022
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrianreber thanks for this

Given that the feature is only available by directly interacting with a kubelet via HTTP, I suggest you start by writing a reference page about how to checkpoint a container via that API (to go inside https://kubernetes.io/docs/reference/node/).

I suggest that you document the body, or perhaps query string (I haven't looked) for an HTTP request that triggers a checkpoint. You should also document the local paths that the kubelet writes to (as per the existing PR), and the format of the checkpoint image.

At this stage, I wouldn't add any concept page. You could however link to that reference page from https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/). You might think it'd be better to link from the container page, and maybe eventually we would. For this early feature I want readers interested in checkpointing to be clear that you have to have a Pod in order to checkpoint one of its containers.

Please avoid linking to a KEP except for further reading at the end of a page. The aim of documentation is so that readers don't need to refer to the source code or the KEP.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from nate-double-u after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sftim
Copy link
Contributor

sftim commented Feb 16, 2022

An example of how we document API operations:
https://kubernetes.io/docs/reference/kubernetes-api/policy-resources/resource-quota-v1/#Operations

You don't have to match that style exactly; however, it'd be helpful if any new reference docs are broadly along those lines.

@adrianreber
Copy link
Contributor Author

@sftim Thanks for your ideas. Will try to rework this PR.

@adrianreber
Copy link
Contributor Author

@sftim I was thinking a bit more about your suggestions.

My understanding is that I should document the API, that make sense. Should I not include any general information about checkpointing as currently in this PR in the concept page. If I should not include it in a concept page, should it be somewhere else? Next to the API documentation or not at all?

@sftim
Copy link
Contributor

sftim commented Feb 16, 2022

Should I not include any general information about checkpointing as currently in this PR in the concept page. If I should not include it in a concept page, should it be somewhere else? Next to the API documentation or not at all?

I recommend being a little more terse - it's OK in reference documentation to say that the actual checkpointing implementation depends on the container runtime (which I believe is true) and to define only what CRI expects from that implementation.
Eg:

  • format of checkpoint archive
  • the archive path that the kubelet passes to the container runtime via CRI

As an example, I'd expect that you'd keep:

Checkpointing a container is the functionality to create a stateful copy of a running container. The stateful copy of the container can be moved to another system and after restore the container will continue to run at exactly the same point it was checkpointed. The concept of checkpointing a container and restoring a container can be seen as pausing and unpausing a container with the possibility to transfer the container to another system between pausing and unpausing.

Unless someone writes a blog about it, though, there'd be no mention yet of the term “forensic” in that initial reference page (related: I'd assume that it's not very useful to take such evidence to a court of law and then have to explain that the feature that generated it is still alpha). All that should change if / when the checkpointing feature is scheduled to move to beta; we want production-quality docs for beta features.

@adrianreber adrianreber force-pushed the 2022-02-15-forensic-container-checkpointing-documentation branch from 7b5f003 to 1d01d39 Compare February 16, 2022 17:31
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrianreber

I hope all these suggestions are useful. I haven't explained all of them in detail; feel free to query anything you're not sure about.

@sftim sftim dismissed their stale review February 16, 2022 18:36

Feedback was incorporated

@sftim
Copy link
Contributor

sftim commented Feb 16, 2022

/retitle Add documentation for container checkpointing feature (KEP 2008)

@k8s-ci-robot k8s-ci-robot changed the title KEP 2008 Forensic Container Checkpointing documentation Add documentation for container checkpointing feature (KEP 2008) Feb 16, 2022
@sftim
Copy link
Contributor

sftim commented Feb 16, 2022

Also: you're well ahead of the deadline! Nice work in getting reviewable changes in this early.

@adrianreber adrianreber force-pushed the 2022-02-15-forensic-container-checkpointing-documentation branch 2 times, most recently from 75e3a36 to 67a566d Compare February 17, 2022 08:35
@adrianreber
Copy link
Contributor Author

Thanks for your suggestions. I tried to include all of them.

@tengqm
Copy link
Contributor

tengqm commented Feb 21, 2022

/hold

@adrianreber Feel free to remove the hold when the upstream changes are merged.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2022
@adrianreber adrianreber force-pushed the 2022-02-15-forensic-container-checkpointing-documentation branch from 67a566d to fd881ce Compare February 21, 2022 18:52
@sftim
Copy link
Contributor

sftim commented Feb 21, 2022

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2022
@nate-double-u
Copy link
Contributor

/sig node
Could someone from @kubernetes/sig-node-pr-reviews take a look at this? (or suggest someone else for a tech review?)

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Feb 21, 2022
@nate-double-u
Copy link
Contributor

nate-double-u commented Mar 19, 2022

Just following up, could someone from @kubernetes/sig-node-pr-reviews take a look at this? (or suggest someone else for a tech review?)

@sftim
Copy link
Contributor

sftim commented Mar 21, 2022

Previews:

@tengqm
Copy link
Contributor

tengqm commented Mar 21, 2022

@sftim Why remove the hold? The upstream change kubernetes/kubernetes#104907 is still under review. The hold was to avoid accidental merge of this PR. Am I missing something?

@sftim
Copy link
Contributor

sftim commented Mar 21, 2022

Oops. I saw that a PR was merged - but it was the KEP, not the code.
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2022
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 28, 2022
@adrianreber adrianreber force-pushed the 2022-02-15-forensic-container-checkpointing-documentation branch from fd881ce to 7db58a5 Compare March 29, 2022 06:59
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 29, 2022
@k8s-ci-robot
Copy link
Contributor

@adrianreber: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2022
@adrianreber
Copy link
Contributor Author

Closing as it was dropped from 1.24

@nate-double-u
Copy link
Contributor

/milestone clear

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. language/en Issues or PRs related to English language needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants