-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Introduce WG Checkpoint Restore #8508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3583,6 +3583,42 @@ workinggroups: | |
liaison: | ||
github: aojea | ||
name: Antonio Ojea | ||
- dir: wg-checkpoint-restore | ||
name: Checkpoint Restore | ||
mission_statement: > | ||
This working group aims to provide a central location for the community to discuss | ||
the integration of Checkpoint/Restore functionality into Kubernetes. | ||
|
||
charter_link: charter.md | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is charter included into this PR? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. now it is, I didn't add it initially as the lifecycle document mentions that it is added later, but looking at the WG PRs it seems to be common to have a charter in the initial PR. |
||
stakeholder_sigs: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sig auth may have a big say in security of this whole restoration pipeline There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for pointing this out! Security is definitely an important topic that we need to discuss with sig-auth, both for the checkpoint API and the restoration pipeline. The following paper and master thesis describe our recent work on this topic: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added sig auth to the list of stakeholder sigs There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a valuable initiative. The charter mentions that the scope includes checkpointing and restoring 'workloads' and providing 'guidance for developers on checkpoint-friendly app design.' Given this focus, it's essential for SIG Apps to be involved as a key stakeholder. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @janetkuo This is a good idea, thank you so much for suggesting it! |
||
- API Machinery | ||
- Auth | ||
- CLI | ||
- Node | ||
- Scheduling | ||
label: checkpoint-restore | ||
leadership: | ||
chairs: | ||
- github: adrianreber | ||
name: Adrian Reber | ||
company: Red Hat | ||
email: [email protected] | ||
- github: haircommander | ||
name: Peter Hunt | ||
company: Red Hat | ||
email: [email protected] | ||
- github: rst0git | ||
name: Radostin Stoyanov | ||
company: University of Oxford | ||
email: [email protected] | ||
- github: viktoriaas | ||
name: Viktória Spišaková | ||
company: Masaryk University | ||
email: [email protected] | ||
meetings: [] | ||
contact: | ||
slack: wg-checkpoint-restore | ||
mailing_list: https://groups.google.com/forum/#!forum/kubernetes-wg-checkpoint-restore | ||
- dir: wg-data-protection | ||
name: Data Protection | ||
mission_statement: > | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
<!--- | ||
This is an autogenerated file! | ||
Please do not edit this file directly, but instead make changes to the | ||
sigs.yaml file in the project root. | ||
To understand how this file is generated, see https://git.k8s.io/community/generator/README.md | ||
---> | ||
# Checkpoint Restore Working Group | ||
|
||
This working group aims to provide a central location for the community to discuss the integration of Checkpoint/Restore functionality into Kubernetes. | ||
|
||
The [charter](charter.md) defines the scope and governance of the Checkpoint Restore Working Group. | ||
|
||
## Stakeholder SIGs | ||
* [SIG API Machinery](/sig-api-machinery) | ||
* [SIG Auth](/sig-auth) | ||
* [SIG CLI](/sig-cli) | ||
* [SIG Node](/sig-node) | ||
* [SIG Scheduling](/sig-scheduling) | ||
|
||
|
||
|
||
## Organizers | ||
|
||
* Adrian Reber (**[@adrianreber](https://github.com/adrianreber)**), Red Hat | ||
* Peter Hunt (**[@haircommander](https://github.com/haircommander)**), Red Hat | ||
* Radostin Stoyanov (**[@rst0git](https://github.com/rst0git)**), University of Oxford | ||
* Viktória Spišaková (**[@viktoriaas](https://github.com/viktoriaas)**), Masaryk University | ||
|
||
## Contact | ||
- Slack: [#wg-checkpoint-restore](https://kubernetes.slack.com/messages/wg-checkpoint-restore) | ||
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-wg-checkpoint-restore) | ||
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/wg%2Fcheckpoint-restore) | ||
<!-- BEGIN CUSTOM CONTENT --> | ||
|
||
<!-- END CUSTOM CONTENT --> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
|
||
# WG Checkpoint Restore Charter | ||
|
||
This charter adheres to the conventions described in the [Kubernetes Charter README] and uses | ||
the Roles and Organization Management outlined in [sig-governance]. | ||
|
||
## Scope | ||
|
||
The Checkpoint/Restore Working Group aims to solve the problem of transparently | ||
checkpointing and restoring workloads in Kubernetes, a functionality discussed | ||
for over five years. The group will deliver the design and implementation of | ||
Checkpoint/Restore functionality in Kubernetes, serving as a central hub for | ||
community information and discussion. This initiative addresses a wide range of | ||
problems, including fault tolerance, improved resource utilization, and | ||
accelerated application startup times. | ||
|
||
### In scope | ||
|
||
- Identify core Kubernetes checkpoint/restore use cases (e.g., live migration, | ||
fault tolerance, debugging, snapshotting) and gather stakeholder requirements. | ||
- Investigate and propose Kubernetes APIs for checkpoint/restore operations. | ||
- Work with SIGs for the best integration of checkpoint/restore functionality | ||
and APIs. | ||
- Provide guidance for developers on checkpoint-friendly app design and | ||
recommendations for operators on feature management. | ||
- Work closely with relevant upstream projects (CRI-O, containerd, CRIU) | ||
for alignment and integration. | ||
- Revisit the existing implementations to find and remedy possible inefficiencies. | ||
One example is the existing checkpoint archive format which has already been | ||
identified as being a major source of slowdown. | ||
|
||
### Out of scope | ||
|
||
- Not focused on general OS-level checkpointing outside Kubernetes | ||
pods/containers. | ||
- Will not dictate internal application checkpointing logic; focuses on | ||
Kubernetes platform orchestration of *container/pod state. | ||
|
||
## Stakeholders | ||
|
||
Stakeholders in this working group span multiple SIGs that own parts of the | ||
code in core kubernetes components and addons. | ||
|
||
- SIG CLI | ||
- SIG API Machinery | ||
- SIG Node | ||
- SIG Scheduling | ||
- SIG Auth | ||
|
||
## Deliverables | ||
|
||
The list of deliverables include the following high level features: | ||
|
||
- In the early stage, we mainly want to offer a well-defined location for the | ||
community to find information, ask questions, and discuss the next steps of | ||
enabling checkpoint and restore in Kubernetes. | ||
|
||
Later: | ||
|
||
- Ability to checkpoint and restore a container using kubectl | ||
- Ability to checkpoint and restore a pod using kubectl | ||
- Integration of container/pod checkpointing in scheduling decisions | ||
|
||
## Roles and Organization Management | ||
|
||
This WG adheres to the Roles and Organization Management outlined in [wg-governance] | ||
and opts-in to updates and modifications to [wg-governance]. | ||
|
||
[wg-governance]: /committee-steering/governance/wg-governance.md | ||
|
||
Additionally, the WG commits to: | ||
|
||
- maintain a solid communication line between the Kubernetes groups and the | ||
wider CNCF community | ||
- submit a proposal to the KubeCon/CloudNativeCon maintainers track | ||
|
||
## Timelines and Disbanding | ||
|
||
As a first mandate, the WG will define a roadmap and tasks in the first quarter | ||
of operation. | ||
|
||
After that the WG will distribute the different tasks to different community | ||
members to define possible APIs and how it can be integrated in Kubernetes. | ||
|
||
Achieving the aforementioned deliverables, also mentioned in the `In Scope` | ||
section, will allow us to decide when to disband this WG. There is no | ||
expectations that the Working Group will be converted into a SIG long term. | ||
|
||
[sig-governance]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/sig-governance.md | ||
[Kubernetes Charter README]: https://github.com/kubernetes/community/blob/master/committee-steering/governance/README.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is everyone on this list a kubernetes org member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#8508 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. I talked with @haircommander and he would be willing to sponsor me. Still looking for a sponsor from another company.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/kubernetes/community/blob/master/community-membership.md#requirements
Are there any other existing community members interested in helping run this effort?