-
Notifications
You must be signed in to change notification settings - Fork 856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud-init-network.service should have alias to cloud-init.service #5684
Comments
Hey @sshedi, thanks for filing this issue.
An alias might be reasonable in the short term until any services ordered after As for your concerns regarding CI / CD pipelines and scripts, do you have any examples of this usage? Cloud-init is not a long-running service and nowhere in the cloud-init documentation is it recommended to start or stop or restart cloud-init services - there shouldn't be a need for it. Cloud-init runs automatically as part of first boot on a cloud instance. Restarting services may be possible, but that's not how users are expected to interact with cloud-init. Users shouldn't need to "run cloud-init". Ultimately we don't want to break anyone that is doing unexpected things with cloud-init without reason, but we also don't want improvements to be held back by use cases that are "off the beaten path", so to say. Since this doesn't appear to be a bug or user complaint, I'm going to change this from a bug report to a feature request.
I'm not sure if you saw the announcement of this change on irc, in the release, the changelog, the mailing list, the discourse announcement, or on the breaking changes documentation page. If not, I would recommend those as a starting point. If you already read those, is there a specific piece of information that is missing? |
Thanks for the detailed explanation @holmanb In VMware solutions, we use GOSC (Guest OS Customisation) using open-vm-tools. Here we use a
We do customisations like password setting, network setting, hostname settings etc using this approach. This is a legacy solution. If direct invocation of cloud-init gets deprecated, this will cause a lot of trouble. Another question, in a different context: When I did
|
Why isn't this configuration being provided by the VMware datasource at boot time? Is there something that this can do that the datasource cannot? Can you please provide a link to this code? This should really be a separate issue from the other things mentioned here, so a separate issue would be more appropriate.
On Fedora[1], like Ubuntu and Debian, openbsd's netcat is actually the default netcat. Perhaps Photon could follow the other major distros? Alternatively, like the docs linked mention, this is trivially implemented using a Python one-liner. I can dig up the one I was using before if that would be helpful for you.
What is your end goal? What are you trying to accomplish by restarting cloud-init? Can you please provide a link to this code? Like I said before, this shouldn't be required, but if for some reason it is required this might be a gap that we need to address.
Trying to carry distro-specific systemd orderings in upstream is really hard to test and maintain - especially with bigger changes like this. We tried not to break the systemd ordering but in this case it clearly did. I'm happy to help fix this for PhotonOS. Are you building this from upstream source or do you have patches applied? If patches, can you please include a link? [1] on Fedora
|
I will create a new issue. In a gist, the original settings from cab file come in ini format, we convert this to yaml by parsing the ini at run time and ultimately feed it to cloud-init.
Yes, I have done that. My concern was with the binary name
My previous request of GOSC using yaml file and feeding it to vmtoolsd (open-vm-tools) and in our CI CD pipe lines, we just do basic sanity tests for these services by simply restarting and checking the status of the service. We also carry a downstream patch: https://github.com/vmware/photon/blob/master/SPECS/cloud-init/0003-Patch-VMware-DS-to-handle-network-settings-from-vmto.patch
And later we simply restart cloud-init services and check network file creation, hostname. And the cyclic ordering issue, I will check further. Not your fault, just wanted to let you know. You can ignore it for now. Thanks for all the answers. |
Thanks for the extra context @sshedi. I'll wait for the new bug to continue the conversation there.
Good point, the current implementation is using the Ubuntu binary name. There is no reason we can't use
Integration testing cloud-init is tricky. The reason that we don't do manual service restarts in our upstream integration tests is because this isn't representative of how users use cloud-init, and it can have unintended side effects. One example of this is that many cloud-config modules only run once per instance. The approach that we take in the upstream integration tests is something along the lines of this:
@sshedi Do you think that this approach could work for you? Also, do you think that the patch would benefit other distros? fyi: cloud-init's integration test backbone, pycloudlib, gained support for VMWare last year. I don't think we currently use it in CI, but if you have interest in learning more about it, we could probably share some guidance to help get that set up. Running the upstream integration tests in your CI would certainly give you lots of test coverage.
Okay, let me know what we can do to assist. In an ideal world, cloud-init would have a single systemd ordering which behaves correctly on all distros. Cloud-init's ordering is complex, but it tries to behave correctly regardless of which components a distro is made up of. I'd be curious to know why PhotonOS uses |
I'm sorry but rebooting VM every time seems to be overkill. We just want to try some combination of yaml configs, nothing else. Also, we will try out different network settings, so we might not get connection back after reboot. Rebooting for each test configuration will increase the time and complexity by several folds. And in our test scripts, I do Also, I don't fully understand the rationale behind not allowing a service to be manually restarted. IMO, this flexibility should be there and it will help big time while debugging issues and trying out things in production instances where rebooting is not an option. In the current implementation, if needed, we can simply share a yaml config with a customer and give commands to feed it to cloud-init without asking them reboot their instance. One of the standout features of cloud-init is its flexibility with configurations, allowing us to quickly test changes without needing to reboot machines. Unlike critical services like audit or dbus, where manual restarts can have serious consequences, cloud-init has always supported explicit invocation and manual service restarts without issue. Suddenly shifting to a new model that restricts this capability feels like losing a vital tool in our toolkit.
Agree. I just kept it in alignment with RHEL in my initial PR while adding PhotonOS support to cloud-init. Thanks for this. |
Perhaps for your testing purposes it is overkill, but for cloud-init upstream this is absolutely vital. It's slow, but our CI gives us results daily so it works.
We do have at least one integration test that manually calls the old cloud-init entry points ( Some thoughts on how to implement this -> currently when the single process flag is used and stdin is a tty (which happens when invoked by a shell), cloud-init skips the socket synchronization logic. This was done to allow cloud-init to run under
Semantic question: by this do you only mean "run it again"? Or do you also mean "if running, kill it and run again"?
Thanks for engaging, these are all fair points. The reason for going this direction is that cloud-init imparts many side-effects on a system, yet has little testing of this feature. The code that makes restarting cloud-init possible is trivial, but since cloud-init makes many decisions at runtime based on image artifacts and system state, it's difficult to make promises about what cloud-init actually does when run this way. Cloud-init was implemented without idempotency in mind which further complicates things. It's easy to see how this functionality adds value, but it's also difficult to reason about and maintain. I'm not strongly opposed to keeping the ability to run cloud-init with
Happy to help! |
Hi @holmanb
Cloud-init is definitely implemented with idempotency in mind. We call out the importance in https://docs.cloud-init.io/en/latest/explanation/vendordata.html
And in places where we know we can't (or are not yet), it's noted:
https://docs.cloud-init.io/en/latest/reference/cli.html I say this not to derail the discussion here, but to point out that if we
Good. I would urge a path that does keep the existing functionality; it's been
I'm trying to understand the concern given that cloud-init already behaves Could you expand on what potential (or existing) scenarios would be
I don't think users stumble upon the If you label it one of the above, what does that mean for upstream (and It would also be helpful to understand what, if anything, would be |
Yes, this should work. I don't have a hard requirement on restarting services, all I need is a way to trigger init-local stage and other stages manually.
Just restart, no need to kill the process. But the newer configurations set by vmtoolsd or given by the the yaml file should take effect.
This option has been there from a long time now. I think there should be an option to maintain backward compatibility. It can be a build time flag as well or a run time flag like Cloud-init is a crucial component for our use cases, and this proposed change feels too disruptive, making things significantly more challenging for us. I understand the complexity of maintaining legacy code, but with such a critical service, it becomes somewhat inevitable. If there's any way I can assist in this process, please don't hesitate to reach out. |
Hey @OddBloke thanks for engaging.
I meant to say "cloud-init is not idempotent", but it doesn't change the point I was making. We agree that idempotency is important, and that there should be more of it. I mentioned idempotency because the lack of idempotency complicates maintenance. Cloud-init would benefit greatly from tests that explicitly check for idempotency not just in cloud-config modules but additionally in many other parts of the code.
The last statement underscores the deficiencies of this feature. A tool which is thoughtfully designed, intuitive, ergonomic, and foolproof shouldn't require learning to work around the quirks.
First boot initialization is cloud-init's bread and butter and it is also where cloud-init really shines. Cloud-init provides features which overlap heavily with configuration management tools such as cfengine, puppet, chef, etc. Alternative CI tools are not a fair 1:1 comparison to cloud-init, because cloud-init's core offering is first boot configuration, however cloud-init can be used more as a configuration management tool using this feature. "Apply this configuration to an already running system" is one of the most basic tasks of configuration management tooling, and Manually applying user-data after initial boot has many potentially surprising outcomes, including the following handful:
I appreciate that users that already know how cloud-init works want to use cloud-init as their tool of choice - I don't want to stop them from doing so. I also think that cloud-init has potential to provide a much improved configuration management user experience over current state, especially with respect to the points listed above. A potential future with a more complete offering would include a UI that is intuitive, ergonomic, and well documented. This would also require configuration management behaviors that are more foolproof, intuitive, and have reasonable debugging workflows. A well-designed tool should not require the user to understand implementation details in order to use it. In the long term, I think that cloud-init can do better than just continuing to scratch the itches of those existing users that are using its existing
I need to shift focus to other things for now, so I'm going to momentarily sidestep the maintenance-related questions. We aren't enacting any immediate changes, and the community's voice is important before we do. |
Err @raharper ; no worries.
Great. My initial read on this thread was that |
/facepalm Oops, sorry
Agreed, thanks for the discussion! |
Bug report
cloud-init.service is a widely used service name and many scripts downstream are using the same to restart the service.
In v24.3 release, it has been renamed to cloud-init-network.service. I think this is going to break a lot of CI CD pipelines and scripts. Renaming such widely used service is an invasive change but at least we should have an alias in the service file with the old name if at all this is needed.
Please consider this request and provide some insights on renaming the service.
Environment details
The text was updated successfully, but these errors were encountered: