Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudcix ds #1351

Merged
merged 35 commits into from
Sep 19, 2024
Merged

Cloudcix ds #1351

merged 35 commits into from
Sep 19, 2024

Conversation

BrinKe-dev
Copy link
Contributor

Create datasource for CloudCIX

Add Cloud-Init support for instances running in CloudCIX

@blackboxsw blackboxsw self-assigned this Mar 28, 2022
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on adding this new datasource to cloud-init and the unittests you have provided @BrinKe-dev.
We definitely can work with you to get this into cloud-init proper thanks for the work so far.

It seems like this datasource instance data is similar to Oracle, Vultr, Ec2 or OpenStack instance data format, but without versioning or a cloud-scope-specific route under the link local

I'm wondering if CloudCIX has the ability to surface versioned instance metadata(IMDS) routes to protect cloud-init from any backward incompatible changes that could be introduced in this cloud's IMDS in the future.

I would like to see sudo cloud-init query --all on an instance to see what metadata content is generally provided to an instance.

I've added a number of suggestions we can discuss as far as applicability and maintainability of this code and using some of the design principles from other cloud-init datasources.

Beyond the other comments for discussion. I think we'll also need to document this datasource a by adding an RST doc file in doc/rtd/topics/datasources/cloudcix.rst which would describe the datasource a bit, provide a Configuration section which will minimally enumerate the url parameters that could be tweaked. See ec2.rst or openstack.rst for examples of that documentation format.

tools/ds-identify Outdated Show resolved Hide resolved
tools/.github-cla-signers Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Show resolved Hide resolved
@BrinKe-dev
Copy link
Contributor Author

Hi @blackboxsw, I've implemented some of the changes to bring the CloudCIX datasource in line with existing Cloud-Init code. You asked for the output of cloud-init query --all in an instance. We're primarily using Cloud-Init for userdata so the metadata is minimal for now.

{
"_beta_keys": [
"subplatform"
],
"availability-zone": null,
"availability_zone": null,
"base64_encoded_keys": [],
"cloud-name": "cloudcix",
"cloud_name": "cloudcix",
"distro": "ubuntu",
"distro_release": "focal",
"distro_version": "20.04",
"ds": {
"_doc": "EXPERIMENTAL: The structure and format of content scoped under the 'ds' key may change in subsequent releases of cloud-init.",
"meta_data": {
"instance-id": "91_94",
"ip_addresses": [
{
"private_ip": "169.254.39.199",
"subnet": "169.254.0.1/16"
},
{
"private_ip": "10.10.10.2",
"public_ip": "185.49.60.70",
"subnet": "10.10.10.1/24"
}
]
}
},
"instance-id": "91_94",
"instance_id": "91_94",
"kernel_release": "5.4.0-26-generic",
"local-hostname": "ubuntu",
"local_hostname": "ubuntu",
"machine": "x86_64",
"merged_cfg": {
"_doc": "Merged cloud-init system config from /etc/cloud/cloud.cfg and /etc/cloud/cloud.cfg.d/",
"_log": [
"[loggers]\nkeys=root,cloudinit\n\n[handlers]\nkeys=consoleHandler,cloudLogHandler\n\n[formatters]\nkeys=simpleFormatter,arg0Formatter\n\n[logger_root]\nlevel=DEBUG\nhandlers=consoleHandler,cloudLogHandler\n\n[logger_cloudinit]\nlevel=DEBUG\nqualname=cloudinit\nhandlers=\npropagate=1\n\n[handler_consoleHandler]\nclass=StreamHandler\nlevel=WARNING\nformatter=arg0Formatter\nargs=(sys.stderr,)\n\n[formatter_arg0Formatter]\nformat=%(asctime)s - %(filename)s[%(levelname)s]: %(message)s\n\n[formatter_simpleFormatter]\nformat=[CLOUDINIT] %(filename)s[%(levelname)s]: %(message)s\n",
"[handler_cloudLogHandler]\nclass=FileHandler\nlevel=DEBUG\nformatter=arg0Formatter\nargs=('/var/log/cloud-init.log', 'a', 'UTF-8')\n",
"[handler_cloudLogHandler]\nclass=handlers.SysLogHandler\nlevel=DEBUG\nformatter=simpleFormatter\nargs=("/dev/log", handlers.SysLogHandler.LOG_USER)\n"
],
"cloud_config_modules": [
"emit_upstart",
"snap",
"ssh-import-id",
"keyboard",
"locale",
"set-passwords",
"grub-dpkg",
"apt-pipelining",
"apt-configure",
"ubuntu-advantage",
"ntp",
"timezone",
"disable-ec2-metadata",
"runcmd",
"byobu"
],
"cloud_final_modules": [
"package-update-upgrade-install",
"fan",
"landscape",
"lxd",
"ubuntu-drivers",
"write-files-deferred",
"puppet",
"chef",
"mcollective",
"salt-minion",
"reset_rmc",
"refresh_rmc_and_interface",
"rightscale_userdata",
"scripts-vendor",
"scripts-per-once",
"scripts-per-boot",
"scripts-per-instance",
"scripts-user",
"ssh-authkey-fingerprints",
"keys-to-console",
"install-hotplug",
"phone-home",
"final-message",
"power-state-change"
],
"cloud_init_modules": [
"migrator",
"seed_random",
"bootcmd",
"write-files",
"growpart",
"resizefs",
"disk_setup",
"mounts",
"set_hostname",
"update_hostname",
"update_etc_hosts",
"ca-certs",
"rsyslog",
"users-groups",
"ssh"
],
"datasource_list": [
"NoCloud",
"ConfigDrive",
"CloudCIX",
"OpenNebula",
"DigitalOcean",
"Azure",
"AltCloud",
"OVF",
"MAAS",
"GCE",
"OpenStack",
"CloudSigma",
"SmartOS",
"Bigstep",
"Scaleway",
"AliYun",
"Ec2",
"CloudStack",
"Hetzner",
"IBMCloud",
"Oracle",
"Exoscale",
"RbxCloud",
"UpCloud",
"VMware",
"Vultr",
"LXD",
"None"
],
"def_log_file": "/var/log/cloud-init.log",
"di_report": {
"datasource_list": []
},
"disable_root": true,
"log_cfgs": [
[
"[loggers]\nkeys=root,cloudinit\n\n[handlers]\nkeys=consoleHandler,cloudLogHandler\n\n[formatters]\nkeys=simpleFormatter,arg0Formatter\n\n[logger_root]\nlevel=DEBUG\nhandlers=consoleHandler,cloudLogHandler\n\n[logger_cloudinit]\nlevel=DEBUG\nqualname=cloudinit\nhandlers=\npropagate=1\n\n[handler_consoleHandler]\nclass=StreamHandler\nlevel=WARNING\nformatter=arg0Formatter\nargs=(sys.stderr,)\n\n[formatter_arg0Formatter]\nformat=%(asctime)s - %(filename)s[%(levelname)s]: %(message)s\n\n[formatter_simpleFormatter]\nformat=[CLOUDINIT] %(filename)s[%(levelname)s]: %(message)s\n",
"[handler_cloudLogHandler]\nclass=FileHandler\nlevel=DEBUG\nformatter=arg0Formatter\nargs=('/var/log/cloud-init.log', 'a', 'UTF-8')\n"
]
],
"output": {
"all": "| tee -a /var/log/cloud-init-output.log"
},
"preserve_hostname": false,
"syslog_fix_perms": [
"syslog:adm",
"root:adm",
"root:wheel",
"root:root"
],
"users": [
"default"
],
"vendor_data": {
"enabled": true,
"prefix": []
},
"vendor_data2": {
"enabled": true,
"prefix": []
}
},
"platform": "cloudcix",
"public_ssh_keys": [],
"python_version": "3.8.10",
"region": null,
"sensitive_keys": [
"merged_cfg"
],
"subplatform": "unknown",
"sys_info": {
"dist": [
"ubuntu",
"20.04",
"focal"
],
"platform": "Linux-5.4.0-26-generic-x86_64-with-glibc2.29",
"python": "3.8.10",
"release": "5.4.0-26-generic",
"system": "Linux",
"uname": [
"Linux",
"ubuntu",
"5.4.0-26-generic",
"#30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020",
"x86_64",
"x86_64"
],
"variant": "ubuntu"
},
"system_platform": "Linux-5.4.0-26-generic-x86_64-with-glibc2.29",
"userdata": "#cloud-config\nruncmd:\n - mkdir /home/administrator/startup",
"v1": {
"_beta_keys": [
"subplatform"
],
"availability-zone": null,
"availability_zone": null,
"cloud-name": "cloudcix",
"cloud_name": "cloudcix",
"distro": "ubuntu",
"distro_release": "focal",
"distro_version": "20.04",
"instance-id": "91_94",
"instance_id": "91_94",
"kernel_release": "5.4.0-26-generic",
"local-hostname": "ubuntu",
"local_hostname": "ubuntu",
"machine": "x86_64",
"platform": "cloudcix",
"public_ssh_keys": [],
"python_version": "3.8.10",
"region": null,
"subplatform": "unknown",
"system_platform": "Linux-5.4.0-26-generic-x86_64-with-glibc2.29",
"variant": "ubuntu"
},
"variant": "ubuntu",
"vendordata": ""
}

cloudinit/url_helper.py Show resolved Hide resolved
cloudinit/sources/DataSourceCloudCIX.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good on this front. Two minor nits and let's land this.

  1. Add CloudCIX to supported public cloud in the README.md
  2. Add datasources/cloudcix.rst to doc/topics/datasources.rst so that docs get generated
  3. Rename your is_running_in_cloudcix method to _is_platform_viable and have it call to a module function def is_platform_viable() -> bool Example DataSourceLXD. (we are trying to standardize entrypoints in a module and datasource so they can be using either from the common BaseClass or external consumers/scripts at some point.

Here's the combined diff of my suggestions:

diff --git a/README.md b/README.md
index f2a745f87..d65e99302 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ get in contact with that distribution and send them our way!
 
 | Supported OSes | Supported Public Clouds | Supported Private Clouds |
 | --- | --- | --- |
-| Alpine Linux<br />ArchLinux<br />Debian<br />DragonFlyBSD<br />Fedora<br />FreeBSD<br />Gentoo Linux<br />NetBSD<br />OpenBSD<br />openEuler<br />RHEL/CentOS/AlmaLinux/Rocky/PhotonOS/Virtuozzo/EuroLinux/CloudLinux/MIRACLE LINUX<br />SLES/openSUSE<br />Ubuntu<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /> | Amazon Web Services<br />Microsoft Azure<br />Google Cloud Platform<br />Oracle Cloud Infrastructure<br />Softlayer<br />Rackspace Public Cloud<br />IBM Cloud<br />DigitalOcean<br />Bigstep<br />Hetzner<br />Joyent<br />CloudSigma<br />Alibaba Cloud<br />OVH<br />OpenNebula<br />Exoscale<br />Scaleway<br />CloudStack<br />AltCloud<br />SmartOS<br />HyperOne<br />Vultr<br />Rootbox<br /> | Bare metal installs<br />OpenStack<br />LXD<br />KVM<br />Metal-as-a-Service (MAAS)<br />VMware<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />|
+| Alpine Linux<br />ArchLinux<br />Debian<br />DragonFlyBSD<br />Fedora<br />FreeBSD<br />Gentoo Linux<br />NetBSD<br />OpenBSD<br />openEuler<br />RHEL/CentOS/AlmaLinux/Rocky/PhotonOS<br />Virtuozzo/EuroLinux/CloudLinux/MIRACLE LINUX<br />SLES/openSUSE<br />Ubuntu<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /> | Amazon Web Services<br />Microsoft Azure<br />Google Cloud Platform<br />Oracle Cloud Infrastructure<br />Softlayer<br />Rackspace Public Cloud<br />IBM Cloud<br />DigitalOcean<br />Bigstep<br />Hetzner<br />Joyent<br />CloudSigma<br />Alibaba Cloud<br />OVH<br />OpenNebula<br />Exoscale<br />Scaleway<br />CloudStack<br />CloudCIX<br />AltCloud<br />SmartOS<br />HyperOne<br />Vultr<br />Rootbox<br />CloudCIX<br />| Bare metal installs<br />OpenStack<br />LXD<br />KVM<br />Metal-as-a-Service (MAAS)<br />VMware<br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br /><br />|
 
 ## To start developing cloud-init
 
diff --git a/cloudinit/sources/DataSourceCloudCIX.py b/cloudinit/sources/DataSourceCloudCIX.py
index 76f9fec68..71d769c50 100644
--- a/cloudinit/sources/DataSourceCloudCIX.py
+++ b/cloudinit/sources/DataSourceCloudCIX.py
@@ -28,7 +28,7 @@ class DataSourceCloudCIX(sources.DataSource):
         )
 
     def _get_data(self):
-        if not self.is_running_in_cloudcix():
+        if not self._is_platform_viable():
             return False
 
         try:
@@ -51,8 +51,15 @@ class DataSourceCloudCIX(sources.DataSource):
         self.userdata_raw = md["user-data"]
         return True
 
-    def is_running_in_cloudcix(self):
-        return dmi.read_dmi_data("system-product-name") == self.dsname
+    def _is_platform_viable(self):
+        return is_platform_viable()
+
+
+def is_platform_viable() -> bool:
+    """Return True when this platform appears to be CloudCIX."""
+    return (
+        dmi.read_dmi_data("system-product-name") == DataSourceCloudCIX.dsname
+    )
 
 
 def read_metadata(base_url, url_params):
diff --git a/doc/rtd/topics/datasources.rst b/doc/rtd/topics/datasources.rst
index fc08bb7de..2793ba9ff 100644
--- a/doc/rtd/topics/datasources.rst
+++ b/doc/rtd/topics/datasources.rst
@@ -33,6 +33,7 @@ The following is a list of documents for each supported datasource:
    datasources/azure.rst
    datasources/cloudsigma.rst
    datasources/cloudstack.rst
+   datasources/cloudcix.rst
    datasources/configdrive.rst
    datasources/digitalocean.rst
    datasources/e24cloud.rst
diff --git a/doc/rtd/topics/datasources/cloudcix.rst b/doc/rtd/topics/datasources/cloudcix.rst
index bba2c22ba..1c0d0bdf3 100644
--- a/doc/rtd/topics/datasources/cloudcix.rst
+++ b/doc/rtd/topics/datasources/cloudcix.rst
@@ -7,7 +7,7 @@ CloudCIX serves metadata through an internal server, accessible at
 ``http://169.254.169.254/v1``. The metadata and userdata can be fetched at
 the ``/metadata`` and ``/userdata`` paths respectively.
 
-CloudCIX instances are identified by the dmi Product Name.
+CloudCIX instances are identified by the DMI product name `CloudCIX`.
 
 Configuration
 -------------
diff --git a/tests/unittests/sources/test_cloudcix.py b/tests/unittests/sources/test_cloudcix.py
index 5099ff9e1..a2c1311fc 100644
--- a/tests/unittests/sources/test_cloudcix.py
+++ b/tests/unittests/sources/test_cloudcix.py
@@ -46,10 +46,12 @@ class TestDataSourceCloudCIX(CiTestCase):
         )
 
     def test_identifying_cloudcix(self):
-        self.assertTrue(self.datasource.is_running_in_cloudcix())
+        self.assertTrue(self.datasource._is_platform_viable())
+        self.assertTrue(ds_mod.is_platform_viable())
 
         self.m_read_dmi_data.return_value = "OnCloud9"
-        self.assertFalse(self.datasource.is_running_in_cloudcix())
+        self.assertFalse(self.datasource._is_platform_viable())
+        self.assertFalse(ds_mod.is_platform_viable())
 
     @mock.patch("cloudinit.url_helper.readurl")
     def test_reading_metadata_on_cloudcix(self, m_readurl):

@BrinKe-dev if you git fetch upstream ;git rebase upstream/main; tox -e doc; xdg-open doc/rtd_html/topics/datasources.html you should hope to see your CloudCIX datasource docs rendered in a browser for our readthedocs.

@BrinKe-dev
Copy link
Contributor Author

Hi @blackboxsw, we're considering modifying our metadata service to enable us to use the Ephemeral DHCP option, which may take a while to complete. Would it be alright to shelve this pull request for now. Apologies for the inconvenience.

@blackboxsw
Copy link
Collaborator

Hi @blackboxsw, we're considering modifying our metadata service to enable us to use the Ephemeral DHCP option, which may take a while to complete. Would it be alright to shelve this pull request for now. Apologies for the inconvenience.

This is actually good to hear @BrinKe-dev. I think that's a good solid choice for support. Our github actions will automatically close this PR in 2 weeks, but you should be re-open it once you push another commit on top without issue. Thanks for the heads up glad to hear you are able to iterate so quickly.

@blackboxsw
Copy link
Collaborator

If you want to clear it before our bot a simple "close with comment" on this PR will work too. Your choice

@github-actions github-actions bot added the stale-pr Pull request is stale; will be auto-closed soon label Apr 30, 2022
@github-actions github-actions bot closed this May 8, 2022
@blackboxsw blackboxsw added this to the cloud-init-24.4 milestone Aug 30, 2024
@jgrassler
Copy link

Ok, I'll look into converting the tests to pytest then. Hopefully I won't run into any sticky situations but if I do, I'd be grateful for help! Also, thanks for the pytestify tip - I would have tackled this entirely manually otherwise :-)

@jgrassler
Copy link

Ok, I converted the unit tests to pytest now.

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test refactor here @jgrassler I have a couple of test nits inline.

I do think we neeed to backout that EphemeralIPNetwork since CIX datasource is only registered to discover the datasource once system networking is up.

Also, in running unittests here I noticed a significant delay in tests which invoked the url retry logic (and ran pytest --durantions=0 to validate slowest unit tests to be sure and have ensured we mocked time.sleep to avoid sluggish tests as well as a chance to assert that our retries use a our default 5 seconds between the one retry configured in the tests.

Below is a diff containing the majority of test changes and CIX module changes that I had suggested inline (with the exception I think of the docstring request).

Thanks again for your contributions, take what you will of my proposed diff

diff --git a/cloudinit/sources/DataSourceCloudCIX.py b/cloudinit/sources/DataSourceCloudCIX.py
index af10dfccb..69e810258 100644
--- a/cloudinit/sources/DataSourceCloudCIX.py
+++ b/cloudinit/sources/DataSourceCloudCIX.py
@@ -72,7 +72,7 @@ class DataSourceCloudCIX(sources.DataSource):
         md_url = self.determine_md_url()
         if md_url is None:
             raise sources.InvalidMetaDataException(
-                "Could not determine MetaData url"
+                "Could not determine metadata URL"
             )
 
         data = read_metadata(md_url, self.get_url_params())
@@ -137,7 +137,7 @@ class DataSourceCloudCIX(sources.DataSource):
             name = macs_to_nics.get(iface["mac_address"])
             if name is None:
                 LOG.warning(
-                    "Metadata mac address %s not found.", iface["mac_address"]
+                    "Metadata MAC address %s not found.", iface["mac_address"]
                 )
                 continue
             netcfg["ethernets"][name] = {
@@ -155,7 +155,7 @@ def is_platform_viable() -> bool:
     return dmi.read_dmi_data("system-product-name") == CLOUDCIX_DMI_NAME
 
 
-def read_metadata(base_url, url_params):
+def read_metadata(base_url: str, url_params):
     md = {}
     leaf_key_format_callback = (
         ("metadata", "meta-data", util.load_json),
diff --git a/tests/unittests/sources/test_cloudcix.py b/tests/unittests/sources/test_cloudcix.py
index 876358f04..3d143491e 100644
--- a/tests/unittests/sources/test_cloudcix.py
+++ b/tests/unittests/sources/test_cloudcix.py
@@ -1,11 +1,11 @@
 # This file is part of cloud-init. See LICENSE file for license information.
 import json
-from unittest.mock import PropertyMock
+from unittest import mock
 
 import pytest
 import responses
 
-from cloudinit import distros, helpers, sources
+from cloudinit import distros, sources
 from cloudinit import url_helper as uh
 from cloudinit.sources import DataSourceCloudCIX as ds_mod
 from cloudinit.sources import InvalidMetaDataException
@@ -62,33 +62,31 @@ class TestDataSourceCloudCIX:
     Test reading the meta-data
     """
 
-    allowed_subp = True
-
     @pytest.fixture(autouse=True)
-    def setup(self, mocker, tmp_path):
-        self.paths = helpers.Paths({"run_dir": tmp_path})
+    def setup(self, mocker, tmpdir, paths):
+        self.paths = paths
         self.datasource = self._get_ds()
         self.m_read_dmi_data = mocker.patch(
             "cloudinit.dmi.read_dmi_data",
-            new_callable=PropertyMock,
+            new_callable=mock.PropertyMock,
         )
         self.m_read_dmi_data.return_value = "CloudCIX"
 
         self._m_find_fallback_nic = mocker.patch(
             "cloudinit.net.find_fallback_nic",
-            new_callable=PropertyMock,
+            new_callable=mock.PropertyMock,
         )
         self._m_find_fallback_nic.return_value = "cixnic0"
         self._m_EphemeralIPNetwork_enter = mocker.patch(
             "cloudinit.net.ephemeral.EphemeralIPNetwork.__enter__",
-            new_callable=PropertyMock,
+            new_callable=mock.PropertyMock,
         )
         self._m_EphemeralIPNetwork_enter.return_value = (
             MockEphemeralIPNetworkWithStateMsg()
         )
         self._m_EphemeralIPNetwork_exit = mocker.patch(
             "cloudinit.net.ephemeral.EphemeralIPNetwork.__exit__",
-            new_callable=PropertyMock,
+            new_callable=mock.PropertyMock,
         )
         self._m_EphemeralIPNetwork_exit.return_value = (
             MockEphemeralIPNetworkWithStateMsg()
@@ -198,40 +196,46 @@ class TestDataSourceCloudCIX:
         assert self.datasource.userdata_raw == USERDATA
 
     @responses.activate
-    def test_failing_imds_endpoints(self):
+    def test_failing_imds_endpoints(self, mocker):
+        sleep = mocker.patch("time.sleep")
+        base_url = ds_mod.METADATA_URLS[0]
         # Make request before imds is set up
-        pytest.raises(
+        with pytest.raises(
             sources.InvalidMetaDataException,
-            self.datasource.crawl_metadata_service,
-        )
+            match="Could not determine metadata URL",
+        ):
+            self.datasource.crawl_metadata_service()
 
-        # Make imds respond to healthcheck
-        base_url = ds_mod.METADATA_URLS[0]
+        # Make imds respond to healthcheck but fail v1/metadata
         responses.add_callback(
             responses.GET,
             base_url,
             callback=MockImds.base_response,
         )
-
-        pytest.raises(
+        version = ds_mod.METADATA_VERSION
+        with pytest.raises(
             sources.InvalidMetaDataException,
-            self.datasource.crawl_metadata_service,
-        )
+            match="Could not determine metadata URL",
+        ):
+            self.datasource.crawl_metadata_service()
 
-        # Make imds serve metadata
-        version = ds_mod.METADATA_VERSION
+        # No sleep/retries when md_url returns 404. No viable IMDS found.
+        assert 0 == sleep.call_count
+
+        # Make imds serve metadata but ConnectionError on userdata
         responses.add_callback(
             responses.GET,
             uh.combine_url(base_url, f"v{version}", "metadata"),
             callback=MockImds.metadata_response,
         )
-
         pytest.raises(
             sources.InvalidMetaDataException,
             self.datasource.crawl_metadata_service,
         )
+        # Sleep called number of default datasource configured "retries"
+        assert [mock.call(5)] == sleep.call_args_list
 
-        # Make imds serve userdata
+        # Make IMDS serve userdata
         responses.add_callback(
             responses.GET,
             uh.combine_url(base_url, f"v{version}", "userdata"),
@@ -239,7 +243,10 @@ class TestDataSourceCloudCIX:
         )
 
         data = self.datasource.crawl_metadata_service()
-        assert data != dict()
+        assert data == {
+            "meta-data": METADATA,
+            "user-data": USERDATA.encode("utf-8"),
+        }
 
     @responses.activate
     def test_read_malformed_metadata(self):
@@ -270,7 +277,7 @@ class TestDataSourceCloudCIX:
         )
 
     @responses.activate
-    def test_bad_response_code(self):
+    def test_bad_response_code(self, mocker):
         def bad_response(response):
             return 404, response.headers, ""
 
@@ -283,10 +290,12 @@ class TestDataSourceCloudCIX:
             uh.combine_url(versioned_url, "metadata"),
             callback=bad_response,
         )
-
-        pytest.raises(
+        sleep = mocker.patch("time.sleep")
+        with pytest.raises(
             InvalidMetaDataException,
-            ds_mod.read_metadata,
-            versioned_url,
-            self.datasource.get_url_params(),
-        )
+            match=f"Failed to fetch IMDS metadata: {versioned_url}",
+        ):
+            ds_mod.read_metadata(
+                versioned_url, self.datasource.get_url_params()
+            )
+        assert [mock.call(5)] == sleep.call_args_list


def __init__(self, sys_cfg, distro, paths):
super(DataSourceCloudCIX, self).__init__(sys_cfg, distro, paths)
self.distro = distro
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

distro is already set in the parent class. no need to do that here too.

Suggested change
self.distro = distro

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Removed.

Comment on lines 108 to 110
md_url = url_helper.combine_url(
base_url, "v{0}".format(version)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already calculated this value above. no need to recalculate upon success. Also no need for md_url local variable. Let's just set self._metadata_url.

Suggested change
md_url = url_helper.combine_url(
base_url, "v{0}".format(version)
)
self._metadata_url = url

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I ripped the redundant bits out now (the second combine_url() operation is still needed since it adds the version).

return None

# Find the highest supported metadata version
md_url = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the local var

Suggested change
md_url = None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

else:
LOG.debug("No metadata found at URL %s", url)

self._metadata_url = md_url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already performed above in review comment.

Suggested change
self._metadata_url = md_url

cloudinit/sources/DataSourceCloudCIX.py Show resolved Hide resolved
md_url = self.determine_md_url()
if md_url is None:
raise sources.InvalidMetaDataException(
"Could not determine MetaData url"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the same capitalization and spelling in customer-facing messaging

Suggested change
"Could not determine MetaData url"
"Could not determine metadata URL"

name = macs_to_nics.get(iface["mac_address"])
if name is None:
LOG.warning(
"Metadata mac address %s not found.", iface["mac_address"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Metadata mac address %s not found.", iface["mac_address"]
"Metadata MAC address %s not found.", iface["mac_address"]

Comment on lines 203 to 206
pytest.raises(
sources.InvalidMetaDataException,
self.datasource.crawl_metadata_service,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's nail down a bit more specific assert matching to ensure we getting the error messages we think we are for these asserts. We can do that using pytest.raises as a context manager and providing the match param.

Suggested change
pytest.raises(
sources.InvalidMetaDataException,
self.datasource.crawl_metadata_service,
)
with pytest.raises(
sources.InvalidMetaDataException,
match="Could not determine metadata URL",
):
self.datasource.crawl_metadata_service()

return False

self.metadata = crawled_data["meta-data"]
self.userdata_raw = util.decode_binary(crawled_data["user-data"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. thanks for the explanation

Comment on lines 55 to 58
):
crawled_data = util.log_time(
logfunc=LOG.debug,
msg="Crawl of metadata service",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrassler sorry about the back and forth here with my suggestion about ephemeral network setup... Given @Brinke-dev's response stating that DHCP platform support is not currently possible, we don't need or want this initial EphemeralIPNetwork setup anymore unless CloudCIX happened to also register the datasource for discovering in the init-local boot stage by adding a second item to the datasources registry at the bottom of the DataSourceCloudCIX module.

(DataSourceCloudCIX, (sources.DEP_FILESYSTEM)), # Run at init-local boot stage

It's only in the init-local boot stage that a datasource would need to attempt network bringup itself via one of the Ephemeral* context managers.

* Applied suggested diff
* Removed EphemeralIPNetwork setup
* Removed redundant local variable from determine_md_url()
* Check for actual error message in invalid JSON unit test
@jgrassler
Copy link

I do think we neeed to backout that EphemeralIPNetwork since CIX datasource is only registered to discover the datasource once system networking is up.

Done. No worries about the back and forth, there is a fair amount of history on this pull request by now...

Below is a diff containing the majority of test changes and CIX module changes that I had suggested inline (with the exception I think of the docstring request).

Thanks for the diff!

@blackboxsw
Copy link
Collaborator

blackboxsw commented Sep 9, 2024

@jgrassler things look good on my end. If you can attach the output of the following commands where this latest iteration of this PR is running that would be great to assert success on your platform. I'll merge after that.

cat > /etc/cloud/cloud.cfg.d/95_enable_cix_datasource.cfg <<EOF
datasource_list: [ NoCloud, CloudCIX ]
EOF

DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force
cloud-init clean --logs --reboot

# await clean reboot
cloud-init status --wait --format=yaml

metadata service
- *timeout*: How long in seconds to wait for a response from the metadata
service
- *wait*: How long in seconds to wait between consecutive requests to the
Copy link
Collaborator

@blackboxsw blackboxsw Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- *wait*: How long in seconds to wait between consecutive requests to the
- *sec_between_retries*: How long in seconds to wait between consecutive requests to the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@jgrassler
Copy link

@jgrassler things look good on my end. If you can attach the output of the following commands where this latest iteration of this PR is running that would be great to assert success on your platform. I'll merge after that.

Sure. I presume you're mainly interested in...

...this command's output (the data source is already enabled and the VM has been initialized using cloud-init so I don't think there's much of a point in rebooting/re-initializing it):

root@noble:/home/ubuntu# DI_LOG=stderr /usr/lib/cloud-init/ds-identify --force
[up 101.50s] ds-identify --force
policy loaded: mode=search report=false found=all maybe=all notfound=disabled
/etc/cloud/cloud.cfg.d/90_dpkg.cfg set datasource_list: [ NoCloud, ConfigDrive, OpenNebula, DigitalOcean, Azure, AltCloud, OVF, MAAS, GCE, OpenStack, CloudSigma, SmartOS, Bigstep, Scaleway, AliYun, Ec2, CloudCIX, CloudStack, Hetzner, IBMCloud, Oracle, Exoscale, RbxCloud, UpCloud, VMware, Vultr, LXD, NWCS, Akamai, WSL, None ]
DMI_PRODUCT_NAME=CloudCIX
DMI_SYS_VENDOR=QEMU
DMI_PRODUCT_SERIAL=
DMI_PRODUCT_UUID=f7f0e9bd-19b6-4276-9d18-886dae37f486
PID_1_PRODUCT_NAME=unavailable
DMI_CHASSIS_ASSET_TAG=
DMI_BOARD_NAME=unavailable
FS_LABELS=UEFI,UEFI,cloudimg-rootfs,BOOT
ISO9660_DEVS=
KERNEL_CMDLINE=BOOT_IMAGE=/vmlinuz-6.8.0-40-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
VIRT=kvm
UNAME_KERNEL_NAME=Linux
UNAME_KERNEL_VERSION=#40-Ubuntu SMP PREEMPT_DYNAMIC Fri Jul 5 10:34:03 UTC 2024
UNAME_MACHINE=x86_64
DSNAME=
DSLIST=NoCloud ConfigDrive OpenNebula DigitalOcean Azure AltCloud OVF MAAS GCE OpenStack CloudSigma SmartOS Bigstep Scaleway AliYun Ec2 CloudCIX CloudStack Hetzner IBMCloud Oracle Exoscale RbxCloud UpCloud VMware Vultr LXD NWCS Akamai WSL None
MODE=search
ON_FOUND=all
ON_MAYBE=all
ON_NOTFOUND=disabled
pid=1279 ppid=1269
is_container=false
is_ds_enabled(IBMCloud) = true.
is_ds_enabled(IBMCloud) = true.
ec2 platform is 'Unknown'.
check for 'CloudCIX' returned found
Found single datasource: CloudCIX
[up 101.55s] returning 0

That's from an instance already initialized using the CloudCIX data source.

Not sure if you'd like to see them as well, but I'll also include some log snippets for good measure.

/var/log/cloud-init.log

2024-09-10 07:31:16,417 - util.py[DEBUG]: Cloud-init v. 24.2-0ubuntu2~24.04.2 finished at Tue, 10 Sep 2024 07:31:16 +0000. Datasource DataSourceCloudCIX.  Up 62.11 seconds

/var/log/cloud-init-output.log

Cloud-init v. 24.2-0ubuntu2~24.04.2 running 'modules:config' at Tue, 10 Sep 2024 07:30:40 +0000. Up 26.39 seconds.
Cloud-init v. 24.2-0ubuntu2~24.04.2 running 'modules:final' at Tue, 10 Sep 2024 07:31:16 +0000. Up 61.89 seconds.
Cloud-init v. 24.2-0ubuntu2~24.04.2 finished at Tue, 10 Sep 2024 07:31:16 +0000. Datasource DataSourceCloudCIX.  Up 62.11 seconds

@jgrassler
Copy link

jgrassler commented Sep 10, 2024

Please do not merge, yet. When I tested this by throwing an actual application at it, I noticed that I forgot adding metadata fields for configuring routes, DNS servers and DNS search lists. I'll update the pull request once I've added these bits.

@blackboxsw
Copy link
Collaborator

Please do not merge, yet. When I tested this by throwing an actual application at it, I noticed that I forgot adding metadata fields for configuring routes, DNS servers and DNS search lists. I'll update the pull request once I've added these bits.

Thank you for your timely response. Will await your confirmation before I perform a final review and merge.

@blackboxsw blackboxsw added the incomplete Action required by submitter label Sep 10, 2024
* Added an expected network configuration that should be
  produced by the metadata used for tests.
* Added an extra assert in test_reading_metadata_on_cloudcix
  to make sure the network configuration generated from
  metadata matches the expected network configuration.
* Removed mocks for EphemeralIPNetwork
@jgrassler
Copy link

jgrassler commented Sep 10, 2024

Ok, I added the plumbing for passing in routes and DNS config now, along with an assert that checks the network configuration generated from metadata against an expected one. Also, I removed the redundant mocks that used to be required with EphemeralIPNetwork in place. As usual I field tested the changes against Focal, Jammy and Noble cloud images and they create the routes/DNS configuration they are supposed to.

As far as I am concerned this is ready for final review.


for iface in metadata["network"]["interfaces"]:
name = macs_to_nics.get(iface["mac_address"])
routes = iface.get("routes", None)
Copy link
Collaborator

@blackboxsw blackboxsw Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrassler it appears you may have control over your datasource producing structured data for network config. I find it a bit strange that your IMDS example data surfaces "routes" content that is structured as viable netplan v2 values for routes and nameservers and you are essentially reconstructing that network v2 in cloud-init. If your datasource has the ability to provide full network: version: 2 config dictionary under the 'network' key, then it would give you flexibility to grow/change your network config in the future in CloudCIX server Instance Metadata Service directly without waiting on any changes (and published releases) from cloud-init. If you have control over that element in CloudCIX IMDS, that may be a slightly more flexible path for you in the future

Side note: if cloud-init sees network: version: 2 from any IMDS and it is on an image with netplan installed, cloud-init passes that network config directly to netplan for writing and managing network configuration. So cloud-init doesn't have to have to grow support for any new netplan (network: version: 2) keys/values or structures and the IMDS network configuration dictionaries can evolve for more complex networking if needed without any changes made to cloud-init to support those future network needs.

Copy link
Collaborator

@blackboxsw blackboxsw Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrassler This is not a blocking review comment preventing us from landing this PR upstream in cloud-init, but it does make future uploads of CloudCIX a bit more complex if we have to worry about CloudCIX network config formats changing in the future and old distribution images in cloud-init not being able to consume a changed network config IMDS endpoint in CloudCIX.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrassler it appears you may have control over your datasource producing structured data for network config.

Yes, we do indeed.

I find it a bit strange that your IMDS example data surfaces "routes" content that is structured as viable netplan v2 values for routes and nameservers and you are essentially reconstructing that network v2 in cloud-init.

That is correct, too. We structured it in that manner to reduce the translation work needed in cloud-init.

If your datasource has the ability to provide full network: version: 2 config dictionary under the 'network' key, then it would give you flexibility to grow/change your network config in the future in CloudCIX server Instance Metadata Service directly without waiting on any changes (and published releases) from cloud-init. If you have control over that element in CloudCIX IMDS, that may be a slightly more flexible path for you in the future

We had a discussion about this today and we decided to go with it because we're not 100% sure what else we may or may not need to configure in the future either. I'm making and testing the change right now and will update the pull request later.

Comment on lines 30 to 35
def _unpickle(self, ci_pkl_version: int) -> None:
super()._unpickle(ci_pkl_version)
if not hasattr(self, "_metadata_url"):
setattr(self, "_metadata_url", None)
if not hasattr(self, "_net_cfg"):
setattr(self, "_net_cfg", None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop this _unpickle logic as this is the first iteration of this datasource so _unpickle guarding is unnecessary as we won't be deserializing an older version which doesn't yet have these instance attrs.

Suggested change
def _unpickle(self, ci_pkl_version: int) -> None:
super()._unpickle(ci_pkl_version)
if not hasattr(self, "_metadata_url"):
setattr(self, "_metadata_url", None)
if not hasattr(self, "_net_cfg"):
setattr(self, "_net_cfg", None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it may technically be unneccessary, it will cause the unit test that checks for it to fail:

 tests/unittests/test_upgrade.py::TestUpgrade::test_all_ds_init_vs_unpickle_attributes[mode1] - AssertionError: New CloudCIX attributes need unpickle coverage: {'_metadata_url', '_net_cfg'}

So unless that failure is ok, I'll leave it in for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgrassler this unit test failure is just due to the fact that the test is short-sighted and doesn't take into account brand new datasources which have yet to have been released. The following diff fixes this test and drops your unpickle method as it's only needed if CloudCIX pre-existed in published images and we need to worry about that obj.pkl loading across system reboot after a package upgrade of cloud-init with introduces new instance attributes across that upgrade. I'l

    drop unneeded unpickle as CloudCIX hsas not released so no pickle issues

diff --git a/cloudinit/sources/DataSourceCloudCIX.py b/cloudinit/sources/DataSourceCloudCIX.py
index 6371434ee..8f6ef4a1b 100644
--- a/cloudinit/sources/DataSourceCloudCIX.py
+++ b/cloudinit/sources/DataSourceCloudCIX.py
@@ -27,13 +27,6 @@ class DataSourceCloudCIX(sources.DataSource):
         self._metadata_url = None
         self._net_cfg = None
 
-    def _unpickle(self, ci_pkl_version: int) -> None:
-        super()._unpickle(ci_pkl_version)
-        if not hasattr(self, "_metadata_url"):
-            setattr(self, "_metadata_url", None)
-        if not hasattr(self, "_net_cfg"):
-            setattr(self, "_net_cfg", None)
-
     def _get_data(self):
         """
         Fetch the user data and the metadata
diff --git a/tests/unittests/test_upgrade.py b/tests/unittests/test_upgrade.py
index 5c8eef5a5..32a3d7c2b 100644
--- a/tests/unittests/test_upgrade.py
+++ b/tests/unittests/test_upgrade.py
@@ -47,6 +47,7 @@ class TestUpgrade:
             "seed",
             "seed_dir",
         },
+        "CloudCIX": {"_metadata_url", "_net_cfg"},
         "CloudSigma": {"cepko", "ssh_public_key"},
         "CloudStack": {
             "api_ver",

func=self.crawl_metadata_service,
)
except sources.InvalidMetaDataException as error:
LOG.debug(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be LOG.error as we have failed to detect a platform that claimed to pass is_platform_viable for CloudCIX based on DMI-data. So something is broken in this environment and should be represented as an error instead of debug.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jgrassler
Copy link

jgrassler commented Sep 17, 2024

Ok, I modified our IMDS to pass a full netplan v2 configuration data structure and adjusted the data source code accordingly. Tested and working on Ubuntu Focal, Jammy and Noble images. Ready for review once more.

@blackboxsw
Copy link
Collaborator

@jgrassler LGTM! 🏆 If you can apply the following diff for the unittest/unpickle we can ship this.

--- a/cloudinit/sources/DataSourceCloudCIX.py
+++ b/cloudinit/sources/DataSourceCloudCIX.py
@@ -27,13 +27,6 @@ class DataSourceCloudCIX(sources.DataSource):
         self._metadata_url = None
         self._net_cfg = None
 
-    def _unpickle(self, ci_pkl_version: int) -> None:
-        super()._unpickle(ci_pkl_version)
-        if not hasattr(self, "_metadata_url"):
-            setattr(self, "_metadata_url", None)
-        if not hasattr(self, "_net_cfg"):
-            setattr(self, "_net_cfg", None)
-
     def _get_data(self):
         """
         Fetch the user data and the metadata
diff --git a/tests/unittests/test_upgrade.py b/tests/unittests/test_upgrade.py
index 5c8eef5a5..32a3d7c2b 100644
--- a/tests/unittests/test_upgrade.py
+++ b/tests/unittests/test_upgrade.py
@@ -47,6 +47,7 @@ class TestUpgrade:
             "seed",
             "seed_dir",
         },
+        "CloudCIX": {"_metadata_url", "_net_cfg"},
         "CloudSigma": {"cepko", "ssh_public_key"},
         "CloudStack": {
             "api_ver",

@blackboxsw blackboxsw removed the incomplete Action required by submitter label Sep 18, 2024
@jgrassler
Copy link

@blackboxsw Thank you! I applied your patch now, so let's...

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this effort and making cloud-init better. What remains once this lands for support in Ubuntu is a packaging branch against the https://github.com/canonical/cloud-init/tree/ubuntu/devel branch to enable CloudCIX in default debian packaging through a commit like 8be3470.

You can propose that changeset if you or we can when we get to it.

Will squash merge once we have CI pass here.

@blackboxsw blackboxsw merged commit 53857c8 into canonical:main Sep 19, 2024
20 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation This Pull Request changes documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants