-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Aeza.net hosting provider #5299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you much @cofob for thinking of adding cloud-init support and driving this pull request to make cloud-init better on your platform. I've added some intiial review comments, suggestions and questions in this first pass. I'd like to see if we can include this support in our next 24.3 release milestone and we'll work with you to get this across the line.
Also a general security question here. Given that user-data and vendor-data likely contains potentially sensitive information and Aeza.net's instance metadata endpoint is providing content over http that appears to be accessible to anyone with the URL http://77.221.156.49/v1/cloudinit/3c0fff62-b1db-4165-a2b2-f819c5ddf83f/meta-data and UUIDs can be guessable, what sort of authentication layer do you have planned to avoid potential exposure of sensitive information to unwanted parties? |
The UUID is not guessable, as it always uses UUIDv4, in which 122 bits are random and using system-level random source. I believe that this is a sufficient layer of authorization. |
@blackboxsw can you please take a look? |
@TheRealFalcon can you please take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pings on this Pull request and continued work here. PTO got in the way her . I think there are still a couple of open discussion points we should understand (mandatory metadata/user-data files) possibly the use of util.read_seeded and moving some unnecessary side-effects out of __init__
cloudinit/sources/DataSourceAeza.py
Outdated
def _get_data(self): | ||
system_uuid = dmi.read_dmi_data("system-uuid") | ||
|
||
md = read_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way read_metadata is written, 404s in any of these route returns a None value.
I don't think we generally want to allow for absent or optional metadata or user-data endpoints to have a correctly discovered datasource.
For cloud-init to be certain it has contacted a viable instance metadata service that matches the discovered datasource type and not some bogus service that happened to be stood up at this public IP address, each datasource generally requires minimally meta-data and user-data files or components to exist (see DataSourceNoCloud as an example).
The meta-data should really be a yaml dict that minimally contains an unique instance-id
which cloud-init caches and acts as an indicator to cloud-init that it has applied the desired configuration from the Aeza.net, if that instance-id changes, cloud-init then knows desired configuration has changed for this instance, and it needs to read and apply new configuration for this node. This includes re-creating SSH host keys, setting up default users or whatever other config operations are declared from meta-data/user-data files.
Other keys in meta-data that are optional:
local-hostname
: the cloud platform's desired hostname for this nodecloud-name
: if desired (it will fallback to theDataSourceAeza.dsname.lower()
)public-keys
: List of public keys to import for the default userregion
: string representing the physical regionavailability-zone
: string representing separate AZ if applicable
Again best practice is to provide minimally instance-id
that give the platform a knob to trigger a re-run of cloud-init across system boot if necessary due to mandated config/provisioning changes. The rest are optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The meta-data should really be a yaml dict that minimally contains an unique instance-id which cloud-init caches
I think we might be being a tad too prescriptive here. At the end of the day, the metadata is useful so that the cloud can get the desired functionality from cloud-init. The form it takes doesn't really matter as long as the cloud-init datasource can scrape it and set the internal structures correctly. Cloud-init can even work just fine without any metadata, but I agree that in order to get cloud-init to do the right thing on snapshots and new instances, a cloud should really be giving us an instance-id...but at the end of the day I wouldn't block on it.
Hi @cofob, I don't see that your github username has signed the cloud-init's Contributor License Agreement. So that I can enable CI on this PR and move toward merging this PR, would you mind reviewing and signing our CLA so that we may include this work in upstream cloud-init? |
I signed the CLA |
@blackboxsw I have made the changes according to your comments. |
@blackboxsw can you review my changes please? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update looks better thank you @cofob. I have a number of comments inline, one of which would be that we need to rebase this branch against tip of main and adapt the util.read_seeded to either expect an ignored _network_config return value as the 4th item of the 4-tuple, or add a new param to util.read_seeded to ignore_files=['network-config']
to avoid that round trip on your platform.
Below is a diff of all other inline changes/suggestions I've made in review
commit 1e5417d14fc44e868f5e87e01e707976b2c321ac (HEAD -> aeza)
Author: Chad Smith <[email protected]>
Date: Thu Aug 15 17:45:27 2024 -0600
review: move side-effects out of __init__ and into _get_data
- factor side-effects out of __init__ and into _get_data
- raise source.InvalidMetaDataError on unexpected format
- update docs to mention other datasource config options retries/timeout
- add unittests for ds_detect
- use pytest instead of CiTestCase
diff --git a/cloudinit/sources/DataSourceAeza.py b/cloudinit/sources/DataSourceAeza.py
index 62f757095..fd43be43b 100644
--- a/cloudinit/sources/DataSourceAeza.py
+++ b/cloudinit/sources/DataSourceAeza.py
@@ -6,7 +6,7 @@
import logging
-from cloudinit import sources, dmi, util
+from cloudinit import dmi, sources, util
LOG = logging.getLogger(__name__)
@@ -24,45 +24,35 @@ class DataSourceAeza(sources.DataSource):
self.ds_cfg = util.mergemanydict([self.ds_cfg, BUILTIN_DS_CONFIG])
- url_params = self.get_url_params()
- self.timeout_seconds = url_params.timeout_seconds
- self.max_wait_seconds = url_params.max_wait_seconds
- self.retries = url_params.num_retries
- self.sec_between_retries = url_params.sec_between_retries
+ @staticmethod
+ def ds_detect():
+ return dmi.read_dmi_data("system-manufacturer") == "Aeza"
+ def _get_data(self):
system_uuid = dmi.read_dmi_data("system-uuid")
- self.metadata_address = (
+ metadata_address = (
self.ds_cfg["metadata_url"].format(
id=system_uuid,
)
+ "%s"
)
-
- @staticmethod
- def ds_detect():
- return dmi.read_dmi_data("system-manufacturer") == "Aeza"
-
- def _get_data(self):
+ url_params = self.get_url_params()
md, ud, vd = util.read_seeded(
- self.metadata_address,
- timeout=self.timeout_seconds,
- retries=self.retries,
+ metadata_address,
+ timeout=url_params.timeout_seconds,
+ retries=url_params.num_retries,
)
if md is None:
- LOG.warn(
- "Failed to read metadata from %s",
- self.metadata_address,
+ raise sources.InvalidMetaDataException(
+ f"Failed to read metadata from {metadata_address}",
)
- return False
if not isinstance(md.get("instance-id"), str):
- LOG.warn(
- "Metadata does not contain instance-id",
+ raise sources.InvalidMetaDataException(
+ f"Metadata does not contain instance-id: {md}"
)
- return False
if not isinstance(ud, bytes):
- LOG.warn("Userdata is not bytes")
- return False
+ raise sources.InvalidMetaDataException("Userdata is not bytes")
self.metadata, self.userdata_raw, self.vendordata_raw = md, ud, vd
diff --git a/doc/rtd/reference/datasources/aeza.rst b/doc/rtd/reference/datasources/aeza.rst
index 767215594..430cf3fd2 100644
--- a/doc/rtd/reference/datasources/aeza.rst
+++ b/doc/rtd/reference/datasources/aeza.rst
@@ -21,6 +21,21 @@ Aeza's datasource can be configured as follows: ::
Specifies the URL to retrieve the VPS meta-data. (optional)
Default: ``http://77.221.156.49/v1/cloudinit/{id}/``
+* ``timeout``
+
+ The timeout value provided to ``urlopen`` for each individual http request.
+ This is used both when selecting a ``metadata_url`` and when crawling the
+ metadata service.
+
+ Default: 10
+
+* ``retries``
+
+ The number of retries that should be attempted for an http request. This
+ value is used only after ``metadata_url`` is selected.
+
+ Default: 5
+
.. note::
``{id}`` in URLs is system-uuid DMI value.
diff --git a/tests/unittests/sources/test_aeza.py b/tests/unittests/sources/test_aeza.py
index 4f9e0363e..47d05675f 100644
--- a/tests/unittests/sources/test_aeza.py
+++ b/tests/unittests/sources/test_aeza.py
@@ -3,10 +3,14 @@
# Author: Egor Ternovoy <[email protected]>
#
# This file is part of cloud-init. See LICENSE file for license information.
+import re
+
+import pytest
from cloudinit import helpers, settings, util
-from cloudinit.sources import DataSourceAeza
-from tests.unittests.helpers import CiTestCase, mock
+from cloudinit.sources import DataSourceAeza as aeza
+from cloudinit.sources import InvalidMetaDataException
+from tests.unittests.helpers import mock
METADATA = util.load_yaml(
"""---
@@ -25,42 +29,117 @@ runcmd:
"""
-class TestDataSourceAeza(CiTestCase):
- """
- Test reading the meta-data
- """
+M_PATH = "cloudinit.sources.DataSourceAeza."
- def setUp(self):
- super(TestDataSourceAeza, self).setUp()
- self.tmp = self.tmp_dir()
- def get_ds(self):
- distro = mock.MagicMock()
- distro.get_tmp_exec_path = self.tmp_dir
- ds = DataSourceAeza.DataSourceAeza(
- settings.CFG_BUILTIN,
- distro,
- helpers.Paths({"run_dir": self.tmp}),
- )
- return ds
+class TestDataSourceAeza:
+ """Test Aeza.net reading instance-data"""
+
+ @pytest.mark.parametrize(
+ "system_manufacturer,expected",
+ (
+ pytest.param("Aeza", True, id="dmi_platform_match_aeza"),
+ pytest.param("aeza", False, id="dmi_platform_match_case_sensitve"),
+ pytest.param("Aezanope", False, id="dmi_platform_strict_match"),
+ ),
+ )
+ @mock.patch(f"{M_PATH}dmi.read_dmi_data")
+ def test_ds_detect(self, m_read_dmi_data, system_manufacturer, expected):
+ """Only strict case-senstiive match on DMI system-manfacturer Aeza"""
+ m_read_dmi_data.return_value = system_manufacturer
+ assert expected is aeza.DataSourceAeza.ds_detect()
+ @pytest.mark.parametrize(
+ "sys_cfg,expected_calls",
+ (
+ pytest.param(
+ {},
+ [
+ mock.call(
+ "http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
+ timeout=10,
+ retries=5,
+ )
+ ],
+ id="default_sysconfig_ds_url_retry_and_timeout",
+ ),
+ pytest.param(
+ {
+ "datasource": {
+ "Aeza": {
+ "timeout": 7,
+ "retries": 8,
+ "metadata_url": "https://somethingelse/",
+ }
+ }
+ },
+ [
+ mock.call(
+ "https://somethingelse/%s",
+ timeout=7,
+ retries=8,
+ )
+ ],
+ id="custom_sysconfig_ds_url_retry_and_timeout_overrides",
+ ),
+ ),
+ )
@mock.patch("cloudinit.util.read_seeded")
- @mock.patch("cloudinit.sources.DataSourceAeza.DataSourceAeza.ds_detect")
+ @mock.patch(f"{M_PATH}dmi.read_dmi_data")
def test_read_data(
self,
- m_ds_detect,
+ m_read_dmi_data,
m_read_seeded,
+ sys_cfg,
+ expected_calls,
+ paths,
+ tmpdir,
):
- m_ds_detect.return_value = True
+ m_read_dmi_data.return_value = "1dd9a779-uuid"
m_read_seeded.return_value = (METADATA, USERDATA, VENDORDATA)
- with self.allow_subp(True):
- ds = self.get_ds()
- ret = ds.get_data()
- self.assertTrue(ret)
+ ds = aeza.DataSourceAeza(
+ sys_cfg=sys_cfg, distro=mock.Mock(), paths=paths
+ )
+ with mock.patch.object(ds, "ds_detect", return_value=True):
+ assert True is ds.get_data()
+
+ assert m_read_seeded.call_args_list == expected_calls
+ assert ds.get_public_ssh_keys() == METADATA.get("public-keys")
+ assert isinstance(ds.get_public_ssh_keys(), list)
+ assert ds.userdata_raw == USERDATA
+ assert ds.vendordata_raw == VENDORDATA
+
+ @pytest.mark.parametrize(
+ "metadata,userdata,error_msg",
+ (
+ ({}, USERDATA, "Metadata does not contain instance-id: {}"),
+ (
+ None,
+ USERDATA,
+ "Failed to read metadata from http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
+ ),
+ ({"instance-id": "yep"}, "non-bytes", "Userdata is not bytes"),
+ ),
+ )
+ @mock.patch("cloudinit.util.read_seeded")
+ @mock.patch(f"{M_PATH}dmi.read_dmi_data")
+ def test_not_detected_on_invalid_instance_data(
+ self,
+ m_read_dmi_data,
+ m_read_seeded,
+ metadata,
+ userdata,
+ error_msg,
+ paths,
+ ):
+ """Assert get_data returns False for unexpected format conditions."""
+ m_read_dmi_data.return_value = "1dd9a779-uuid"
+ m_read_seeded.return_value = (metadata, userdata, VENDORDATA)
- self.assertTrue(m_read_seeded.called)
- self.assertEqual(ds.get_public_ssh_keys(), METADATA.get("public-keys"))
- self.assertIsInstance(ds.get_public_ssh_keys(), list)
- self.assertEqual(ds.get_userdata_raw(), USERDATA)
- self.assertEqual(ds.get_vendordata_raw(), VENDORDATA)
+ ds = aeza.DataSourceAeza(sys_cfg={}, distro=mock.Mock(), paths=paths)
+ with pytest.raises(
+ InvalidMetaDataException, match=re.escape(error_msg)
+ ):
+ with mock.patch.object(ds, "ds_detect", return_value=True):
+ ds.get_data()
diff --git a/tests/unittests/sources/test_common.py b/tests/unittests/sources/test_common.py
index 6a7b2d527..1e06b0efc 100644
--- a/tests/unittests/sources/test_common.py
+++ b/tests/unittests/sources/test_common.py
@@ -4,6 +4,7 @@ from unittest.mock import patch
from cloudinit import importer, settings, sources, type_utils
from cloudinit.sources import DataSource
+from cloudinit.sources import DataSourceAeza as Aeza
from cloudinit.sources import DataSourceAkamai as Akamai
from cloudinit.sources import DataSourceAliYun as AliYun
from cloudinit.sources import DataSourceAltCloud as AltCloud
@@ -34,7 +35,6 @@ from cloudinit.sources import DataSourceUpCloud as UpCloud
from cloudinit.sources import DataSourceVMware as VMware
from cloudinit.sources import DataSourceVultr as Vultr
from cloudinit.sources import DataSourceWSL as WSL
-from cloudinit.sources import DataSourceAeza as Aeza
from tests.unittests import helpers as test_helpers
DEFAULT_LOCAL = [
I can push this commit to your PR if you wish, and rebase against main if you like to ensure we minimally get the updated util.read_seeded changes, but I wanted to leave the decision to you as to what you would prefer to do? Do you want your platform to ignore network-config by default? If so, I can provide a separate PR if you like against tip of main to allow avoiding network-config requests from read_seeded that we could just rebase into this branch after review or we could incorporate that changeset into this branch.
What would you prefer?
cloudinit/sources/DataSourceAeza.py
Outdated
url_params = self.get_url_params() | ||
self.timeout_seconds = url_params.timeout_seconds | ||
self.max_wait_seconds = url_params.max_wait_seconds | ||
self.retries = url_params.num_retries | ||
self.sec_between_retries = url_params.sec_between_retries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's avoid side-effects in init and setting instance variables for values that are only used in a single function _get_data
. We can move this into _get_data as locals and drop the unused sec_between_retries and max_wait_seconds
tests/unittests/sources/test_aeza.py
Outdated
""" | ||
|
||
|
||
class TestDataSourceAeza(CiTestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the unit test start here. Some things we hope to generally consider in unittests:
- Across the board we are moving away from CITestCase as we are looking to retire that test class. We prefer instead to use more pytest instead of builtin unittest module. It gives us access to fixtures like tmpdir/caplog and parametrizatoin etc. Let's refactor this test a bit.
- The unittest we have here are mocks ds_detect and doesn't exercise what ds_detect queries. Let's get some test coverage there to assert we are actually looking at the 'right' dmi value.
self.retries = url_params.num_retries | ||
self.sec_between_retries = url_params.sec_between_retries | ||
|
||
system_uuid = dmi.read_dmi_data("system-uuid") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect this value to change at all across system reboot? If so, __init__
isn't going to be called across system reboot due to datasource cache in /var/lib/cloud/instance/obj.pkl. We should likely be performing this inside _get_data to ensure it's called each boot.
cloudinit/sources/DataSourceAeza.py
Outdated
) | ||
|
||
@staticmethod | ||
def ds_detect(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs unittest coverage
cloudinit/sources/DataSourceAeza.py
Outdated
) | ||
|
||
if md is None: | ||
LOG.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In cases where instance data isn't the expected format we generally raise sources.InvalidMetaDataException instead of logging a plain warning level message
cloudinit/sources/DataSourceAeza.py
Outdated
return False | ||
if not isinstance(md.get("instance-id"), str): | ||
LOG.warn( | ||
"Metadata does not contain instance-id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's parametrize log values to tell us what we actually did see, intead of static logs that provide no further information.
"Metadata does not contain instance-id", | |
"Metadata does not contain instance-id: %s", md |
cloudinit/sources/DataSourceAeza.py
Outdated
) | ||
return False | ||
if not isinstance(ud, bytes): | ||
LOG.warn("Userdata is not bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't parametrize user-data and add it to logs generally because it's highly likely to contain sensitive information (passwords) which we don't want to persist in cloud-init.log if possible.
cloudinit/sources/DataSourceAeza.py
Outdated
return dmi.read_dmi_data("system-manufacturer") == "Aeza" | ||
|
||
def _get_data(self): | ||
md, ud, vd = util.read_seeded( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A commit landed in tip of main that is incompatible with this changeset as read_seeded now returns a 4-tuple from util.read_seeded which looks for a supplemental/optional network-config file.We need to git rebase on this branch to refresh to latest commit on main and possibly ignore that network-config 4-tuple return value. I haven't rebased your PR against upstream/main and pushed my suggested changes as a commit because we may actually want to think about providing a supplemental parameter to allow read_seeded to ignore GETs of network-config since network-config isdisregared by this platform and we don't want to waste URL retries on an unexpected/irrelevant file.
@cofob if the proposed changes work for you, I think the final thing we'd need (once we agree on the approach for how to address
|
Ping @cofob. I think this PR needs a |
Co-authored-by: Chad Smith <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Minor diff remaining to fix mypy, doc-check and ruff and dropped that extra unittest.
And lastly, can we get output of a successful run of the suggested commands against the latest commit to confirm successful datasource discovery?
diff --git a/cloudinit/sources/DataSourceAeza.py b/cloudinit/sources/DataSourceAeza.py
index d55dc16e0..40917efb3 100644
--- a/cloudinit/sources/DataSourceAeza.py
+++ b/cloudinit/sources/DataSourceAeza.py
@@ -52,9 +52,6 @@ class DataSourceAeza(sources.DataSource):
raise sources.InvalidMetaDataException(
f"Metadata does not contain instance-id: {md}"
)
- if not isinstance(ud, bytes):
- raise sources.InvalidMetaDataException("Userdata is not bytes")
-
self.metadata, self.userdata_raw, self.vendordata_raw = md, ud, vd
return True
diff --git a/doc/rtd/reference/datasources/aeza.rst b/doc/rtd/reference/datasources/aeza.rst
index 430cf3fd2..3b5b452a7 100644
--- a/doc/rtd/reference/datasources/aeza.rst
+++ b/doc/rtd/reference/datasources/aeza.rst
@@ -37,6 +37,6 @@ Aeza's datasource can be configured as follows: ::
Default: 5
.. note::
- ``{id}`` in URLs is system-uuid DMI value.
+ ``{id}`` in URLs is ``system-uuid`` DMI value.
.. _Aeza: https://wiki.aeza.net/cloud-init
diff --git a/tests/unittests/sources/test_aeza.py b/tests/unittests/sources/test_aeza.py
index a07cf43c3..0374e168a 100644
--- a/tests/unittests/sources/test_aeza.py
+++ b/tests/unittests/sources/test_aeza.py
@@ -7,7 +7,7 @@ import re
import pytest
-from cloudinit import helpers, settings, util
+from cloudinit import util
from cloudinit.sources import DataSourceAeza as aeza
from cloudinit.sources import InvalidMetaDataException
from tests.unittests.helpers import mock
@@ -121,7 +121,6 @@ class TestDataSourceAeza:
USERDATA,
"Failed to read metadata from http://77.221.156.49/v1/cloudinit/1dd9a779-uuid/%s",
),
- ({"instance-id": "yep"}, "non-bytes", "Userdata is not bytes"),
),
)
@mock.patch("cloudinit.util.read_seeded")
if not isinstance(ud, bytes): | ||
raise sources.InvalidMetaDataException("Userdata is not bytes") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we really expect bytes here from util.read_seeded for ud. self.userdata_raw certainly expects str. I think we can drop this check here as this is also not needed for DataSourceNoCloudNet when it reaches out to a remote URL for user-data files.
if not isinstance(ud, bytes): | |
raise sources.InvalidMetaDataException("Userdata is not bytes") |
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
Proposed Commit Message
Additional Context
Test urls:
Test Steps
At the moment cloud-init support in our service is in draft state. We have tested the functionality of these changes.
Checklist
Merge type