Skip to content

feat(azure): report failure if missing customdata#6779

Open
cadejacobson wants to merge 37 commits intocanonical:mainfrom
cadejacobson:azure-missing-customdata
Open

feat(azure): report failure if missing customdata#6779
cadejacobson wants to merge 37 commits intocanonical:mainfrom
cadejacobson:azure-missing-customdata

Conversation

@cadejacobson
Copy link
Copy Markdown
Contributor

@cadejacobson cadejacobson commented Mar 2, 2026

Proposed Commit Message

feat(azure): report failure if missing customdata

Add experimental Azure datasource config key
experimental_fail_on_missing_customdata with a default of
false and use it to control reporting when OVF custom data is
missing despite IMDS indicating that hasCustomData=true.

Additional Context

On Azure, IMDS contains a field indicating whether the Custom Data field is expected to be found in the provisioning media. In some rare cases, this field is set to True but there is no custom data found in the provisioning media. In the future, this will be a failure reported to Azure by removing the experimental flag.

Merge type

  • Squash merge using "Proposed Commit Message"

When ovf-env.xml is present but does not contain custom data, yet IMDS
indicates that custom data was provided to the VM (via hasCustomData),
report a provisioning failure. This helps surface cases where custom
data is silently lost during provisioning.

The behavior is gated behind a new feature flag,
EXPERIMENTAL_FAIL_ON_MISSING_CUSTOMDATA, which is disabled by default
while undergoing scale testing. Once validated, it will be renamed and
enabled for new distro releases.

Changes:
- Add EXPERIMENTAL_FAIL_ON_MISSING_CUSTOMDATA feature flag
- Add _hascustomdata_from_imds() helper to query IMDS metadata
- Report ReportableErrorImdsInvalidMetadata when custom data is
  expected but missing from OVF provisioning media
@cadejacobson cadejacobson changed the title Azure missing customdata feat(azure): report failure if missing customdata Mar 2, 2026
@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging blackboxsw, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag blackboxsw to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Mar 17, 2026
@github-actions github-actions Bot removed the stale-pr Pull request is stale; will be auto-closed soon label Mar 18, 2026
Comment thread cloudinit/sources/DataSourceAzure.py
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
Comment thread tests/unittests/sources/test_azure.py Outdated
Comment thread tests/unittests/sources/test_azure.py Outdated
Comment thread cloudinit/sources/azure/errors.py Outdated
@cadejacobson cadejacobson marked this pull request as ready for review March 30, 2026 22:39
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
@cadejacobson cadejacobson force-pushed the azure-missing-customdata branch from 3950cc2 to 835512e Compare April 2, 2026 20:37
Comment thread tests/unittests/sources/test_azure.py Outdated
@cadejacobson cadejacobson force-pushed the azure-missing-customdata branch from 61dedf9 to 323f9e7 Compare April 3, 2026 17:26
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
@cadejacobson cadejacobson requested a review from cjp256 April 15, 2026 16:57
Copy link
Copy Markdown
Contributor

@peytonr18 peytonr18 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Comment thread tests/unittests/sources/test_azure.py Outdated
Comment thread cloudinit/sources/DataSourceAzure.py Outdated
Copy link
Copy Markdown
Contributor

@cjp256 cjp256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, github won't let me resolve last comments.

Thanks!!

@cadejacobson
Copy link
Copy Markdown
Contributor Author

@blackboxsw @holmanb
This PR should be ready for review. Thanks!

@blackboxsw blackboxsw self-assigned this Apr 27, 2026
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @cadejacobson for this proposal. It looks good. I have a few minor suggestions (tests, comments and typing).

I also have a couple of questions about the hasCustomData expected values and whether we really need an ovf_source local variable.

Otherwise this looks good to me.

Comment thread cloudinit/sources/DataSourceAzure.py Outdated
return None


def _hascustomdata_from_imds(imds_data) -> Optional[bool]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please type hint params when we add new functions/methods.

Suggested change
def _hascustomdata_from_imds(imds_data) -> Optional[bool]:
def _hascustomdata_from_imds(imds_data: Dict) -> Optional[bool]:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to include the type hint!

Comment thread cloudinit/sources/DataSourceAzure.py Outdated
return None


def _hascustomdata_from_imds(imds_data) -> Optional[bool]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this a strict boolean and avoid Optional[bool] truthiness behavior.
We don't need None, let's use return False instead.

Suggested change
def _hascustomdata_from_imds(imds_data) -> Optional[bool]:
def _hascustomdata_from_imds(imds_data) -> bool:


def _hascustomdata_from_imds(imds_data) -> Optional[bool]:
try:
return imds_data["extended"]["compute"]["hasCustomData"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for this key to be present but with a value not in (True, False)? I can't seem to find Azure docs on this field, but I can state that our instance data represents what appears to be a strict bool value ```
"extended": {
"compute": {
"hasCustomData": false,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. If present, this field should only be in the values of (True, False). There is a very very small chance that IMDS returns malformed JSON, at which point the field could be either incorrectly populated or not present, but this is an unexpected case.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have been using None to indicate the difference between False and not-present. Programmatically None and False are treated the same with the if statement, the but reported error has more clarity in case there's a situation where IMDS unexpectedly is missing data.

The alternative I suppose is to specifically raise ReportableErrorImdsMissingHasCustomData or something. The thing is that if IMDS metadata was fetched using the fallback api-version, we might not see it and force an error where there is none.

try:
return imds_data["extended"]["compute"]["hasCustomData"]
except KeyError:
return None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return None
return False

Comment on lines +5491 to +5494
(True, None),
(True, "myCustomData"),
(True, ""),
(False, None),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that hasCustomData in IMDS compute are boolean values. Why is this test setting None, non-empty and empty strings for this value?are we testing.

I believe this test should probably only be checking hasCustomData in (True, False). Do I have an incorrect read on this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, we specify the hasCustomData on the left param and the customData field on the right. The hasCustomData does only stay in the group (True, False), but for the CustomData, we do specify None as well as empty and non-empty strings to verify that we catch various styles of empty CustomData that could be found in the OVF source.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the right is the actual customData found on provisioning media (ovf-env.xml), this is just enumerating the cases the possible values seen there (expected or not)

Comment thread cloudinit/sources/DataSourceAzure.py Outdated

# only use userdata from imds if OVF did not provide custom data
# userdata provided by IMDS is always base64 encoded
# Only use userdata from imds if OVF did not provide custom data.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please retain IMDS capitalization convention through comments where possible.

Suggested change
# Only use userdata from imds if OVF did not provide custom data.
# Only use userdata from IMDS if OVF did not provide custom data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to improve this capitalization!

Comment thread cloudinit/sources/DataSourceAzure.py Outdated
report_diagnostic_event(msg)
raise sources.InvalidMetaDataException(msg)

self.seed = ovf_source or "IMDS"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain why we need a separate self.seed and ovf_source given the various call-sites throughout DataSourceAzure. They appear to be synonymous and we may be able to avoid using ovf_source in general right. What do Azure devs think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i agree it'd be nice to standardize on self.seed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the references to ovf_source with self.seed in the DataSourceAzure.py file. All Azure tests continue to pass as expected.

Comment thread cloudinit/sources/DataSourceAzure.py Outdated
if cfg.get("ProvisionGuestProxyAgent"):
self._check_azure_proxy_agent_status()

ovf_source = "IMDS"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like setting this ovf_source to IMDS may be more conspicuously placed at the open of the if pps_type != PPSType.NONE: conditional to indicate the context of this section more clearly. Down on line 768 it feels more 'hidden'. Depending on what folks decide related to the question about with self.seed instance variable versus ovf_source local var this specific assignment may become self.seed = "IMDS".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

@cadejacobson cadejacobson May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this assignment to the top of the if branch, and replaced the ovf_source with self.seed.

@cadejacobson
Copy link
Copy Markdown
Contributor Author

cadejacobson commented May 1, 2026

Thank you @cadejacobson for this proposal. It looks good. I have a few minor suggestions (tests, comments and typing).

I also have a couple of questions about the hasCustomData expected values and whether we really need an ovf_source local variable.

Otherwise this looks good to me.

@blackboxsw Thanks for the review! I will work with Chris Patterson when he is back on Monday to ensure I give you the most accurate information I can on the cases of malformed IMDS data. After that, I will get the rest of the comments addressed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants