feat(azure): report failure if missing customdata#6779
feat(azure): report failure if missing customdata#6779cadejacobson wants to merge 37 commits intocanonical:mainfrom
Conversation
When ovf-env.xml is present but does not contain custom data, yet IMDS indicates that custom data was provided to the VM (via hasCustomData), report a provisioning failure. This helps surface cases where custom data is silently lost during provisioning. The behavior is gated behind a new feature flag, EXPERIMENTAL_FAIL_ON_MISSING_CUSTOMDATA, which is disabled by default while undergoing scale testing. Once validated, it will be renamed and enabled for new distro releases. Changes: - Add EXPERIMENTAL_FAIL_ON_MISSING_CUSTOMDATA feature flag - Add _hascustomdata_from_imds() helper to query IMDS metadata - Report ReportableErrorImdsInvalidMetadata when custom data is expected but missing from OVF provisioning media
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging blackboxsw, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag blackboxsw to reopen it.) |
3950cc2 to
835512e
Compare
61dedf9 to
323f9e7
Compare
cjp256
left a comment
There was a problem hiding this comment.
LGTM, github won't let me resolve last comments.
Thanks!!
|
@blackboxsw @holmanb |
blackboxsw
left a comment
There was a problem hiding this comment.
Thank you @cadejacobson for this proposal. It looks good. I have a few minor suggestions (tests, comments and typing).
I also have a couple of questions about the hasCustomData expected values and whether we really need an ovf_source local variable.
Otherwise this looks good to me.
| return None | ||
|
|
||
|
|
||
| def _hascustomdata_from_imds(imds_data) -> Optional[bool]: |
There was a problem hiding this comment.
Please type hint params when we add new functions/methods.
| def _hascustomdata_from_imds(imds_data) -> Optional[bool]: | |
| def _hascustomdata_from_imds(imds_data: Dict) -> Optional[bool]: |
There was a problem hiding this comment.
Updated to include the type hint!
| return None | ||
|
|
||
|
|
||
| def _hascustomdata_from_imds(imds_data) -> Optional[bool]: |
There was a problem hiding this comment.
Let's make this a strict boolean and avoid Optional[bool] truthiness behavior.
We don't need None, let's use return False instead.
| def _hascustomdata_from_imds(imds_data) -> Optional[bool]: | |
| def _hascustomdata_from_imds(imds_data) -> bool: |
|
|
||
| def _hascustomdata_from_imds(imds_data) -> Optional[bool]: | ||
| try: | ||
| return imds_data["extended"]["compute"]["hasCustomData"] |
There was a problem hiding this comment.
Is it possible for this key to be present but with a value not in (True, False)? I can't seem to find Azure docs on this field, but I can state that our instance data represents what appears to be a strict bool value ```
"extended": {
"compute": {
"hasCustomData": false,
There was a problem hiding this comment.
You are correct. If present, this field should only be in the values of (True, False). There is a very very small chance that IMDS returns malformed JSON, at which point the field could be either incorrectly populated or not present, but this is an unexpected case.
There was a problem hiding this comment.
We have been using None to indicate the difference between False and not-present. Programmatically None and False are treated the same with the if statement, the but reported error has more clarity in case there's a situation where IMDS unexpectedly is missing data.
The alternative I suppose is to specifically raise ReportableErrorImdsMissingHasCustomData or something. The thing is that if IMDS metadata was fetched using the fallback api-version, we might not see it and force an error where there is none.
| try: | ||
| return imds_data["extended"]["compute"]["hasCustomData"] | ||
| except KeyError: | ||
| return None |
There was a problem hiding this comment.
| return None | |
| return False |
| (True, None), | ||
| (True, "myCustomData"), | ||
| (True, ""), | ||
| (False, None), |
There was a problem hiding this comment.
It appears that hasCustomData in IMDS compute are boolean values. Why is this test setting None, non-empty and empty strings for this value?are we testing.
I believe this test should probably only be checking hasCustomData in (True, False). Do I have an incorrect read on this?
There was a problem hiding this comment.
In this test, we specify the hasCustomData on the left param and the customData field on the right. The hasCustomData does only stay in the group (True, False), but for the CustomData, we do specify None as well as empty and non-empty strings to verify that we catch various styles of empty CustomData that could be found in the OVF source.
There was a problem hiding this comment.
yeah the right is the actual customData found on provisioning media (ovf-env.xml), this is just enumerating the cases the possible values seen there (expected or not)
|
|
||
| # only use userdata from imds if OVF did not provide custom data | ||
| # userdata provided by IMDS is always base64 encoded | ||
| # Only use userdata from imds if OVF did not provide custom data. |
There was a problem hiding this comment.
Please retain IMDS capitalization convention through comments where possible.
| # Only use userdata from imds if OVF did not provide custom data. | |
| # Only use userdata from IMDS if OVF did not provide custom data. |
There was a problem hiding this comment.
Updated to improve this capitalization!
| report_diagnostic_event(msg) | ||
| raise sources.InvalidMetaDataException(msg) | ||
|
|
||
| self.seed = ovf_source or "IMDS" |
There was a problem hiding this comment.
I'm not certain why we need a separate self.seed and ovf_source given the various call-sites throughout DataSourceAzure. They appear to be synonymous and we may be able to avoid using ovf_source in general right. What do Azure devs think?
There was a problem hiding this comment.
i agree it'd be nice to standardize on self.seed
There was a problem hiding this comment.
Replaced the references to ovf_source with self.seed in the DataSourceAzure.py file. All Azure tests continue to pass as expected.
| if cfg.get("ProvisionGuestProxyAgent"): | ||
| self._check_azure_proxy_agent_status() | ||
|
|
||
| ovf_source = "IMDS" |
There was a problem hiding this comment.
It feels like setting this ovf_source to IMDS may be more conspicuously placed at the open of the if pps_type != PPSType.NONE: conditional to indicate the context of this section more clearly. Down on line 768 it feels more 'hidden'. Depending on what folks decide related to the question about with self.seed instance variable versus ovf_source local var this specific assignment may become self.seed = "IMDS".
There was a problem hiding this comment.
on a related note, I've never really looked closely at how it was used until now: https://github.com/cadejacobson/cloud-init/blob/26eb69fb1b2bb73797716c3e19d65aeabd7361ac/cloudinit/sources/DataSourceAzure.py#L361
There was a problem hiding this comment.
Moved this assignment to the top of the if branch, and replaced the ovf_source with self.seed.
@blackboxsw Thanks for the review! I will work with Chris Patterson when he is back on Monday to ensure I give you the most accurate information I can on the cases of malformed IMDS data. After that, I will get the rest of the comments addressed. Thanks! |
Proposed Commit Message
Additional Context
On Azure, IMDS contains a field indicating whether the Custom Data field is expected to be found in the provisioning media. In some rare cases, this field is set to
Truebut there is no custom data found in the provisioning media. In the future, this will be a failure reported to Azure by removing the experimental flag.Merge type