OCPBUGS-78420: preserve user-provided configs when multiple on-disk assets coexist#10390
OCPBUGS-78420: preserve user-provided configs when multiple on-disk assets coexist#10390pawanpinjarkar wants to merge 1 commit intoopenshift:mainfrom
Conversation
…ssets coexist The asset store's dependency tracking previously marked any asset as "dirty" when a parent dependency was loaded from disk, triggering regeneration even when both parent and child were user-provided configuration files that should coexist. This broke the OVE use case where agent-config.yaml (rendezvousIP only) and nmstateconfig.yaml (network config) can coexist without install-config.yaml. The installer would discard nmstateconfig.yaml, then fail to regenerate it since agent-config has no hosts. Fix by adding "dirtyDueToUserConfigs" tracking to assetState: - Propagates through dependency chains including intermediate assets - Preserves on-disk assets when dirty only due to user configs AND not in state file (newly user-provided) - Still regenerates assets in state file when dependencies change (backward compatibility) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
@pawanpinjarkar: This pull request references Jira Issue OCPBUGS-78420, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe changes add a new test case for agent configuration with RendezvousIP only and introduce tracking for configuration-induced dirtiness in the asset store to preserve on-disk assets and manage state-file interactions. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.3)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment Tip You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.Change the |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@pawanpinjarkar: This pull request references Jira Issue OCPBUGS-78420, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/asset/store/store.go (1)
331-335:⚠️ Potential issue | 🔴 CriticalKeep discarded on-disk assets attached to unfetched states.
When this branch sees
foundOnDisk,presentOnDiskstays true butassetToStoreremains nil. The new Line 326 fast path can then return fromfetch()before this descendant is regenerated, andpurge()later dereferencesassetState.assetfor everypresentOnDiskentry on Lines 372-376. That will panic for graphs likeA(on disk, not in state) -> C(on disk + in state) -> B(on disk).Possible fix
case anyParentsDirty: if foundOnDisk { logrus.Warningf("%sDiscarding the %s that was provided in the target directory because its dependencies are dirty and it needs to be regenerated", indent, a.Name()) + assetToStore = onDiskAsset } source = unfetched🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/store/store.go` around lines 331 - 335, In the anyParentsDirty branch, when foundOnDisk is true we currently set source = unfetched but clear assetToStore, which leaves presentOnDisk true while assetState.asset is nil and later causes a panic in purge(); retain the on-disk asset instead of discarding it: in the case anyParentsDirty block (where foundOnDisk is checked) leave assetToStore (or reattach assetState.asset) populated when foundOnDisk is true so fetch()'s fast path and later purge() see a non-nil assetState.asset; ensure you only mark source = unfetched (so it will be regenerated) but do not drop assetToStore/assetState.asset for presentOnDisk entries.
🧹 Nitpick comments (1)
pkg/asset/store/store_test.go (1)
521-549: Please cover the publicFetch()path here.These cases call
store.fetch()and seedstateFileAssetsby hand, so they never exerciseFetch()'ssaveStateFile()/purge()path inpkg/asset/store/store.go(Lines 84-89 and Lines 167-195). AFetch()+newStore()round-trip would catch the exact regressions this change is trying to prevent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/asset/store/store_test.go` around lines 521 - 549, The test currently calls the unexported store.fetch() and mutates store.stateFileAssets directly, so update the test to exercise the public Fetch() path by creating the store via newStore() (or whichever public constructor creates a storeImpl and its on-disk state file handling), seed the on-disk state by writing the expected state JSON to the store's state file location (not by setting store.stateFileAssets), then call store.Fetch(ctx, assets[target]) and assert no error; this will exercise saveStateFile() and purge() paths. Reference storeImpl, Fetch(), fetch(), newStore(), saveStateFile(), purge(), and stateFileAssets when locating the code to change. Ensure the test still builds deterministically by using the same tempDir and assets setup as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/asset/store/store.go`:
- Around line 320-329: The current logic treats !foundInStateFile as
"user-provided" but saveStateFile() still writes every onDiskSource asset into
the state file, so after one run the provenance is lost; fix by persisting
explicit provenance/source in the stored state and restoring it on load instead
of inferring from presence. Specifically: extend the state entry struct written
by saveStateFile() to include a Source/Provenance enum/flag (e.g.,
"user-provided" vs "generated") and set that when you populate entries from
onDiskSource/onDiskAsset; update the state load path to read that field and set
a new boolean (or use it to compute foundInStateFile/provenanceUserProvided)
used by the branch that checks anyParentsDirty, dirtyDueToUserConfigs,
foundOnDisk, and foundInStateFile so the branch uses the persisted provenance
value rather than mere presence; ensure saveStateFile still writes the
provenance for on-disk assets created by users.
---
Outside diff comments:
In `@pkg/asset/store/store.go`:
- Around line 331-335: In the anyParentsDirty branch, when foundOnDisk is true
we currently set source = unfetched but clear assetToStore, which leaves
presentOnDisk true while assetState.asset is nil and later causes a panic in
purge(); retain the on-disk asset instead of discarding it: in the case
anyParentsDirty block (where foundOnDisk is checked) leave assetToStore (or
reattach assetState.asset) populated when foundOnDisk is true so fetch()'s fast
path and later purge() see a non-nil assetState.asset; ensure you only mark
source = unfetched (so it will be regenerated) but do not drop
assetToStore/assetState.asset for presentOnDisk entries.
---
Nitpick comments:
In `@pkg/asset/store/store_test.go`:
- Around line 521-549: The test currently calls the unexported store.fetch() and
mutates store.stateFileAssets directly, so update the test to exercise the
public Fetch() path by creating the store via newStore() (or whichever public
constructor creates a storeImpl and its on-disk state file handling), seed the
on-disk state by writing the expected state JSON to the store's state file
location (not by setting store.stateFileAssets), then call store.Fetch(ctx,
assets[target]) and assert no error; this will exercise saveStateFile() and
purge() paths. Reference storeImpl, Fetch(), fetch(), newStore(),
saveStateFile(), purge(), and stateFileAssets when locating the code to change.
Ensure the test still builds deterministically by using the same tempDir and
assets setup as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c447f9d9-2bfd-4f13-8070-a190947c4147
📒 Files selected for processing (3)
pkg/asset/agent/image/unconfigured_ignition_test.gopkg/asset/store/store.gopkg/asset/store/store_test.go
| // Special case: A parent is dirty, but the dirtiness is only due to user-provided configs from disk, | ||
| // AND this asset is also found on disk AND is not in the state file (newly user-provided). | ||
| // In this case, preserve the on-disk version instead of regenerating, since multiple user-provided | ||
| // configs should coexist (e.g., agent-config.yaml with rendezvousIP + nmstateconfig.yaml with | ||
| // network config in the OVE use case). If the asset IS in the state file, it was previously generated | ||
| // and should be regenerated when its inputs change. | ||
| case anyParentsDirty && dirtyDueToUserConfigs && foundOnDisk && !foundInStateFile: | ||
| logrus.Debugf("%sUsing %s loaded from target directory (preserving user-provided config despite on-disk dependencies)", indent, a.Name()) | ||
| assetToStore = onDiskAsset | ||
| source = onDiskSource |
There was a problem hiding this comment.
This preservation disappears after the first save/load cycle.
This branch treats !foundInStateFile as "user-provided", but saveStateFile() on Lines 172-180 still writes every onDiskSource asset into .openshift_install_state.json. After one successful run, the same on-disk config will now look state-file-backed on the next invocation, so it falls out of this branch and gets regenerated/discarded again even though it was never generated in the first place. You need to persist provenance/source, not infer it from state-file presence alone.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/asset/store/store.go` around lines 320 - 329, The current logic treats
!foundInStateFile as "user-provided" but saveStateFile() still writes every
onDiskSource asset into the state file, so after one run the provenance is lost;
fix by persisting explicit provenance/source in the stored state and restoring
it on load instead of inferring from presence. Specifically: extend the state
entry struct written by saveStateFile() to include a Source/Provenance enum/flag
(e.g., "user-provided" vs "generated") and set that when you populate entries
from onDiskSource/onDiskAsset; update the state load path to read that field and
set a new boolean (or use it to compute foundInStateFile/provenanceUserProvided)
used by the branch that checks anyParentsDirty, dirtyDueToUserConfigs,
foundOnDisk, and foundInStateFile so the branch uses the persisted provenance
value rather than mere presence; ensure saveStateFile still writes the
provenance for on-disk assets created by users.
|
@pawanpinjarkar: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| // dirtyDueToUserConfigs is true if this asset is dirty only because of user-provided | ||
| // WritableAssets loaded from disk (not because of actual code changes or regeneration needs). | ||
| // This is used to preserve on-disk WritableAssets when multiple user configs coexist. | ||
| dirtyDueToUserConfigs bool |
There was a problem hiding this comment.
The store type is a very fundamental type for the whole installer and I don't think it's a good idea to introduce such low level change with potentially a huge impact area across the repo. It looks like more an issue on how we shaped the specific assets dependencies, so I'd prefer to review that first
/hold
There was a problem hiding this comment.
Agree this is NOT the best way as it changes the core asset store functionality.
The asset store's dependency tracking previously marked any asset as "dirty" when a parent dependency was loaded from disk, triggering regeneration even when both parent and child were user-provided configuration files that should coexist.
This broke the OVE use case where agent-config.yaml (rendezvousIP only) and nmstateconfig.yaml (network config) can coexist without install-config.yaml. The installer would discard nmstateconfig.yaml, then fail to regenerate it since agent-config has no hosts.
Fix by adding "dirtyDueToUserConfigs" tracking to assetState:
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com
Summary by CodeRabbit