[Chaos][OADP] Automate cases: Delete Pods during OADP Restore#3284
[Chaos][OADP] Automate cases: Delete Pods during OADP Restore#3284qwang1 wants to merge 1 commit intoRedHatQE:mainfrom
Conversation
📝 WalkthroughWalkthroughAdds OADP restore orchestration and a restore chaos test; introduces OADP-phase waiting helpers; tightens exception handling and pod-deletion semantics; changes Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3284 +/- ##
==========================================
+ Coverage 98.63% 98.65% +0.02%
==========================================
Files 25 25
Lines 2420 2462 +42
==========================================
+ Hits 2387 2429 +42
Misses 33 33
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #3755. Overlapping filesutilities/constants.py |
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/utils.py`:
- Around line 602-613: The post-try unreachable check for vm is redundant
because the TimeoutSampler loop either breaks with a truthy sample (setting vm
via try_get_and_wait_running) or the TimeoutExpiredError is caught and re-raised
as TimeoutError; remove the unreachable block that checks "if not vm: raise
TimeoutError(...)" and rely on the existing try/except around TimeoutSampler and
the break condition in the loop (symbols: TimeoutSampler,
try_get_and_wait_running, TimeoutExpiredError, TimeoutError, vm, vm_name,
namespace, timeout).
- Around line 564-573: The loop using TimeoutSampler with func=is_running can
never fall through because TimeoutSampler raises TimeoutExpiredError on timeout,
so remove the unreachable raise or, better, catch TimeoutExpiredError around the
iteration and re-raise a clearer TimeoutError with context; wrap the for loop in
a try/except TimeoutExpiredError block, log via LOGGER.info/LOGGER.error as
needed, then raise TimeoutError(f"VM {vm.name} did not reach Running state in
{timeout}s") from the caught TimeoutExpiredError so callers get the original
traceback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: b4c66557-93df-459c-87d6-80cab5347d71
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/oadp/conftest.py (1)
150-169:⚠️ Potential issue | 🟠 MajorHIGH: These
finallyblocks can replace the real OADP failure with a cleanup error.Line 158 and Line 355 both call
pod_deleting_process_recover()during teardown. Ifwait_for_oadp_phase()or the phase assertion fails first, any exception raised by recovery will override that original failure. That is especially risky here becausepod_deleting_process_recover()explicitly raisesResourceNotFoundErrorwhen no matching workloads exist.The simplest fix is to only enforce recovery after the operation reached
COMPLETED, or to catch/log recovery errors here without re-raising when backup/restore has already failed. Right now a secondary recovery problem can hide the real backup/restore result.Also applies to: 349-359
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 150 - 169, In the finally block, avoid letting pod_deleting_process_recover override earlier test failures: either only call pod_deleting_process_recover if the OADP operation reached the COMPLETED phase (check via wait_for_oadp_phase/phase result or a success flag on pod_deleting_thread_during_oadp_operations) OR wrap the pod_deleting_process_recover(...) call in a try/except that catches ResourceNotFoundError, ValueError, TypeError and logs the recovery failure without re-raising when an earlier exception is present; keep references to pod_deleting_process_recover and pod_deleting_thread_during_oadp_operations to locate the code and ensure stop_event/thread.join logic remains unchanged.
♻️ Duplicate comments (1)
tests/chaos/utils.py (1)
578-620:⚠️ Potential issue | 🔴 CriticalCRITICAL: Return a
VirtualMachineForTestswrapper here, not a rawVirtualMachine.
tests/chaos/oadp/test_oadp.pypasses this return value straight intocheck_file_in_running_vm(), and that helper opensConsole(vm=...). A rawocp_resources.VirtualMachinecan exist and reachRunning, but it still lacks the QE wrapper’s login context, so the restore can succeed and the console validation still fail.🐛 Minimal fix
def wait_for_restored_vm( admin_client, namespace, vm_name, + os_flavor, timeout=TIMEOUT_5MIN, sleep=5, -) -> VirtualMachine: +) -> VirtualMachineForTests: @@ - LOGGER.info(f"VM {vm.name} found, waiting for Running") - wait_until_vm_running(vm=vm, timeout=timeout, sleep=sleep) - - LOGGER.info(f"Restored VM {vm_name} is Running") - return vm + restored_vm = VirtualMachineForTests( + name=vm.name, + namespace=namespace_name, + client=admin_client, + os_flavor=os_flavor, + generate_unique_name=False, + ) + LOGGER.info(f"VM {restored_vm.name} found, waiting for Running") + wait_until_vm_running(vm=restored_vm, timeout=timeout, sleep=sleep) + return restored_vmThen pass
os_flavor=rhel_vm_with_dv_running.os_flavorfromtest_delete_pods_during_restore().Based on learnings:
VirtualMachineForTestsauto-generates unique names by default, so wrappers around existing restored VMs needgenerate_unique_name=Falseto preserve the real resource name.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 578 - 620, The function wait_for_restored_vm currently returns a raw VirtualMachine which breaks Console(vm=...) usage in check_file_in_running_vm; change wait_for_restored_vm to accept an optional os_flavor parameter and before returning wrap the found vm in the QE wrapper: return a VirtualMachineForTests wrapper around the existing VirtualMachine with generate_unique_name=False and the passed os_flavor (use VirtualMachineForTests(..., generate_unique_name=False, os_flavor=os_flavor) or the project’s wrapper constructor signature), and update the caller test_delete_pods_during_restore to pass os_flavor=rhel_vm_with_dv_running.os_flavor when calling wait_for_restored_vm so the wrapper preserves the real VM name and has the correct flavor for login context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 150-169: In the finally block, avoid letting
pod_deleting_process_recover override earlier test failures: either only call
pod_deleting_process_recover if the OADP operation reached the COMPLETED phase
(check via wait_for_oadp_phase/phase result or a success flag on
pod_deleting_thread_during_oadp_operations) OR wrap the
pod_deleting_process_recover(...) call in a try/except that catches
ResourceNotFoundError, ValueError, TypeError and logs the recovery failure
without re-raising when an earlier exception is present; keep references to
pod_deleting_process_recover and pod_deleting_thread_during_oadp_operations to
locate the code and ensure stop_event/thread.join logic remains unchanged.
---
Duplicate comments:
In `@tests/chaos/utils.py`:
- Around line 578-620: The function wait_for_restored_vm currently returns a raw
VirtualMachine which breaks Console(vm=...) usage in check_file_in_running_vm;
change wait_for_restored_vm to accept an optional os_flavor parameter and before
returning wrap the found vm in the QE wrapper: return a VirtualMachineForTests
wrapper around the existing VirtualMachine with generate_unique_name=False and
the passed os_flavor (use VirtualMachineForTests(...,
generate_unique_name=False, os_flavor=os_flavor) or the project’s wrapper
constructor signature), and update the caller test_delete_pods_during_restore to
pass os_flavor=rhel_vm_with_dv_running.os_flavor when calling
wait_for_restored_vm so the wrapper preserves the real VM name and has the
correct flavor for login context.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: ad42c555-effc-41f0-8740-0c325ac78486
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
rnetser
left a comment
There was a problem hiding this comment.
did not complete the review;
please check how functionality is implemented in the repo and align the code here
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #4171. Overlapping filesutilities/constants.py |
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #4169. Overlapping filesutilities/constants.py |
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #4267. Overlapping filesutilities/constants.py |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/oadp/conftest.py (1)
156-162:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parenthesize the multi-exception handler — this is a blocking SyntaxError.
Line 162 uses a bare
except ResourceNotFoundError, ValueError, TypeError:which Python 3.14 rejects withSyntaxError: multiple exception types must be parenthesized. This syntax error blocks parsing of the entire file. Wrap the exceptions in parentheses to fix the syntax:Fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):Multiple exception types in except clauses must always be parenthesized in Python 3.14 (and earlier). This is a syntax requirement, not optional style.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 156 - 162, The except clause after the call to pod_deleting_process_recover uses comma-separated exception names which raises a SyntaxError; change the handler to use a parenthesized tuple of exceptions (e.g., except (ResourceNotFoundError, ValueError, TypeError):) so the multi-exception catch is valid in Python 3.x; update the except that references ResourceNotFoundError, ValueError, TypeError in the conftest.py block around the pod_deleting_process_recover call.
♻️ Duplicate comments (1)
tests/chaos/oadp/conftest.py (1)
283-302:⚠️ Potential issue | 🟠 MajorHIGH: Do not expose
final_statusbefore chaos teardown finishes.The
yieldat Line 290 gives the test control before Lines 293-302 stop the deleter thread and recover the OADP workloads. That makes restored-VM validation race ongoing pod deletion instead of validating the post-restore steady state.Return the status after cleanup
try: final_status = wait_for_oadp_phase( resource=oadp_restore_started, terminal_statuses=terminal_statuses, timeout=TIMEOUT_3MIN, ) assert final_status in allowed_statuses, f"Restore ended in unexpected terminal phase: {final_status}" - yield final_status finally: pod_deleting_thread_during_oadp_operations["stop_event"].set() if thread.is_alive(): thread.join(timeout=TIMEOUT_1MIN) @@ pod_deleting_process_recover( resources=[Deployment, DaemonSet], namespace=namespace, pod_prefix=pod_prefix, ) + + return final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 283 - 302, The test yields control before the chaos teardown runs, causing validation to race with pod deletion; fix by moving the yield so it happens after cleanup. Specifically, declare final_status in the outer scope, assign it via wait_for_oadp_phase (and keep the assert) inside the try block, then perform pod_deleting_thread_during_oadp_operations["stop_event"].set(), thread.join(...), and pod_deleting_process_recover(...) in the finally block, and only after that finally block yield final_status; reference symbols: wait_for_oadp_phase, final_status, allowed_statuses, pod_deleting_thread_during_oadp_operations, thread, pod_deleting_process_recover.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 193-204: The ResourceEditor context is closed before
wait_for_oadp_phase runs, so the patched field
backup_obj.spec["defaultVolumesToFsBackup"]=False is reverted too early; move
the call to wait_for_oadp_phase inside the with ResourceEditor(backup.instance)
as backup_obj: block so the patch remains active until the backup reaches the
expected phase (use the same parameters currently passed to wait_for_oadp_phase
and keep references to ResourceEditor, backup.instance, backup_obj, and
backup.Backup.Status.COMPLETED).
In `@tests/chaos/oadp/test_oadp.py`:
- Around line 144-224: The test uses a module-scoped fixture
rhel_vm_with_dv_running while each parametrized case deletes and recreates the
VM/namespace, causing cross-test contamination; change the VM/namespace fixture
to function scope (or add a new function-scoped fixture used only by
test_delete_pods_during_restore) so each parametrized run gets a fresh
VM/namespace; update the fixture definition for rhel_vm_with_dv_running (or
create rhel_vm_per_test) to scope="function" and ensure
test_delete_pods_during_restore and the restore_with_pod_deletion_orchestration
flow consume that function-scoped fixture so state is isolated per parameter
set.
In `@tests/chaos/utils.py`:
- Around line 670-673: Remove the unreachable AssertionError at the end of the
TimeoutSampler loop: the iterator either returns a sample in TimeoutSampler or
raises TimeoutExpiredError (caught and re-raised), so delete the "raise
AssertionError(...)" statement after the exception handler to avoid dead code;
locate this in the TimeoutSampler/loop implementation where TimeoutExpiredError
is caught and re-raised.
---
Outside diff comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 156-162: The except clause after the call to
pod_deleting_process_recover uses comma-separated exception names which raises a
SyntaxError; change the handler to use a parenthesized tuple of exceptions
(e.g., except (ResourceNotFoundError, ValueError, TypeError):) so the
multi-exception catch is valid in Python 3.x; update the except that references
ResourceNotFoundError, ValueError, TypeError in the conftest.py block around the
pod_deleting_process_recover call.
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 283-302: The test yields control before the chaos teardown runs,
causing validation to race with pod deletion; fix by moving the yield so it
happens after cleanup. Specifically, declare final_status in the outer scope,
assign it via wait_for_oadp_phase (and keep the assert) inside the try block,
then perform pod_deleting_thread_during_oadp_operations["stop_event"].set(),
thread.join(...), and pod_deleting_process_recover(...) in the finally block,
and only after that finally block yield final_status; reference symbols:
wait_for_oadp_phase, final_status, allowed_statuses,
pod_deleting_thread_during_oadp_operations, thread,
pod_deleting_process_recover.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 23f3d04f-cead-45af-910a-c29663d3e88f
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/oadp/conftest.py (1)
156-163:⚠️ Potential issue | 🔴 CriticalFix Python 3.14 syntax error: parenthesize multiple exception types.
Line 162 blocks module collection with
SyntaxError: multiple exception types must be parenthesized. When catching multiple exception types in a single clause, wrap them in parentheses.- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 156 - 163, The except clause incorrectly lists multiple exception types with commas causing a Python 3.14 SyntaxError; update the except in the pod_deleting_process_recover call to catch a tuple of exceptions like (ResourceNotFoundError, ValueError, TypeError) and leave the LOGGER.error(...) block unchanged so the handler runs for any of those exceptions.
♻️ Duplicate comments (1)
tests/chaos/oadp/conftest.py (1)
283-302:⚠️ Potential issue | 🟠 MajorHIGH: Don’t hand control back to the test until pod-deletion chaos is fully stopped.
Line 290 yields to
test_delete_pods_during_restore()before Lines 293-302 stop the thread and recover the affected workloads. That makeswait_for_restored_vm()andcheck_file_in_running_vm()race ongoing deletions instead of validating a recovered post-restore state. The race noted earlier is still present.🔧 Return a stable post-recovery state
try: final_status = wait_for_oadp_phase( resource=oadp_restore_started, terminal_statuses=terminal_statuses, timeout=TIMEOUT_3MIN, ) assert final_status in allowed_statuses, f"Restore ended in unexpected terminal phase: {final_status}" - yield final_status finally: pod_deleting_thread_during_oadp_operations["stop_event"].set() if thread.is_alive(): thread.join(timeout=TIMEOUT_1MIN) # Recovery only — thread teardown handled by pod_deleting_thread fixture pod_deleting_process_recover( resources=[Deployment, DaemonSet], namespace=namespace, pod_prefix=pod_prefix, ) + + return final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 283 - 302, The test yields control too early: move the yield of final_status (from wait_for_oadp_phase / variable final_status) so that pod-deleting chaos is fully stopped and workloads recovered first; specifically, set pod_deleting_thread_during_oadp_operations["stop_event"], join the thread (use thread.join with TIMEOUT_1MIN if alive), call pod_deleting_process_recover (passing Deployment/DaemonSet, namespace, pod_prefix) and only then yield final_status so downstream helpers like wait_for_restored_vm() and check_file_in_running_vm() run against a stable, post-recovery state.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/utils.py`:
- Line 614: Replace the runtime assert with an explicit runtime check after the
TimeoutSampler loop: verify that the local variable vm is not None (the one set
inside the TimeoutSampler loop) and if it is None raise a clear exception (e.g.,
raise RuntimeError or AssertionError with the same message) so the check cannot
be stripped by python -O and the type checker is satisfied; update the code
around the TimeoutSampler loop and the variable vm accordingly.
- Around line 558-575: The function wait_until_vm_running currently lacks an
explicit return type; update its signature to include a None return annotation
(i.e., def wait_until_vm_running(vm: VirtualMachine, timeout=TIMEOUT_5MIN,
sleep=5) -> None:) so the contract is explicit for IDEs/type-checkers while
keeping the same behavior involving TimeoutSampler and raising TimeoutError from
TimeoutExpiredError.
- Line 673: The trailing "assert False, 'Unreachable code reached'" in the
TimeoutSampler loop is unsafe and unnecessary because the loop either returns a
sample or raises TimeoutExpiredError; remove this assertion (preferred) or, if a
return-path is required by the type checker, replace it with an explicit raise
AssertionError() so the check is not stripped by python -O; update the code
around the TimeoutSampler generator/loop (the function/class named
TimeoutSampler and its surrounding try/except/return logic) to eliminate the
unreachable assert and ensure control flow remains clear.
---
Outside diff comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 156-163: The except clause incorrectly lists multiple exception
types with commas causing a Python 3.14 SyntaxError; update the except in the
pod_deleting_process_recover call to catch a tuple of exceptions like
(ResourceNotFoundError, ValueError, TypeError) and leave the LOGGER.error(...)
block unchanged so the handler runs for any of those exceptions.
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 283-302: The test yields control too early: move the yield of
final_status (from wait_for_oadp_phase / variable final_status) so that
pod-deleting chaos is fully stopped and workloads recovered first; specifically,
set pod_deleting_thread_during_oadp_operations["stop_event"], join the thread
(use thread.join with TIMEOUT_1MIN if alive), call pod_deleting_process_recover
(passing Deployment/DaemonSet, namespace, pod_prefix) and only then yield
final_status so downstream helpers like wait_for_restored_vm() and
check_file_in_running_vm() run against a stable, post-recovery state.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7acddf4b-e53e-4169-86dc-d616ce69b5dc
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/utils.py`:
- Around line 614-615: Remove the redundant unreachable assertion after the
TimeoutSampler loop: the post-loop check "if vm is None: raise
AssertionError(...)" is unnecessary because the loop either breaks with a truthy
sample (setting vm via try_get_and_wait_running) or raises TimeoutExpiredError;
delete this check to reduce cognitive overhead and keep logic in the surrounding
function that uses TimeoutSampler and try_get_and_wait_running unchanged.
- Line 674: Remove the unreachable "raise AssertionError(...)" at the end of the
TimeoutSampler consumption block; the TimeoutSampler iterator either returns a
value or raises TimeoutExpiredError (which is already caught and re-raised), so
the assertion is dead code—delete the raise statement in the function using
TimeoutSampler (referenced as TimeoutSampler and TimeoutExpiredError) to clean
up the unreachable path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c46d35ae-61ea-4547-b7b2-6c84969031f4
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
|
@coderabbitai resume |
✅ Actions performedReviews resumed. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/oadp/conftest.py`:
- Line 162: The except clause using multiple exception types is not
parenthesized and causes a SyntaxError; update the except handler that currently
lists ResourceNotFoundError, ValueError, TypeError to use the proper
parenthesized multi-except form (i.e., wrap the exception types in parentheses)
so the try/except in conftest.py compiles correctly.
In `@tests/chaos/oadp/test_oadp.py`:
- Around line 196-211: The test docstring currently uses a nonconforming
"High-level flow:" heading; update the docstring to follow the repo STD format
by replacing that heading with "Steps:", add a top-level "Preconditions:"
section summarizing setup (healthy VM, backup exists), and add a final
"Expected:" section describing pass criteria (restore reaches terminal state, VM
recreated and Running, guest data intact); ensure section names are exactly
"Preconditions:", "Steps:", and "Expected:" and keep the same bulleted content
under the appropriate headings so tooling and conventions (the docstring for
this OADP restore resilience test) remain compatible.
In `@tests/chaos/utils.py`:
- Around line 665-668: Replace the reachable assert in the conditional that
checks expected_phase with an explicit raise of AssertionError so the check
still runs under python -O; locate the block that currently reads "if
expected_phase: assert sample == expected_phase, (f\"{resource.name} ended with
phase {sample}, expected {expected_phase}\")" and change it to raise
AssertionError with the same formatted message (use the variables sample,
expected_phase and resource.name) so failures are never stripped by
optimization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1093bff8-584d-4f20-b1d8-94407321e2e7
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/utils.py (1)
350-350:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parameter name mismatch causes
TypeErrorat runtime.The function signature uses
resources(plural) attests/chaos/utils.py:350, but the call site attests/chaos/conftest.py:404passesresource=(singular). This will raiseTypeError: pod_deleting_process_recover() got an unexpected keyword argument 'resource'when executed.Fix by changing
resource=toresources=attests/chaos/conftest.py:404-405.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` at line 350, The test fails because the function pod_deleting_process_recover(resources, namespace, pod_prefix) expects the parameter name resources but the call uses resource=; update the call site to pass resources= (instead of resource=) so the keyword matches the function signature (locate the invocation that passes resource= and rename it to resources=).
♻️ Duplicate comments (2)
tests/chaos/oadp/test_oadp.py (1)
196-211:⚠️ Potential issue | 🟡 MinorMEDIUM: Use required STD section headers in the new restore test docstring.
Line 199 uses
High-level flow:; this should be structured asPreconditions:,Steps:, andExpected:to match repository STD tooling/conventions.
As per coding guidelines:STD (Software Test Description) test docstrings MUST follow format with Required sections: Preconditions:, Steps:, Expected:.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/test_oadp.py` around lines 196 - 211, Update the test docstring to use the required STD section headers instead of "High-level flow:": replace that header with three sections "Preconditions:", "Steps:", and "Expected:", moving the existing numbered flow items so preconditions list setup (e.g., VM creation and backup), steps list actions performed (delete VM/namespace, start background pod-deletion process, trigger restore, wait), and expected list the outcomes (restore succeeds, VM recreated and Running, guest data verified); edit the triple-quoted docstring in tests/chaos/oadp/test_oadp.py that describes the restore test to reflect these STD headers.tests/chaos/oadp/conftest.py (1)
162-162:⚠️ Potential issue | 🔴 CriticalCRITICAL: SyntaxError — parenthesize multiple exception types.
This is invalid Python 3 syntax and will fail at import time:
except ResourceNotFoundError, ValueError, TypeError: # SyntaxErrorWHY: Python 3 requires parentheses for multi-exception
exceptclauses. The comma-separated form was Python 2 syntax forexcept ExceptionType, variable:.🐛 Fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` at line 162, The except clause currently uses Python2-style comma-separated exceptions ("except ResourceNotFoundError, ValueError, TypeError:") which raises a SyntaxError in Python3; change it to use a parenthesized tuple of exception types like "except (ResourceNotFoundError, ValueError, TypeError):" (preserving the existing exception handling body) so the interpreter accepts multiple exception types correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 271-281: Replace the inline terminal statuses set in the fixture
with the shared constant from utilities.constants: remove the local
terminal_statuses definition and import and use OADP_RESTORE_TERMINAL_STATUSES
(which should mirror VeleroRestore.Status values) instead; keep allowed_statuses
but if it should be a subset derive it from OADP_RESTORE_TERMINAL_STATUSES (or
explicitly import/define OADP_RESTORE_TERMINAL_STATUSES in utilities.constants
as suggested) so the fixture uses the centralized constant rather than
hard-coded VeleroRestore.Status members.
- Around line 297-302: Wrap the call to pod_deleting_process_recover in the
recovery-only fixture in a try/except like the
backup_with_pod_deletion_orchestration fixture: catch Exception as err, log a
contextual error including namespace and pod_prefix (e.g., using the existing
logger or processLogger) and then re-raise the exception so failures are
visible; reference the pod_deleting_process_recover invocation and the
surrounding fixture to locate where to add the try/except and logging.
- Around line 35-36: POLL_INTERVAL is declared but unused; either remove the
constant or replace hard-coded poll values with it—update the TimeoutSampler
usages (the calls to TimeoutSampler that currently pass literal poll intervals
like 5 and 10) to use POLL_INTERVAL (or a second constant if different intervals
are needed) or simply delete the POLL_INTERVAL declaration if you prefer
literals; ensure you modify the TimeoutSampler invocations (the TimeoutSampler
calls in this module) to reference the constant name POLL_INTERVAL when
applicable.
In `@tests/chaos/utils.py`:
- Around line 665-668: The reachable assertion in the conditional block that
checks expected_phase (comparing sample to expected_phase and referencing
resource.name) must be replaced with an explicit raise so it isn't removed under
python -O; locate the conditional that reads "if expected_phase:" and replace
the assert that compares sample == expected_phase with an explicit raise (e.g.,
raise AssertionError or a ValueError) including the same formatted message
f"{resource.name} ended with phase {sample}, expected {expected_phase}" so
failures still raise when running optimized Python.
---
Outside diff comments:
In `@tests/chaos/utils.py`:
- Line 350: The test fails because the function
pod_deleting_process_recover(resources, namespace, pod_prefix) expects the
parameter name resources but the call uses resource=; update the call site to
pass resources= (instead of resource=) so the keyword matches the function
signature (locate the invocation that passes resource= and rename it to
resources=).
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Line 162: The except clause currently uses Python2-style comma-separated
exceptions ("except ResourceNotFoundError, ValueError, TypeError:") which raises
a SyntaxError in Python3; change it to use a parenthesized tuple of exception
types like "except (ResourceNotFoundError, ValueError, TypeError):" (preserving
the existing exception handling body) so the interpreter accepts multiple
exception types correctly.
In `@tests/chaos/oadp/test_oadp.py`:
- Around line 196-211: Update the test docstring to use the required STD section
headers instead of "High-level flow:": replace that header with three sections
"Preconditions:", "Steps:", and "Expected:", moving the existing numbered flow
items so preconditions list setup (e.g., VM creation and backup), steps list
actions performed (delete VM/namespace, start background pod-deletion process,
trigger restore, wait), and expected list the outcomes (restore succeeds, VM
recreated and Running, guest data verified); edit the triple-quoted docstring in
tests/chaos/oadp/test_oadp.py that describes the restore test to reflect these
STD headers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: cf4b306d-a0c6-400c-9d7e-10a9af0ca687
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
tests/chaos/oadp/conftest.py (1)
161-161:⚠️ Potential issue | 🔴 CriticalCRITICAL: SyntaxError — parenthesize the multi-except handler.
WHY: Python 3 requires parentheses around multiple exception types. The syntax
except A, B, C:was Python 2's way to bind an exception to a variable (likeexcept A as B). In Python 3.11+ (yourrequires-python), this raisesSyntaxError: multiple exception types must be parenthesizedand the code won't compile.Fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` at line 161, The multi-except handler using `except ResourceNotFoundError, ValueError, TypeError:` is invalid in Python 3; change it to a parenthesized tuple of exceptions like `except (ResourceNotFoundError, ValueError, TypeError):` (or `except (ResourceNotFoundError, ValueError, TypeError) as e:` if the exception object is needed) in the relevant try/except block to fix the SyntaxError in conftest.py.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/oadp/conftest.py`:
- Line 298: The multi-except handler currently uses Python 2 comma syntax
(listing ResourceNotFoundError, ValueError, TypeError separated by commas);
update the handler to use a parenthesized tuple of exception types instead
(i.e., wrap the exception names in parentheses) so the except statement is valid
in Python 3; make the same change for both occurrences in
tests/chaos/oadp/conftest.py (the handlers that reference ResourceNotFoundError,
ValueError, TypeError).
In `@utilities/constants.py`:
- Around line 1044-1056: OADP_BACKUP_TERMINAL_STATUSES and
OADP_RESTORE_TERMINAL_STATUSES contain identical sets and risk drift;
consolidate them by defining a single canonical constant (e.g.,
OADP_TERMINAL_STATUSES) containing the set of terminal status strings and then
set OADP_BACKUP_TERMINAL_STATUSES and OADP_RESTORE_TERMINAL_STATUSES to
reference that canonical constant (or remove the duplicates and import/reference
the canonical name wherever used) so there is a single source of truth.
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Line 161: The multi-except handler using `except ResourceNotFoundError,
ValueError, TypeError:` is invalid in Python 3; change it to a parenthesized
tuple of exceptions like `except (ResourceNotFoundError, ValueError,
TypeError):` (or `except (ResourceNotFoundError, ValueError, TypeError) as e:`
if the exception object is needed) in the relevant try/except block to fix the
SyntaxError in conftest.py.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 095c8415-cbde-4fa3-bd50-1127e949a492
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/utils.py (1)
54-104:⚠️ Potential issue | 🟠 MajorHIGH:
ratio=0still keeps the deleter alive for the full duration.
delete_pods()now treatsratio == 0as a supported no-op, but both continuous deleter paths still keep their loop/process alive untilmax_duration. That means “disabled chaos” still burns a worker for the whole interval and emits repeated no-op logs instead of exiting immediately.♻️ Proposed fix
def _delete_pods_continuously(client, pod_prefix, namespace_name, ratio, interval, max_duration): + if ratio == 0: + LOGGER.info(f"Skipping pod deleting process for prefix {pod_prefix}: ratio=0") + return + try: for _ in TimeoutSampler( wait_timeout=max_duration, sleep=interval, func=delete_pods, @@ def delete_pods_continuously( stop_event, client, pod_prefix, namespace_name, ratio, interval, max_duration, ): + if ratio == 0: + LOGGER.info(f"Skipping continuous pod deletion for prefix {pod_prefix}: ratio=0") + return + start_time = time.time()Also applies to: 435-440, 468-512
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 54 - 104, The continuous pod-deleter currently keeps running until max_duration even when ratio == 0; update create_pod_deleting_process by having its inner function _delete_pods_continuously check if ratio == 0 (or falsy) at start and return immediately to avoid spawning a long-lived no-op process; locate the inner function _delete_pods_continuously and add an early "if ratio == 0: return" (or equivalent) before calling delete_pods/TimeoutSampler, and apply the same fix to the other similar deleter routines that use delete_pods (the other continuous deleter loops referenced in the diff).
♻️ Duplicate comments (3)
tests/chaos/oadp/conftest.py (3)
155-161:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parenthesize both multi-except handlers.
Line 161 and Line 298 use a multi-except form that the repo's current parser/Flake8 still rejects, so
tests/chaos/oadp/conftest.pywill not import in CI. Useexcept (ResourceNotFoundError, ValueError, TypeError):in both blocks.🛠️ Minimal fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError): @@ - except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):🔎 Verification
#!/bin/bash set -euo pipefail python --version for cfg in pyproject.toml tox.ini setup.cfg .python-version; do if [ -f "$cfg" ]; then echo "== $cfg ==" rg -n "requires-python|target-version|python_version" "$cfg" || true fi done python - <<'PY' from pathlib import Path source = Path("tests/chaos/oadp/conftest.py").read_text() compile(source, "tests/chaos/oadp/conftest.py", "exec") print("compile(): OK") PYAlso applies to: 292-298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 155 - 161, The multi-except clauses around the pod_deleting_process_recover call use invalid syntax; change the except clauses from "except ResourceNotFoundError, ValueError, TypeError:" to a parenthesized tuple form "except (ResourceNotFoundError, ValueError, TypeError):" in the block that wraps the pod_deleting_process_recover invocation (and the similar multi-except block later in the file), so Python/Flake8 will parse the file correctly.
270-283:⚠️ Potential issue | 🟠 MajorHIGH: Let the test assert the final restore phase.
Lines 272-283 narrow the result back to
Completed/Failedeven though the wait already usesOADP_RESTORE_TERMINAL_STATUSES. That turnsPartiallyFailedandFailedValidationinto fixture-setup errors instead of a regularexpected_statusmismatch in the test.✂️ Minimal cleanup
- allowed_statuses = { - oadp_restore_started.Status.COMPLETED, - oadp_restore_started.Status.FAILED, - } - try: final_status = wait_for_oadp_phase( resource=oadp_restore_started, terminal_statuses=terminal_statuses, timeout=TIMEOUT_3MIN, ) - assert final_status in allowed_statuses, f"Restore ended in unexpected terminal phase: {final_status}" yield final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 270 - 283, The test narrows terminal restore outcomes to only COMPLETED/FAILED which hides other terminal phases; remove the allowed_statuses set and the restrictive assertion and instead assert that the returned final_status from wait_for_oadp_phase (called with terminal_statuses=OADP_RESTORE_TERMINAL_STATUSES) is one of the expected terminal statuses by checking membership against OADP_RESTORE_TERMINAL_STATUSES (use oadp_restore_started and final_status in the assertion message to preserve context).
284-299:⚠️ Potential issue | 🟠 MajorHIGH: Recover the OADP workloads before returning from this fixture.
Line 284 yields
final_statusbefore the stop/join/recover path runs. The downstream restore validation then waits for the VM whilevelero/ controller-manager /node-agentpods can still be getting deleted, which makes the post-restore checks race ongoing chaos. Return the phase only after the thread is stopped andpod_deleting_process_recover(...)completes, or split orchestration and post-recovery validation into separate fixtures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 284 - 299, The fixture yields final_status too early and must only return after the chaos thread is stopped and pod recovery completes; move the yield of final_status to after the finally block (after pod_deleting_thread_during_oadp_operations["stop_event"].set(), thread.join(...), and pod_deleting_process_recover(...)) or alternatively split orchestration and post-recovery validation into separate fixtures so that pod_deleting_process_recover (the function handling Deployment/DaemonSet recovery) finishes before the fixture returns; also ensure you use the existing pod_deleting_thread fixture for thread teardown and keep references to final_status, pod_deleting_thread_during_oadp_operations, and pod_deleting_process_recover to locate the changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/utils.py`:
- Around line 574-575: Before converting TimeoutExpiredError to TimeoutError,
add an ERROR-level log that records the full context and original exception;
specifically, in the except blocks that catch TimeoutExpiredError (the
occurrences handling VM state waits that reference vm.name and timeout) call the
module logger (e.g., logger.error or logger.exception) with a message like "VM
<vm.name> failed to reach 'Running' within <timeout>s; current state: <vm.status
or similar>" and include the original exception (exc_info=error or use
logger.exception) before raising the new TimeoutError; apply this change to the
three catch sites shown (the block that currently does `except
TimeoutExpiredError as error: raise TimeoutError(f"VM {vm.name}...") from error`
and the similar blocks at the other mentioned locations).
---
Outside diff comments:
In `@tests/chaos/utils.py`:
- Around line 54-104: The continuous pod-deleter currently keeps running until
max_duration even when ratio == 0; update create_pod_deleting_process by having
its inner function _delete_pods_continuously check if ratio == 0 (or falsy) at
start and return immediately to avoid spawning a long-lived no-op process;
locate the inner function _delete_pods_continuously and add an early "if ratio
== 0: return" (or equivalent) before calling delete_pods/TimeoutSampler, and
apply the same fix to the other similar deleter routines that use delete_pods
(the other continuous deleter loops referenced in the diff).
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 155-161: The multi-except clauses around the
pod_deleting_process_recover call use invalid syntax; change the except clauses
from "except ResourceNotFoundError, ValueError, TypeError:" to a parenthesized
tuple form "except (ResourceNotFoundError, ValueError, TypeError):" in the block
that wraps the pod_deleting_process_recover invocation (and the similar
multi-except block later in the file), so Python/Flake8 will parse the file
correctly.
- Around line 270-283: The test narrows terminal restore outcomes to only
COMPLETED/FAILED which hides other terminal phases; remove the allowed_statuses
set and the restrictive assertion and instead assert that the returned
final_status from wait_for_oadp_phase (called with
terminal_statuses=OADP_RESTORE_TERMINAL_STATUSES) is one of the expected
terminal statuses by checking membership against OADP_RESTORE_TERMINAL_STATUSES
(use oadp_restore_started and final_status in the assertion message to preserve
context).
- Around line 284-299: The fixture yields final_status too early and must only
return after the chaos thread is stopped and pod recovery completes; move the
yield of final_status to after the finally block (after
pod_deleting_thread_during_oadp_operations["stop_event"].set(),
thread.join(...), and pod_deleting_process_recover(...)) or alternatively split
orchestration and post-recovery validation into separate fixtures so that
pod_deleting_process_recover (the function handling Deployment/DaemonSet
recovery) finishes before the fixture returns; also ensure you use the existing
pod_deleting_thread fixture for thread teardown and keep references to
final_status, pod_deleting_thread_during_oadp_operations, and
pod_deleting_process_recover to locate the changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2e62e786-e7ad-4a5b-8d93-449da7cc738c
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
There was a problem hiding this comment.
♻️ Duplicate comments (4)
tests/chaos/utils.py (2)
591-618:⚠️ Potential issue | 🟠 MajorHIGH:
timeoutis being spent twice here.Lines 604-607 can consume the full budget waiting for the VM to exist, and Line 618 then starts a second full-budget wait for
Running. A 5-minute timeout can therefore stretch to nearly 10 minutes and hide slow-restore regressions.⏱️ Keep a single timeout budget
LOGGER.info(f"Waiting for restored VM {vm_name}") namespace_name = namespace.name if hasattr(namespace, "name") else namespace + deadline = time.monotonic() + timeout vm = None @@ if vm is None: raise AssertionError("Unreachable: vm should be set after TimeoutSampler loop") LOGGER.info(f"VM {vm.name} found, waiting for Running") - wait_until_vm_running(vm=vm, timeout=timeout, sleep=sleep) + remaining_timeout = max(1, int(deadline - time.monotonic())) + wait_until_vm_running(vm=vm, timeout=remaining_timeout, sleep=sleep)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 591 - 618, The current code can use the full timeout twice: first in the TimeoutSampler loop (try_get_and_wait_running) and then again in wait_until_vm_running, potentially doubling the wait budget. Fix by tracking elapsed time after the sampler returns (or while looping) and pass the remaining timeout to wait_until_vm_running (e.g., remaining = max(0, timeout - elapsed)), or refactor try_get_and_wait_running/TimeoutSampler to check vm.status == "Running" so only one TimeoutSampler call is needed; update calls to use the remaining timeout and keep sleep unchanged, referencing TimeoutSampler, try_get_and_wait_running, wait_until_vm_running, timeout, sleep, and vm.
574-575:⚠️ Potential issue | 🟡 MinorMEDIUM: log translated timeouts before re-raising them.
These
except TimeoutExpiredErrorblocks convert the failure straight intoTimeoutError, but they never emit anERRORlog with the last-known VM or phase context. In long chaos runs, that drops the most useful triage signal from the log stream.As per coding guidelines,
ERROR level logging REQUIRED for exceptions with full context: what failed, expected vs actual values, resource state.Also applies to: 609-612, 670-671
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 574 - 575, The except TimeoutExpiredError handlers that currently raise a TimeoutError (the blocks matching "except TimeoutExpiredError as error:" that reference vm.name and the similar blocks at the other noted locations) must first emit an ERROR-level log containing full context (VM name, phase/state, expected vs actual/last-seen values and the timeout) and include the captured exception details, then re-raise the TimeoutError using "raise ... from error"; update each such handler in tests/chaos/utils.py (including the occurrences around the vm.name block and the ones noted at the other locations) to call the module's logger.error (or the existing process logger used in this file) with a descriptive message and the exception before re-raising.tests/chaos/oadp/conftest.py (2)
284-289:⚠️ Potential issue | 🟠 MajorHIGH: Line 284 yields before the chaos thread is stopped.
In pytest, Lines 287-299 run during fixture teardown, not before the test body.
test_delete_pods_during_restore()therefore waits for the restored VM and checks guest data while the deleter thread can still be killing OADP pods, so the post-restore validation is racing the disruption instead of a recovered state. Stop/join and recover first, then return the phase.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 284 - 289, The fixture yields final_status too early — stop the pod deleter and wait for it to finish before yielding so tests don't race with chaos; move the pod_deleting_thread_during_oadp_operations["stop_event"].set() and the thread.join(timeout=TIMEOUT_1MIN) (and any recovery/wait-for-recovery steps if present) to execute before the yield of final_status (i.e., ensure the call to thread.join and cleanup happen prior to returning final_status from the fixture, referencing pod_deleting_thread_during_oadp_operations, thread, TIMEOUT_1MIN, and yield final_status).
161-161:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parenthesize both multi-except handlers or the current checks will fail to parse this file.
Flake8 is already reporting
E999on Line 161, and Line 298 has the same form. The comma syntax only parses on Python 3.14+, so this blocks lint/import with the repo's current toolchain.🔧 Minimal fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError): @@ - except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):Based on learnings: "with Python 3.14 the syntax
except ValueError, TypeError:is valid if there is noasclause"; the current Flake8 E999 shows this repo's toolchain is not on that parser yet.Also applies to: 298-298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` at line 161, The problematic multi-except uses comma-separated exceptions (ResourceNotFoundError, ValueError, TypeError) which is invalid for our parser; update each occurrence to either wrap those exception names in a single exception-tuple (parenthesize them) or split them into separate except blocks so each exception is handled properly; locate the offending except statement(s) that refer to ResourceNotFoundError, ValueError, and TypeError (there are two occurrences) and apply the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 284-289: The fixture yields final_status too early — stop the pod
deleter and wait for it to finish before yielding so tests don't race with
chaos; move the pod_deleting_thread_during_oadp_operations["stop_event"].set()
and the thread.join(timeout=TIMEOUT_1MIN) (and any recovery/wait-for-recovery
steps if present) to execute before the yield of final_status (i.e., ensure the
call to thread.join and cleanup happen prior to returning final_status from the
fixture, referencing pod_deleting_thread_during_oadp_operations, thread,
TIMEOUT_1MIN, and yield final_status).
- Line 161: The problematic multi-except uses comma-separated exceptions
(ResourceNotFoundError, ValueError, TypeError) which is invalid for our parser;
update each occurrence to either wrap those exception names in a single
exception-tuple (parenthesize them) or split them into separate except blocks so
each exception is handled properly; locate the offending except statement(s)
that refer to ResourceNotFoundError, ValueError, and TypeError (there are two
occurrences) and apply the change.
In `@tests/chaos/utils.py`:
- Around line 591-618: The current code can use the full timeout twice: first in
the TimeoutSampler loop (try_get_and_wait_running) and then again in
wait_until_vm_running, potentially doubling the wait budget. Fix by tracking
elapsed time after the sampler returns (or while looping) and pass the remaining
timeout to wait_until_vm_running (e.g., remaining = max(0, timeout - elapsed)),
or refactor try_get_and_wait_running/TimeoutSampler to check vm.status ==
"Running" so only one TimeoutSampler call is needed; update calls to use the
remaining timeout and keep sleep unchanged, referencing TimeoutSampler,
try_get_and_wait_running, wait_until_vm_running, timeout, sleep, and vm.
- Around line 574-575: The except TimeoutExpiredError handlers that currently
raise a TimeoutError (the blocks matching "except TimeoutExpiredError as error:"
that reference vm.name and the similar blocks at the other noted locations) must
first emit an ERROR-level log containing full context (VM name, phase/state,
expected vs actual/last-seen values and the timeout) and include the captured
exception details, then re-raise the TimeoutError using "raise ... from error";
update each such handler in tests/chaos/utils.py (including the occurrences
around the vm.name block and the ones noted at the other locations) to call the
module's logger.error (or the existing process logger used in this file) with a
descriptive message and the exception before re-raising.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f036865a-ad9d-4f7b-b86a-24f3e5b25572
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
|
@coderabbitai resume |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/chaos/utils.py (1)
350-401:⚠️ Potential issue | 🔴 CriticalCRITICAL: Existing chaos recovery callers still use
resource=, so this helper will fail before it reaches the wait logic.
tests/chaos/conftest.py:404-407callspod_deleting_process_recover(resource=pod_deletion_config["resource"], ...), but this API only acceptsresources. That raisesTypeErrorin the standard chaos cleanup path. Update that caller in the same PR, or keep this helper backward-compatible with the singular form.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 350 - 401, The helper pod_deleting_process_recover currently requires a list named resources and raises TypeError if callers pass the singular keyword resource; make it backward-compatible by accepting either resources (list) or resource (single Deployment/DaemonSet) and normalizing to a list early in the function (e.g., if resources is None and a resource arg exists, wrap it in a list, or if resources is not a list but is a single resource, treat it as [resources]); ensure the rest of the logic still validates each item via resource.kind and preserve existing error messages (adjust the initial TypeError message to reflect both accepted forms) so existing callers using resource= continue to work.
♻️ Duplicate comments (3)
tests/chaos/utils.py (1)
579-620:⚠️ Potential issue | 🟠 MajorHIGH:
wait_for_restored_vm()can consume the timeout budget twice.The first
TimeoutSamplerwaits up totimeoutfor the VM object to appear, thenwait_until_vm_running()can wait another fulltimeoutforRunning. A restore that produces the VM near the deadline can therefore block for almost 2× the advertised budget and make slow restores look healthy. Use a single sampler for “exists and is Running”, or pass only the remaining time intowait_until_vm_running().⏱️ Minimal fix
+ start_time = time.monotonic() try: for sample in TimeoutSampler(wait_timeout=timeout, sleep=sleep, func=try_get_and_wait_running): if sample: break @@ LOGGER.info(f"VM {vm.name} found, waiting for Running") - wait_until_vm_running(vm=vm, timeout=timeout, sleep=sleep) + remaining_timeout = max(0, timeout - (time.monotonic() - start_time)) + wait_until_vm_running(vm=vm, timeout=remaining_timeout, sleep=sleep)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/utils.py` around lines 579 - 620, wait_for_restored_vm currently can spend the full timeout in TimeoutSampler and then another full timeout in wait_until_vm_running; change it to track elapsed time and only pass the remaining budget to wait_until_vm_running (or use a single TimeoutSampler that checks both existence and vm.status == "Running"). Concretely, in wait_for_restored_vm record start = time.time() before the TimeoutSampler loop, after the sampler finds vm compute elapsed = time.time() - start, compute remaining = timeout - elapsed, if remaining <= 0 raise TimeoutError with the same context, otherwise call wait_until_vm_running(vm=vm, timeout=remaining, sleep=sleep); references: wait_for_restored_vm, try_get_and_wait_running, TimeoutSampler, wait_until_vm_running.tests/chaos/oadp/conftest.py (2)
161-161:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parenthesize both multi-except handlers.
Line 161 and Line 298 use the unparenthesized multi-except form, but the repo’s current lint/parser setup still rejects it, so this file will not import in CI. Use a tuple at both sites.
🔧 Minimal fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError): ... - except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):Quick repro: this should fail until both handlers are parenthesized.
#!/bin/bash python -m py_compile tests/chaos/oadp/conftest.pyAlso applies to: 298-298
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` at line 161, Replace the unparenthesized multi-except forms with a parenthesized tuple at both occurrences: locate the lines using the exact except pattern "except ResourceNotFoundError, ValueError, TypeError" and the other similar multi-except at the later occurrence and change them to use a tuple (e.g., except (ResourceNotFoundError, ValueError, TypeError):) so the file imports correctly under the current parser/lint rules.
277-299:⚠️ Potential issue | 🟠 MajorHIGH: Yield the restore status only after chaos recovery finishes.
yield final_statusat Line 284 hands control back to the test while the pod-deletion thread is still running. The restored-VM wait and file check then race ongoing pod deletions, so this validates restore under continued chaos instead of the post-chaos result.♻️ Suggested restructuring
try: final_status = wait_for_oadp_phase( resource=oadp_restore_started, terminal_statuses=terminal_statuses, timeout=TIMEOUT_3MIN, ) assert final_status in allowed_statuses, ( f"Restore ended in unexpected terminal phase: {final_status}" ) - yield final_status finally: pod_deleting_thread_during_oadp_operations["stop_event"].set() if thread.is_alive(): thread.join(timeout=TIMEOUT_1MIN) @@ except (ResourceNotFoundError, ValueError, TypeError): LOGGER.error(f"Recovery failed for prefix {pod_prefix} in namespace {namespace}") raise + + return final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 277 - 299, The test yields final_status before chaos recovery finishes, causing the post-restore checks to race with ongoing pod deletions; move the yield(final_status) to after the recovery/teardown block so that you first set pod_deleting_thread_during_oadp_operations["stop_event"], join thread (thread.join(timeout=TIMEOUT_1MIN)), run pod_deleting_process_recover(resources=[Deployment, DaemonSet], namespace=namespace, pod_prefix=pod_prefix) and only then yield final_status; also fix the except clause to use a tuple form except (ResourceNotFoundError, ValueError, TypeError): to correctly catch multiple exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/chaos/utils.py`:
- Around line 350-401: The helper pod_deleting_process_recover currently
requires a list named resources and raises TypeError if callers pass the
singular keyword resource; make it backward-compatible by accepting either
resources (list) or resource (single Deployment/DaemonSet) and normalizing to a
list early in the function (e.g., if resources is None and a resource arg
exists, wrap it in a list, or if resources is not a list but is a single
resource, treat it as [resources]); ensure the rest of the logic still validates
each item via resource.kind and preserve existing error messages (adjust the
initial TypeError message to reflect both accepted forms) so existing callers
using resource= continue to work.
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Line 161: Replace the unparenthesized multi-except forms with a parenthesized
tuple at both occurrences: locate the lines using the exact except pattern
"except ResourceNotFoundError, ValueError, TypeError" and the other similar
multi-except at the later occurrence and change them to use a tuple (e.g.,
except (ResourceNotFoundError, ValueError, TypeError):) so the file imports
correctly under the current parser/lint rules.
- Around line 277-299: The test yields final_status before chaos recovery
finishes, causing the post-restore checks to race with ongoing pod deletions;
move the yield(final_status) to after the recovery/teardown block so that you
first set pod_deleting_thread_during_oadp_operations["stop_event"], join thread
(thread.join(timeout=TIMEOUT_1MIN)), run
pod_deleting_process_recover(resources=[Deployment, DaemonSet],
namespace=namespace, pod_prefix=pod_prefix) and only then yield final_status;
also fix the except clause to use a tuple form except (ResourceNotFoundError,
ValueError, TypeError): to correctly catch multiple exceptions.
In `@tests/chaos/utils.py`:
- Around line 579-620: wait_for_restored_vm currently can spend the full timeout
in TimeoutSampler and then another full timeout in wait_until_vm_running; change
it to track elapsed time and only pass the remaining budget to
wait_until_vm_running (or use a single TimeoutSampler that checks both existence
and vm.status == "Running"). Concretely, in wait_for_restored_vm record start =
time.time() before the TimeoutSampler loop, after the sampler finds vm compute
elapsed = time.time() - start, compute remaining = timeout - elapsed, if
remaining <= 0 raise TimeoutError with the same context, otherwise call
wait_until_vm_running(vm=vm, timeout=remaining, sleep=sleep); references:
wait_for_restored_vm, try_get_and_wait_running, TimeoutSampler,
wait_until_vm_running.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f60bafbe-6956-4a01-9eaa-42c0b9b8e642
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
✅ Actions performedReviews resumed. |
Polarion cases: CNV-12023, CNV-12025, CNV-12027 Signed-off-by: Qixuan Wang <qwang@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
tests/chaos/oadp/conftest.py (2)
285-307:⚠️ Potential issue | 🟠 MajorHIGH: Finish chaos teardown before returning the restore phase.
Line 292 yields control back to the test, but Line 295 through Line 307 do not run until fixture teardown.
test_delete_pods_during_restore()then waits on the restored VM while the pod-deletion thread is still active, so the new restore checks race ongoing chaos instead of validating post-recovery state.🔧 Suggested restructure
try: final_status = wait_for_oadp_phase( resource=oadp_restore_started, terminal_statuses=terminal_statuses, timeout=TIMEOUT_30MIN, ) - assert final_status in allowed_statuses, f"Restore ended in unexpected terminal phase: {final_status}" - yield final_status - finally: pod_deleting_thread_during_oadp_operations["stop_event"].set() if thread.is_alive(): thread.join(timeout=TIMEOUT_1MIN) # Recovery only — thread teardown handled by pod_deleting_thread fixture try: pod_deleting_process_recover( resources=[Deployment, DaemonSet], namespace=namespace, pod_prefix=pod_prefix, ) - except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError): LOGGER.error(f"Recovery failed for prefix {pod_prefix} in namespace {namespace}") raise + + assert final_status in allowed_statuses, ( + f"Restore ended in unexpected terminal phase: {final_status}" + ) + return final_status🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` around lines 285 - 307, The fixture yields the restore phase before stopping the chaos thread and performing recovery, causing races; ensure teardown runs before yielding by moving the pod_deleting_thread_during_oadp_operations["stop_event"].set(), waiting for thread.join(timeout=TIMEOUT_1MIN) and calling pod_deleting_process_recover(resources=[Deployment, DaemonSet], namespace=namespace, pod_prefix=pod_prefix) (catching exceptions with a tuple: except (ResourceNotFoundError, ValueError, TypeError):) to execute prior to yield from the fixture (or alternatively, join/stop the thread and run recovery immediately after wait_for_oadp_phase and assert, then yield the final_status).
165-165:⚠️ Potential issue | 🔴 CriticalCRITICAL: Parenthesize both multi-except handlers.
Line 165 and Line 306 still use the unparenthesized multi-except form. The repo’s current parser/lint toolchain rejects it, so this file will not pass validation as written.
🔧 Minimal fix
- except ResourceNotFoundError, ValueError, TypeError: + except (ResourceNotFoundError, ValueError, TypeError):#!/bin/bash set -euo pipefail rg -n 'except ResourceNotFoundError, ValueError, TypeError:' tests/chaos/oadp/conftest.py python - <<'PY' from pathlib import Path source = Path("tests/chaos/oadp/conftest.py").read_text() try: compile(source, "tests/chaos/oadp/conftest.py", "exec") print("compile(): OK") except SyntaxError as exc: print(f"compile(): SyntaxError -> {exc}") PYAlso applies to: 306-306
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/chaos/oadp/conftest.py` at line 165, The except clauses use the deprecated comma-separated multi-except form ("except ResourceNotFoundError, ValueError, TypeError:"); update both occurrences to use a parenthesized tuple form ("except (ResourceNotFoundError, ValueError, TypeError):") so the parser accepts the file; search for the exact string "except ResourceNotFoundError, ValueError, TypeError:" in conftest.py and replace both instances with the parenthesized version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 301-305: The restore fixture call passes a resources argument to
pod_deleting_process_recover but the function still ignores it and uses a
hardcoded [Deployment, DaemonSet]; update the pod_deleting_process_recover
implementation to consume the resources parameter (e.g., iterate over the
provided resources list instead of the hardcoded list) so the restore-case
inputs from tests/chaos/oadp/test_oadp.py (which pass [Deployment] or
[DaemonSet]) take effect, or if you prefer remove the unused resources parameter
from the call/signature to eliminate dead code; ensure the function signature
(pod_deleting_process_recover) and any internal uses reference the passed-in
resources variable rather than a literal.
In `@tests/chaos/utils.py`:
- Around line 462-465: The current except block catching TimeoutExpiredError in
the pod deletion helper (around pod.wait_deleted and TIMEOUT_30SEC) only logs at
INFO and continues; change it to log at ERROR with full context (pod.name,
timeout value, expected deletion vs observed state) using LOGGER.error, and then
re-raise the TimeoutExpiredError so callers can fail or retry explicitly; keep
the original exception when re-raising to preserve traceback.
- Around line 592-629: The current flow in wait_for_restored_vm
(try_get_and_wait_running, TimeoutSampler loop, then wait_until_vm_running) can
double-consume the timeout; fix by making the two phases share a single timeout
budget: record start_time before the TimeoutSampler, compute end_time =
start_time + timeout, after the sampler loop calculate remaining = end_time -
now and if remaining <= 0 raise the same TimeoutError, otherwise call
wait_until_vm_running(vm=vm, timeout=remaining, sleep=sleep). Use the existing
symbols try_get_and_wait_running, TimeoutSampler, wait_until_vm_running and
preserve the same error/LOG lines but pass namespace_name consistently.
---
Duplicate comments:
In `@tests/chaos/oadp/conftest.py`:
- Around line 285-307: The fixture yields the restore phase before stopping the
chaos thread and performing recovery, causing races; ensure teardown runs before
yielding by moving the
pod_deleting_thread_during_oadp_operations["stop_event"].set(), waiting for
thread.join(timeout=TIMEOUT_1MIN) and calling
pod_deleting_process_recover(resources=[Deployment, DaemonSet],
namespace=namespace, pod_prefix=pod_prefix) (catching exceptions with a tuple:
except (ResourceNotFoundError, ValueError, TypeError):) to execute prior to
yield from the fixture (or alternatively, join/stop the thread and run recovery
immediately after wait_for_oadp_phase and assert, then yield the final_status).
- Line 165: The except clauses use the deprecated comma-separated multi-except
form ("except ResourceNotFoundError, ValueError, TypeError:"); update both
occurrences to use a parenthesized tuple form ("except (ResourceNotFoundError,
ValueError, TypeError):") so the parser accepts the file; search for the exact
string "except ResourceNotFoundError, ValueError, TypeError:" in conftest.py and
replace both instances with the parenthesized version.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: a2c1d712-7b34-4664-910b-d3d240d201ab
📒 Files selected for processing (4)
tests/chaos/oadp/conftest.pytests/chaos/oadp/test_oadp.pytests/chaos/utils.pyutilities/constants.py
| pod_deleting_process_recover( | ||
| resources=[Deployment, DaemonSet], | ||
| namespace=namespace, | ||
| pod_prefix=pod_prefix, | ||
| ) |
There was a problem hiding this comment.
MEDIUM: The new resources parameter is still inert.
The restore parametrization in tests/chaos/oadp/test_oadp.py now passes [Deployment] vs [DaemonSet], but this recovery call still hardcodes both resource types. Changing the case data currently has no effect, so either consume the payload here or remove it from the restore-case inputs.
♻️ Minimal fix
- resources=[Deployment, DaemonSet],
+ resources=(
+ pod_deleting_thread_during_oadp_operations["resources"]
+ if pod_deleting_thread_during_oadp_operations["resources"] is not None
+ else [Deployment, DaemonSet]
+ ),As per coding guidelines: "No dead code - every function, variable, fixture MUST be used or removed; code marked with # skip-unused-code is excluded from dead code analysis".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pod_deleting_process_recover( | |
| resources=[Deployment, DaemonSet], | |
| namespace=namespace, | |
| pod_prefix=pod_prefix, | |
| ) | |
| pod_deleting_process_recover( | |
| resources=( | |
| pod_deleting_thread_during_oadp_operations["resources"] | |
| if pod_deleting_thread_during_oadp_operations["resources"] is not None | |
| else [Deployment, DaemonSet] | |
| ), | |
| namespace=namespace, | |
| pod_prefix=pod_prefix, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/chaos/oadp/conftest.py` around lines 301 - 305, The restore fixture
call passes a resources argument to pod_deleting_process_recover but the
function still ignores it and uses a hardcoded [Deployment, DaemonSet]; update
the pod_deleting_process_recover implementation to consume the resources
parameter (e.g., iterate over the provided resources list instead of the
hardcoded list) so the restore-case inputs from tests/chaos/oadp/test_oadp.py
(which pass [Deployment] or [DaemonSet]) take effect, or if you prefer remove
the unused resources parameter from the call/signature to eliminate dead code;
ensure the function signature (pod_deleting_process_recover) and any internal
uses reference the passed-in resources variable rather than a literal.
| try: | ||
| pod.wait_deleted(timeout=TIMEOUT_30SEC) | ||
| except TimeoutExpiredError: | ||
| LOGGER.warning(f"Pod {pod.name} was not deleted") | ||
| LOGGER.info(f"Pod {pod.name} was not deleted") |
There was a problem hiding this comment.
HIGH: Timed-out pod deletions are treated as success.
If wait_deleted() times out, this chaos round did not actually remove the selected pod, but the helper only emits an INFO log and keeps going. That can let the restore scenario pass without exercising the intended failure. Log this at ERROR and re-raise so callers can fail or retry explicitly.
🐛 Proposed fix
for pod in pods_to_delete:
LOGGER.info(f"Deleting pod {pod.name}")
pod.clean_up()
try:
pod.wait_deleted(timeout=TIMEOUT_30SEC)
- except TimeoutExpiredError:
- LOGGER.info(f"Pod {pod.name} was not deleted")
+ except TimeoutExpiredError as error:
+ LOGGER.error(
+ f"Timed out waiting for pod {pod.name} in namespace {namespace_name} to be deleted"
+ )
+ raise TimeoutError(
+ f"Pod {pod.name} was not deleted within {TIMEOUT_30SEC}s"
+ ) from errorAs per coding guidelines: ERROR level logging REQUIRED for exceptions with full context: what failed, expected vs actual values, resource state and NEVER silently swallow exceptions - at minimum, log the error before continuing in Python.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| pod.wait_deleted(timeout=TIMEOUT_30SEC) | |
| except TimeoutExpiredError: | |
| LOGGER.warning(f"Pod {pod.name} was not deleted") | |
| LOGGER.info(f"Pod {pod.name} was not deleted") | |
| try: | |
| pod.wait_deleted(timeout=TIMEOUT_30SEC) | |
| except TimeoutExpiredError as error: | |
| LOGGER.error( | |
| f"Timed out waiting for pod {pod.name} in namespace {namespace_name} to be deleted" | |
| ) | |
| raise TimeoutError( | |
| f"Pod {pod.name} was not deleted within {TIMEOUT_30SEC}s" | |
| ) from error |
🧰 Tools
🪛 Ruff (0.15.9)
[warning] 465-465: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/chaos/utils.py` around lines 462 - 465, The current except block
catching TimeoutExpiredError in the pod deletion helper (around pod.wait_deleted
and TIMEOUT_30SEC) only logs at INFO and continues; change it to log at ERROR
with full context (pod.name, timeout value, expected deletion vs observed state)
using LOGGER.error, and then re-raise the TimeoutExpiredError so callers can
fail or retry explicitly; keep the original exception when re-raising to
preserve traceback.
| namespace_name = namespace.name if hasattr(namespace, "name") else namespace | ||
|
|
||
| vm = None | ||
|
|
||
| def try_get_and_wait_running(): | ||
| nonlocal vm | ||
|
|
||
| try: | ||
| vm_iter = VirtualMachine.get(client=admin_client, namespace=namespace_name, name=vm_name) | ||
| vm_obj = next(vm_iter, None) | ||
| if vm_obj: | ||
| test_vm = VirtualMachineForTests( | ||
| client=admin_client, | ||
| name=vm_name, | ||
| namespace=namespace_name, | ||
| ) | ||
| test_vm.name = vm_name | ||
| vm = test_vm | ||
| return vm | ||
| except ResourceNotFoundError: | ||
| return None | ||
|
|
||
| try: | ||
| for sample in TimeoutSampler(wait_timeout=timeout, sleep=sleep, func=try_get_and_wait_running): | ||
| if sample: | ||
| break | ||
|
|
||
| except TimeoutExpiredError as error: | ||
| LOGGER.error(f"VM {vm_name} was not found in namespace {namespace_name} within {timeout}s") | ||
| raise TimeoutError( | ||
| f"VM {vm_name} not found or not running in namespace {namespace} within {timeout}s" | ||
| ) from error | ||
|
|
||
| if vm is None: | ||
| raise AssertionError("Unreachable: vm should be set after TimeoutSampler loop") | ||
|
|
||
| LOGGER.info(f"VM {vm.name} found, waiting for Running") | ||
| wait_until_vm_running(vm=vm, timeout=timeout, sleep=sleep) |
There was a problem hiding this comment.
HIGH: timeout can be consumed twice in wait_for_restored_vm.
The first wait only covers “VM object exists”. After that breaks, wait_until_vm_running() gets the full timeout again. In the failure path, a 5-minute budget can stretch to nearly 10 minutes. Carry one timeout budget across both phases, or fold the Running check into the same wait.
♻️ Proposed fix
def wait_for_restored_vm(
admin_client,
namespace,
vm_name,
timeout=TIMEOUT_5MIN,
sleep=5,
) -> VirtualMachine:
@@
LOGGER.info(f"Waiting for restored VM {vm_name}")
namespace_name = namespace.name if hasattr(namespace, "name") else namespace
+ timeout_watch = TimeoutWatch(timeout=timeout)
vm = None
@@
LOGGER.info(f"VM {vm.name} found, waiting for Running")
- wait_until_vm_running(vm=vm, timeout=timeout, sleep=sleep)
+ wait_until_vm_running(
+ vm=vm,
+ timeout=max(1, int(timeout_watch.remaining_time())),
+ sleep=sleep,
+ )🧰 Tools
🪛 Ruff (0.15.9)
[warning] 596-596: Missing return type annotation for private function try_get_and_wait_running
(ANN202)
[warning] 620-620: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
[warning] 620-620: Logging statement uses f-string
(G004)
[warning] 621-623: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 626-626: Avoid specifying long messages outside the exception class
(TRY003)
[warning] 628-628: Logging statement uses f-string
(G004)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/chaos/utils.py` around lines 592 - 629, The current flow in
wait_for_restored_vm (try_get_and_wait_running, TimeoutSampler loop, then
wait_until_vm_running) can double-consume the timeout; fix by making the two
phases share a single timeout budget: record start_time before the
TimeoutSampler, compute end_time = start_time + timeout, after the sampler loop
calculate remaining = end_time - now and if remaining <= 0 raise the same
TimeoutError, otherwise call wait_until_vm_running(vm=vm, timeout=remaining,
sleep=sleep). Use the existing symbols try_get_and_wait_running, TimeoutSampler,
wait_until_vm_running and preserve the same error/LOG lines but pass
namespace_name consistently.
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #3748. Overlapping filesutilities/constants.py |
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #4725. Overlapping filesutilities/constants.py |
Short description:
Polarion cases: CNV-12023, CNV-12025, CNV-12027
Need #3002 as its dependence.
More details:
What this PR does / why we need it:
Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:
Summary by CodeRabbit
Tests
Chores