CORENET-7206: Multus OTE: add OpenShift Tests Extension framework infrastructure#303
CORENET-7206: Multus OTE: add OpenShift Tests Extension framework infrastructure#303anuragthehatter wants to merge 7 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis PR adds a multus-cni e2e extension, new build targets, OTP test coverage, and Docker packaging for the compressed test binary. ChangesMultus e2e extension
Sequence Diagram(s)sequenceDiagram
participant Makefile
participant TestMakefile
participant Main
participant OpenShiftTestsExtension
participant Cobra
participant Dockerfile
Makefile->>TestMakefile: build-e2e-tests
TestMakefile->>Main: build multus-cni-tests-ext
Main->>OpenShiftTestsExtension: create and register extension specs
Main->>Cobra: execute root command
Dockerfile->>TestMakefile: run build-e2e-tests in rhel9 stage
Dockerfile->>Dockerfile: gzip multus-cni-tests-ext and copy artifact
🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 12 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/cmd/main.go`:
- Around line 31-33: The main() function currently uses
panic(fmt.Sprintf("couldn't build extension test specs from ginkgo: %+v",
err.Error())) on a recoverable setup error; replace this panic with a controlled
exit: write a clear error message including err (use err or err.Error()) to
stderr via fmt.Fprintln(os.Stderr, ...) and call os.Exit(nonZeroCode) to
terminate deterministically. Update imports to include "os" if missing and
remove the panic call in test/cmd/main.go so the code path uses the controlled
stderr+os.Exit behavior when the err variable is non-nil.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6d4d4554-adef-4a1b-b1ba-b8e2f8a3eb43
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
Dockerfile.openshiftMakefilego.modtest/Makefiletest/cmd/main.go
1713ca4 to
31bd7e6
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: anuragthehatter The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@anuragthehatter: This pull request references CORENET-7206 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/otp/multus.go`:
- Around line 68-70: The cleanup paths in the test helpers are ignoring errors,
which can leave stale namespaces or active tcpdump processes; update the
deferred cleanup and exec handling in the relevant test functions (including the
namespace delete and pkill tcpdump calls) to check returned errors instead of
discarding them. Only suppress expected cleanup cases like NotFound, and make
sure the same fix is applied across the referenced cleanup blocks in the test
file so no Go error return is ignored.
- Around line 599-637: The macvlan test pods created in the
ReplicationController are not guaranteed to land on the same node as the
sniffer, which can make the observation flaky on multi-node clusters. Update the
test setup around the `rc`/`ReplicationController` template to label the
`sniff-pod` and add pod affinity or node affinity on the six test pods so they
are scheduled onto the same `kubernetes.io/hostname` as `sniff-pod`. Keep the
fix localized to the `Creating 6 test pods with macvlan secondary network` block
and the pod template used by `Create`.
- Around line 720-721: The tcpdump count assertions are comparing the trimmed
stdout as a string, so unexpected non-numeric output can still pass; update the
Neighbor Advertisement and the other affected count checks to parse the result
numerically after strings.TrimSpace using strconv.Atoi, then assert the parsed
count is greater than zero. Keep the fix localized around the existing
stdout.String() handling and the o.Expect checks so the logic validates actual
packet counts instead of string content.
- Line 27: The Ginkgo suite name for the OTP Multus specs is missing the suite
qualifier token used by test/cmd/main.go selection. Update the g.Describe
declaration for OTP Multus to include the “[Suite:openshift/multus-cni]”
qualifier in the name so these specs are picked up by the new extension suite
selector.
- Around line 34-35: The test setup is using context.Background() for
Kubernetes-related API and exec calls, which prevents cancellation and can let a
stuck request outlive the surrounding Eventually timeout. Update the test
helpers in multus setup to use a bounded context instead of a background
context, and thread that context through the Kubernetes call sites so
cancellations/timeouts are respected; focus on the ctx initialization and any
functions in this flow that consume it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 889abbbd-989e-44df-8adb-08b3f83f47c0
📒 Files selected for processing (2)
test/cmd/main.gotest/otp/multus.go
🚧 Files skipped from review as they are similar to previous changes (1)
- test/cmd/main.go
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/otp/multus.go`:
- Line 927: The ping command construction in the multus test helpers uses
/bin/sh -c with pod-derived input, which should be removed. Update the ping
command setup at the referenced pingCmd assignments in multus.go to pass ping
and its flags as direct argv entries with the target IP as a separate argument,
and apply the same fix to the other matching call sites so no shell
interpolation is used for pod annotation values.
- Line 758: The new Ginkgo specs in multus test suite are missing the standard
OTP informing metadata in their It names. Update the affected It declarations to
include the same [OTP][informing] prefix used by the rest of this suite so these
LifecycleInforming tests are tagged consistently; use the existing test names in
the multus spec block to locate the two new cases.
- Around line 811-817: The node selection logic in the Multus OTP tests is too
restrictive because it assumes a worker-labeled node exists. Update the node
lookup in the affected test helpers (for example the block that uses
clientset.CoreV1().Nodes().List in multus.go and the related spots noted in the
review) to select any Ready, schedulable node instead of filtering on
node-role.kubernetes.io/worker, then continue using that chosen node for
pinning/affinity. Keep the existing assertions, but base them on readiness and
schedulability so the tests still validate bridge isolation on SNO and other
non-HA topologies.
- Line 895: The json.Unmarshal calls in multus.go cannot typecheck because
encoding/json is not imported. Add the encoding/json import alongside the other
imports in test/otp/multus.go, making sure the import is present for all four
unmarshaling sites that use json.Unmarshal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 88083e4a-8848-4dc7-bea8-a7720bc96ed4
📒 Files selected for processing (1)
test/otp/multus.go
…IPv6 exclude ranges - [76652]: Dummy CNI plugin support - [66876]: Whereabouts dual-stack IPAM - [69947]: Macvlan Unsolicited Neighbor Advertisements - [80524]: Basic port isolation with bridge CNI - [80525]: Mixed port isolation networks These tests were moved as they are pure Multus CNI tests without ovn-kubernetes dependencies. Total OTP tests now: 9 (2 security + 5 networking tools + 2 port isolation) Related: openshift/multus-cni#303
|
@anuragthehatter: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. |
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn openshift/origin#31231 |
|
@anuragthehatter: trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command |
|
/test-with openshift/origin#31231 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/da8e4da0-7036-11f1-8f14-4a9942044fcb-0 |
|
@SachinNinganure lets check above /payload-job-with-prs job results and debug if you see any failures in any of your cases going in via this PR |
|
@anuragthehatter The payload job build failures look like a CI infrastructure issue, similar to the recent tests-private:4.22 build failures that was fixed (Go version mismatch in builder images). |
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn openshift/origin#31231 |
|
@SachinNinganure: it appears that you have attempted to use some version of the payload command, but your comment was incorrectly formatted and cannot be acted upon. See the docs for usage info. |
|
/retest |
1 similar comment
|
/retest |
| "k8s.io/client-go/tools/remotecommand" | ||
| ) | ||
|
|
||
| var _ = g.Describe("[JIRA:Networking][OTP][sig-network] OTP Multus", func() { |
There was a problem hiding this comment.
Lets add openshift/conformance.serial or parallel to your g.Describe like
https://github.com/openshift/cluster-network-operator/pull/3015/changes#diff-30af132467d7a9df230f52564f78a0256952bbca5c96d190650b7cf6610280caR18
|
@SachinNinganure Your tests are not being running in any jobs IIUC. Your tests are not picked up due to |
16a9333 to
6cb94b3
Compare
|
/payload-job-with-prs periodic-ci-openshift-release-main-ci-5.0-e2e-aws-ovn openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/80ca6000-7168-11f1-82a5-8d7cb46534a1-0 |
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-fips openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/b4e33550-7169-11f1-83d2-00e1e58b39d1-0 |
33fb1ed to
020dce3
Compare
| RUN ./hack/build-go.sh && \ | ||
| cd /usr/src/multus-cni/bin | ||
| ENV GO111MODULE=on | ||
| RUN go mod vendor && make build-e2e-tests && gzip -9 test/bin/multus-cni-tests-ext |
There was a problem hiding this comment.
@SachinNinganure Just FYI, we need go mod vendor and GO1111MODULE=on undr dockerfile of any component OTE frameowkr code to enavle vendoring ot be generated dynamically duting tests. It prevents us to commit 100/1000's of vendoring files in a PR.
I juts did on this PR for now.
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-fips openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/839e6030-716b-11f1-8dbf-011d1ed04792-0 |
| "k8s.io/client-go/tools/remotecommand" | ||
| ) | ||
|
|
||
| var _ = g.Describe("[OTP] Multus CNI [Suite:openshift/conformance/parallel]", func() { |
There was a problem hiding this comment.
missing [sig-network]
Refer: https://github.com/openshift/cluster-network-operator/pull/3015/changes#diff-276ffa16a5e942d512e1b6a73da42da1bddd81f4fdf080f5be2e30eabff71046
Should be
[sig-network][OTP][Suite:openshift/conformance/parallel] Multus CNI
There was a problem hiding this comment.
tests are not running in CI due to this. Also fetch latest changes from this PR on your staging branch regarding go mode vendor change else that will be overritten
| }) | ||
|
|
||
| // High-57589: Whereabouts CNI Timeout with Large Exclude Range | ||
| g.It("57589-should handle large IPv6 exclude ranges without timeout", func() { |
There was a problem hiding this comment.
Follow https://github.com/openshift/cluster-network-operator/pull/3015/changes#diff-276ffa16a5e942d512e1b6a73da42da1bddd81f4fdf080f5be2e30eabff71046R112 tagging convention on all cases
020dce3 to
cfd5c92
Compare
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-fips openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/4f91c970-7176-11f1-96f7-87fcc13aecee-0 |
Set up the OTE framework for multus-cni with no test cases yet. This adds the test binary entry point, build infrastructure, and Dockerfile integration for the multus-cni-tests-ext binary. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Multus-specific OTP tests:
- [57589]: Whereabouts IPv6 exclude ranges
- [76652]: Dummy CNI plugin
- [66876]: Whereabouts dual-stack IPAM
- [69947]: Macvlan Unsolicited NAs
Add 2 bridge CNI port isolation tests from ovn-kubernetes: Test 80524: Basic port isolation - Creates NAD with portIsolation:true on bridge CNI - Verifies 2 pods with isolated ports cannot communicate - Tests dual-stack (IPv4 + IPv6) isolation Test 80525: Mixed isolation networks - Creates 2 NADs: one isolated, one non-isolated - Verifies pods can communicate via non-isolated network - Verifies pods CANNOT communicate via isolated network - Demonstrates selective port isolation behavior moving all pure Multus tests here.
Fix 5 issues identified by CodeRabbit review:
1. Suite qualifier
- Add [Suite:openshift/multus-cni] to Describe block
- Ensures tests are selected by extension suite filter
2. Bounded context
- Replace context.Background() with context.WithTimeout(10min)
- Prevent hung API/exec calls from blocking test indefinitely
- Add proper cleanup with g.DeferCleanup(cancel)
3. Error handling
- Add apierrors import for proper error checking
- Check all namespace deletion errors, log non-NotFound failures
- Check pkill tcpdump errors and log with stdout/stderr
- Prevents stale resources from silent cleanup failures
4. Pod affinity
- Add 'app: sniffer' label to sniff-pod
- Add required pod affinity to RC template
- Schedule all 6 test pods on same node as sniffer
- Fixes flaky failures on multi-node clusters
5. Numeric validation
- Add strconv import
- Parse packet counts with strconv.Atoi()
- Use BeNumerically(">", 0) instead of string comparison
- Prevents false positives from empty/error output
Four fixes addressing CodeRabbit review feedback:
1. Test naming consistency (lines 55, 148, 244, 519, 793, 995):
- Remove [OTP][informing] labels from all 6 test names
- Format: g.It("57589-should...") matches ovn-k PR #3213
- Lifecycle controlled by test/cmd/main.go per @asood-rh
2. SNO compatibility (lines 848-869, 1090-1111):
- Replace worker node selector with Ready+schedulable check
- Works on SNO, TNF, TNA, HyperShift clusters
- Tests 80524, 80525 no longer require worker label
3. Missing import (line 6):
- Add encoding/json import for json.Unmarshal() calls
- Used in 4 places to parse network-status annotations
4. Security: Direct argv for ping (lines 980, 1235, 1266):
- Replace: []string{"/bin/sh", "-c", fmt.Sprintf("ping... %s")}
- With: []string{"ping", "-c", "3", "-W", "2", ip}
- Avoid shell interpolation of IPs from annotations
cfd5c92 to
49a0cdd
Compare
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-fips openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/3ce35b60-7188-11f1-8222-77be40ca4c89-0 |
49a0cdd to
2891903
Compare
|
/payload-job-with-prs periodic-ci-openshift-release-main-nightly-5.0-e2e-aws-ovn-fips openshift/origin#31231 |
|
@anuragthehatter: trigger 1 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/f90521a0-71ad-11f1-9d8e-e5a6357abb31-0 |
|
Seems all tests passing on above job except 77102 which needed PSA fix. Squashed that fix and re-trigerred payload job |
Add security.openshift.io/scc.podSecurityLabelSync: false label to test namespace to allow privileged pod creation for CNI file permission checks. Test 77102 requires hostNetwork, hostPID, and privileged container to access host filesystem and check CNI config file permissions (CIS compliance). Without this label, pod creation fails with PodSecurity restricted policy violation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
|
/retest |
|
@anuragthehatter: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
test/cmd/main.go), build Makefile (test/Makefile), and Dockerfile integrationLifecycleInformingTest plan
multus-cni-tests-extbinary successfully/usr/bin/multus-cni-tests-ext.gz🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Chores