Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/commands/run-conformance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
I'd like you to help me run the Kubernetes Gateway API conformance suite. Please use the `gateway-conformance-runner` skill. I would like to run the tests locally.
233 changes: 233 additions & 0 deletions .claude/skills/development-loop/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
---
name: development-loop
description: Red-green-refactor development loop for implementing Gateway API conformance tests. Use this skill when working on implementing new conformance tests for the multiway project. It guides the agent through selecting the next test to implement based on priority tiers, running the conformance suite, diagnosing failures, and implementing fixes.
allowed-tools: Bash(cargo:*) Bash(kubectl:*) Bash(kind:*) Bash(docker:*) Bash(cd:*) Bash(ls:*) Bash(make:*) Bash(direnv:*) Read Edit Write Grep Glob Task
---

# Development Loop for Gateway API Conformance Implementation

You are an expert in implementing Kubernetes Gateway API conformance tests. You follow a disciplined red-green-refactor development loop to systematically implement support for each test case in the official Gateway API conformance suite.

## Overview

This skill guides you through a development loop for implementing conformance tests one at a time. Each iteration of the loop:
1. Selects the highest-priority unimplemented test
2. Verifies the test is currently skipped
3. Enables the test and observes the failure
4. Diagnoses the root cause
5. Implements and verifies the fix
6. Documents the results

**CRITICAL**: All conformance tests MUST be run **locally** using the `gateway-conformance-runner` skill's local testing workflow. Never run tests in-cluster during development.

## Test Priority Tiers

Test cases have been prioritized into 7 tiers, stored in CSV files within this skill's directory:

| File | Priority | Description |
|------|----------|-------------|
| `test-tiers/tier-1-essential.csv` | Highest | Core functionality that must work |
| `test-tiers/tier-2-important-http.csv` | High | Important HTTP routing features |
| `test-tiers/tier-3-production.csv` | Medium-High | Production-ready features |
| `test-tiers/tier-4-advanced.csv` | Medium | Advanced routing capabilities |
| `test-tiers/tier-5-observability.csv` | Medium-Low | Observability features |
| `test-tiers/tier-6-validation.csv` | Low | Validation and edge cases |
| `test-tiers/tier-7-not-relevant.csv` | Lowest | Tests not relevant to this implementation |

Each CSV file has the following columns:
- `test_name`: The name of the conformance test
- `description`: A brief description of what the test validates
- `implemented`: Status - `false`, `in-progress`, or `true`

## Development Loop Steps

### Step 1: Select the Next Test

Scan the tier CSV files in order of priority (tier-1 first, tier-7 last) to find the first test where `implemented` is `false`.

Once you've selected a test:
1. Update the CSV file to change `implemented` from `false` to `in-progress`
2. Record the test name and its description for reference

**Example:**
```csv
HTTPRouteSimpleSameNamespace,Basic HTTP routing...,in-progress
```

### Step 2: Verify Test is Currently Skipped

Before making any code changes, verify the current state:

1. Ensure the `GATEWAY_CONFORMANCE_SUITE` environment variable is set
2. Navigate to `$GATEWAY_CONFORMANCE_SUITE`
3. Use the `gateway-conformance-runner` skill to run the conformance suite locally
4. Verify:
- The selected test is currently **skipped** (not running)
- All other enabled tests are **passing**

If other tests are failing, stop and address those failures first before enabling a new test.

### Step 3: Enable the Test and Observe Failure

1. Enable the test by removing it from the skip list or adding it to the enabled tests in the conformance configuration
2. Run the conformance suite again using `gateway-conformance-runner`
3. Observe and capture the test failure output
4. Document the specific failure message and any relevant stack traces

### Step 4: Handle Test Results

**If the test passes immediately:**
- Update the CSV file to change `implemented` from `in-progress` to `true`
- Document this finding (the feature was already implemented)
- Return to Step 1 to select the next test

**If the test fails:**
- Proceed to Step 5 (Diagnosis)

### Step 5: Diagnose the Failure

#### 5a: Attempt to Create a Unit Test (Recommended)

Before diving into the implementation, try to recreate the conformance test as a purely functional unit test within this repository:

1. Study the conformance test implementation in `$GATEWAY_CONFORMANCE_SUITE/conformance`
2. Understand what scenario the test is validating
3. Create a unit test using this project's testing patterns:
- Use `snapshot` semantics for expected outputs
- Use `world state` semantics for modeling the reconciler
- Implement as a purely functional controller test

Having a local unit test provides:
- Faster iteration cycles
- Easier debugging
- Better test isolation
- Documentation of the expected behavior

If you cannot successfully create a unit test, proceed to the next step.

#### 5b: Investigate Root Cause

1. Analyze the failure message to identify the failing assertion
2. Trace through the code to understand the request flow:
- Control plane: How are resources being reconciled?
- Data plane: How are requests being routed?
3. Identify the specific code paths responsible for the failure
4. Document your findings

#### 5c: File a Bug Report

Create a Markdown file in `./bug-reports/` documenting:

```markdown
# Bug Report: [Test Name]

## Test Description
[What the conformance test is validating]

## Failure Message
[The exact error message from the conformance test]

## Root Cause Analysis
[Your findings about why the test is failing]

## Affected Code
- Control plane: [relevant files/functions]
- Data plane: [relevant files/functions]

## Proposed Fix
[Your plan to address the issue]
```

### Step 6: Implement the Fix

1. Make the necessary code changes to fix the identified issue
2. Keep changes minimal and focused on the specific test
3. Follow the project's coding conventions and patterns

### Step 7: Verify the Fix

1. If you created a unit test in Step 5a, run it first:
```bash
cargo nextest run [test_name]
```
2. Run the full conformance suite using `gateway-conformance-runner`
3. Verify:
- The previously failing test now **passes**
- No other tests have regressed

If verification fails, return to Step 5 to continue diagnosis.

### Step 8: Document and Report

Once the test passes:

1. Update the CSV file to change `implemented` from `in-progress` to `true`

2. Create a summary report with the following format:

```markdown
## Test Completed: [Test Name]

### Summary
[Brief description of what was implemented]

### Changes Made

**Before:**
[Code or behavior before the fix]

**After:**
[Code or behavior after the fix]

### Files Modified
- `path/to/file1.rs`: [description of changes]
- `path/to/file2.rs`: [description of changes]

### Unit Test Added
[Yes/No - if yes, describe the test]

### Lessons Learned
[Any insights that might help with future tests]
```

3. Return to Step 1 to continue with the next test

## Running Conformance Tests Locally

Always use the `gateway-conformance-runner` skill for running conformance tests. The local testing workflow provides:
- Faster iteration cycles
- Real-time output for debugging
- Direct access to test logs
- Ability to run individual tests

Key commands:
```bash
# Verify environment
echo $GATEWAY_CONFORMANCE_SUITE

# Run conformance tests locally
cd $GATEWAY_CONFORMANCE_SUITE && make conformance
```

## Best Practices

1. **One test at a time**: Focus on a single test per iteration
2. **Verify first**: Always confirm the test is skipped before enabling
3. **Minimal changes**: Make the smallest change needed to pass the test
4. **Document everything**: Keep thorough records in bug reports and summaries
5. **Unit tests preferred**: Local unit tests make debugging much faster
6. **No regressions**: Ensure all previously passing tests continue to pass

## Error Recovery

If you encounter issues:
- **Wrong kubectl context**: Stop immediately, switch to the correct context
- **Conformance suite not found**: Verify `GATEWAY_CONFORMANCE_SUITE` is set correctly
- **Multiple tests failing**: Address failing tests before enabling new ones
- **Stuck on a test**: Document findings, mark as `in-progress`, and consider moving to the next test with a note

## Files and Directories

- `./test-tiers/*.csv`: Test priority lists and implementation status
- `./bug-reports/`: Diagnostic reports for failing tests
- `$GATEWAY_CONFORMANCE_SUITE/conformance`: The official conformance test suite
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Bug Report: HTTPRouteSimpleSameNamespace

## Test Description
This conformance test validates basic HTTP routing from an HTTPRoute to a backend service in the same namespace. It creates:
- A Gateway named `same-namespace` in `gateway-conformance-infra` namespace
- An HTTPRoute named `gateway-conformance-infra-test` that routes to `infra-backend-v1:8080`
- Makes an HTTP request to the Gateway and expects a 200 response from the backend

## Issues Found

### Issue 1: Missing ResolvedRefs Condition on Gateway Listeners (FIXED)

**Failure Message:**
```
gateway gateway-conformance-infra/same-namespace doesn't have ResolvedRefs condition set to True on http listener
```

**Root Cause:**
Gateway listeners only had the `Accepted` condition set, but Gateway API requires listeners to also have `ResolvedRefs` and `Programmed` conditions.

**Fix Applied:**
Added `ResolvedRefs` and `Programmed` conditions to `GatewayStatusListeners` in `crates/controlplane/src/core/reconcile.rs:288-333`.

### Issue 2: Missing observedGeneration on HTTPRoute Conditions (FIXED)

**Failure Message:**
```
HTTPRoute expected observedGeneration to be updated to 1 for all conditions, only 0/2 were updated. stale conditions are: Accepted (generation 0), ResolvedRefs (generation 0)
```

**Root Cause:**
HTTPRoute parent status conditions had `observed_generation: None` instead of the route's generation.

**Fix Applied:**
Updated `build_parent_status` function in `crates/controlplane/src/core/reconcile.rs:672-713` to accept and use the route's generation.

### Issue 3: ConfigMap Server-Side Apply Conflict (FIXED)

**Failure Message:**
```
Failed to execute upsert error=Kubernetes API error: ApiError: Apply failed with 1 conflict: conflict with "unknown" using v1: .data.config.json
```

**Root Cause Analysis:**
The controller uses Server-Side Apply (SSA) to update ConfigMaps containing data plane configuration. There was a conflict with an "unknown" field manager on the `.data.config.json` field.

**Investigation Findings:**

1. **Original Code Issue:** The upsert functions in `executor.rs` used a "get-then-create" pattern:
- If resource exists: Update with SSA using field manager "multiway-controller"
- If resource doesn't exist: Create with `api.create()` using `PostParams::default()` (no field manager = "unknown")

This meant ConfigMaps created by our controller were tagged with field manager "unknown", causing conflicts on subsequent SSA updates.

2. **Initial Attempted Fix:** Changed all upsert functions to use pure SSA with `PatchParams::apply("multiway-controller").force()`:
- SSA handles both creation and updates idempotently
- The `.force()` flag should force the controller to take ownership of conflicting fields

3. **SSA with Force Still Failed:** Even with `.force()` enabled, the SSA conflict persisted. Investigation showed that k8s-openapi's `ConfigMap` type, when serialized via `Patch::Apply`, may not properly include TypeMeta (apiVersion/kind) fields required by SSA.

4. **Working Solution:** Implemented a create-or-replace pattern for ConfigMaps instead of SSA:
- Use `api.get()` to check if ConfigMap exists
- If exists: Use `api.replace()` with the existing `resource_version` to update
- If not exists (404): Use `api.create()` to create

This avoids SSA field manager conflicts entirely while still providing idempotent upsert semantics.

**Fix Applied:**
Updated `upsert_configmap` in `crates/controlplane/src/shell/executor.rs:216-264` to use the create-or-replace pattern.

### Issue 4: Data Plane Not Responding to HTTP Requests (IN PROGRESS)

**Failure Message:**
```
Request failed, not ready yet: Get "http://10.96.65.60/": context deadline exceeded
```

**Root Cause Analysis:**
After fixing the ConfigMap SSA conflict, the ConfigMaps are now being created successfully and the data plane pods are running. However, HTTP requests to the Gateway Service IP are timing out.

**Current Status:**
- ConfigMaps: ✅ Created successfully (`multiway-config-same-namespace`, etc.)
- Data plane pods: ✅ Running (`multiway-dp-same-namespace-*`)
- HTTP routing: ❌ Requests timeout

This is a separate issue from the SSA conflict and requires further investigation into:
1. Data plane configuration loading
2. Envoy/Pingora routing setup
3. Service endpoints and networking

## Test Status

The status condition and ConfigMap fixes are working:
- Gateway listener ResolvedRefs condition: ✅ PASSING
- HTTPRoute observedGeneration on conditions: ✅ PASSING
- ConfigMap creation/update (SSA conflict fix): ✅ FIXED

The HTTP request test fails due to data plane routing issues:
- Simple HTTP request should reach infra-backend: ❌ FAILING (timeout - data plane not routing traffic)

## Files Modified

1. `crates/controlplane/src/core/reconcile.rs`:
- Lines 288-334: Added ResolvedRefs and Programmed conditions to listener status
- Lines 639-646: Pass route generation to build_parent_status
- Lines 672-713: Updated build_parent_status to accept and use generation parameter
- Lines 385-405: Updated build_configmap to explicitly set all fields

2. `crates/controlplane/src/shell/executor.rs`:
- Lines 216-264: Changed `upsert_configmap` from SSA to create-or-replace pattern
- This fix avoids SSA field manager conflicts that persisted even with `.force()` enabled
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
test_name,description,implemented
HTTPRouteSimpleSameNamespace,Basic HTTP routing from a route to a backend service in the same namespace. Foundation of all routing.,in-progress
HTTPRouteMatching,Path and header matching for routing requests to different backends based on request criteria.,false
HTTPRouteExactPathMatching,Exact path matching where /foo matches only /foo and not /foo/bar.,false
HTTPRouteWeight,Traffic distribution across multiple backends based on specified weights for load balancing.,false
GatewayWithAttachedRoutes,Core Gateway-Route attachment model verifying routes attach correctly and status is tracked.,false
HTTPRouteListenerHostnameMatching,Multiple listeners with different hostnames routing to different backends.,false
HTTPRouteListenerPortMatching,HTTP listeners on different ports with port-based routing.,false
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
test_name,description,implemented
HTTPRouteHeaderMatching,Header-based routing rules enabling request filtering and routing based on HTTP headers.,false
HTTPRouteMethodMatching,HTTP method-based routing (GET/POST/PUT/DELETE etc) for REST API routing.,false
HTTPRouteQueryParamMatching,Query parameter-based routing for feature flags and conditional routing.,false
HTTPRouteRequestHeaderModifier,Adding/removing/replacing request headers before forwarding to backends.,false
HTTPRouteResponseHeaderModifier,Response header modification for security headers and CORS.,false
HTTPRouteRewritePath,Path rewriting and prefix stripping for backend URL compatibility.,false
HTTPRouteRewriteHost,Host header rewriting for multi-domain backend support.,false
HTTPRouteHostnameIntersection,Hostname matching with wildcard and specific hostname handling and precedence.,false
HTTPRouteMatchingAcrossRoutes,Routing rules across multiple HTTPRoutes on same gateway verifying rule precedence.,false
HTTPRoutePathMatchOrder,Correct matching order for path-based rules ensuring predictable routing behavior.,false
11 changes: 11 additions & 0 deletions .claude/skills/development-loop/test-tiers/tier-3-production.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
test_name,description,implemented
HTTPRouteHTTPSListener,HTTPS listener support with TLS certificate management and termination.,false
HTTPRouteRedirectScheme,HTTP to HTTPS redirect for standard security enforcement.,false
HTTPRouteRedirectPath,Path-based redirects for URL migration and restructuring.,false
HTTPRouteRedirectPort,Port-based redirects for traffic management.,false
HTTPRouteRedirectHostAndStatus,Host redirect with HTTP status code control (301/302/etc).,false
HTTPRouteRedirectPortAndScheme,Combined port and scheme redirects in a single rule.,false
HTTPRouteTimeoutRequest,Request timeout handling to prevent hung connections.,false
HTTPRouteTimeoutBackendRequest,Backend connection timeout to handle slow backends.,false
HTTPRouteCrossNamespace,Cross-namespace route attachment with ReferenceGrant for namespace isolation.,false
GatewayWithAttachedRoutesWithPort8080,Gateway with routes on non-standard ports (8080).,false
13 changes: 13 additions & 0 deletions .claude/skills/development-loop/test-tiers/tier-4-advanced.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
test_name,description,implemented
HTTPRouteRequestMirror,Request mirroring/shadowing to additional backends for canary testing.,false
HTTPRouteRequestMultipleMirrors,Multiple request mirrors to several backends simultaneously.,false
HTTPRouteRequestPercentageMirror,Percentage-based traffic mirroring for gradual testing rollout.,false
HTTPRouteBackendRequestHeaderModifier,Per-backend header modification separate from route-level modification.,false
HTTPRouteRequestHeaderModifierBackendWeights,Header modification combined with weighted backend distribution.,false
HTTPRouteCORSAllowCredentialsBehavior,CORS credential handling for web applications with authentication.,false
HTTPRouteNamedRule,Named HTTPRoute rules for better observability and metrics reference.,false
HTTPRouteServiceTypes,Support for various Kubernetes service types (headless/manual endpoint slices).,false
GatewayModifyListeners,Dynamic listener modification and status updates at runtime.,false
GatewayHTTPListenerIsolation,Listener isolation ensuring requests don't cross listener boundaries.,false
GatewayStaticAddresses,Gateway static IP address assignment and management.,false
GatewayOptionalAddressValue,Optional gateway address handling when address is not specified.,false
Loading