feat(instances): add pause and resume functionality#320
Conversation
Add PAUSED status to InstanceStatus enum and extend both ComputeBackend and InstanceService interfaces with PauseInstance and ResumeInstance methods.
Use ContainerPause and ContainerUnpause Docker API calls to freeze and unfreeze containers. Add ContainerUpdate for ResizeInstance support.
…uspend/DomainResume Add DomainSuspend and DomainResume to LibvirtClient interface and implement them in RealLibvirtClient using virDomainSuspend and virDomainResume libvirt calls.
… Firecracker and Noop backends Firecracker returns nil (not supported). Noop returns nil (success). These are placeholder implementations to satisfy the interface.
…eService Add RBAC authorization with PermissionInstanceUpdate, validate state transitions (RUNNING->PAUSED, PAUSED->RUNNING), update database record with new status, and emit audit log events for pause/resume actions.
Wire Pause and Resume HTTP handlers with RBAC authorization, swagger documentation, and corresponding handler tests.
Add POST routes for pause and resume actions to the instance group with RBAC authorization middleware.
…reaker Add Bulkhead type with semaphore-based concurrency limiting to prevent cascading failures. Enhance CircuitBreaker with NewCircuitBreakerWithOpts, StateChangeFunc callback, and improved half-open state handling.
… bulkhead Wrap ComputeBackend with circuit breaker and bulkhead patterns for resilience. Includes timeout-aware call protection and comprehensive test coverage for circuit tripping, bulkhead limits, and timeouts.
…mocks Update mock implementations to satisfy the updated ComputeBackend interface with pause/resume methods.
Update mock compute backends in worker tests to implement PauseInstance, ResumeInstance, and ResizeInstance to satisfy the updated ports.ComputeBackend interface.
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds PAUSED instance state and pause/resume lifecycle across stack: API routes and handlers, service logic with validation/audit/metrics, compute/backend interface extensions and implementations (Docker, Libvirt, Firecracker, noop), resilient/bulkhead plumbing, error definitions, and tests/docs updates. Changes
Sequence DiagramsequenceDiagram
participant Client as Client
participant API as API Router/Handler
participant Svc as Instance Service
participant Backend as Compute Backend
participant DB as Database
participant Audit as Audit Logger
Client->>API: POST /instances/:id/pause
API->>Svc: PauseInstance(ctx, id)
Svc->>Svc: RBAC + Load instance (repo)
Svc->>Backend: PauseInstance(ctx, targetID)
Backend-->>Svc: success / error
alt success
Svc->>DB: Update instance status -> PAUSED
DB-->>Svc: success
Svc->>Audit: Write audit (instance.pause)
Svc-->>API: success
API-->>Client: 200 "instance paused"
else backend conflict
Svc-->>API: error (Conflict)
API-->>Client: 409
else db update fails
Svc->>Backend: ResumeInstance(ctx, targetID) // best-effort rollback
Svc-->>API: error
API-->>Client: 500
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 45 minutes and 1 second.Comment |
There was a problem hiding this comment.
Pull request overview
Adds pause/resume lifecycle support for instances across API, services, and compute backends, and introduces a resilience wrapper (circuit breaker + bulkhead + timeouts) for compute operations.
Changes:
- Add
POST /instances/:id/pauseandPOST /instances/:id/resumeendpoints, routed with RBACPermissionInstanceUpdate. - Implement pause/resume in service layer + Docker/Libvirt backends and extend ports/status enums.
- Introduce
platform.ResilientCompute, plus new/updated circuit breaker and bulkhead primitives (with tests).
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/workers/pipeline_worker_test.go | Extends worker test mock backend with new interface methods. |
| internal/workers/healing_worker_test.go | Extends worker test service mock with pause/resume methods. |
| internal/workers/database_failover_worker_test.go | Extends worker test mock backend with pause/resume/resize methods. |
| internal/repositories/noop/adapters.go | Adds no-op pause/resume/resize to satisfy updated compute interface. |
| internal/repositories/libvirt/real_client.go | Adds context-aware DomainSuspend/DomainResume wrappers. |
| internal/repositories/libvirt/mock_client_test.go | Extends libvirt mock client with suspend/resume methods. |
| internal/repositories/libvirt/libvirt_client.go | Extends LibvirtClient interface with suspend/resume. |
| internal/repositories/libvirt/lb_proxy_test.go | Updates libvirt LB proxy test mock to satisfy interface. |
| internal/repositories/libvirt/adapter.go | Implements Pause/Resume/Resize for libvirt backend. |
| internal/repositories/k8s/mocks_test.go | Extends k8s test mock service with pause/resume. |
| internal/repositories/k8s/kubeadm_provisioner_test.go | Extends kubeadm test mock service with pause/resume. |
| internal/repositories/firecracker/adapter_noop.go | Adds unsupported-method stubs for pause/resume/resize. |
| internal/repositories/firecracker/adapter.go | Adds pause/resume/resize methods (currently returning nil). |
| internal/repositories/docker/fakes_test.go | Extends docker fake client for pause/unpause + container update. |
| internal/repositories/docker/adapter.go | Implements Docker pause/resume and container resource update (resize). |
| internal/platform/resilient_compute.go | New compute decorator adding bulkhead/CB/timeouts to backend calls. |
| internal/platform/resilient_compute_test.go | Tests resilient compute passthrough, CB, bulkhead, timeouts, unwrap. |
| internal/platform/circuit_breaker.go | Enhances circuit breaker (opts, state strings, half-open single-flight). |
| internal/platform/bulkhead.go | New bulkhead (semaphore) concurrency limiter with wait timeouts. |
| internal/handlers/instance_handler_test.go | Extends handler test mock service with pause/resume methods. |
| internal/handlers/instance_handler.go | Adds pause/resume HTTP handlers + swagger annotations. |
| internal/core/services/mock_compute_test.go | Extends service mocks to satisfy updated interfaces. |
| internal/core/services/instance.go | Adds PauseInstance/ResumeInstance with RBAC + state validation + audit/metrics. |
| internal/core/ports/instance.go | Extends InstanceService interface with pause/resume. |
| internal/core/ports/compute.go | Extends ComputeBackend with pause/resume and resize. |
| internal/core/domain/instance.go | Adds StatusPaused to instance status enum. |
| internal/api/setup/router.go | Wires pause/resume routes under /instances/:id. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Ping(ctx context.Context) error | ||
| // Type returns a string identifier of the backend (e.g., "docker", "kvm"). | ||
| Type() string | ||
| // ResizeInstance updates the CPU and memory limits of a running or stopped instance. |
There was a problem hiding this comment.
ResizeInstance's signature uses raw int64s but the interface comment doesn't define units. In this PR, ResilientCompute/Docker expect NanoCPUs + bytes while Libvirt converts cpu/1e9 to vCPU count. Please clarify the contract in the interface doc comment (e.g., cpu in NanoCPUs, memory in bytes) to avoid backend-specific interpretations.
| // ResizeInstance updates the CPU and memory limits of a running or stopped instance. | |
| // ResizeInstance updates the CPU and memory limits of a running or stopped instance. | |
| // cpu is specified in NanoCPUs (1 vCPU = 1e9 NanoCPUs), and memory is specified in bytes. |
| return nil // Firecracker does not support pause/resume | ||
| } | ||
|
|
||
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return nil // Firecracker does not support pause/resume |
There was a problem hiding this comment.
ResumeInstance returns nil even when Firecracker pause/resume isn't supported, which can lead to incorrect instance status transitions (PAUSED→RUNNING) without any backend change. Return a not-supported error when not in MockMode.
| return nil // Firecracker does not support pause/resume | |
| } | |
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | |
| if a.cfg.MockMode { | |
| return nil | |
| } | |
| return nil // Firecracker does not support pause/resume | |
| return fmt.Errorf("pause instance %s: firecracker does not support pause/resume", id) | |
| } | |
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | |
| if a.cfg.MockMode { | |
| return nil | |
| } | |
| return fmt.Errorf("resume instance %s: firecracker does not support pause/resume", id) |
| s.logger.Error("failed to pause container", "container_id", target, "error", err) | ||
| return errors.Wrap(errors.Internal, "failed to pause container", err) |
There was a problem hiding this comment.
These error/log messages hardcode "container" (and container_id) for pause/resume failures, but the backend can be libvirt where the target is a domain name. Consider using backend-neutral wording ("instance"/"resource") to keep logs accurate across backends.
| s.logger.Error("failed to pause container", "container_id", target, "error", err) | |
| return errors.Wrap(errors.Internal, "failed to pause container", err) | |
| s.logger.Error("failed to pause instance", "target", target, "error", err) | |
| return errors.Wrap(errors.Internal, "failed to pause instance", err) |
| // Modify memory (in KiB) and vCPU in the XML | ||
| newDOMXML := a.applyDomainResize(domXML, int(memory/1024), int(cpu/1e9)) | ||
|
|
There was a problem hiding this comment.
Libvirt vCPU count is derived as int(cpu/1e9). If callers provide fractional CPU (e.g., 0.5 core => 5e8 NanoCPUs), this becomes 0 and produces invalid domain XML. Validate inputs (cpu/memory > 0) and define a rounding strategy (e.g., ceil to at least 1 vCPU) or reject unsupported fractional values with a clear error.
| // applyDomainResize updates vCPU and memory in domain XML. | ||
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | ||
| // Replace memory allocation | ||
| xml = regexp.MustCompile(`<memory unit="KiB">\d+</memory>`). | ||
| ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | ||
| xml = regexp.MustCompile(`<currentMemory unit="KiB">\d+</currentMemory>`). | ||
| ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | ||
| // Replace vCPU count | ||
| xml = regexp.MustCompile(`<vcpu[^>]*>\d+</vcpu>`). | ||
| ReplaceAllString(xml, fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus)) |
There was a problem hiding this comment.
applyDomainResize recompiles regexps on every call via regexp.MustCompile. Since ResizeInstance can be called frequently, consider precompiling these regexps once (package-level vars) to avoid repeated compilation overhead.
| // applyDomainResize updates vCPU and memory in domain XML. | |
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | |
| // Replace memory allocation | |
| xml = regexp.MustCompile(`<memory unit="KiB">\d+</memory>`). | |
| ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | |
| xml = regexp.MustCompile(`<currentMemory unit="KiB">\d+</currentMemory>`). | |
| ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | |
| // Replace vCPU count | |
| xml = regexp.MustCompile(`<vcpu[^>]*>\d+</vcpu>`). | |
| ReplaceAllString(xml, fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus)) | |
| var ( | |
| domainMemoryXMLRegexp = regexp.MustCompile(`<memory unit="KiB">\d+</memory>`) | |
| domainCurrentMemoryXMLRegexp = regexp.MustCompile(`<currentMemory unit="KiB">\d+</currentMemory>`) | |
| domainVCPUXMLRegexp = regexp.MustCompile(`<vcpu[^>]*>\d+</vcpu>`) | |
| ) | |
| // applyDomainResize updates vCPU and memory in domain XML. | |
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | |
| // Replace memory allocation | |
| xml = domainMemoryXMLRegexp.ReplaceAllString( | |
| xml, | |
| fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB), | |
| ) | |
| xml = domainCurrentMemoryXMLRegexp.ReplaceAllString( | |
| xml, | |
| fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB), | |
| ) | |
| // Replace vCPU count | |
| xml = domainVCPUXMLRegexp.ReplaceAllString( | |
| xml, | |
| fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus), | |
| ) |
| func (a *DockerAdapter) PauseInstance(ctx context.Context, name string) error { | ||
| if err := a.cli.ContainerPause(ctx, name); err != nil { | ||
| return fmt.Errorf("failed to pause container %s: %w", name, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func (a *DockerAdapter) ResumeInstance(ctx context.Context, name string) error { | ||
| if err := a.cli.ContainerUnpause(ctx, name); err != nil { | ||
| return fmt.Errorf("failed to resume container %s: %w", name, err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // ResizeInstance updates the CPU and memory limits of a container. | ||
| func (a *DockerAdapter) ResizeInstance(ctx context.Context, id string, cpuNanoCPUs, memoryBytes int64) error { | ||
| resp, err := a.cli.ContainerUpdate(ctx, id, container.UpdateConfig{ | ||
| Resources: container.Resources{ | ||
| NanoCPUs: cpuNanoCPUs, | ||
| Memory: memoryBytes, | ||
| MemorySwap: memoryBytes, // Must be >= Memory; setting equal disables swap while allowing memory update | ||
| }, | ||
| }) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to update container %s: %w", id, err) | ||
| } | ||
| if resp.Warnings != nil { | ||
| a.logger.Warn("container update warnings", "container_id", id, "warnings", resp.Warnings) | ||
| } | ||
| return nil |
There was a problem hiding this comment.
DockerAdapter gained PauseInstance/ResumeInstance/ResizeInstance but docker/adapter_unit_test.go has no tests for these new methods. Add unit tests that verify the adapter forwards to ContainerPause/ContainerUnpause/ContainerUpdate and wraps errors consistently (similar to existing Start/Stop tests).
| return nil // Firecracker does not support pause/resume | ||
| } | ||
|
|
||
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return nil // Firecracker does not support pause/resume |
There was a problem hiding this comment.
PauseInstance returns nil even when Firecracker pause/resume isn't supported. This will make callers think the instance was paused successfully and may update persisted status/metrics incorrectly. Return a clear error (e.g., NotImplemented/InvalidOperation) when not in MockMode, consistent with adapter_noop.go and other unimplemented Firecracker operations.
| return nil // Firecracker does not support pause/resume | |
| } | |
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | |
| if a.cfg.MockMode { | |
| return nil | |
| } | |
| return nil // Firecracker does not support pause/resume | |
| return fmt.Errorf("pause instance is not supported by Firecracker") | |
| } | |
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | |
| if a.cfg.MockMode { | |
| return nil | |
| } | |
| return fmt.Errorf("resume instance is not supported by Firecracker") |
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return nil // Firecracker does not support resize |
There was a problem hiding this comment.
ResizeInstance returns nil even though the comment says Firecracker doesn't support resize. This silently reports success to callers and can desync desired vs actual resources. Return an explicit not-supported error when not in MockMode.
| return nil // Firecracker does not support resize | |
| return fmt.Errorf("resize is not supported by Firecracker") |
| inst.Status = domain.StatusPaused | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.pause", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.pause", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("pause", "success").Inc() | ||
| s.logger.Info("instance paused", "instance_id", inst.ID) | ||
| return nil |
There was a problem hiding this comment.
PauseInstance changes the instance status to PAUSED but does not update InstancesTotal gauges (e.g., decrement running / increment paused). StartInstance/StopInstance maintain these metrics, so pause/resume will currently leave instance counts incorrect in Prometheus.
| inst.Status = domain.StatusRunning | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.resume", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.resume", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("resume", "success").Inc() | ||
| s.logger.Info("instance resumed", "instance_id", inst.ID) | ||
| return nil |
There was a problem hiding this comment.
ResumeInstance sets status back to RUNNING but does not update InstancesTotal gauges for a PAUSED→RUNNING transition. If you introduce the PAUSED status label, make sure the gauges are kept consistent here as well.
…ause/resume tests - Move regex compilation in applyDomainResize to struct fields to avoid recompilation on every call (Must Fix from PR #320 review) - Add ErrNotSupported sentinel error in Firecracker adapter and return it from PauseInstance/ResumeInstance/ResizeInstance stubs (Consider item) - Add unit tests for pause/resume state transitions in instance service covering: success paths, wrong state errors, and compute backend errors
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 28 out of 28 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Type returns a string identifier of the backend (e.g., "docker", "kvm"). | ||
| Type() string | ||
| // ResizeInstance updates the CPU and memory limits of a running or stopped instance. | ||
| ResizeInstance(ctx context.Context, id string, cpu, memory int64) error |
There was a problem hiding this comment.
ResizeInstance's cpu/memory parameters are undocumented in the interface, but implementations treat them as nano-CPUs and bytes (e.g., Docker uses NanoCPUs/Memory; libvirt divides cpu by 1e9 and memory by 1024). Please document the required units in the ComputeBackend interface comment (and consider renaming params to cpuNano/memoryBytes for clarity) to prevent callers from passing vCPU counts or MB and getting invalid 0 vCPU / incorrect memory updates.
| ResizeInstance(ctx context.Context, id string, cpu, memory int64) error | |
| // cpuNano must be specified in nano-CPUs (for example, 1 vCPU = 1_000_000_000). | |
| // memoryBytes must be specified in bytes (for example, 512 MiB = 536_870_912). | |
| // Callers must not pass vCPU counts or MB/MiB values directly. | |
| ResizeInstance(ctx context.Context, id string, cpuNano, memoryBytes int64) error |
| inst.Status = domain.StatusPaused | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.pause", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.pause", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("pause", "success").Inc() | ||
| s.logger.Info("instance paused", "instance_id", inst.ID) | ||
| return nil | ||
| } | ||
|
|
||
| // ResumeInstance resumes a paused instance back to running state. | ||
| func (s *InstanceService) ResumeInstance(ctx context.Context, idOrName string) error { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| tenantID := appcontext.TenantIDFromContext(ctx) | ||
|
|
||
| if err := s.rbacSvc.Authorize(ctx, userID, tenantID, domain.PermissionInstanceUpdate, idOrName); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| inst, err := s.GetInstance(ctx, idOrName) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if inst.Status != domain.StatusPaused { | ||
| return errors.New(errors.Conflict, "instance must be PAUSED to resume, got: "+string(inst.Status)) | ||
| } | ||
|
|
||
| target := inst.ContainerID | ||
| if target == "" { | ||
| target = s.formatContainerName(inst.ID) | ||
| } | ||
|
|
||
| if err := s.compute.ResumeInstance(ctx, target); err != nil { | ||
| platform.InstanceOperationsTotal.WithLabelValues("resume", "failure").Inc() | ||
| s.logger.Error("failed to resume container", "container_id", target, "error", err) | ||
| return errors.Wrap(errors.Internal, "failed to resume container", err) | ||
| } | ||
|
|
||
| inst.Status = domain.StatusRunning | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.resume", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.resume", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("resume", "success").Inc() | ||
| s.logger.Info("instance resumed", "instance_id", inst.ID) |
There was a problem hiding this comment.
PauseInstance/ResumeInstance update the instance status in the DB but don't update the InstancesTotal gauge (unlike StartInstance/StopInstance). This will leave Prometheus instance counts inconsistent once instances move into/out of PAUSED. Consider decrementing running/incrementing paused on pause, and decrementing paused/incrementing running on resume (using the same backend label).
| if err != nil { | ||
| return fmt.Errorf("failed to update container %s: %w", id, err) | ||
| } | ||
| if resp.Warnings != nil { |
There was a problem hiding this comment.
In ResizeInstance, resp.Warnings is a slice; it can be non-nil but empty. Logging a warning when len(resp.Warnings)==0 can create noisy logs. Consider checking len(resp.Warnings) > 0 instead of resp.Warnings != nil.
| if resp.Warnings != nil { | |
| if len(resp.Warnings) > 0 { |
| func (r *ResilientCompute) GetInstanceLogs(ctx context.Context, id string) (io.ReadCloser, error) { | ||
| var rc io.ReadCloser | ||
| err := r.callProtected(ctx, r.opts.CallTimeout, func(ctx context.Context) error { | ||
| var e error | ||
| rc, e = r.inner.GetInstanceLogs(ctx, id) | ||
| return e | ||
| }) | ||
| return rc, err | ||
| } | ||
|
|
||
| func (r *ResilientCompute) GetInstanceStats(ctx context.Context, id string) (io.ReadCloser, error) { | ||
| var rc io.ReadCloser | ||
| err := r.callProtected(ctx, r.opts.CallTimeout, func(ctx context.Context) error { | ||
| var e error | ||
| rc, e = r.inner.GetInstanceStats(ctx, id) | ||
| return e | ||
| }) | ||
| return rc, err | ||
| } |
There was a problem hiding this comment.
ResilientCompute's timeout context is canceled before the caller can consume the returned stream. For GetInstanceLogs/GetInstanceStats, callProtected creates ctx2 and defers cancel(), so ctx2 is canceled as soon as the backend returns the ReadCloser, which can prematurely terminate Docker/libvirt log/stats streams. Consider either (1) not using callProtected for streaming methods, or (2) returning a wrapped ReadCloser whose Close() triggers cancel, so the context stays alive while the stream is read.
| // callProtected runs fn through bulkhead → circuit breaker → timeout. | ||
| func (r *ResilientCompute) callProtected(ctx context.Context, timeout time.Duration, fn func(ctx context.Context) error) error { | ||
| return r.bulkhead.Execute(ctx, func() error { | ||
| return r.cb.Execute(func() error { | ||
| ctx2, cancel := context.WithTimeout(ctx, timeout) | ||
| defer cancel() |
There was a problem hiding this comment.
callProtected applies the per-call timeout only inside the circuit-breaker closure, after a bulkhead slot is acquired. This means total request time can exceed CallTimeout by up to BulkheadWait (or indefinitely if BulkheadWait=0 and ctx has no deadline). If the intent is a true end-to-end per-call timeout, create the timeout context before bulkhead acquisition and use that context throughout (bulkhead + circuit breaker + backend call).
| // callProtected runs fn through bulkhead → circuit breaker → timeout. | |
| func (r *ResilientCompute) callProtected(ctx context.Context, timeout time.Duration, fn func(ctx context.Context) error) error { | |
| return r.bulkhead.Execute(ctx, func() error { | |
| return r.cb.Execute(func() error { | |
| ctx2, cancel := context.WithTimeout(ctx, timeout) | |
| defer cancel() | |
| // callProtected runs fn through timeout → bulkhead → circuit breaker so the | |
| // per-call timeout applies to the full operation, including bulkhead wait time. | |
| func (r *ResilientCompute) callProtected(ctx context.Context, timeout time.Duration, fn func(ctx context.Context) error) error { | |
| ctx2, cancel := context.WithTimeout(ctx, timeout) | |
| defer cancel() | |
| return r.bulkhead.Execute(ctx2, func() error { | |
| return r.cb.Execute(func() error { |
| func (b *Bulkhead) acquire(ctx context.Context) error { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ErrBulkheadFull | ||
| default: | ||
| } | ||
|
|
||
| if b.timeout > 0 { | ||
| timer := time.NewTimer(b.timeout) | ||
| defer timer.Stop() | ||
| select { | ||
| case b.sem <- struct{}{}: | ||
| return nil | ||
| case <-timer.C: | ||
| return ErrBulkheadFull | ||
| case <-ctx.Done(): | ||
| return ErrBulkheadFull | ||
| } | ||
| } | ||
| // No explicit timeout — rely on context. | ||
| select { | ||
| case b.sem <- struct{}{}: | ||
| return nil | ||
| case <-ctx.Done(): | ||
| return ErrBulkheadFull | ||
| } |
There was a problem hiding this comment.
Bulkhead.acquire returns ErrBulkheadFull for all ctx.Done() cases (including caller cancellation/deadline exceeded). This loses the original cancellation reason and can misclassify failures as bulkhead saturation. Consider returning ctx.Err() when the context is already done, and reserve ErrBulkheadFull for actual semaphore wait timeouts (timer.C) or when ctx expires specifically while waiting for a slot (wrapping ctx.Err() if you still want a sentinel).
| reMemory: regexp.MustCompile(`<memory unit="KiB">\d+</memory>`), | ||
| reCurrentMemory: regexp.MustCompile(`<currentMemory unit="KiB">\d+</currentMemory>`), |
There was a problem hiding this comment.
The precompiled memory/currentMemory regexes only match double-quoted attributes (unit="KiB"), but the domain XML generated in templates.go uses single quotes (unit='KiB') and does not include a tag. As a result, ResizeInstance will not update memory at all for domains created by this adapter. Update the regexes to handle the actual template output (both quote styles) or, preferably, parse/modify the XML structurally instead of regex substitution.
| reMemory: regexp.MustCompile(`<memory unit="KiB">\d+</memory>`), | |
| reCurrentMemory: regexp.MustCompile(`<currentMemory unit="KiB">\d+</currentMemory>`), | |
| reMemory: regexp.MustCompile(`<memory unit=['"]KiB['"]>\d+</memory>`), | |
| reCurrentMemory: regexp.MustCompile(`<currentMemory unit=['"]KiB['"]>\d+</currentMemory>`), |
| // applyDomainResize updates vCPU and memory in domain XML. | ||
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | ||
| // Replace memory allocation | ||
| xml = a.reMemory.ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | ||
| xml = a.reCurrentMemory.ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | ||
| // Replace vCPU count | ||
| xml = a.reVCPU.ReplaceAllString(xml, fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus)) |
There was a problem hiding this comment.
applyDomainResize replaces the entire <vcpu ...>... element with a bare %d, which drops existing attributes like placement='static' from the template and may change libvirt behavior. Consider preserving attributes via capturing groups (e.g., keep the opening tag) or using an XML parser to update only the numeric text content.
| // applyDomainResize updates vCPU and memory in domain XML. | |
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | |
| // Replace memory allocation | |
| xml = a.reMemory.ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | |
| xml = a.reCurrentMemory.ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | |
| // Replace vCPU count | |
| xml = a.reVCPU.ReplaceAllString(xml, fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus)) | |
| // replaceVCPUCountPreservingAttributes updates the vCPU count while preserving any existing | |
| // attributes on the <vcpu> element, such as placement="static". | |
| func (a *LibvirtAdapter) replaceVCPUCountPreservingAttributes(xml string, vcpus int) string { | |
| reVCPUContent := regexp.MustCompile(`(?s)^(<vcpu\b[^>]*>)[^<]*(</vcpu>)$`) | |
| return a.reVCPU.ReplaceAllStringFunc(xml, func(match string) string { | |
| parts := reVCPUContent.FindStringSubmatch(match) | |
| if len(parts) != 3 { | |
| return match | |
| } | |
| return fmt.Sprintf(`%s%d%s`, parts[1], vcpus, parts[2]) | |
| }) | |
| } | |
| // applyDomainResize updates vCPU and memory in domain XML. | |
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | |
| // Replace memory allocation | |
| xml = a.reMemory.ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | |
| xml = a.reCurrentMemory.ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | |
| // Replace vCPU count while preserving existing <vcpu> attributes. | |
| xml = a.replaceVCPUCountPreservingAttributes(xml, vcpus) |
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (4)
internal/platform/circuit_breaker.go (1)
84-92: Replace hard-coded default values with named constantsLines 85, 88, and 91 hard-code breaker defaults. Promote them to constants to keep defaults centralized and avoid scattered literals.
Proposed refactor
+const ( + defaultFailureThreshold = 5 + defaultResetTimeout = 30 * time.Second + defaultSuccessRequired = 1 +) + func NewCircuitBreakerWithOpts(opts CircuitBreakerOpts) *CircuitBreaker { if opts.Threshold <= 0 { - opts.Threshold = 5 + opts.Threshold = defaultFailureThreshold } if opts.ResetTimeout <= 0 { - opts.ResetTimeout = 30 * time.Second + opts.ResetTimeout = defaultResetTimeout } if opts.SuccessRequired <= 0 { - opts.SuccessRequired = 1 + opts.SuccessRequired = defaultSuccessRequired }As per coding guidelines, "**/*.go`: Do not use magic numbers - use named constants instead".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/platform/circuit_breaker.go` around lines 84 - 92, Replace the hard-coded magic numbers in the circuit breaker defaults with named constants: declare DefaultThreshold (int = 5), DefaultResetTimeout (time.Duration = 30 * time.Second) and DefaultSuccessRequired (int = 1) near the top of the circuit breaker implementation, then use these constants in the initialization block that currently checks opts.Threshold, opts.ResetTimeout and opts.SuccessRequired (replace the literals 5, 30 * time.Second and 1 with the corresponding constants) so defaults are centralized and self-documented.internal/repositories/docker/fakes_test.go (1)
98-104: Track pause/unpause calls in the fake for better test assertions
ContainerPauseandContainerUnpausecurrently return success without recording calls, unlike other fake methods. Incrementing counters here improves verification of pause/resume flows.Proposed refactor
func (f *fakeDockerClient) ContainerPause(ctx context.Context, containerID string) error { + f.inc("ContainerPause") return nil } func (f *fakeDockerClient) ContainerUnpause(ctx context.Context, containerID string) error { + f.inc("ContainerUnpause") return nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/repositories/docker/fakes_test.go` around lines 98 - 104, The fake Docker client methods ContainerPause and ContainerUnpause don't record calls; update the fakeDockerClient to track these invocations (e.g., add pauseCount/unpauseCount or record entries in an operations slice and protect with the same mutex used by other fake methods) and increment the appropriate counter/append an entry inside ContainerPause and ContainerUnpause so tests can assert pause/unpause behavior; ensure you follow the existing fake's concurrency pattern and naming conventions used by other methods in fakeDockerClient.internal/core/services/instance_unit_test.go (1)
981-1096: Consider table-driving the pause/resume cases.The new suite repeats the same setup and assertion pattern several times. A table would cut the duplication and make it easier to add more pause/resume permutations later.
As per coding guidelines, use table-driven tests in test files.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/instance_unit_test.go` around lines 981 - 1096, The tests for PauseInstance_* and ResumeInstance_* duplicate setup/expectation logic; convert them into a single table-driven loop that defines cases (name, method: "PauseInstance" or "ResumeInstance", initial Status, compute error or nil, expected final Status or error substring, whether Update and Audit should be asserted). In the loop call svc.PauseInstance or svc.ResumeInstance based on the case, set up common mocks once per case (repo.GetByName Maybe, repo.GetByID -> inst, rbacSvc.Authorize for PermissionInstanceUpdate and PermissionInstanceRead), conditionally set compute.On("PauseInstance"/"ResumeInstance", mock.Anything, "cid-1").Return(...) and compute.On("Type").Return("docker").Maybe(), and conditionally assert repo.On("Update", ...) and auditSvc.On("Log", ...).Return(...).Use mock.MatchedBy for status checks and assert error/no-error and inst.Status per case; this collapses the repeated logic while keeping behavior for PauseInstance and ResumeInstance.internal/platform/resilient_compute.go (1)
33-52: Extract the default limits into named constants.The constructor hard-codes several fallback durations and thresholds, which makes the defaults harder to audit. As per coding guidelines, Do not use magic numbers - use named constants instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/platform/resilient_compute.go` around lines 33 - 52, Extract the hard-coded defaults in ResilientComputeOpts.withDefaults into named package-level constants (e.g. defaultCallTimeout, defaultLongCallTimeout, defaultCBThreshold, defaultCBResetTimeout, defaultBulkheadMaxConc, defaultBulkheadWait) declared near the top of the file; then replace the literal values used when setting o.CallTimeout, o.LongCallTimeout, o.CBThreshold, o.CBResetTimeout, o.BulkheadMaxConc and o.BulkheadWait with those constants to remove magic numbers and make the defaults easy to audit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/core/services/instance_unit_test.go`:
- Line 303: The test expectations that call repo.On("GetByName", mock.Anything,
instanceID.String()).Return(...).Maybe() should be required instead of optional;
locate the GetByName expectations in instance_unit_test.go (the repo.On calls
that use "GetByName" with instanceID.String()) and remove the .Maybe() so the
mock requires that GetByName is invoked; apply the same change for each
occurrence noted (lines corresponding to the other GetByName expectations) to
ensure the name-first resolution path exercised by InstanceService is enforced
by the tests.
In `@internal/core/services/instance.go`:
- Around line 644-653: When s.compute.PauseInstance (and symmetrically
s.compute.ResumeInstance) succeeds but s.repo.Update fails, add a best-effort
compensating rollback before returning to avoid state drift: after setting
inst.Status and detecting s.repo.Update error inside the methods that call
s.compute.PauseInstance/ResumeInstance (e.g., the current Pause flow and the
Resume flow), call the opposite compute operation (call s.compute.ResumeInstance
on Pause failure path, call s.compute.PauseInstance on Resume failure path) and
log both the repo update error and any rollback error (using s.logger and
platform.InstanceOperationsTotal metrics); ensure the original repo update error
is still returned but include rollback outcome in logs so the system records
both attempts.
In `@internal/platform/bulkhead.go`:
- Around line 22-38: The Bulkhead default WaitTimeout from BulkheadOpts isn't
applied in NewBulkhead, so callers get a zero timeout and acquire may block
until context cancel; modify NewBulkhead to set opts.WaitTimeout to
5*time.Second when it's zero (e.g., if opts.WaitTimeout == 0 { opts.WaitTimeout
= 5 * time.Second }) and assign that value to the Bulkhead.timeout field so
functions like acquire honor the documented 5s default.
In `@internal/platform/circuit_breaker.go`:
- Around line 195-206: The recordSuccess logic currently treats any non-HalfOpen
state the same and resets failureCount/state, which lets a late success close an
already-OPEN breaker; modify the switch in recordSuccess so only an actual
StateClosed path clears failureCount and sets state (i.e., add an explicit case
StateClosed that resets failureCount/state) and do nothing when cb.state ==
StateOpen so stale successes cannot transition an open breaker to closed; keep
the HalfOpen handling (successCount, successRequired and
transitionLocked(StateClosed)) as-is and reference cb.successCount,
cb.failureCount, cb.successRequired, cb.transitionLocked, and the state
constants StateHalfOpen/StateClosed/StateOpen.
In `@internal/platform/resilient_compute_test.go`:
- Around line 128-330: Multiple tests in resilient_compute_test.go repeat the
same setup and assertions; refactor them into table-driven subtests: consolidate
scenarios like TestResilientComputePassthrough,
TestResilientComputeCircuitTrips, TestResilientComputeBulkheadLimits,
TestResilientComputeTimeout, TestResilientComputePingBypassesBulkhead and the
ResizeInstance subtests into one or more table-driven tests where each case
defines a name, mock (mockCompute with delay/err), ResilientComputeOpts, the
invoked method(s) (e.g., StartInstance, Ping, ResizeInstance,
LaunchInstanceWithOptions, Exec), and expected outcomes (error type like
ErrCircuitOpen/ErrBulkheadFull or expected return values/call counts); iterate
cases using t.Run and reuse NewResilientCompute and common ctx/setup code,
asserting results per-case so duplication is removed while keeping checks for
callCount, timing expectations, and bypass behavior.
- Around line 19-125: Replace the hand-rolled mockCompute with a testify-based
mock by defining a MockCompute struct that embeds testify/mock.Mock and
implementing each interface method to call m.Called(...) and return the expected
typed results (implement LaunchInstanceWithOptions, StartInstance, StopInstance,
DeleteInstance, GetInstanceLogs, GetInstanceStats, GetInstancePort,
GetInstanceIP, GetConsoleURL, Exec, RunTask, WaitTask, CreateNetwork,
DeleteNetwork, AttachVolume, DetachVolume, Ping, ResizeInstance, PauseInstance,
ResumeInstance, and Type). Remove atomic.Int64 callCount and delay fields from
the mock; if tests need call counting or simulated delay use testify's
mock.AssertNumberOfCalls and the Run(func(args mock.Arguments){...}) hook on
specific method expectations to increment counters or sleep. Ensure each method
converts mock.Called return values to the correct concrete types (string,
[]string, int, io.ReadCloser, error, etc.) to match the original signatures.
In `@internal/platform/resilient_compute.go`:
- Around line 152-156: The ResizeInstance method currently uses r.callProtected
with r.opts.CallTimeout which is too short for destructive/long-running
operations; change the call to use r.opts.LongCallTimeout (or a dedicated resize
timeout field) so ResilientCompute.ResizeInstance calls r.callProtected(ctx,
r.opts.LongCallTimeout, ...) instead of r.opts.CallTimeout to avoid backend
cancellations during VM resize transitions.
In `@internal/repositories/firecracker/adapter.go`:
- Around line 154-173: The MockMode branches in PauseInstance, ResumeInstance
and ResizeInstance currently return nil and silently report success; change them
to return ErrNotSupported (or a specific ErrMockModeNotSupported if you prefer)
when a.cfg.MockMode is true so callers get a non-nil error for unsupported
operations; update the functions PauseInstance, ResumeInstance and
ResizeInstance to return the error rather than nil and ensure ErrNotSupported is
the error used so behavior is consistent with non-mock paths.
- Around line 27-29: Remove the package-level ErrNotSupported sentinel from
adapter.go and stop exporting a global error; instead return a locally created
error where needed (e.g., use fmt.Errorf or errors.New inside the
function/method that currently returns ErrNotSupported) or define a
function-scoped or method-scoped typed error if callers need type assertions;
update all references to ErrNotSupported to use the new local error creation
(search for ErrNotSupported usages) so no package-level global var remains
alongside idRegex.
In `@internal/repositories/libvirt/adapter.go`:
- Around line 221-243: The code currently undefines the live domain
(DomainUndefine) before ensuring the new XML can be successfully defined,
risking the VM being left offline; instead, validate and create the replacement
definition first by calling DomainDefineXML with a temporary/unique name/UUID
(generate a temp name or modify the <name> or <uuid> in newDOMXML returned from
applyDomainResize) to ensure the XML parses and a definition can be created,
then if that succeeds, undefine the original domain (DomainUndefine on dom) and
redefine the final newDOMXML (or rename the temp definition to the original
name/uuid as needed), and only then start the domain (DomainCreate) if
wasRunning; reference functions/methods: DomainGetXMLDesc, applyDomainResize,
DomainDefineXML, DomainUndefine, DomainCreate.
- Around line 249-257: The applyDomainResize function currently mutates domain
XML with regex (see applyDomainResize and the regexes reMemory, reCurrentMemory,
reVCPU), losing vCPU attributes (placement/cpuset) and mishandling memory units;
replace the regex approach by parsing the XML into an element tree (e.g., using
Go's encoding/xml or an XML DOM library), locate the <vcpu> element and update
only its inner text while preserving all attributes, locate <memory> and
<currentMemory> elements, detect/convert their existing unit attribute to the
desired KiB value (or set the unit attribute to "KiB" after converting) instead
of hardcoding unit="KiB", and then marshal the modified XML back to string;
update applyDomainResize to use these parsed-element updates and remove reliance
on reMemory/reCurrentMemory/reVCPU regex replacements.
In `@internal/workers/pipeline_worker_test.go`:
- Around line 161-169: The three mock methods on mockComputeBackendExtended
(PauseInstance, ResumeInstance, ResizeInstance) currently return nil and
silently succeed; update each to record the call via m.Called(...) (e.g., for
ResizeInstance pass ctx, id, cpu, memory) and return the mocked error/result
from the call (use args.Error(0) / args.Get(0) as appropriate) so tests can
assert invocations and errors instead of masking unexpected calls.
---
Nitpick comments:
In `@internal/core/services/instance_unit_test.go`:
- Around line 981-1096: The tests for PauseInstance_* and ResumeInstance_*
duplicate setup/expectation logic; convert them into a single table-driven loop
that defines cases (name, method: "PauseInstance" or "ResumeInstance", initial
Status, compute error or nil, expected final Status or error substring, whether
Update and Audit should be asserted). In the loop call svc.PauseInstance or
svc.ResumeInstance based on the case, set up common mocks once per case
(repo.GetByName Maybe, repo.GetByID -> inst, rbacSvc.Authorize for
PermissionInstanceUpdate and PermissionInstanceRead), conditionally set
compute.On("PauseInstance"/"ResumeInstance", mock.Anything, "cid-1").Return(...)
and compute.On("Type").Return("docker").Maybe(), and conditionally assert
repo.On("Update", ...) and auditSvc.On("Log", ...).Return(...).Use
mock.MatchedBy for status checks and assert error/no-error and inst.Status per
case; this collapses the repeated logic while keeping behavior for PauseInstance
and ResumeInstance.
In `@internal/platform/circuit_breaker.go`:
- Around line 84-92: Replace the hard-coded magic numbers in the circuit breaker
defaults with named constants: declare DefaultThreshold (int = 5),
DefaultResetTimeout (time.Duration = 30 * time.Second) and
DefaultSuccessRequired (int = 1) near the top of the circuit breaker
implementation, then use these constants in the initialization block that
currently checks opts.Threshold, opts.ResetTimeout and opts.SuccessRequired
(replace the literals 5, 30 * time.Second and 1 with the corresponding
constants) so defaults are centralized and self-documented.
In `@internal/platform/resilient_compute.go`:
- Around line 33-52: Extract the hard-coded defaults in
ResilientComputeOpts.withDefaults into named package-level constants (e.g.
defaultCallTimeout, defaultLongCallTimeout, defaultCBThreshold,
defaultCBResetTimeout, defaultBulkheadMaxConc, defaultBulkheadWait) declared
near the top of the file; then replace the literal values used when setting
o.CallTimeout, o.LongCallTimeout, o.CBThreshold, o.CBResetTimeout,
o.BulkheadMaxConc and o.BulkheadWait with those constants to remove magic
numbers and make the defaults easy to audit.
In `@internal/repositories/docker/fakes_test.go`:
- Around line 98-104: The fake Docker client methods ContainerPause and
ContainerUnpause don't record calls; update the fakeDockerClient to track these
invocations (e.g., add pauseCount/unpauseCount or record entries in an
operations slice and protect with the same mutex used by other fake methods) and
increment the appropriate counter/append an entry inside ContainerPause and
ContainerUnpause so tests can assert pause/unpause behavior; ensure you follow
the existing fake's concurrency pattern and naming conventions used by other
methods in fakeDockerClient.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b1fd7891-edeb-490e-ac8b-384c0d8e1d6d
📒 Files selected for processing (28)
internal/api/setup/router.gointernal/core/domain/instance.gointernal/core/ports/compute.gointernal/core/ports/instance.gointernal/core/services/instance.gointernal/core/services/instance_unit_test.gointernal/core/services/mock_compute_test.gointernal/handlers/instance_handler.gointernal/handlers/instance_handler_test.gointernal/platform/bulkhead.gointernal/platform/circuit_breaker.gointernal/platform/resilient_compute.gointernal/platform/resilient_compute_test.gointernal/repositories/docker/adapter.gointernal/repositories/docker/fakes_test.gointernal/repositories/firecracker/adapter.gointernal/repositories/firecracker/adapter_noop.gointernal/repositories/k8s/kubeadm_provisioner_test.gointernal/repositories/k8s/mocks_test.gointernal/repositories/libvirt/adapter.gointernal/repositories/libvirt/lb_proxy_test.gointernal/repositories/libvirt/libvirt_client.gointernal/repositories/libvirt/mock_client_test.gointernal/repositories/libvirt/real_client.gointernal/repositories/noop/adapters.gointernal/workers/database_failover_worker_test.gointernal/workers/healing_worker_test.gointernal/workers/pipeline_worker_test.go
| // StatusPaused indicates the instance is paused (frozen CPU, retained memory/network). | ||
| StatusPaused InstanceStatus = "PAUSED" |
There was a problem hiding this comment.
StatusPaused introduction is not reflected in existing status switch paths
Adding PAUSED at Line 37 needs corresponding handling in current status-based switches; provided snippets show gaps in:
internal/core/services/instance.go:868-879(termination metrics),internal/core/services/dashboard.go:55-66(running/stopped summary),internal/autoscaler/server.go:248-260(proto state mapping).
Without those updates, paused instances are silently miscounted or unmapped.
| switch cb.state { | ||
| case StateHalfOpen: | ||
| cb.successCount++ | ||
| if cb.successCount >= cb.successRequired { | ||
| cb.failureCount = 0 | ||
| cb.successCount = 0 | ||
| cbFunc, name, from, to, changed = cb.transitionLocked(StateClosed) | ||
| } | ||
| default: | ||
| cb.failureCount = 0 | ||
| cb.state = StateClosed | ||
| } |
There was a problem hiding this comment.
Do not close an OPEN breaker from stale successes
At Line 205, recordSuccess can set StateOpen -> StateClosed in the default branch. Under concurrency, a request admitted while closed can finish successfully after another request has already opened the breaker, and this code will close it prematurely.
Proposed fix
func (cb *CircuitBreaker) recordSuccess() {
@@
switch cb.state {
case StateHalfOpen:
cb.successCount++
if cb.successCount >= cb.successRequired {
cb.failureCount = 0
cb.successCount = 0
cbFunc, name, from, to, changed = cb.transitionLocked(StateClosed)
}
- default:
+ case StateClosed:
cb.failureCount = 0
- cb.state = StateClosed
+ case StateOpen:
+ // Ignore stale successes from requests admitted before breaker opened.
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| switch cb.state { | |
| case StateHalfOpen: | |
| cb.successCount++ | |
| if cb.successCount >= cb.successRequired { | |
| cb.failureCount = 0 | |
| cb.successCount = 0 | |
| cbFunc, name, from, to, changed = cb.transitionLocked(StateClosed) | |
| } | |
| default: | |
| cb.failureCount = 0 | |
| cb.state = StateClosed | |
| } | |
| switch cb.state { | |
| case StateHalfOpen: | |
| cb.successCount++ | |
| if cb.successCount >= cb.successRequired { | |
| cb.failureCount = 0 | |
| cb.successCount = 0 | |
| cbFunc, name, from, to, changed = cb.transitionLocked(StateClosed) | |
| } | |
| case StateClosed: | |
| cb.failureCount = 0 | |
| case StateOpen: | |
| // Ignore stale successes from requests admitted before breaker opened. | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/platform/circuit_breaker.go` around lines 195 - 206, The
recordSuccess logic currently treats any non-HalfOpen state the same and resets
failureCount/state, which lets a late success close an already-OPEN breaker;
modify the switch in recordSuccess so only an actual StateClosed path clears
failureCount and sets state (i.e., add an explicit case StateClosed that resets
failureCount/state) and do nothing when cb.state == StateOpen so stale successes
cannot transition an open breaker to closed; keep the HalfOpen handling
(successCount, successRequired and transitionLocked(StateClosed)) as-is and
reference cb.successCount, cb.failureCount, cb.successRequired,
cb.transitionLocked, and the state constants
StateHalfOpen/StateClosed/StateOpen.
| func (a *FirecrackerAdapter) PauseInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return ErrNotSupported | ||
| } | ||
|
|
||
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return ErrNotSupported | ||
| } | ||
|
|
||
| func (a *FirecrackerAdapter) ResizeInstance(ctx context.Context, id string, cpu, memory int64) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return ErrNotSupported | ||
| } |
There was a problem hiding this comment.
MockMode should not silently return success for unsupported operations.
Returning nil for pause/resume/resize in MockMode can falsely mark instances as paused/resumed/resized even though no backend action exists.
♻️ Proposed fix
func (a *FirecrackerAdapter) PauseInstance(ctx context.Context, id string) error {
if a.cfg.MockMode {
- return nil
+ return errNotSupported()
}
- return ErrNotSupported
+ return errNotSupported()
}
func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error {
if a.cfg.MockMode {
- return nil
+ return errNotSupported()
}
- return ErrNotSupported
+ return errNotSupported()
}
func (a *FirecrackerAdapter) ResizeInstance(ctx context.Context, id string, cpu, memory int64) error {
if a.cfg.MockMode {
- return nil
+ return errNotSupported()
}
- return ErrNotSupported
+ return errNotSupported()
}As per coding guidelines **/*.go: Do not use silent failures - avoid blank identifier assignment like _ = someFunc().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/repositories/firecracker/adapter.go` around lines 154 - 173, The
MockMode branches in PauseInstance, ResumeInstance and ResizeInstance currently
return nil and silently report success; change them to return ErrNotSupported
(or a specific ErrMockModeNotSupported if you prefer) when a.cfg.MockMode is
true so callers get a non-nil error for unsupported operations; update the
functions PauseInstance, ResumeInstance and ResizeInstance to return the error
rather than nil and ensure ErrNotSupported is the error used so behavior is
consistent with non-mock paths.
| // applyDomainResize updates vCPU and memory in domain XML. | ||
| func (a *LibvirtAdapter) applyDomainResize(xml string, memoryKiB, vcpus int) string { | ||
| // Replace memory allocation | ||
| xml = a.reMemory.ReplaceAllString(xml, fmt.Sprintf(`<memory unit="KiB">%d</memory>`, memoryKiB)) | ||
| xml = a.reCurrentMemory.ReplaceAllString(xml, fmt.Sprintf(`<currentMemory unit="KiB">%d</currentMemory>`, memoryKiB)) | ||
| // Replace vCPU count | ||
| xml = a.reVCPU.ReplaceAllString(xml, fmt.Sprintf(`<vcpu>%d</vcpu>`, vcpus)) | ||
| return xml | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n internal/repositories/libvirt/adapter.go | head -300 | tail -80Repository: poyrazK/thecloud
Length of output: 3345
🏁 Script executed:
rg "reMemory|reCurrentMemory|reVCPU" internal/repositories/libvirt/adapter.go -A 1 -B 1Repository: poyrazK/thecloud
Length of output: 885
🌐 Web query:
libvirt XML memory unit variants KiB B MiB specification
💡 Result:
In libvirt domain XML, memory-related elements (such as , , , and sizes in memory devices) support an optional 'unit' attribute that specifies the unit for the numeric value. The default unit is "KiB" (kibibytes, 2^10 bytes). Valid unit variants include: - "b" or "bytes": bytes (1 byte) - "KB": kilobytes (10^3 = 1000 bytes) - "k" or "KiB": kibibytes (2^10 = 1024 bytes) - "MB": megabytes (10^6 = 1,000,000 bytes) - "M" or "MiB": mebibytes (2^20 = 1,048,576 bytes) - "GB": gigabytes (10^9 = 1,000,000,000 bytes) - "G" or "GiB": gibibytes (2^30 = 1,073,741,824 bytes) - "TB": terabytes (10^12 bytes) - "T" or "TiB": tebibytes (2^40 bytes) Values are rounded up to the nearest kibibyte by libvirt and may be further adjusted by the hypervisor. These units apply consistently across memory allocation, backing, devices, and related elements like memtune. Examples: 1048576 (1 GiB) 1024 (1 GiB) 1 (1 GiB)
Citations:
- 1: https://www.claudiokuenzler.com/blog/1471/kvm-qemu-libvirt-memory-allocation-unit-gb-gib
- 2: https://libvirt.org/formatdomain.html
- 3: https://libvirt.org/kbase/memorydevices.html
- 4: https://avdv.github.io/libvirt/formatdomain.html
- 5: https://libvirt.org/formatstorage.html
🌐 Web query:
libvirt vcpu tag attributes placement cpuset specification
💡 Result:
In libvirt domain XML configuration, the element supports two key attributes related to CPU placement and cpuset specification: 'placement' and 'cpuset'. The 'placement' attribute (since libvirt 0.9.11, QEMU/KVM only) indicates the CPU placement mode for the domain process: - 'static' (default if cpuset specified): Pins the domain process to the cpuset or all available CPUs. - 'auto': Pins to advisory nodeset from numad daemon, ignoring cpuset. The 'cpuset' attribute (since 0.4.4) is a comma-separated list of host physical CPU numbers or ranges for default pinning of the domain process and vCPUs: - Syntax: single numbers (e.g., '0'), ranges (e.g., '1-4'), exclusions (e.g., '^3' excludes CPU 3 from prior range). - Example: cpuset="1-4,^3,6" pins to CPUs 1,2,4,6. - Overrides: Ignored if cputune/emulatorpin specified (for process) or vcpupin for specific vCPUs. Unspecified vCPUs without vcpupin use this cpuset. Example XML: 2 For finer control, use with and . These apply primarily to QEMU/KVM; check hypervisor support.
Citations:
- 1: https://libvirt.org/formatdomain.html
- 2: https://libvirt.gitlab.io/libvirt-appdev-guide-python/libvirt_application_development_guide_using_python-Guest_Domains-Domain_Config-Memory_CPU.html
- 3: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/6/html/virtualization_administration_guide/sect-cpu-allocation
Use proper XML parsing instead of regex-based mutation for domain XML modifications.
The applyDomainResize function has two problems:
- The
<vcpu>regex matches attributes likeplacementandcpuset(<vcpu[^>]*>), but the replacement<vcpu>%d</vcpu>strips them entirely, causing data loss. - The memory and currentMemory regexes only match
unit="KiB", but libvirt supports multiple units (KiB, MiB, GiB, bytes, etc.). Existing XML with different units won't be replaced, and the hardcodedunit="KiB"replacement assumes a specific variant.
Parsing the XML tree would safely preserve attributes and handle unit variations correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/repositories/libvirt/adapter.go` around lines 249 - 257, The
applyDomainResize function currently mutates domain XML with regex (see
applyDomainResize and the regexes reMemory, reCurrentMemory, reVCPU), losing
vCPU attributes (placement/cpuset) and mishandling memory units; replace the
regex approach by parsing the XML into an element tree (e.g., using Go's
encoding/xml or an XML DOM library), locate the <vcpu> element and update only
its inner text while preserving all attributes, locate <memory> and
<currentMemory> elements, detect/convert their existing unit attribute to the
desired KiB value (or set the unit attribute to "KiB" after converting) instead
of hardcoding unit="KiB", and then marshal the modified XML back to string;
update applyDomainResize to use these parsed-element updates and remove reliance
on reMemory/reCurrentMemory/reVCPU regex replacements.
| func (m *mockComputeBackendExtended) PauseInstance(ctx context.Context, id string) error { | ||
| return nil | ||
| } | ||
| func (m *mockComputeBackendExtended) ResumeInstance(ctx context.Context, id string) error { | ||
| return nil | ||
| } | ||
| func (m *mockComputeBackendExtended) ResizeInstance(ctx context.Context, id string, cpu, memory int64) error { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
New mock methods silently succeed and can mask unexpected calls
At Lines 162, 165, and 168 the methods return nil without recording invocations. Wire them through m.Called(...) so tests can assert expected behavior and catch accidental usage.
Proposed fix
func (m *mockComputeBackendExtended) PauseInstance(ctx context.Context, id string) error {
- return nil
+ return m.Called(ctx, id).Error(0)
}
func (m *mockComputeBackendExtended) ResumeInstance(ctx context.Context, id string) error {
- return nil
+ return m.Called(ctx, id).Error(0)
}
func (m *mockComputeBackendExtended) ResizeInstance(ctx context.Context, id string, cpu, memory int64) error {
- return nil
+ return m.Called(ctx, id, cpu, memory).Error(0)
}As per coding guidelines, "**/*.go: Do not use silent failures - avoid blank identifier assignment like _ = someFunc()".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/workers/pipeline_worker_test.go` around lines 161 - 169, The three
mock methods on mockComputeBackendExtended (PauseInstance, ResumeInstance,
ResizeInstance) currently return nil and silently succeed; update each to record
the call via m.Called(...) (e.g., for ResizeInstance pass ctx, id, cpu, memory)
and return the mocked error/result from the call (use args.Error(0) /
args.Get(0) as appropriate) so tests can assert invocations and errors instead
of masking unexpected calls.
Add DomainGetState check to PauseInstance and ResumeInstance in the libvirt adapter, consistent with existing patterns in other methods (ResizeInstance, stopDomainIfRunning). This prevents silently succeeding when the domain is already in the target state.
- Bulkhead.acquire now returns ctx.Err() when context is cancelled, instead of always returning ErrBulkheadFull, making the error type distinguishable from capacity exhaustion - ResumeInstance error log now explicitly notes the instance is left in PAUSED state so operators can detect and recover from partial failures
Resolve merge conflicts: - Add PauseInstance/ResumeInstance to NoopComputeBackend - Remove duplicate ResizeInstance in docker/adapter.go - Add PauseInstance/ResumeInstance to ResilientCompute - Add PauseInstance/ResumeInstance to mockCompute test struct - Remove duplicate ResizeInstance in mock_compute_test.go - Fix bulkhead test to expect ctx.Err() instead of ErrBulkheadFull - Add domainStatePaused constant to libvirt adapter - Update bulkhead acquire to return ctx.Err() on context cancellation
…tations - Remove duplicate ResizeInstance from firecracker/adapter.go (origin/main has correct version) - Remove ErrNotSupported from firecracker PauseInstance/ResumeInstance (no-op backend) - Ensure firecracker adapter_noop.go is correct
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 27 out of 27 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // PauseInstance suspends a running domain (freezes CPU, retains memory/network). | ||
| func (a *LibvirtAdapter) PauseInstance(ctx context.Context, id string) error { | ||
| dom, err := a.client.DomainLookupByName(ctx, id) | ||
| if err != nil { | ||
| return fmt.Errorf(errDomainNotFound, err) | ||
| } | ||
|
|
||
| state, _, err := a.client.DomainGetState(ctx, dom, 0) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get domain state: %w", err) | ||
| } | ||
| if state != domainStateRunning { | ||
| return fmt.Errorf("domain is not running (state=%d), cannot pause", state) | ||
| } | ||
|
|
||
| if err := a.client.DomainSuspend(ctx, dom); err != nil { | ||
| return fmt.Errorf("failed to suspend domain: %w", err) | ||
| } | ||
| a.logger.Info("domain paused", "domain", id) | ||
| return nil | ||
| } | ||
|
|
||
| // ResumeInstance resumes a paused domain back to running state. | ||
| func (a *LibvirtAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| dom, err := a.client.DomainLookupByName(ctx, id) | ||
| if err != nil { | ||
| return fmt.Errorf(errDomainNotFound, err) | ||
| } | ||
|
|
||
| state, _, err := a.client.DomainGetState(ctx, dom, 0) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get domain state: %w", err) | ||
| } | ||
| if state != domainStatePaused { | ||
| return fmt.Errorf("domain is not paused (state=%d), cannot resume", state) | ||
| } | ||
|
|
||
| if err := a.client.DomainResume(ctx, dom); err != nil { | ||
| return fmt.Errorf("failed to resume domain: %w", err) | ||
| } | ||
| a.logger.Info("domain resumed", "domain", id) | ||
| return nil | ||
| } |
| ContainerExecInspect(ctx context.Context, execID string) (container.ExecInspect, error) | ||
| ContainerRename(ctx context.Context, containerID string, newName string) error | ||
| ContainerUpdate(ctx context.Context, containerID string, updateConfig container.UpdateConfig) (container.UpdateResponse, error) | ||
| ContainerUpdate(ctx context.Context, containerID string, updateConfig container.UpdateConfig) (container.UpdateResponse, error) |
| func (b *NoopComputeBackend) Ping(ctx context.Context) error { return nil } | ||
| func (b *NoopComputeBackend) Type() string { return "noop" } | ||
| func (b *NoopComputeBackend) PauseInstance(ctx context.Context, id string) error { return nil } | ||
| func (b *NoopComputeBackend) ResumeInstance(ctx context.Context, id string) error { return nil } | ||
| func (b *NoopComputeBackend) ResizeInstance(ctx context.Context, id string, cpu, memory int64) error { return nil } |
| s.logger.Error("failed to resume container", "container_id", target, "error", err) | ||
| return errors.Wrap(errors.Internal, "failed to resume container", err) |
| | PR | Title | Implementation Status | | ||
| |----|-------|----------------------| | ||
| | PR 1 | Image Management | ✅ COMPLETED | | ||
| | PR 2 | Instance Resize/Scale | ✅ COMPLETED | | ||
| | PR 3 | Instance Pause/Resume | ❌ NOT IMPLEMENTED | | ||
| | PR 4 | Provision Retry & Error Handling | ❌ NOT IMPLEMENTED | | ||
| | PR 5 | Instance Tags | ❌ NOT IMPLEMENTED | | ||
| | PR 6 | WebSocket Progress Events | ⚠️ PARTIALLY IMPLEMENTED | | ||
| | PR 7 | Instance Snapshots | ❌ NOT IMPLEMENTED | | ||
| | PR 8 | Instance Live Migration | ❌ NOT IMPLEMENTED | |
| dom, err := a.client.DomainLookupByName(ctx, id) | ||
| if err != nil { | ||
| return fmt.Errorf(errDomainNotFound, err) | ||
| } | ||
|
|
||
| state, _, err := a.client.DomainGetState(ctx, dom, 0) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to get domain state: %w", err) | ||
| } | ||
| if state != domainStateRunning { | ||
| return fmt.Errorf("domain is not running (state=%d), cannot pause", state) | ||
| } | ||
|
|
||
| if err := a.client.DomainSuspend(ctx, dom); err != nil { | ||
| return fmt.Errorf("failed to suspend domain: %w", err) | ||
| } | ||
| a.logger.Info("domain paused", "domain", id) | ||
| return nil | ||
| } | ||
|
|
||
| // ResumeInstance resumes a paused domain back to running state. | ||
| func (a *LibvirtAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| dom, err := a.client.DomainLookupByName(ctx, id) | ||
| if err != nil { | ||
| return fmt.Errorf(errDomainNotFound, err) | ||
| } | ||
|
|
| func (a *FirecrackerAdapter) PauseInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return nil // Firecracker does not support pause/resume | ||
| } | ||
|
|
||
| func (a *FirecrackerAdapter) ResumeInstance(ctx context.Context, id string) error { | ||
| if a.cfg.MockMode { | ||
| return nil | ||
| } | ||
| return nil // Firecracker does not support pause/resume | ||
| } |
| @@ -65,15 +63,15 @@ func (b *Bulkhead) acquire(ctx context.Context) error { | |||
| case <-timer.C: | |||
| return ErrBulkheadFull | |||
| case <-ctx.Done(): | |||
| return ErrBulkheadFull | |||
| return ctx.Err() | |||
| inst.Status = domain.StatusPaused | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.pause", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.pause", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("pause", "success").Inc() | ||
| s.logger.Info("instance paused", "instance_id", inst.ID) | ||
| return nil | ||
| } | ||
|
|
||
| // ResumeInstance resumes a paused instance back to running state. | ||
| func (s *InstanceService) ResumeInstance(ctx context.Context, idOrName string) error { | ||
| userID := appcontext.UserIDFromContext(ctx) | ||
| tenantID := appcontext.TenantIDFromContext(ctx) | ||
|
|
||
| if err := s.rbacSvc.Authorize(ctx, userID, tenantID, domain.PermissionInstanceUpdate, idOrName); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| inst, err := s.GetInstance(ctx, idOrName) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if inst.Status != domain.StatusPaused { | ||
| return errors.New(errors.Conflict, "instance must be PAUSED to resume, got: "+string(inst.Status)) | ||
| } | ||
|
|
||
| target := inst.ContainerID | ||
| if target == "" { | ||
| target = s.formatContainerName(inst.ID) | ||
| } | ||
|
|
||
| if err := s.compute.ResumeInstance(ctx, target); err != nil { | ||
| platform.InstanceOperationsTotal.WithLabelValues("resume", "failure").Inc() | ||
| s.logger.Error("failed to resume container", "container_id", target, "error", err) | ||
| return errors.Wrap(errors.Internal, "failed to resume container", err) | ||
| } | ||
|
|
||
| inst.Status = domain.StatusRunning | ||
| if err := s.repo.Update(ctx, inst); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| if err := s.auditSvc.Log(ctx, inst.UserID, "instance.resume", "instance", inst.ID.String(), map[string]interface{}{ | ||
| "name": inst.Name, | ||
| }); err != nil { | ||
| s.logger.Warn("failed to log audit event", "action", "instance.resume", "instance_id", inst.ID, "error", err) | ||
| } | ||
|
|
||
| platform.InstanceOperationsTotal.WithLabelValues("resume", "success").Inc() | ||
| s.logger.Info("instance resumed", "instance_id", inst.ID) | ||
| return nil |
- ResumeInstance: roll back to RUNNING state if backend resume fails, preventing instances from being left in inconsistent PAUSED state - PauseInstance: detect and return Conflict error when backend reports state violation (instance not in RUNNING state) instead of Internal - Add sentinel errors ErrInstanceNotPausable and ErrInstanceNotResumable for state-based failures in libvirt adapter Fixes review findings from PR #320
- Add pause and resume endpoint documentation to api-reference.md - Update FEATURES.md compute section with pause/resume and resilient compute details - Add Pause/Resume row to Docker vs Libvirt comparison table
- TestInstanceHandlerPause: success case - TestInstanceHandlerPauseNotFound: 404 case - TestInstanceHandlerPauseConflict: 409 case for wrong state - TestInstanceHandlerResume: success case - TestInstanceHandlerResumeNotFound: 404 case - TestInstanceHandlerResumeConflict: 409 case for wrong state
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/platform/bulkhead.go (1)
9-11:⚠️ Potential issue | 🟡 MinorUpdate comments to match the new error contract.
ErrBulkheadFullis now only returned on bulkhead wait-timeout, but Line 9-11 and Line 41-44 still say context expiry also returnsErrBulkheadFull. Please align comments with current behavior to prevent misuse.✏️ Suggested doc-only patch
-// ErrBulkheadFull is returned when the bulkhead's concurrency limit is reached -// and the caller's timeout/context expires before a slot opens. +// ErrBulkheadFull is returned when the bulkhead's concurrency limit is reached +// and the bulkhead wait timeout expires before a slot opens. var ErrBulkheadFull = errors.New("bulkhead: concurrency limit reached") // Execute runs fn within the bulkhead's concurrency limit. -// If the bulkhead is full and the wait timeout (or context) expires, -// ErrBulkheadFull is returned without calling fn. +// If the caller context is canceled or its deadline is exceeded while waiting, +// the corresponding context error is returned. +// If the bulkhead wait timeout expires first, ErrBulkheadFull is returned +// without calling fn. func (b *Bulkhead) Execute(ctx context.Context, fn func() error) error {Also applies to: 41-44
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/platform/bulkhead.go` around lines 9 - 11, Update the comment for ErrBulkheadFull to reflect the new contract: state that ErrBulkheadFull is returned only when the bulkhead's wait times out (wait-timeout) and not when the caller's context is cancelled or expires; remove or change any text that says context expiry/cancellation also returns ErrBulkheadFull (including the other comment block referencing the same behavior). Ensure the comment clearly distinguishes context cancellation errors from ErrBulkheadFull so callers won't mistake context expiry for a wait-timeout.
♻️ Duplicate comments (1)
internal/core/services/instance.go (1)
661-664:⚠️ Potential issue | 🟠 MajorAdd compensating backend rollback when DB update fails after successful pause/resume.
After a successful backend transition, returning on
repo.Updatefailure can leave runtime and DB state inconsistent. Use best-effort inverse backend action before returning the DB error.Also applies to: 712-715
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/instance.go` around lines 661 - 664, After performing the backend state transition (e.g., setting inst.Status = domain.StatusPaused / domain.StatusRunning) but before returning when s.repo.Update(ctx, inst) fails, invoke the inverse backend operation as a best-effort compensating rollback (e.g., call the service method that undoes the pause when repo.Update fails after pausing, or undoes the resume when repo.Update fails after resuming), swallow and log any errors from that rollback, then return the original DB error; apply the same pattern for the other identical block (around the inst.Status assignment and s.repo.Update at the second location).
🧹 Nitpick comments (1)
compute-roadmap.md (1)
90-90: Optional: Address markdown formatting issues.Static analysis detected minor formatting issues that should be fixed for better markdown rendering:
- Tables missing surrounding blank lines (lines 90, 118, 213, 266, 321, 387, 464, 527): Add blank lines before and after each table per MD058.
- Fenced code block missing language specifier (line 378): Add a language identifier (e.g.,
text or leave empty as) to the code fence.These changes improve rendering consistency across different markdown parsers.
Also applies to: 118-118, 213-213, 266-266, 321-321, 387-387, 464-464, 527-527, 378-378
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute-roadmap.md` at line 90, Add blank lines before and after each markdown table (for example the table starting with "| Action | File |") at the occurrences noted (lines flagged in the review) so each table is separated by a blank line above and below per MD058, and update the fenced code block at the flagged location (the fence around the snippet at the review’s line 378) to include a language specifier such as ```text (or an explicit empty fence ```) to satisfy the code-block rule; make these edits across the noted occurrences (tables at the flagged lines and the single fenced block) to fix rendering.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute-roadmap.md`:
- Line 11: The roadmap incorrectly marks "PR 3 | Instance Pause/Resume" as "❌
NOT IMPLEMENTED"; update the status table entry for PR 3 to "✅ COMPLETED" and
rework the section currently titled "What needs to be built" into "What was
built" for PR 3: list the implemented artifacts (StatusPaused in InstanceStatus
enum, POST /instances/:id/pause and POST /instances/:id/resume endpoints,
service-layer validation for RUNNING↔PAUSED transitions, ComputeBackend
implementations for Docker/Libvirt/Firecracker/Noop with
ContainerPause/ContainerUnpause and DomainSuspend/DomainResume, audit events,
resume rollback behavior, and tests) following the same structure/pattern used
for PR 1 and PR 2.
In `@docs/FEATURES.md`:
- Line 35: The docs state uses STARTING but the domain model defines
INITIALIZING; update the transition line to use INITIALIZING (so the sequence
reads INITIALIZING → RUNNING → STOPPED / PAUSED → DELETED) to match the instance
status constants declared in the domain (e.g., the INITIALIZING/ RUNNING /
STOPPED / PAUSED / DELETED status constants in instance.go); ensure the term
exactly matches the domain constant name to avoid contract drift.
In `@docs/swagger/swagger.json`:
- Around line 3969-4020: The pause/resume instance endpoint docs (e.g., the
"/instances/{id}/pause" POST operation and the corresponding
"/instances/{id}/resume" operation) are missing authentication/authorization
failure responses; add 401 and 403 response entries to each protected
operation’s "responses" object (using the same schema ref
"#/definitions/httputil.Response") so generated clients/docs include
unauthorized and forbidden cases.
In `@internal/core/services/instance.go`:
- Around line 700-709: Before calling s.compute.ResumeInstance capture the
instance's current status (e.g., oldStatus := inst.Status) and on resume failure
restore inst.Status = oldStatus instead of forcibly setting
domain.StatusRunning; then call s.repo.Update(ctx, inst) and include both the
resume error and any repo error in s.logger.Error; update any related
logs/metrics (platform.InstanceOperationsTotal) accordingly and return the
wrapped error as before.
In `@internal/handlers/instance_handler.go`:
- Around line 254-257: The Swagger comment blocks above the instance handler(s)
are missing a 400 response even though the handler returns errors.InvalidInput
when the request id is empty; update the comment blocks that currently list
`@Success` 200, `@Failure` 404, `@Failure` 409, `@Failure` 500 to also include `@Failure`
400 {object} httputil.Response so the generated docs reflect the
errors.InvalidInput path (apply the same change to the other similar comment
block that validates id and returns errors.InvalidInput).
In `@internal/repositories/libvirt/adapter.go`:
- Around line 217-219: The resume path currently returns a plain fmt.Errorf when
the domain is not in domainStatePaused; replace that plain error with a wrapped
typed error using apierrors.ErrInstanceNotResumable (mirroring the pause path)
so callers can detect Conflict; e.g. change the return to wrap the message with
apierrors.ErrInstanceNotResumable (using the project’s usual Wrapf/Wrap pattern)
and include the same descriptive text and state value.
---
Outside diff comments:
In `@internal/platform/bulkhead.go`:
- Around line 9-11: Update the comment for ErrBulkheadFull to reflect the new
contract: state that ErrBulkheadFull is returned only when the bulkhead's wait
times out (wait-timeout) and not when the caller's context is cancelled or
expires; remove or change any text that says context expiry/cancellation also
returns ErrBulkheadFull (including the other comment block referencing the same
behavior). Ensure the comment clearly distinguishes context cancellation errors
from ErrBulkheadFull so callers won't mistake context expiry for a wait-timeout.
---
Duplicate comments:
In `@internal/core/services/instance.go`:
- Around line 661-664: After performing the backend state transition (e.g.,
setting inst.Status = domain.StatusPaused / domain.StatusRunning) but before
returning when s.repo.Update(ctx, inst) fails, invoke the inverse backend
operation as a best-effort compensating rollback (e.g., call the service method
that undoes the pause when repo.Update fails after pausing, or undoes the resume
when repo.Update fails after resuming), swallow and log any errors from that
rollback, then return the original DB error; apply the same pattern for the
other identical block (around the inst.Status assignment and s.repo.Update at
the second location).
---
Nitpick comments:
In `@compute-roadmap.md`:
- Line 90: Add blank lines before and after each markdown table (for example the
table starting with "| Action | File |") at the occurrences noted (lines flagged
in the review) so each table is separated by a blank line above and below per
MD058, and update the fenced code block at the flagged location (the fence
around the snippet at the review’s line 378) to include a language specifier
such as ```text (or an explicit empty fence ```) to satisfy the code-block rule;
make these edits across the noted occurrences (tables at the flagged lines and
the single fenced block) to fix rendering.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 19fda502-3ac2-4bf7-9a7a-57b157afdcca
📒 Files selected for processing (32)
compute-roadmap.mddocs/FEATURES.mddocs/api-reference.mddocs/guides/libvirt-backend.mddocs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlinternal/api/setup/router.gointernal/core/ports/compute.gointernal/core/ports/instance.gointernal/core/services/instance.gointernal/core/services/instance_unit_test.gointernal/core/services/mock_compute_test.gointernal/errors/errors.gointernal/handlers/instance_handler.gointernal/handlers/instance_handler_test.gointernal/platform/bulkhead.gointernal/platform/bulkhead_test.gointernal/platform/resilient_compute.gointernal/platform/resilient_compute_test.gointernal/repositories/docker/adapter.gointernal/repositories/docker/fakes_test.gointernal/repositories/firecracker/adapter.gointernal/repositories/firecracker/adapter_noop.gointernal/repositories/k8s/kubeadm_provisioner_test.gointernal/repositories/k8s/mocks_test.gointernal/repositories/libvirt/adapter.gointernal/repositories/libvirt/lb_proxy_test.gointernal/repositories/noop/adapters.gointernal/workers/database_failover_worker_test.gointernal/workers/healing_worker_test.gointernal/workers/pipeline_worker_test.go
✅ Files skipped from review due to trivial changes (7)
- internal/workers/pipeline_worker_test.go
- internal/errors/errors.go
- internal/platform/resilient_compute_test.go
- docs/guides/libvirt-backend.md
- internal/platform/bulkhead_test.go
- docs/api-reference.md
- internal/repositories/k8s/kubeadm_provisioner_test.go
🚧 Files skipped from review as they are similar to previous changes (10)
- internal/repositories/docker/fakes_test.go
- internal/repositories/libvirt/lb_proxy_test.go
- internal/api/setup/router.go
- internal/workers/database_failover_worker_test.go
- internal/repositories/firecracker/adapter.go
- internal/platform/resilient_compute.go
- internal/repositories/firecracker/adapter_noop.go
- internal/repositories/docker/adapter.go
- internal/core/services/mock_compute_test.go
- internal/core/services/instance_unit_test.go
- compute-roadmap.md: update PR 3 status to COMPLETED with implemented artifacts - docs/FEATURES.md: fix instance state name from STARTING to INITIALIZING - docs/swagger: add 400/401/403 responses to pause/resume endpoints - internal/handlers/instance_handler.go: add 400/401/403 to swagger comments - internal/core/services/instance.go: - resume rollback now preserves old status instead of hardcoding RUNNING - add compensating rollback when repo.Update fails after pause/resume - rollback now attempts both repo Update (to restore status) and backend undo - internal/platform/bulkhead.go: clarify ErrBulkheadFull only on wait timeout - internal/repositories/libvirt/adapter.go: add ErrInstanceNotResumable to resume path - instance_unit_test.go: add RepoError_Rollback test cases for pause and resume Fixes all code review findings.
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (3)
internal/platform/bulkhead.go (1)
27-39:⚠️ Potential issue | 🟠 MajorWait-timeout contract is still inconsistent with constructor behavior.
Line 27 says
WaitTimeoutdefaults to 5s, but Lines 31-39 do not apply that default, so zero means “wait on context only.” This mismatch can still cause unbounded waits when callers use a non-deadline context.♻️ Proposed fix
+const ( + defaultBulkheadMaxConc = 10 + defaultBulkheadWaitTimeout = 5 * time.Second +) + type BulkheadOpts struct { Name string // Identifier for logging/metrics. - MaxConc int // Maximum concurrent requests. Default 10. - WaitTimeout time.Duration // How long to wait for a slot. Default 5s. 0 means use context deadline. + MaxConc int // Maximum concurrent requests. Default 10. + WaitTimeout time.Duration // How long to wait for a slot. Default 5s. } // NewBulkhead creates a new concurrency-limiting bulkhead. func NewBulkhead(opts BulkheadOpts) *Bulkhead { if opts.MaxConc <= 0 { - opts.MaxConc = 10 + opts.MaxConc = defaultBulkheadMaxConc + } + if opts.WaitTimeout <= 0 { + opts.WaitTimeout = defaultBulkheadWaitTimeout } return &Bulkhead{ name: opts.Name, sem: make(chan struct{}, opts.MaxConc), timeout: opts.WaitTimeout, } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/platform/bulkhead.go` around lines 27 - 39, The WaitTimeout field on BulkheadOpts is documented to default to 5s but NewBulkhead currently treats zero as “use context deadline”; update NewBulkhead to apply the documented default: if opts.WaitTimeout == 0 set it to 5*time.Second before assigning to the Bulkhead.timeout field (and ensure the comment on WaitTimeout matches this behavior); reference the WaitTimeout field on BulkheadOpts, the NewBulkhead function, and the Bulkhead.timeout field when making the change.internal/core/services/instance_unit_test.go (1)
304-304:⚠️ Potential issue | 🟡 MinorKeep these
GetByNameexpectations required.
TerminateInstanceandExecstill tryGetByNamebefore falling back toGetByIDfor UUID inputs. Leaving these expectations optional lets that resolution path disappear without failing the tests.Also applies to: 345-356, 676-676, 731-731, 781-781, 836-836, 896-896, 956-956
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/instance_unit_test.go` at line 304, The mock expectation for repo.On("GetByName", mock.Anything, instanceID.String()) is marked optional with .Maybe(), which allows the name-resolution path to disappear and hide regressions; change those expectations to required (remove .Maybe() and use the default required expectation or call .Once()/.Times(1)) so TerminateInstance and Exec tests must exercise GetByName before they fall back to GetByID; update the same pattern at the other occurrences that reference repo.On("GetByName", mock.Anything, ...) in the test (lines noted in the review) to ensure the name lookup path is enforced.internal/core/services/instance.go (1)
711-721:⚠️ Potential issue | 🟠 MajorDon't collapse backend resume conflicts into 500s.
ComputeBackend.ResumeInstancecan return a Conflict-typed error such asinternal/errors.ErrInstanceNotResumable. This branch always wrapsInternal, so/instances/{id}/resumereturns 500 instead of the documented 409.🧩 Suggested fix
if err := s.compute.ResumeInstance(ctx, target); err != nil { platform.InstanceOperationsTotal.WithLabelValues("resume", "failure").Inc() + if errors.Is(err, errors.Conflict) { + s.logger.Warn("resume not possible in current state", "container_id", target, "error", err) + return errors.New(errors.Conflict, err.Error()) + } oldStatus := inst.Status s.logger.Error("failed to resume container, rolling back", "container_id", target, "instance_id", inst.ID, "old_status", oldStatus, "error", err) inst.Status = oldStatusPlease add a unit case for a Conflict returned from
ResumeInstanceso this stays covered.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/core/services/instance.go` around lines 711 - 721, The resume error handling currently always wraps errors.Internal, collapsing conflict errors into 500s; change the s.compute.ResumeInstance error branch to detect conflict-type errors (e.g., internal/errors.ErrInstanceNotResumable or using errors.Is/As) and handle them specially: increment the platform.InstanceOperationsTotal metric with "resume","conflict", log a clear conflict message including "container_id" and "instance_id", set inst.Status back to oldStatus and attempt repo.Update as before, but return a wrapped conflict error (not errors.Internal). Also add a unit test case for s.compute.ResumeInstance returning the conflict error to assert a 409-equivalent response and the "conflict" metric increment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute-roadmap.md`:
- Around line 90-98: Several tables in the markdown are missing surrounding
blank lines which triggers MD058; locate each table starting with the header row
"| Action | File |" (and other similar pipe-delimited tables) and add a single
blank line before and after each table block (e.g., the tables near the shown
ranges and any other occurrences), ensuring there is an empty line above the
header row and an empty line after the final table row so markdownlint MD058 is
satisfied.
- Around line 24-50: The PR summary is inconsistent: the top section claims
ImportImage and six methods (including ImportImage) are implemented, but the
"What exists" list shows only five methods and ImageHandler with five endpoints
while "What needs to be built" still lists ImportImage; align the document by
choosing the true current state and updating the lists accordingly—either remove
ImportImage from the initial "All features implemented" list or update "What
exists" to include ImportImage across domain.Image, ports.ImageService (include
ImportImage), internal/core/services/image.go (ImageService implementation
containing ImportImage), internal/handlers/image_handler.go (ImageHandler with
POST /api/v1/images/import), and tests (TestImageService_Unit); also ensure RBAC
and router notes (PermissionImageCreate, PermissionImageRead*, router wiring)
are consistent with the chosen state.
- Around line 334-340: The fenced code block using triple backticks in the
Markdown (the block that starts with ``` and contains the flow "LaunchInstance →
enqueue job → return 202" through "Client connects to
/ws/instances/{id}/events") needs a language identifier to satisfy markdownlint
MD040; update the opening fence from ``` to ```text (or another appropriate
language) so the block becomes a labeled fenced code block.
In `@internal/core/services/instance_unit_test.go`:
- Around line 982-1155: The test testInstanceServicePauseResumeUnit reuses the
same mock objects across subtests which accumulates expectations; refactor into
a table-driven test where each table entry describes the case (name, initial
instance state, expected compute/repo/rbac/audit behaviors and expected
error/state), and inside the loop create fresh mocks (new(MockInstanceRepo),
new(MockComputeBackend), new(MockRBACService), new(MockAuditService)), construct
svc with those mocks, set up per-case repo.On/compute.On/rbacSvc.On/auditSvc.On
expectations (using Once/Maybe as appropriate), call svc.PauseInstance or
svc.ResumeInstance, and assert results; ensure each case sets its own instance
value (domain.Instance) and that mock.AssertExpectationsForObjects is called for
the mocks created in that iteration to guarantee isolation and avoid cross-test
expectation leakage.
In `@internal/core/services/instance.go`:
- Around line 663-673: The repo update rollback currently restores inst.Status
(oldStatus) before attempting the compute backend compensating call, which can
leave DB and backend inconsistent; change the order in both rollback branches so
you first call the compensating backend method (use
s.compute.ResumeInstance(ctx, target) in the pause-failure path and
s.compute.PauseInstance(ctx, target) in the resume-failure path), check/log any
backend error, and only after a successful or best-effort backend call attempt
to write the old status back via s.repo.Update(ctx, inst) (use the existing
error logging pattern for repo update failures); apply the same ordering change
to the symmetric rollback block that uses s.compute.PauseInstance and
s.repo.Update.
---
Duplicate comments:
In `@internal/core/services/instance_unit_test.go`:
- Line 304: The mock expectation for repo.On("GetByName", mock.Anything,
instanceID.String()) is marked optional with .Maybe(), which allows the
name-resolution path to disappear and hide regressions; change those
expectations to required (remove .Maybe() and use the default required
expectation or call .Once()/.Times(1)) so TerminateInstance and Exec tests must
exercise GetByName before they fall back to GetByID; update the same pattern at
the other occurrences that reference repo.On("GetByName", mock.Anything, ...) in
the test (lines noted in the review) to ensure the name lookup path is enforced.
In `@internal/core/services/instance.go`:
- Around line 711-721: The resume error handling currently always wraps
errors.Internal, collapsing conflict errors into 500s; change the
s.compute.ResumeInstance error branch to detect conflict-type errors (e.g.,
internal/errors.ErrInstanceNotResumable or using errors.Is/As) and handle them
specially: increment the platform.InstanceOperationsTotal metric with
"resume","conflict", log a clear conflict message including "container_id" and
"instance_id", set inst.Status back to oldStatus and attempt repo.Update as
before, but return a wrapped conflict error (not errors.Internal). Also add a
unit test case for s.compute.ResumeInstance returning the conflict error to
assert a 409-equivalent response and the "conflict" metric increment.
In `@internal/platform/bulkhead.go`:
- Around line 27-39: The WaitTimeout field on BulkheadOpts is documented to
default to 5s but NewBulkhead currently treats zero as “use context deadline”;
update NewBulkhead to apply the documented default: if opts.WaitTimeout == 0 set
it to 5*time.Second before assigning to the Bulkhead.timeout field (and ensure
the comment on WaitTimeout matches this behavior); reference the WaitTimeout
field on BulkheadOpts, the NewBulkhead function, and the Bulkhead.timeout field
when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 66a88b06-6736-4d62-b629-2e550acf8a50
📒 Files selected for processing (11)
compute-roadmap.mddocs/FEATURES.mddocs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlinternal/core/services/instance.gointernal/core/services/instance_unit_test.gointernal/handlers/instance_handler.gointernal/handlers/instance_handler_test.gointernal/platform/bulkhead.gointernal/repositories/libvirt/adapter.go
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/FEATURES.md
- internal/handlers/instance_handler_test.go
- docs/swagger/swagger.yaml
- bulkhead: apply 5s default WaitTimeout when zero (matches docs) - instance: add conflict detection in ResumeInstance (matches PauseInstance) - instance: fix rollback order - compensating backend call first, then repo - instance_unit_test: refactor pause/resume tests to table-driven with fresh mocks per case; add ResumeInstance_ConflictError test case
- docker adapter: fix indentation regression in ContainerUpdate interface method - libvirt adapter: improve error messages with human-readable state names instead of raw numeric values (e.g., "domain is PAUSED" not "domain is 3")
Fixes lint failure: internal/core/services/instance_unit_test.go:1012:12: test helper function should start from t.Helper() (thelper)
Fixes lint failure across all 9 assert closures in the pause/resume table-driven test cases (lines 1012, 1028, 1047, 1071, 1095, 1111, 1131, 1148, 1172).
After merging main, testComputeBackend was missing the new interface methods.
The previous log said "failed to resume container, rolling back" which was misleading since no backend rollback occurs when ResumeInstance fails - the container stays PAUSED. Updated to "failed to resume container, instance left in PAUSED state".
Summary
POST /instances/:id/pauseandPOST /instances/:id/resumeAPI endpointsStatusPausedto instance status enumPermissionInstanceUpdateResilientComputewrapper with circuit breaker and bulkhead patternsTest plan
go build ./...- compilesgo test ./internal/platform/... -v- platform tests passgo test ./internal/repositories/libvirt/... -v- libvirt tests passgo test ./internal/repositories/docker/... -v- docker tests passgo test ./internal/handlers/... -v- handler tests passgo vet ./...- no issuesCommits (11 total)
Summary by CodeRabbit
New Features
Documentation
Bug Fixes
Tests