feat(instances): add ResizeInstance for CPU/memory resizing#183
Merged
feat(instances): add ResizeInstance for CPU/memory resizing#183
Conversation
Add PermissionInstanceResize RBAC permission and ResizeInstance method to both ComputeBackend and InstanceService interfaces. ResizeInstance(ctx, id, cpu, memory) allows changing the CPU/memory of a running or stopped instance.
Adds ContainerUpdate to dockerClient interface and implements ResizeInstance which calls docker client.ContainerUpdate with cpu (NanoCPUs) and memory (bytes) limits for live resize.
Implements ResizeInstance for Libvirt using a cold resize: stop → update domain XML (memory/vcpu) → start. Uses regex replacement on domain XML to update <memory> and <vcpu> elements, then redefines and restarts the domain.
Resolves instance by id/name, validates target instance type, checks quota delta (new-old resources for vcpus/memory), calls compute.ResizeInstance, updates instance record, and logs the resize operation. Decrements quota on downscale.
POST /api/v1/instances/:id/resize accepts { instance_type: "basic-4" }
and calls service.ResizeInstance. Requires PermissionInstanceResize.
Firecracker returns "resize not supported" error. Noop backend returns nil (no-op).
Add ResizeInstance to MockInstanceService, MockComputeBackend, instanceServiceMock (handler tests), and mockCompute (libvirt).
Fixes lint failure where k8s test mocks did not implement the new ResizeInstance method added to ports.InstanceService.
Fixes lint failures in k8s and workers test files where mocks did not implement the ResizeInstance method added to ports.InstanceService and ports.ComputeBackend interfaces.
Add comprehensive test coverage for the ResizeInstance feature: - Unit tests: 12 subtests covering upsize, downsize, same-size, by-uuid lookup, not found, type validation, quota enforcement, compute errors, record errors, and RBAC failures - Handler tests: 6 subtests covering success, invalid ID, invalid body, empty instance type, not found, and quota exceeded - E2E tests: upsize, downsize, and invalid type scenarios E2E tests require a running server with Docker backend.
- Document POST /instances/:id/resize in api-reference.md - Update FEATURES.md instance lifecycle description - Add ADR-025 documenting the resize decision (cold vs warm, quota delta enforcement, Libvirt vs Docker strategy)
The resize endpoint returns {"message": "..."}, not the instance
data. Verification happens via GET /instances/:id after resize.
basic-4 doesn't exist in the seeded instance_types table. Resize upsize: basic-2 → standard-1 (1→2 vCPU, 1024→2048MB). Resize downsize: standard-1 → basic-2 (2→1 vCPU, 2048→1024MB).
Adds instance:resize permission and seeds it to developer role so that users with developer role can resize instances. File: internal/repositories/postgres/migrations/107_seed_instance_resize_permission.up.sql File: internal/repositories/postgres/migrations/107_seed_instance_resize_permission.down.sql
…esize - Pass vCPU count (cpu/1e9) instead of NanoCPUs to applyDomainResize - Only call DomainCreate if instance was originally running, preserving stopped state after resize File: internal/repositories/libvirt/adapter.go File: internal/repositories/firecracker/adapter_noop.go
- Add same-size short-circuit to avoid unnecessary compute calls - Validate Status and ContainerID before proceeding with resize - Use deltaMemMB in MB (not GB) to preserve sub-GB changes - Preserve QuotaExceeded error kind instead of wrapping as Forbidden - Capture quota errors and log failures after successful resize - Add rollback: revert compute and undo quota on repo update failure - Record INSTANCE_RESIZE event for audit trail File: internal/core/services/instance.go
… coverage - Remove uuid.Parse block so handler passes raw idStr to service - Fix binding error to use errors.New instead of errors.Wrap - Update InvalidID test to verify name-based resize works - Add response body assertion to Success test File: internal/handlers/instance_handler.go File: internal/handlers/instance_handler_test.go
- Add 401/403 responses and minLength:1 to instance_type in swagger.json - Update unit tests: add eventSvc mock, fix memory quota values (MB not GB), add Status field to test instances, add rollback expectations for UpdateRecordError test - Add containerUpdateResp and containerUpdateErr to fakeDockerClient for error/warning path testing File: docs/swagger/swagger.json File: internal/core/services/instance_unit_test.go File: internal/repositories/docker/fakes_test.go
applyDomainResize now uses encoding/xml to unmarshal domain XML into a struct, modify memory/vcpu fields, and marshal back. This is more robust than regex replacement which breaks on XML format changes. File: internal/repositories/libvirt/adapter.go
Replaces 4 identical polling loops with a shared helper function that accepts desired status and timeout. Removes ~70 lines of duplicated code across TestComputeE2E and TestResizeInstance tests. File: tests/compute_e2e_test.go
The 107_seed_instance_resize_permission migration incorrectly referenced a non-existent "permissions" table and used permission_id column that doesn't exist in role_permissions. The role_permissions table uses a "permission" TEXT column (not permission_id UUID). This fixes the migration to match the actual schema defined in 027_create_rbac.up.sql.
The swagger generator removed minLength:1 from instance_type since the handler validation now validates empty string separately. This brings docs in sync with the actual generated swagger output.
Two changes to address gocyclo lint failure (>30 complexity threshold): 1. Return error when quota updates fail post-resize instead of just logging (early return reduces complexity) 2. The thelper lint was already fixed via t.Helper() in prior session This keeps ResizeInstance below the 30-complexity guard while preserving all rollback and error-handling behavior.
…omplexity Breaks ResizeInstance into focused helpers: resolveInstance, resolveInstanceTypes, validateResize, executeResize, completeResize. Each helper handles one concern, keeping ResizeInstance's main flow at ~10 branches (below 30 threshold). Also removes unused `desired` parameter from waitForInstanceStatus since all callers pass domain.StatusRunning (fixes unparam lint).
All callers pass the same 90-second timeout, so hardcode it directly in the helper to satisfy unparam lint.
The QuotaExceeded_CPU and QuotaExceeded_Memory tests were missing Compute mock expectations, causing nil pointer dereference panics when executeResize tried to call Compute.ResizeInstance.
1. Magic constants: Add NanoCPUsPerVCPU (1e9) and BytesPerMB (1024*1024) constants, replace raw literals in executeResize, completeResize, and rollback 2. Lookup error masking: Only fall through to UUID lookup when GetByName returns a NotFound error (not arbitrary errors) 3. Handler empty-id guard: Add id != "" check in ResizeInstance handler, consistent with Stop/Get/Terminate 4. Log key snake_case: Fix memory_ki_b → memory_kib in domain resize log 5. DB rollback: Stop discarding errors from compensating actions; add rollbackErrs slice with structured error messages; only attempt quota reversal when forward increment succeeded (cpuIncremented/memIncremented flags); compose and return original update error + rollback errors 6. applyDomainResize XML: Remove struct-based marshal/unmarshal (stripped elements); revert to targeted regex replacements preserving all other XML elements/attributes; remove unused domainXML struct and encoding/xml import 7. Down migration: Restrict DELETE to same role targeted by up migration (role = 'developer') instead of deleting for all roles 8. Undefine/define order: Add rollback DomainDefineXML on define failure to prevent permanent domain loss if redefine fails
798ed07 to
532a03a
Compare
532a03a to
588f74f
Compare
# Conflicts: # internal/repositories/libvirt/adapter.go
6337712 to
9924cac
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| // 3. DB update | ||
| inst.InstanceType = newInstanceType | ||
| inst.Version++ |
Comment on lines
827
to
+830
| deltaMemMB := newIT.MemoryMB - oldIT.MemoryMB | ||
| memoryGB := deltaMemMB / 1024 | ||
|
|
||
| // 1. Quota changes first — fail fast before any VM state change |
Comment on lines
256
to
270
| if wasRunning { | ||
| if err := a.client.DomainCreate(ctx, newDom); err != nil { | ||
| if rbErr := a.rollbackFromSnapshot(ctx, id, snapshotName, domXML); rbErr != nil { | ||
| return fmt.Errorf("failed to start domain after resize (instance_id=%s): %w", id, errors.Join(err, rbErr)) | ||
| // Rollback: undefine the new definition and restore the original domain. | ||
| undefineErr := a.client.DomainUndefine(ctx, newDom) | ||
| if undefineErr != nil { | ||
| a.logger.Error("failed to undefine new domain after DomainCreate failure", "domain", id, "error", undefineErr) | ||
| } | ||
| restoredDom, rollbackErr := a.client.DomainDefineXML(ctx, domXML) | ||
| if rollbackErr != nil { | ||
| return fmt.Errorf("failed to start domain after resize (instance_id=%s), rollback also failed: original error: %w; rollback error: %w", id, err, rollbackErr) | ||
| } | ||
| if restartErr := a.client.DomainCreate(ctx, restoredDom); restartErr != nil { | ||
| return fmt.Errorf("failed to start domain after resize (instance_id=%s), rollback redef succeeded but restart failed: %w", id, restartErr) | ||
| } | ||
| return fmt.Errorf("failed to start domain after resize: %w", err) |
Comment on lines
154
to
160
| t.Run("TerminateInstance", func(t *testing.T) { | ||
| if !instanceReady { | ||
| t.Skip("Instance did not reach RUNNING state, skipping terminate") | ||
| } | ||
| resp := deleteRequest(t, client, fmt.Sprintf(testutil.TestRouteFormat, testutil.TestBaseURL, testutil.TestRouteInstances, instanceID), token) | ||
| defer func() { _ = resp.Body.Close() }() | ||
|
|
Comment on lines
256
to
263
| // 5. Terminate Instance | ||
| t.Run("TerminateInstance", func(t *testing.T) { | ||
| if !instanceReady { | ||
| t.Skip("Instance did not reach RUNNING state, skipping terminate") | ||
| } | ||
| resp := deleteRequest(t, client, fmt.Sprintf(testutil.TestRouteFormat, testutil.TestBaseURL, testutil.TestRouteInstances, instanceID), token) | ||
| defer func() { _ = resp.Body.Close() }() | ||
|
|
- Add ResizeInstanceResponse struct with message, instance_type, and status fields - Update swagger annotation to use new response schema - Regenerate swagger docs
4e65712 to
66a5495
Compare
# Conflicts: # internal/api/setup/router.go # internal/core/services/notify_unit_test.go # internal/repositories/libvirt/adapter.go # internal/repositories/noop/adapters.go
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
POST /instances/:id/resizeendpoint to resize running/stopped instances to different instance typesBreaking Change
Response format change (for API consumers):
The success response for
POST /instances/:id/resizenow includes additional fields:{"message": "instance resized"}{"message": "instance resized", "instance_type": "...", "status": "..."}Clients should be updated to handle the new
instance_typeandstatusfields.Test plan
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests