feat(instances): enhance stats API with network and disk I/O#375
feat(instances): enhance stats API with network and disk I/O#375
Conversation
Adds NetworkRxBytes, NetworkTxBytes, DiskReadBytes, DiskWriteBytes, and CPUTimeNanoseconds to InstanceStats. Also extends RawDockerStats with NetworkStats, BlkioStats, and BlkioStatEntry to support decoding these fields from Docker's stats payload.
Processes NetworkStats and BlkioStats from RawDockerStats to populate the new NetworkRxBytes, NetworkTxBytes, DiskReadBytes, DiskWriteBytes, and CPUTimeNanoseconds fields in InstanceStats.
Adds DomainGetCPUStats method to support retrieving detailed CPU stats (nanoseconds, vCPU time) for instance stats enhancement.
Enhances GetInstanceStats to call DomainGetCPUStats and include cpu_time in the returned JSON, matching the RawDockerStats structure so the service layer can populate CPUTimeNanoseconds.
Includes NetworkRxBytes, NetworkTxBytes, DiskReadBytes, DiskWriteBytes, and CPUTimeNanoseconds in the InstanceStats schema.
Documents the stats endpoint with all new fields: network_rx_bytes, network_tx_bytes, disk_read_bytes, disk_write_bytes, and cpu_time_nanoseconds.
- Extends TestInstanceService_CalculateInstanceStats with cases for: - Network I/O across multiple interfaces (rx/tx summing) - Block I/O read/write (Op=="read"/"Read", Op=="write"/"Write") - CPUTimeNanoseconds from CPUStats.CPUTime - Combined all fields in one test case - Adds GetInstanceStats_ComputeError test in RepoErrors suite - Updates TestInstanceHandlerGetStats to verify all new fields are present in the JSON response wrapper
- Improve DomainGetCPUStats discriminator comment: "4 = ULLONG discriminator per go-libvirt TypedParamValue" (was "D==4 means ULLONG") - Add explanation comment on CPUTimeNanoseconds json tag noting it's Libvirt-only (Docker uses delta-based percentage instead)
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR extends the instance statistics API with additional real-time metrics: network I/O (rx/tx bytes), disk I/O (read/write bytes), and CPU time in nanoseconds. The libvirt adapter now fetches CPU stats via a new ChangesInstance Statistics Enhancement
Sequence DiagramsequenceDiagram
participant Handler
participant Service
participant Adapter
participant LibvirtClient
participant Domain
Handler->>Service: GetInstanceStats(instanceName)
Service->>Adapter: GetInstanceStats(domainName)
Adapter->>LibvirtClient: DomainMemoryStats()
LibvirtClient-->>Adapter: memory usage & limit
Adapter->>LibvirtClient: DomainGetCPUStats()
LibvirtClient-->>Adapter: cpu_time (typed param)
Adapter-->>Service: RawDockerStats {<br/> memory_stats,<br/> cpu_stats { cpu_time },<br/> network_stats { interfaces },<br/> blkio_stats { read/write ops }<br/>}
Service->>Service: calculateInstanceStats()
Note over Service: Aggregate network Rx/Tx<br/>across interfaces
Note over Service: Sum disk read/write<br/>from blkio entries
Service-->>Handler: InstanceStats {<br/> cpu_percentage,<br/> memory_usage/limit/percentage,<br/> network_rx/tx_bytes,<br/> disk_read/write_bytes,<br/> cpu_time_nanoseconds<br/>}
Handler-->>Handler: Marshal JSON response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 53 minutes and 41 seconds.Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/swagger/swagger.json (2)
4186-4186:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate
/instances/{id}/statsdescription to match expanded payloadLine 4186 still says “CPU and Memory usage,” but the response now includes network I/O, disk I/O, and CPU time nanoseconds. This will mislead API consumers reading the endpoint docs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/swagger/swagger.json` at line 4186, Update the OpenAPI description for the /instances/{id}/stats endpoint so it accurately reflects the full response payload: mention CPU and Memory usage plus network I/O, disk I/O, and CPU time in nanoseconds; locate the "description" field under the path "/instances/{id}/stats" in the swagger JSON and replace the existing text ("Gets real-time CPU and Memory usage for a compute instance") with a concise description that lists all returned metrics (CPU usage, memory usage, network I/O, disk I/O, and CPU time in nanoseconds).
9051-9074:⚠️ Potential issue | 🟠 MajorAdd
format: int64to integer counter fields to prevent client-side overflowThe new counter fields on lines 9052, 9055, 9058, 9070, and 9073 are declared as
"type": "integer"without"format". Popular Swagger/OpenAPI 2.0 code generators (swagger-codegen, OpenAPI Generator) default unformatted integers to 32-bit types, which overflow quickly for byte counts and nanosecond timestamps. Adding"format": "int64"ensures 64-bit mapping in generated client code.Also update the endpoint description at line 4186 from "Gets real-time CPU and Memory usage" to reflect that the response now includes disk I/O and network statistics.
💡 Suggested patch
"cpu_time_nanoseconds": { "description": "only populated by Libvirt backend; Docker uses delta-based percentage instead", - "type": "integer" + "type": "integer", + "format": "int64" }, "disk_read_bytes": { - "type": "integer" + "type": "integer", + "format": "int64" }, "disk_write_bytes": { - "type": "integer" + "type": "integer", + "format": "int64" }, "network_rx_bytes": { - "type": "integer" + "type": "integer", + "format": "int64" }, "network_tx_bytes": { - "type": "integer" + "type": "integer", + "format": "int64" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/swagger/swagger.json` around lines 9051 - 9074, Update the OpenAPI schema properties cpu_time_nanoseconds, disk_read_bytes, disk_write_bytes, network_rx_bytes, and network_tx_bytes to include "format": "int64" so generators map them to 64-bit integers (preventing 32-bit overflow); locate these properties by their names in the snippet and add the format key alongside "type": "integer". Also update the endpoint description text (the operation that currently reads "Gets real-time CPU and Memory usage") to mention that the response now includes disk I/O and network statistics so documentation matches the extended response fields.
🧹 Nitpick comments (2)
internal/handlers/instance_handler_test.go (1)
485-499: ⚡ Quick winUse typed JSON decoding instead of
map[string]interface{}castsLine 485 onward uses panic-prone type assertions (
.(float64)) for every field. Prefer unmarshallingdataintodomain.InstanceStatsso failures are assertion-driven and easier to diagnose.💡 Suggested change
- var wrapper struct { - Data map[string]interface{} `json:"data"` - } + var wrapper struct { + Data domain.InstanceStats `json:"data"` + } err := json.Unmarshal(w.Body.Bytes(), &wrapper) require.NoError(t, err) - assert.InDelta(t, 10.5, wrapper.Data["cpu_percentage"], 0.01) - assert.InDelta(t, 128, wrapper.Data["memory_usage_bytes"], 0.01) - assert.InDelta(t, 256, wrapper.Data["memory_limit_bytes"], 0.01) - assert.InDelta(t, 50.0, wrapper.Data["memory_percentage"], 0.01) - assert.Equal(t, uint64(1024), uint64(wrapper.Data["network_rx_bytes"].(float64))) - assert.Equal(t, uint64(512), uint64(wrapper.Data["network_tx_bytes"].(float64))) - assert.Equal(t, uint64(4096), uint64(wrapper.Data["disk_read_bytes"].(float64))) - assert.Equal(t, uint64(2048), uint64(wrapper.Data["disk_write_bytes"].(float64))) - assert.Equal(t, uint64(3000000000), uint64(wrapper.Data["cpu_time_nanoseconds"].(float64))) + assert.InDelta(t, 10.5, wrapper.Data.CPUPercentage, 0.01) + assert.InDelta(t, 128, wrapper.Data.MemoryUsageBytes, 0.01) + assert.InDelta(t, 256, wrapper.Data.MemoryLimitBytes, 0.01) + assert.InDelta(t, 50.0, wrapper.Data.MemoryPercentage, 0.01) + assert.Equal(t, uint64(1024), wrapper.Data.NetworkRxBytes) + assert.Equal(t, uint64(512), wrapper.Data.NetworkTxBytes) + assert.Equal(t, uint64(4096), wrapper.Data.DiskReadBytes) + assert.Equal(t, uint64(2048), wrapper.Data.DiskWriteBytes) + assert.Equal(t, uint64(3000000000), wrapper.Data.CPUTimeNanoseconds)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/handlers/instance_handler_test.go` around lines 485 - 499, The test unmarshals response into a generic map with panic-prone float64 casts; change the local wrapper to unmarshal "data" into the concrete domain.InstanceStats type (e.g. struct { Data domain.InstanceStats `json:"data"` }) instead of map[string]interface{}, update the json.Unmarshal call to use that wrapper, and replace the current assertions that use type assertions (wrapper.Data["..."].(float64)) with direct, typed checks against wrapper.Data's fields (like CpuPercentage, MemoryUsageBytes, NetworkRxBytes, DiskReadBytes, CpuTimeNanoseconds) to make failures deterministic and safer.docs/swagger/docs.go (1)
9059-9082: Consider adding explicitformat: int64to large integer fields for downstream tool clarity.Lines 9059-9082 define byte counters and nanosecond timestamps as
"type": "integer"without format. While OpenAPI 2.0 spec allows this and does not default to 32-bit, explicitly addingformat: int64would improve clarity for client generators and ensure consistent type interpretation across tools.Update the source struct annotations (not this generated file) to emit
format: int64, then regenerate Swagger. This is a polish improvement, not a spec compliance issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/swagger/docs.go` around lines 9059 - 9082, The generated schema omits explicit int64 formats for large integer counters; update the source Go structs' field annotations for the big counters (e.g., CpuTimeNanoseconds, DiskReadBytes, DiskWriteBytes, NetworkRxBytes, NetworkTxBytes and any other byte/nanosecond counter fields) to emit format:int64 in the Swagger metadata (via the struct tag / swagger annotation your generator recognizes), then regenerate docs so the generated docs.go contains "format": "int64" for those properties. Ensure you modify the source struct tags/annotations (not docs.go) and run the swagger generation step to apply the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/api-reference.md`:
- Around line 295-297: The docs list a `503` response but the handler currently
returns `500` via the httputil.Error default mapping; update either the docs or
the code so they match. If you intend to keep `503`, change the error mapping in
the handler (where httputil.Error is produced/returned) to emit an HTTP 503 for
backend/internal stats failures (adjust the error creation or mapping logic in
the endpoint handler function that calls into httputil.Error); otherwise, update
the docs in docs/api-reference.md to document `500` instead of `503` so the
documentation matches the current implementation.
In `@docs/swagger/swagger.yaml`:
- Around line 666-683: The integer counters in the schema (cpu_time_nanoseconds,
disk_read_bytes, disk_write_bytes, memory_limit_bytes, memory_usage_bytes,
network_rx_bytes, network_tx_bytes) should be annotated with format: uint64 so
code generators treat them as 64-bit unsigned counters; update those property
definitions in swagger.yaml to add format: uint64 and leave memory_percentage as
number (float). Also add a note in the schema docs or examples recommending
string encoding for values >53-bit when consumed by JavaScript clients to avoid
precision loss.
In `@internal/repositories/libvirt/adapter.go`:
- Around line 606-616: The code currently swallows errors from
a.client.DomainGetCPUStats and leaves cpuTime as 0; instead check err != nil
after calling DomainGetCPUStats and propagate or surface the error (e.g., return
fmt.Errorf("DomainGetCPUStats failed: %w", err) or pass the wrapped error up the
call chain) rather than continuing with a zero cpuTime; update the surrounding
function that uses cpuParams/cpuTime to accept and handle the propagated error
so failures are not silently ignored.
- Line 610: Replace the magic number 4 used as the TypedParamValue discriminator
in the cpu_time check with a named constant: declare a package-level constant
(e.g., TypedParamValueDiscriminatorULLONG or TypedParamValueULLONG = 4) near
other constants in adapter.go and update the condition p.Value.D == 4 to use
that constant (p.Value.D == TypedParamValueULLONG); ensure the constant name
clearly indicates it represents the ULLONG discriminator used by go-libvirt and
update any related comments to reflect the constant usage.
---
Outside diff comments:
In `@docs/swagger/swagger.json`:
- Line 4186: Update the OpenAPI description for the /instances/{id}/stats
endpoint so it accurately reflects the full response payload: mention CPU and
Memory usage plus network I/O, disk I/O, and CPU time in nanoseconds; locate the
"description" field under the path "/instances/{id}/stats" in the swagger JSON
and replace the existing text ("Gets real-time CPU and Memory usage for a
compute instance") with a concise description that lists all returned metrics
(CPU usage, memory usage, network I/O, disk I/O, and CPU time in nanoseconds).
- Around line 9051-9074: Update the OpenAPI schema properties
cpu_time_nanoseconds, disk_read_bytes, disk_write_bytes, network_rx_bytes, and
network_tx_bytes to include "format": "int64" so generators map them to 64-bit
integers (preventing 32-bit overflow); locate these properties by their names in
the snippet and add the format key alongside "type": "integer". Also update the
endpoint description text (the operation that currently reads "Gets real-time
CPU and Memory usage") to mention that the response now includes disk I/O and
network statistics so documentation matches the extended response fields.
---
Nitpick comments:
In `@docs/swagger/docs.go`:
- Around line 9059-9082: The generated schema omits explicit int64 formats for
large integer counters; update the source Go structs' field annotations for the
big counters (e.g., CpuTimeNanoseconds, DiskReadBytes, DiskWriteBytes,
NetworkRxBytes, NetworkTxBytes and any other byte/nanosecond counter fields) to
emit format:int64 in the Swagger metadata (via the struct tag / swagger
annotation your generator recognizes), then regenerate docs so the generated
docs.go contains "format": "int64" for those properties. Ensure you modify the
source struct tags/annotations (not docs.go) and run the swagger generation step
to apply the change.
In `@internal/handlers/instance_handler_test.go`:
- Around line 485-499: The test unmarshals response into a generic map with
panic-prone float64 casts; change the local wrapper to unmarshal "data" into the
concrete domain.InstanceStats type (e.g. struct { Data domain.InstanceStats
`json:"data"` }) instead of map[string]interface{}, update the json.Unmarshal
call to use that wrapper, and replace the current assertions that use type
assertions (wrapper.Data["..."].(float64)) with direct, typed checks against
wrapper.Data's fields (like CpuPercentage, MemoryUsageBytes, NetworkRxBytes,
DiskReadBytes, CpuTimeNanoseconds) to make failures deterministic and safer.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4ffbd0d9-1c22-4090-a558-75039d7a401b
📒 Files selected for processing (14)
docs/api-reference.mddocs/swagger/docs.godocs/swagger/swagger.jsondocs/swagger/swagger.yamlinternal/core/domain/instance.gointernal/core/services/instance.gointernal/core/services/instance_internal_test.gointernal/core/services/instance_unit_test.gointernal/handlers/instance_handler_test.gointernal/repositories/libvirt/adapter.gointernal/repositories/libvirt/adapter_unit_test.gointernal/repositories/libvirt/libvirt_client.gointernal/repositories/libvirt/mock_client_test.gointernal/repositories/libvirt/real_client.go
| **Error Responses:** | ||
| - `404` — Instance not found | ||
| - `503` — Backend stats unavailable |
There was a problem hiding this comment.
Documented error code does not match actual handler behavior
Line 297 documents 503, but this endpoint currently returns 500 for internal/backend failures via httputil.Error default mapping. Please align docs with implementation (or implement explicit 503 mapping if that is the intended contract).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/api-reference.md` around lines 295 - 297, The docs list a `503` response
but the handler currently returns `500` via the httputil.Error default mapping;
update either the docs or the code so they match. If you intend to keep `503`,
change the error mapping in the handler (where httputil.Error is
produced/returned) to emit an HTTP 503 for backend/internal stats failures
(adjust the error creation or mapping logic in the endpoint handler function
that calls into httputil.Error); otherwise, update the docs in
docs/api-reference.md to document `500` instead of `503` so the documentation
matches the current implementation.
| cpu_time_nanoseconds: | ||
| description: only populated by Libvirt backend; Docker uses delta-based percentage | ||
| instead | ||
| type: integer | ||
| disk_read_bytes: | ||
| type: integer | ||
| disk_write_bytes: | ||
| type: integer | ||
| memory_limit_bytes: | ||
| type: number | ||
| memory_percentage: | ||
| type: number | ||
| memory_usage_bytes: | ||
| type: number | ||
| network_rx_bytes: | ||
| type: integer | ||
| network_tx_bytes: | ||
| type: integer |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In OpenAPI 2.0 (Swagger), how do code generators treat type: integerwhenformat is omitted, and what is the recommended format for uint64-like counters?
💡 Result:
- In OpenAPI/Swagger 2.0, when a schema/parameter uses type: integer but omits format, code generators commonly treat it as the default 32-bit signed integer (int32). For example, OpenAPI Generator’s Go generator maps integer (without format) to an int32-like type, rather than treating it as a generic integer. 2) For “uint64-like counters”, the recommended approach is to use the uint64 format (not integer/int64). In the OpenAPI format registry, uint64 is defined as an unsigned 64-bit integer and recommends representing values as JSON strings when they are outside the 53-bit safe range.
Citations:
- 1: https://swagger.io/specification/v2/
- 2: https://spec.openapis.org/registry/format/uint64
- 3: [REQ] [Go] Map integer type without format qualifier to int OpenAPITools/openapi-generator#3055
- 4: https://github.com/swagger-api/swagger-spec/blob/master/versions/2.0.md
Specify uint64 format for high-range counters in schema
Lines 666-683 define byte/time counters as bare type: integer. Code generators interpret this as 32-bit signed integers by default, which is insufficient for backend uint64 fields and risks overflow/truncation in SDK consumers.
Use format: uint64 to explicitly match backend semantics. For JSON serialization of values exceeding 53-bit safe range, consider representing large values as strings when consumed by JavaScript clients.
🛠️ Suggested schema patch
cpu_time_nanoseconds:
description: only populated by Libvirt backend; Docker uses delta-based percentage
instead
type: integer
+ format: uint64
disk_read_bytes:
type: integer
+ format: uint64
disk_write_bytes:
type: integer
+ format: uint64
@@
network_rx_bytes:
type: integer
+ format: uint64
network_tx_bytes:
type: integer
+ format: uint64🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/swagger/swagger.yaml` around lines 666 - 683, The integer counters in
the schema (cpu_time_nanoseconds, disk_read_bytes, disk_write_bytes,
memory_limit_bytes, memory_usage_bytes, network_rx_bytes, network_tx_bytes)
should be annotated with format: uint64 so code generators treat them as 64-bit
unsigned counters; update those property definitions in swagger.yaml to add
format: uint64 and leave memory_percentage as number (float). Also add a note in
the schema docs or examples recommending string encoding for values >53-bit when
consumed by JavaScript clients to avoid precision loss.
- Add typedParamULLONG constant (4) to clarify magic number - Add else branch comment for graceful degradation on DomainGetCPUStats error - Add GetInstanceStats_WithCPUStats subtest exercising cpu_time extraction - Fix api-reference.md field types: int → uint64
Summary
network_rx_bytes,network_tx_bytes,disk_read_bytes,disk_write_bytes, andcpu_time_nanosecondstoInstanceStatsRawDockerStatswithNetworkStats,BlkioStatsto decode Docker's full stats payloadDomainGetCPUStatstoLibvirtClientinterface for Libvirt backend CPU timecalculateInstanceStatsto sum network/disk fields from both backendsTest plan
go test ./internal/core/services/... -v -run "CalculateInstanceStats"— 5 subcases passgo test ./internal/handlers/... -v -run "GetStats"— validates full response schemago test ./internal/repositories/libvirt/...— DomainGetCPUStats mock verifiedgo build ./...— compiles cleango vet ./...— no issuesSummary by CodeRabbit
New Features
GET /instances/:id/statsendpoint for real-time compute instance metricsDocumentation