Skip to content

feat: add slow request tracking, request size histogram, and upstream health gauge#7

Merged
abtreece merged 1 commit into
mainfrom
feature/enhanced-metrics
Jan 16, 2026
Merged

feat: add slow request tracking, request size histogram, and upstream health gauge#7
abtreece merged 1 commit into
mainfrom
feature/enhanced-metrics

Conversation

@abtreece

Copy link
Copy Markdown
Owner

Summary

  • Add slow request counter (nginx_server_zone_slow_requests_total) - counts requests exceeding configurable threshold (default 1 second)
  • Add request size histogram (nginx_server_zone_request_length_bytes_bucket) - tracks request body size distribution with buckets from 100B to 100MB
  • Add upstream health gauge (nginx_upstream_health) - provides calculated health status (1=healthy, 0=unhealthy) based on failure rate threshold (default 10%)

Changes

  • lib/resty/ngxstats/common.lua: Added constants (REQUEST_SIZE_BUCKETS, SLOW_REQUEST_THRESHOLD, UPSTREAM_HEALTH_FAILURE_THRESHOLD) and calculate_upstream_health() function
  • lib/resty/ngxstats/log.lua: Refactored to extract record_ssl_metrics() and record_upstream_metrics() helper functions, added new metric collection
  • lib/resty/ngxstats/prometheus.lua: Added metric definitions and handlers for new metrics
  • spec/common_spec.lua: Added 11 new unit tests for constants and health calculation
  • spec/prometheus_spec.lua: Added 3 new unit tests for Prometheus format output
  • README.md: Updated metrics table and feature list
  • CHANGELOG.md: Documented new features

Test plan

  • All 76 unit tests pass
  • All 55 integration tests pass
  • Linting passes with 0 warnings/errors
  • CI pipeline passes

… health gauge

- Add slow request counter (requests exceeding 1s threshold)
- Add request size histogram with buckets from 100B to 100MB
- Add upstream health gauge (1=healthy, 0=unhealthy based on failure rate)
- Refactor log.lua to extract helper functions for SSL and upstream metrics
- Add unit tests for new constants and functions
- Update README with new metrics documentation
- Update CHANGELOG with new features
Copilot AI review requested due to automatic review settings January 16, 2026 01:41

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds three new monitoring capabilities to the ngxstats library: slow request tracking, request size histograms, and upstream health status calculation.

Changes:

  • Introduces slow request counter to track requests exceeding 1 second threshold
  • Adds request size histogram with buckets ranging from 100 bytes to 100MB
  • Implements upstream health gauge based on failure rate (10% threshold)

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
lib/resty/ngxstats/common.lua Added constants for new metrics and health calculation function
lib/resty/ngxstats/log.lua Refactored metric recording into helper functions and integrated new metrics
lib/resty/ngxstats/prometheus.lua Added Prometheus format definitions for new metrics
spec/common_spec.lua Added test coverage for new constants and health calculation logic
spec/prometheus_spec.lua Added tests verifying Prometheus output format for new metrics
README.md Updated documentation with new metrics and feature comparison
CHANGELOG.md Documented new features in unreleased section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +164 to 167
if proxy_host ~= "" then
local health = common.calculate_upstream_health(stats, upstream_name)
common.update_num(stats, common.key({'upstreams', upstream_name, 'health'}), health)
end

Copilot AI Jan 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The health calculation should be scoped within the upstream_response_time conditional block (lines 152-161) to ensure it only executes when upstream metrics are being recorded. Currently, it relies on upstream_name which is only set when an upstream exists, but the logic would be clearer if this block was inside the upstream metrics section.

Copilot uses AI. Check for mistakes.
Comment thread spec/common_spec.lua
Comment on lines +357 to +364
it("should return 1 (healthy) when exactly at threshold", function()
-- 90 requests, 10 failures = 10% failure rate (at threshold, not above)
mock_stats:set("upstreams:edge:requests", 90)
mock_stats:set("upstreams:edge:failures", 10)

local health = common.calculate_upstream_health(mock_stats, "edge")
assert.equals(1, health)
end)

Copilot AI Jan 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comment indicates the failure rate is 'at threshold, not above', but with 90 requests and 10 failures, the total is 100 and the failure rate is 10%, which equals the threshold (0.1). The implementation uses > on line 267 of common.lua, so this test correctly expects healthy (1). However, consider adding a test case with 91 requests and 10 failures (9.9% failure rate) to verify the boundary condition more explicitly.

Copilot uses AI. Check for mistakes.
local requests = stats:get(requests_key) or 0
local failures = stats:get(failures_key) or 0

-- Need at least some requests to determine health

Copilot AI Jan 16, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says 'Need at least some requests to determine health' but the code also returns healthy (1) when there is no data. Consider clarifying the comment to explicitly state the assumption, such as: 'Assume healthy if no data is available yet'.

Suggested change
-- Need at least some requests to determine health
-- Need at least some requests to determine health; assume healthy if no data is available yet

Copilot uses AI. Check for mistakes.
@abtreece abtreece merged commit d1d770b into main Jan 16, 2026
8 checks passed
@abtreece abtreece deleted the feature/enhanced-metrics branch January 16, 2026 01:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants