Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
229 changes: 229 additions & 0 deletions docs/CONNECTION_POOL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
# Connection Pool Configuration

This document explains how `rust_loadtest` manages HTTP connections and how to
configure pooling behavior for different test scenarios.

## How Connection Pooling Works

Each load test builds a single `reqwest::Client` that maintains a connection
pool per target host. When a request completes, the underlying TCP connection
(including its TLS session) is returned to the pool. Subsequent requests grab
an existing connection from the pool instead of performing a new TCP handshake
and TLS negotiation.

This is the **default behavior** — no special configuration is needed to reuse
connections.

### When connections are reused

- Workers fire requests continuously (e.g., RPS >= 1)
- Idle connections haven't exceeded the idle timeout
- The pool hasn't reached the max idle limit

### When new connections are created

- First request from each worker (no pooled connection exists yet)
- Idle timeout expired — the pooled connection was closed
- `maxIdlePerHost` is set to 0 — pooling is effectively disabled
- The server closed the connection (e.g., server-side idle timeout)

## Configuration

Pool settings can be configured via **environment variables** (applied at
startup) or via the **YAML config** (applied per-test on `POST /config`).
YAML values override environment variables when present.

### Environment Variables

| Variable | Default | Description |
|--------------------------|---------|--------------------------------------------------|
| `POOL_MAX_IDLE_PER_HOST` | `32` | Maximum idle connections kept per host |
| `POOL_IDLE_TIMEOUT_SECS` | `30` | Seconds an idle connection stays in the pool |
| `TCP_NODELAY` | `true` | Disable Nagle's algorithm for lower latency |
| `REQUEST_TIMEOUT_SECS` | `30` | Per-request timeout |

### YAML Config

Add an optional `pool` section under `config`:

```yaml
config:
baseUrl: https://example.com
pool:
maxIdlePerHost: 32
idleTimeoutSecs: 30
metricsReuseThresholdMs: 100
```

| Field | Default | Description |
|--------------------------|---------|--------------------------------------------------|
| `maxIdlePerHost` | `32` | Max idle connections per host. Set to `0` to disable pooling. |
| `idleTimeoutSecs` | `30` | Seconds before idle connections are closed. Set to `0` to close immediately. |
| `metricsReuseThresholdMs`| `100` | Latency threshold (ms) for the Prometheus metrics heuristic. Does **not** affect actual connection behavior — only how metrics classify requests as "new" vs "reused". |

## Use Case: Force New Connection Per Request

Use this when you need every request to perform a full TCP + TLS handshake.
Useful for testing:

- TLS handshake latency and overhead
- Server-side connection establishment handling under load
- Certificate validation performance
- Load balancer connection distribution

```yaml
version: "1.0"
config:
baseUrl: https://api.example.com
workers: 10
duration: 5m
timeout: 30s
pool:
maxIdlePerHost: 0
idleTimeoutSecs: 0
load:
model: rps
target: 100
scenarios:
- name: new-connection-test
weight: 100
steps:
- name: request
request:
method: GET
path: /health
assertions:
- type: statusCode
expected: 200
```

With environment variables:

```bash
POOL_MAX_IDLE_PER_HOST=0 POOL_IDLE_TIMEOUT_SECS=0
```

## Use Case: Reuse Connections (Default)

Use this for standard load testing where you want realistic connection behavior.
Connections are established once and reused across requests, which is how most
production clients behave.

```yaml
version: "1.0"
config:
baseUrl: https://api.example.com
workers: 25
duration: 10m
timeout: 30s
# No pool section needed — defaults reuse connections
load:
model: rps
target: 1000
scenarios:
- name: reuse-connection-test
weight: 100
steps:
- name: request
request:
method: GET
path: /health
assertions:
- type: statusCode
expected: 200
```

## Use Case: Long-Lived Connection Reuse with Infrequent Requests

Use this when requests are spaced far apart (e.g., every 5 minutes) but you
want to keep the same TCP/TLS session alive between them. Increase the idle
timeout to prevent the pool from closing connections during gaps.

```yaml
version: "1.0"
config:
baseUrl: https://api.example.com
workers: 1
duration: 1h
timeout: 30s
pool:
maxIdlePerHost: 1
idleTimeoutSecs: 600
load:
model: rps
target: 1
scenarios:
- name: keepalive-test
weight: 100
steps:
- name: request
request:
method: POST
path: /oauth2/v1/token
body: "grant_type=client_credentials&client_id=my_id&client_secret=my_secret"
headers:
Content-Type: application/x-www-form-urlencoded
assertions:
- type: statusCode
expected: 200
thinkTime:
min: 4m
max: 5m
standby:
workers: 1
rps: 1.0
```

**Note:** Even with a high idle timeout, the remote server may close the
connection on its side (common server idle timeouts are 60-120s). The pool
will transparently open a new connection when this happens.

## Monitoring Connection Reuse

Prometheus metrics are available on port 9090:

| Metric | Type | Description |
|-----------------------------------------|------------|------------------------------------------|
| `connection_pool_likely_new_total` | Counter | Requests classified as new connections |
| `connection_pool_likely_reused_total` | Counter | Requests classified as reused connections|
| `connection_pool_reuse_rate_percent` | Gauge | Current reuse percentage |
| `connection_pool_requests_total` | Counter | Total requests tracked |
| `connection_pool_max_idle_per_host` | Gauge | Configured max idle setting |
| `connection_pool_idle_timeout_seconds` | Gauge | Configured idle timeout setting |

### Important: Metrics Are Heuristic-Based

The "new" vs "reused" classification uses a **latency heuristic**, not actual
connection state (reqwest does not expose this). Requests slower than
`metricsReuseThresholdMs` (default: 100ms) are classified as "likely new
connection" because a TLS handshake typically adds 50-150ms.

This means:

- Fast targets where TLS completes in <100ms will **undercount** new connections
- Slow targets where reused requests take >100ms will **overcount** new connections

Tune `metricsReuseThresholdMs` in the YAML to match your target's typical TLS
handshake time for more accurate classification. For definitive connection
tracking, check server-side access logs.

### Grafana Queries

**New vs reused connections over time (time series panel):**

| Query | Legend |
|-------------------------------------------------|----------|
| `rate(connection_pool_likely_reused_total[1m])` | Reused |
| `rate(connection_pool_likely_new_total[1m])` | New |

**Reuse rate (single stat panel):**

```promql
connection_pool_reuse_rate_percent
```

**Percentage of new connections (single stat panel):**

```promql
connection_pool_likely_new_total / connection_pool_requests_total * 100
```
31 changes: 23 additions & 8 deletions src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,6 +101,7 @@ pub struct Config {
// When Some, these override env-var defaults when building the HTTP client.
pub pool_max_idle_per_host: Option<usize>,
pub pool_idle_timeout_secs: Option<u64>,
pub pool_metrics_reuse_threshold_ms: Option<u64>,
}

/// Helper to get a required environment variable.
Expand Down Expand Up @@ -235,10 +236,15 @@ impl Config {
let auto_disable_percentiles_on_warning =
env_bool("AUTO_DISABLE_PERCENTILES_ON_WARNING", true);

let (pool_max_idle_per_host, pool_idle_timeout_secs) = match &yaml_config.config.pool {
Some(p) => (p.max_idle_per_host, p.idle_timeout_secs),
None => (None, None),
};
let (pool_max_idle_per_host, pool_idle_timeout_secs, pool_new_connection_threshold_ms) =
match &yaml_config.config.pool {
Some(p) => (
p.max_idle_per_host,
p.idle_timeout_secs,
p.metrics_reuse_threshold_ms,
),
None => (None, None, None),
};

let config = Config {
target_url,
Expand All @@ -263,6 +269,7 @@ impl Config {
cluster: ClusterConfig::from_env(),
pool_max_idle_per_host,
pool_idle_timeout_secs,
pool_metrics_reuse_threshold_ms,
};

config.validate()?;
Expand Down Expand Up @@ -330,10 +337,15 @@ impl Config {
let auto_disable_percentiles_on_warning =
env_bool("AUTO_DISABLE_PERCENTILES_ON_WARNING", true);

let (pool_max_idle_per_host, pool_idle_timeout_secs) = match &yaml_config.config.pool {
Some(p) => (p.max_idle_per_host, p.idle_timeout_secs),
None => (None, None),
};
let (pool_max_idle_per_host, pool_idle_timeout_secs, pool_new_connection_threshold_ms) =
match &yaml_config.config.pool {
Some(p) => (
p.max_idle_per_host,
p.idle_timeout_secs,
p.metrics_reuse_threshold_ms,
),
None => (None, None, None),
};

let config = Config {
target_url,
Expand All @@ -358,6 +370,7 @@ impl Config {
cluster: ClusterConfig::from_env(),
pool_max_idle_per_host,
pool_idle_timeout_secs,
pool_metrics_reuse_threshold_ms,
};

config.validate()?;
Expand Down Expand Up @@ -525,6 +538,7 @@ impl Config {
cluster: ClusterConfig::from_env(),
pool_max_idle_per_host: None,
pool_idle_timeout_secs: None,
pool_metrics_reuse_threshold_ms: None,
};

config.validate()?;
Expand Down Expand Up @@ -730,6 +744,7 @@ impl Config {
cluster: ClusterConfig::for_testing(),
pool_max_idle_per_host: None,
pool_idle_timeout_secs: None,
pool_metrics_reuse_threshold_ms: None,
}
}

Expand Down
22 changes: 13 additions & 9 deletions src/connection_pool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ pub struct PoolStatsTracker {

/// Threshold for considering a connection "likely new" (milliseconds)
/// Requests slower than this are likely establishing new connections
new_connection_threshold_ms: u64,
new_connection_threshold_ms: Arc<Mutex<u64>>,
}

impl PoolStatsTracker {
Expand All @@ -203,17 +203,23 @@ impl PoolStatsTracker {
pub fn new(new_connection_threshold_ms: u64) -> Self {
Self {
stats: Arc::new(Mutex::new(ConnectionStats::default())),
new_connection_threshold_ms,
new_connection_threshold_ms: Arc::new(Mutex::new(new_connection_threshold_ms)),
}
}

/// Update the latency threshold used to classify new vs reused connections.
pub fn set_threshold_ms(&self, threshold_ms: u64) {
*self.new_connection_threshold_ms.lock().unwrap() = threshold_ms;
}

/// Record a request with timing information.
///
/// Uses latency to infer connection reuse. Requests with very low latency
/// (<50ms typically) likely reused an existing connection. Slower requests
/// may have established a new connection (including TLS handshake).
pub fn record_request(&self, latency_ms: u64) {
let now = Instant::now();
let threshold = *self.new_connection_threshold_ms.lock().unwrap();
let mut stats = self.stats.lock().unwrap();

stats.total_requests += 1;
Expand All @@ -228,21 +234,19 @@ impl PoolStatsTracker {
// Infer connection type based on latency
// Fast requests (<threshold) likely reused connections
// Slow requests likely established new connections (TLS handshake adds ~50-100ms)
if latency_ms >= self.new_connection_threshold_ms {
if latency_ms >= threshold {
stats.likely_new_connections += 1;
CONNECTION_POOL_LIKELY_NEW.inc();
debug!(
latency_ms = latency_ms,
threshold = self.new_connection_threshold_ms,
"Request latency suggests new connection"
latency_ms,
threshold, "Request latency suggests new connection"
);
} else {
stats.likely_reused_connections += 1;
CONNECTION_POOL_LIKELY_REUSED.inc();
debug!(
latency_ms = latency_ms,
threshold = self.new_connection_threshold_ms,
"Request latency suggests reused connection"
latency_ms,
threshold, "Request latency suggests reused connection"
);
}

Expand Down
6 changes: 6 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1136,6 +1136,12 @@ async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
h.abort();
}

// Apply pool stats threshold from YAML and reset counters for new test.
if let Some(threshold_ms) = new_cfg.pool_metrics_reuse_threshold_ms {
GLOBAL_POOL_STATS.set_threshold_ms(threshold_ms);
}
GLOBAL_POOL_STATS.reset();

// Rebuild HTTP client in case TLS/pool config changed.
let new_client =
match rust_loadtest::client::build_client(&new_cfg.to_client_config()) {
Expand Down
7 changes: 7 additions & 0 deletions src/yaml_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,13 @@ pub struct YamlPoolConfig {
/// Set to 0 to immediately close connections after each request.
#[serde(rename = "idleTimeoutSecs")]
pub idle_timeout_secs: Option<u64>,

/// Latency threshold in milliseconds used by Prometheus metrics to classify
/// a request as a new connection vs a reused one (default: 100). Requests
/// slower than this are counted as "likely new connection". Does NOT affect
/// actual connection behavior — only the metrics heuristic.
#[serde(rename = "metricsReuseThresholdMs")]
pub metrics_reuse_threshold_ms: Option<u64>,
}

fn default_timeout() -> YamlDuration {
Expand Down
Loading