Skip to content

Automated regression test suite for validator consensus failure modes#118

Open
ekwe7 wants to merge 2 commits into
VeriNode-Labs:mainfrom
ekwe7:Automated-Regression-Test-Suite-for-Validator-Consensus-Failure-Modes
Open

Automated regression test suite for validator consensus failure modes#118
ekwe7 wants to merge 2 commits into
VeriNode-Labs:mainfrom
ekwe7:Automated-Regression-Test-Suite-for-Validator-Consensus-Failure-Modes

Conversation

@ekwe7

@ekwe7 ekwe7 commented Jun 27, 2026

Copy link
Copy Markdown

closes #73

Summary

This PR introduces a comprehensive fault-injection testing framework for the consensus layer. The new regression suite simulates real-world failure conditions and validates protocol recovery behavior under controlled scenarios.

The implementation enables automated reproduction of failure modes that previously required manual investigation, improving reliability, regression detection, and confidence in consensus changes.

Problem

The consensus layer currently lacks an automated mechanism for validating behavior under common distributed-system failure conditions.

As a result:

  • Production failures require manual reproduction
  • Recovery behavior is difficult to validate consistently
  • Consensus regressions can go undetected
  • Failure-mode testing is not standardized
  • CI pipelines cannot verify protocol resilience automatically

Solution

This PR introduces a configurable simulation environment capable of injecting faults into consensus communication and validating protocol recovery.

The framework supports:

  • Configurable validator topologies
  • Controlled fault injection
  • Automated recovery verification
  • CI execution with reporting
  • Repeatable regression testing

Key Features

Fault Injection API

Implemented a fault injection layer for consensus messaging that enables controlled simulation of network and validator failures.

Supported fault types:

  • Network partitions
  • Message delays
  • Validator equivocation
  • Consensus timeouts

The API is extensible and can support additional fault classes in future testing efforts.

Simulation Runner

Added a simulation runner capable of:

  • Creating configurable validator sets
  • Executing consensus rounds under fault conditions
  • Measuring recovery outcomes
  • Running deterministic regression scenarios

The framework supports:

8+ concurrent validators

in accordance with project requirements.

Failure Mode Coverage

Added dedicated test scenarios covering:

Network Partition

  • Partial validator isolation
  • Majority/minority partitions
  • Partition healing and recovery

Message Delay

  • Delayed consensus messages
  • Out-of-order delivery scenarios
  • Recovery after delayed communication

Equivocation

  • Conflicting validator messages
  • Detection of invalid behavior
  • Protocol recovery after equivocation

Timeout Conditions

  • Delayed participation
  • Missing responses
  • Consensus timeout recovery

CI Integration

Added automated execution in CI.

Features include:

  • Full regression suite execution
  • Execution time bounded for CI environments
  • JUnit report generation
  • Failure visibility in build pipelines

Target execution time:

Under 10 minutes

for complete suite execution.

Testing Strategy

The regression framework validates:

  • Consensus safety
  • Consensus liveness
  • Fault recovery correctness
  • Protocol stability after recovery
  • Validator agreement guarantees

Coverage includes:

  • Known production failure scenarios
  • Historical regression cases
  • Fault combinations where applicable

Validation

Executed validation for:

  • Fault injection correctness
  • Simulation determinism
  • Recovery verification logic
  • Multi-validator execution
  • CI reporting integration
  • Performance constraints

Acceptance Criteria

  • Fault injection API implemented
  • Simulation runner supports at least 8 validators
  • Network partition scenarios covered
  • Message delay scenarios covered
  • Equivocation scenarios covered
  • Timeout scenarios covered
  • Minimum 95% failure-mode coverage achieved
  • CI integration added
  • JUnit reports generated
  • Complete suite executes within CI time constraints

Benefits

Reliability

  • Detects consensus regressions before deployment
  • Validates recovery behavior automatically
  • Reduces production risk

Developer Productivity

  • Eliminates manual failure reproduction workflows
  • Provides repeatable fault testing
  • Simplifies debugging of protocol changes

Operational Confidence

  • Improves confidence in consensus updates
  • Verifies resilience under realistic fault conditions
  • Ensures recovery guarantees remain intact

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Automated Regression Test Suite for Validator Consensus Failure Modes

2 participants