Skip to content

Latest commit

 

History

History
476 lines (379 loc) · 15.6 KB

File metadata and controls

476 lines (379 loc) · 15.6 KB

KAI Network Performance

This document provides guidance on network performance considerations, optimizations, and benchmarking for KAI's peer-to-peer networking system.

Performance Characteristics

KAI's networking system is designed to balance performance with reliability in distributed environments. Understanding these characteristics helps in building high-performance networked applications.

Key Performance Factors

  1. Latency

    • Round-trip time for commands and responses
    • Ping times between peers
    • Effect of geographic distribution on command execution
  2. Throughput

    • Maximum packets per second
    • Maximum connections per node
    • Data volume limitations
  3. Scalability

    • Connection scaling with node count
    • Message routing efficiency
    • Resource usage at scale
  4. Reliability

    • Packet loss handling
    • Connection failure recovery
    • Message ordering guarantees

Performance Measurements

The KAI network system's performance has been measured under various conditions:

Baseline Performance

Metric Value Notes
Minimum Latency 1-2 ms Local network, minimal load
Maximum Throughput ~10,000 commands/sec Local network, simple commands
Maximum Connections 32 per node Default ENet configuration
Serialization Overhead 20-100 bytes Depends on command complexity
Connection Setup Time 50-200 ms Includes discovery and handshake

Scaling Characteristics

┌────────────────────────────────────────────────┐
│                                                │
│  Latency vs. Number of Connected Peers         │
│                                                │
│  50 ┼                                  *       │
│     │                               *          │
│     │                            *             │
│  40 ┼                         *                │
│     │                      *                   │
│     │                   *                      │
│  30 ┼                *                         │
│     │             *                            │
│     │          *                               │
│  20 ┼       *                                  │
│     │    *                                     │
│     │ *                                        │
│  10 ┼                                          │
│     │                                          │
│     └┬─────┬─────┬─────┬─────┬─────┬─────┬────┐
│      0     5     10    15    20    25    30    │
│              Number of Connected Peers         │
└────────────────────────────────────────────────┘

ENet is designed to maintain reasonable performance up to its connection limit, with a gradual increase in resource usage as peer count grows.

Optimizing Performance

Configuration Parameters

The following configuration parameters can significantly affect network performance:

  1. ENet Configuration

    • Maximum connections
    • MTU size
    • Send/receive buffer sizes
    • ENet compression level
    • Congestion control parameters
  2. KAI Node Configuration

    • Update frequency
    • Connection timeouts
    • Peer discovery intervals
    • Command batch size

Command Execution Optimization

Optimize command execution performance with these strategies:

  1. Command Batching

    // Instead of sending multiple individual commands
    @peer command1
    @peer command2
    @peer command3
    
    // Batch them into a single command
    @peer batch command1; command2; command3
    
  2. Result Filtering

    // Specify only the fields needed to reduce response size
    @peer get_object --fields=name,position,status
    
  3. Selective Broadcasting

    // Only broadcast to peers that need the information
    @peers[group=workers] process_data
    
  4. Local Fallbacks

    // Execute locally if remote execution would be slower
    @peer[timeout=50ms] calculate || calculate_locally
    

Network Topology Considerations

The network topology significantly impacts performance:

┌─────────────────┐   ┌─────────────────┐
│                 │   │                 │
│  Mesh Topology  │   │  Star Topology  │
│                 │   │                 │
│    A───────B    │   │       A         │
│    │\      │    │   │      /│\        │
│    │ \     │    │   │     / │ \       │
│    │  \    │    │   │    /  │  \      │
│    │   \   │    │   │   /   │   \     │
│    │    \  │    │   │  B    C    D    │
│    │     \ │    │   │  │    │    │    │
│    C──────D     │   │  E    F    G    │
│                 │   │                 │
└─────────────────┘   └─────────────────┘
  1. Mesh Topology

    • Every node connects directly to every other node
    • Lowest latency for direct communication
    • Highest connection overhead (n² connections)
    • Best for smaller networks (<10 nodes)
  2. Star Topology

    • Central hub node with spoke connections
    • Reduced connection count (n connections)
    • Single point of failure
    • Additional hop for peer-to-peer communication
  3. Hybrid Topology

    • Clusters of nodes in mesh arrangement
    • Designated bridge nodes between clusters
    • Balances connection count and latency
    • Good for medium to large networks

Data Transmission Optimization

Optimize data transmission with these techniques:

  1. Object Caching

    • Cache frequently used objects locally
    • Use version numbers to detect changes
    • Refresh cache only when necessary
  2. Incremental Updates

    • Send only changed fields of objects
    • Use delta compression for large objects
    • Track object versions for change detection
  3. Compression

    • Use ENet's built-in compression for efficient transmission
    • Consider custom compression for specific data types
    • Balance compression ratio with CPU overhead
  4. Prioritization

    • Assign priorities to different message types
    • Ensure critical messages are sent first
    • Use lower priorities for bulk data transfers

Performance Testing

Benchmarking Methodology

A comprehensive benchmarking approach includes:

  1. Latency Testing

    • Measure round-trip time for commands
    • Test with varying payload sizes
    • Measure under different network conditions
  2. Throughput Testing

    • Determine maximum commands per second
    • Measure with different command complexities
    • Test with multiple concurrent clients
  3. Scalability Testing

    • Measure performance with increasing peer counts
    • Test different network topologies
    • Evaluate resource usage scaling

Sample Benchmark Script

#!/bin/bash
# Network performance test script

# Start 10 peer nodes in a mesh configuration
for i in {1..10}; do
  port=$((14590 + $i))
  ./Bin/NetworkPeer "config/bench_peer${i}.json" &
  PEER_PIDS[$i]=$!
  sleep 0.5
done

echo "All peers started, waiting for connections..."
sleep 5

# Run a series of benchmarks
echo "Starting latency test..."
for i in {1..100}; do
  start=$(date +%s%N)
  result=$(./Bin/CommandClient "127.0.0.1:14591" "echo ping")
  end=$(date +%s%N)
  echo "$i: $((($end - $start)/1000000)) ms"
done

echo "Starting throughput test..."
start=$(date +%s%N)
for i in {1..1000}; do
  ./Bin/CommandClient "127.0.0.1:14591" "noop" &>/dev/null
done
end=$(date +%s%N)
echo "1000 commands in $((($end - $start)/1000000)) ms"
echo "Throughput: $(echo "scale=2; 1000000 / $((($end - $start)/1000000))" | bc) cmds/sec"

# Clean up
for pid in ${PEER_PIDS[@]}; do
  kill $pid
done

echo "Benchmark complete"

Visualization and Analysis

Performance data can be visualized to identify bottlenecks:

┌────────────────────────────────────────────────┐
│                                                │
│  Command Throughput by Payload Size            │
│                                                │
│  10000 ┼   *                                   │
│        │    \                                  │
│        │     \                                 │
│   7500 ┼      \                               │
│        │       \                              │
│        │        \                             │
│   5000 ┼         *                            │
│        │          \                           │
│        │           \                          │
│   2500 ┼            *                         │
│        │             \                        │
│        │              *──────*──────*         │
│      0 ┼                                      │
│        │                                      │
│        └┬──────┬──────┬──────┬──────┬──────┬─┐
│         0B    10KB    100KB   1MB    10MB     │
│                   Payload Size                 │
└────────────────────────────────────────────────┘

Performance Monitoring

Key Metrics to Monitor

  1. Connection Metrics

    • Active connections
    • Connection attempts
    • Failed connections
    • Connection timeouts
  2. Traffic Metrics

    • Packets sent/received
    • Bytes sent/received
    • Packet loss rate
    • Retransmission rate
  3. Command Metrics

    • Commands executed
    • Command execution time
    • Command failure rate
    • Command queue length
  4. Resource Usage

    • CPU utilization
    • Memory consumption
    • Network bandwidth
    • Thread count

Implementing Monitoring

// Example monitoring integration
class NetworkMonitor {
public:
    static void RecordLatency(const std::string& command, int64_t microseconds) {
        // Record command execution latency
    }
    
    static void RecordTraffic(size_t bytesSent, size_t bytesReceived) {
        // Record traffic statistics
    }
    
    static void RecordConnectionEvent(const std::string& event) {
        // Record connection events
    }
    
    static void ExportMetrics(const std::string& filename) {
        // Export metrics to file
    }
};

Performance Tuning Guidelines

General Guidelines

  1. Start Simple

    • Begin with default configuration
    • Measure baseline performance
    • Identify bottlenecks before optimizing
  2. Profile Before Optimizing

    • Use profiling tools to identify slowdowns
    • Focus on the most significant bottlenecks first
    • Measure improvements after each change
  3. Test in Realistic Conditions

    • Test with expected network latency and packet loss
    • Simulate real-world usage patterns
    • Include error conditions and recovery scenarios

Environment-Specific Tuning

  1. Local Network

    • Increase update frequency for lower latency
    • Use higher throughput settings
    • Minimize compression to reduce CPU usage
  2. Wide Area Network

    • Increase timeouts for higher latency
    • Use more aggressive compression
    • Implement command batching to reduce round trips
  3. Internet Deployment

    • Implement reconnection handling
    • Use more conservative timeouts
    • Enable all reliability features

Resource Constrained Environments

  1. Low Memory

    • Reduce buffer sizes
    • Limit maximum connections
    • Use incremental object updates
  2. Low CPU

    • Disable compression for simple data
    • Reduce update frequency
    • Batch processing where possible
  3. Limited Bandwidth

    • Enable maximum compression
    • Implement aggressive filtering
    • Use delta updates for changed data only

Advanced Performance Techniques

Load Balancing

Implement load balancing across peers:

┌─────────┐     ┌─────────┐     ┌─────────┐
│ Peer A  │     │ Peer B  │     │ Peer C  │
└─────────┘     └─────────┘     └─────────┘
     │               │               │
     │  Task 1 (30%) │  Task 2 (20%) │ Task 3 (50%)
     │               │               │
  1. Workload Distribution

    • Assign tasks based on peer capabilities
    • Monitor peer load and adjust distribution
    • Implement work stealing for dynamic balancing
  2. Specialization

    • Designate peers for specific roles
    • Optimize peers for their specific tasks
    • Route related commands to specialized peers

Asynchronous Command Execution

Implement asynchronous execution for better throughput:

┌─────────┐                 ┌─────────┐
│ Peer A  │                 │ Peer B  │
└─────────┘                 └─────────┘
     │                           │
     │ Command (TaskID=123)      │
     │──────────────────────────>│
     │                           │
     │ Accepted (TaskID=123)     │
     │<──────────────────────────│
     │                           │ Processing
     │ Other operations...       │
     │                           │
     │ Result (TaskID=123)       │
     │<──────────────────────────│
     │                           │
  1. Task Tracking

    • Assign unique IDs to each command
    • Return immediate acknowledgment
    • Deliver results when available
  2. Progress Updates

    • Provide intermediate progress for long-running tasks
    • Allow cancellation of in-progress tasks
    • Implement timeout handling

Intelligent Command Routing

┌─────────┐     ┌─────────┐     ┌─────────┐
│ Peer A  │     │ Peer B  │     │ Peer C  │
└─────────┘     └─────────┘     └─────────┘
     │               │               │
     │ Command       │               │
     │───────────────┼───────────────┘
     │               │
     │               │ Has Resource?
     │               │
     │               │ Yes, Execute
     │               │
     │ Result        │
     │<──────────────┘
  1. Resource-Based Routing

    • Route commands to peers with required resources
    • Consider peer capabilities and current load
    • Implement fallbacks if preferred peer is unavailable
  2. Geography-Based Routing

    • Route to nearest peer for latency-sensitive operations
    • Consider network topology in routing decisions
    • Implement region-aware command distribution

Related Documentation