Skip to content

Refactor shard handling to use configurable shard number from config#62

Open
Salty-SourKey wants to merge 2 commits intomainfrom
configure_numshard
Open

Refactor shard handling to use configurable shard number from config#62
Salty-SourKey wants to merge 2 commits intomainfrom
configure_numshard

Conversation

@Salty-SourKey
Copy link
Contributor

This pull request makes the number of shards (ShardNum) fully configurable throughout the codebase, replacing the previous use of a hardcoded constant (NumShards). The value is now read from the global configuration, allowing for dynamic adjustment of the sharding scheme without code changes. Key updates include removing the NumShards constant, updating all shard calculations to use the configuration value, and ensuring the configuration is accessible where needed.

Configuration refactor and usage:

  • Removed the NumShards constant and all its usages, centralizing the shard count in the config package as ShardNum within the global configuration ([[1]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-86825c80bb79bd22c948318cf94fe4b66e4665be4a3e6c9629f7ac48d1abca07L18-L21), [[2]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1204)).
  • Updated all shard assignment logic (e.g., address-to-shard mapping, transaction routing) to use config.GetConfig().ShardNum instead of the hardcoded constant ([[1]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L600-R600), [[2]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L672-R672), [[3]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1029-R1029), [[4]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1053-R1053), [[5]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1138-R1138), [[6]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1254-R1261), [[7]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1310-R1309), [[8]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-92e3f20f15139f3a0f08b79893a50c6a572c0a0a6113fa3646ecb27c24c1e601L329-R329)).

Configuration access and initialization:

  • Added a global Configuration variable and a GetConfig() accessor in config/config.go to provide easy access to the loaded configuration throughout the application ([config/config.goR24-R29](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-fe44f09c4d5977b5f5eaea29170b6a0748819c9d02271746a20d81a5f3efca17R24-R29)).
  • Updated constructors and function calls that previously required NumShards to instead use the value from the configuration ([[1]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-ac581fb50a47860df636cb782d9b0d799cb03fc7a2269862499bad2a2d549516L58-R59), [[2]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-ce0efdc296e84775ac5f762e0e12b55f9b13c4b0232acb70120618501e543d0eL652-R652), [[3]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L588-R588), [[4]](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-f46d0f5ecb747bfa1413f2a709dd83173502c13e32221cd521e07d58bb1dcbe9L1289-R1288)).

Imports and cleanup:

  • Added necessary imports for the config package in files that now require access to the configuration ([internal/orchestrator/simulator.goR14](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-ac581fb50a47860df636cb782d9b0d799cb03fc7a2269862499bad2a2d549516R14)).
  • Minor whitespace cleanup in statedb.go ([internal/orchestrator/statedb.goL713](https://github.com/The-Sharding-Resurrection/Matrix/pull/62/files#diff-86825c80bb79bd22c948318cf94fe4b66e4665be4a3e6c9629f7ac48d1abca07L713)).

@claude
Copy link

claude bot commented Jan 22, 2026

Code Review: Configurable Shard Number Refactor

Thank you for this PR! The goal of making shard count configurable is valuable. However, there's a critical bug that will cause runtime panics in tests and potentially in production.


🔴 Critical Issue: Uninitialized Global Configuration

Problem: The PR introduces a global config.Configuration variable that is never initialized. All calls to config.GetConfig() will return a zero-valued Config struct where ShardNum = 0, causing division-by-zero panics in modulo operations.

Location: config/config.go:24-28

var Configuration Config  // ⚠️ Never set anywhere!

func GetConfig() Config {
    return Configuration  // Always returns {ShardNum: 0, ...}
}

Impact:

  • TestShardAssignment and other tests will panic: shard := int(addr[len(addr)-1]) % config.GetConfig().ShardNum → division by zero
  • All shard assignment logic (internal/shard/server.go:600, 672, 1029, 1053, 1138, 1254, 1261, 1310) will fail
  • Orchestrator simulator won't work (internal/orchestrator/simulator.go:59)

Root Cause: Neither cmd/shard/main.go nor cmd/orchestrator/main.go sets config.Configuration after loading the config file. The config is loaded but discarded.


🔧 Recommended Fix

Option 1: Initialize the global in Load functions (recommended)

// config/config.go
func Load(configPath string) (*Config, error) {
    // ... existing code ...
    if err := json.Unmarshal(data, cfg); err != nil {
        return nil, fmt.Errorf("failed to parse config file: %w", err)
    }
    
    Configuration = *cfg  // ✅ Set global
    return cfg, nil
}

func LoadDefault() (*Config, error) {
    // ... existing code ...
    cfg, err := Load(path)
    if err == nil {
        Configuration = *cfg  // ✅ Set global (or rely on Load())
        return cfg, nil
    }
    // ...
}

Option 2: Initialize in main entry points

In both cmd/shard/main.go and cmd/orchestrator/main.go:

cfg, err := config.LoadDefault()
if err != nil {
    log.Fatal(err)
}
config.Configuration = *cfg  // ✅ Set global

For tests: Add initialization in test setup:

// internal/shard/shard_test.go
func setupTestServer(t *testing.T, shardID int, orchestratorURL string) *Server {
    t.Helper()
    // Initialize config for tests
    config.Configuration = config.Config{ShardNum: 8}  // ✅ Match test expectations
    return NewServerForTest(shardID, orchestratorURL, config.NetworkConfig{})
}

📋 Other Issues

  1. Inconsistent initialization in main entry points

    • cmd/shard/main.go:39: Config is loaded but only NetworkConfig is extracted
    • cmd/orchestrator/main.go:49: Still uses numShards from flags/env instead of config
    • These should align: if config exists and has ShardNum, use it; otherwise fall back to flags/env
  2. Missing validation

    • No validation that ShardNum > 0 in config.Load()
    • Could add: if cfg.ShardNum <= 0 { return nil, fmt.Errorf("shard_num must be > 0") }
  3. Whitespace-only changes

    • internal/orchestrator/statedb.go: Many whitespace changes make the diff harder to review
    • While formatting improvements are good, consider separating them into a dedicated formatting commit

✅ What Works Well

  1. Complete migration: All NumShards constant references successfully removed
  2. Consistent pattern: Uses config.GetConfig().ShardNum uniformly across codebase
  3. Clean API: GetConfig() provides centralized access

🧪 Testing Recommendations

Before merging:

  1. Run go test ./... to verify tests pass (currently they will panic)
  2. Add a test in config/config_test.go that verifies LoadDefault() sets Configuration
  3. Test docker-compose startup to ensure config flows correctly to all services

📝 Documentation Updates Needed

Per CLAUDE.md requirements, please update:

  • docs/architecture.md: Document the global config pattern and initialization flow
  • docs/TODO.md: Check off the NumShards configurability item if it exists
  • README.md: Update configuration examples to show shard_num usage

Summary

Cannot merge until the initialization bug is fixed. Once Configuration is properly set, the refactor is well-executed and will enable dynamic shard count configuration. Please fix the critical issue and re-test before merging.

Let me know if you need any clarification on the fix!

@claude
Copy link

claude bot commented Jan 22, 2026

Code Review - PR #62: Refactor shard handling to use configurable shard number

Thank you for this refactoring! The goal of making the shard count configurable is excellent. However, I've identified several critical issues that need to be addressed before merging.


🚨 Critical Issues

1. Uninitialized Configuration in Tests (Race Condition & Panics)

Problem: The global Configuration variable is accessed via config.GetConfig().ShardNum throughout the codebase, but tests never initialize it. This causes:

  • Zero value bugs: ShardNum defaults to 0, causing division-by-zero panics in modulo operations
  • Race conditions: Multiple tests running concurrently may have undefined behavior

Affected locations:

  • internal/shard/shard_test.go:329 - TestShardAssignment will panic
  • internal/orchestrator/simulator_test.go - Any test using the simulator
  • All tests that indirectly call shard assignment logic

Fix required:

// Add to test files or a test_utils.go
func init() {
    config.Configuration = config.Config{
        ShardNum: 8,  // or load from test config
        // ... other fields
    }
}

Or better yet, add a test helper:

func setupTestConfig(t *testing.T) {
    t.Helper()
    config.Configuration = config.Config{
        ShardNum: 8,
        StorageDir: t.TempDir(),
        TestAccountNum: 100,
    }
}

2. Orchestrator Still Uses Parameter Instead of Config

Problem: The orchestrator's main.go passes numShards as a parameter to NewService(), but the service should be reading from the global config like the shards do.

Location: cmd/orchestrator/main.go:49

Current (inconsistent):

service, err := orchestrator.NewService(*numShards, *bytecodePath, networkConfig)

Should be:

// Ensure config is loaded before creating service
cfg, err := config.LoadDefault()
if err != nil {
    log.Fatalf("Failed to load config: %v", err)
}

service, err := orchestrator.NewService(cfg.ShardNum, *bytecodePath, networkConfig)
// OR refactor NewService to read from config.GetConfig().ShardNum internally

Why this matters: Currently, the orchestrator and shards can have different shard counts, breaking the entire system. The config must be the single source of truth.


3. Thread Safety Concern: Global Variable Access

Problem: config.GetConfig() returns a copy of the Configuration struct by value:

func GetConfig() Config {
    return Configuration  // Returns a copy
}

This is actually good for thread safety (prevents concurrent modification), but the initialization in Load() is not thread-safe:

Configuration = *cfg  // Unprotected write to global variable

Potential race condition:

  1. Main goroutine calls LoadDefault()
  2. Background goroutines (block producer, simulator) call GetConfig()
  3. If they interleave during startup, reads might see partial/corrupted config

Recommended fix:

var (
    Configuration Config
    configMu      sync.RWMutex
)

func GetConfig() Config {
    configMu.RLock()
    defer configMu.RUnlock()
    return Configuration
}

func Load(configPath string) (*Config, error) {
    // ... parse config ...
    
    configMu.Lock()
    Configuration = *cfg
    configMu.Unlock()
    return cfg, nil
}

Alternative (simpler): Use sync.Once to ensure config is loaded exactly once before any goroutines start.


⚠️ Moderate Issues

4. Config File Has Wrong Default (6 shards vs 8)

Location: config/config.json:2

The config file says "shard_num": 6, but:

  • Docker Compose defines 8 shards (shard-0 through shard-7)
  • Previous hardcoded constant was 8
  • Test expectations assume 8 shards

Fix: Change to "shard_num": 8 to match infrastructure.


5. Missing Config Validation

The Load() function doesn't validate that ShardNum > 0. This can cause panics when used in modulo operations.

Recommended:

func Load(configPath string) (*Config, error) {
    // ... existing parsing ...
    
    if cfg.ShardNum <= 0 {
        return nil, fmt.Errorf("invalid shard_num: must be > 0, got %d", cfg.ShardNum)
    }
    
    Configuration = *cfg
    return cfg, nil
}

✅ Positive Aspects

  1. Complete removal of hardcoded constant - All 13 usages of NumShards were successfully replaced
  2. Clean separation of concerns - Config package is properly isolated
  3. Backward compatibility - The change is transparent to users; API doesn't change
  4. Good documentation - PR description clearly explains the changes

📋 Testing Recommendations

Before merging, please verify:

  1. Unit tests pass:

    go test ./... -v
  2. Integration tests with config.json:

    docker compose up --build
    python scripts/test_cross_shard.py
  3. Different shard counts work:

    • Test with shard_num: 4
    • Test with shard_num: 16
    • Verify address mapping works correctly for all counts

🔧 Suggested Action Plan

  1. Fix critical issue WIP: Orchestrator evm #1 (test initialization) - blocking
  2. Fix critical issue Add Claude Code GitHub Workflow #2 (orchestrator consistency) - blocking
  3. Fix critical issue Refactor transaction handling: replace TxRef with Transaction type to implement Local Transaction in the future #3 (thread safety) - recommended
  4. Fix config file default (6 → 8) - recommended
  5. Add config validation - recommended
  6. Run full test suite
  7. Update documentation if needed (mention config initialization requirements)

Let me know if you'd like help implementing any of these fixes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant