Skip to content

bug: two portfolio-monitoring systems run simultaneously — race conditions and potential double-rebalancing #5

Description

@Uchechukwu-Ekezie

Description

The backend starts two independent systems that both scan portfolios and trigger rebalancing on overlapping schedules. There is no coordination, locking, or deduplication between them, creating race conditions where the same portfolio can be rebalanced twice simultaneously.

The Two Systems

System 1: RebalancingService (legacy cron — monitoring/rebalancer.ts)

// monitoring/rebalancer.ts, line 20
start() {
    cron.schedule('*/2 * * * *', async () => {  // Every 2 minutes
        await this.checkAllPortfolios()
    })
}

System 2: AutoRebalancerService (queue-backed — services/autoRebalancer.ts)

// autoRebalancer.ts, line 17
private readonly CHECK_INTERVAL = 30 * 60 * 1000  // Every 30 minutes via BullMQ

Both Started in index.ts

// index.ts, lines 224-231
// Note: comment says "now queue-backed, no cron" but cron still runs
const rebalancingService = new RebalancingService(wss)
rebalancingService.start()   // ← Cron fires every 2 minutes

// ... later ...
await autoRebalancer.start() // ← BullMQ queue fires every 30 minutes

The comment on line 224 reads // now queue-backed, no cron — suggesting RebalancingService was intended to be removed when AutoRebalancerService was introduced, but it was never actually removed.

Failure Scenarios

Scenario A — Double rebalance:
At T=0, cron fires and finds Portfolio X needs rebalancing. It enqueues a rebalance job. At T=30min, BullMQ also fires and finds Portfolio X still needs rebalancing (because the first job is still processing). A second rebalance job is enqueued. Both execute, resulting in over-trading and incorrect final allocation.

Scenario B — Duplicate database writes:
Both systems independently call recordRebalanceEvent(), inserting two records for the same rebalance into rebalance_events. The history endpoint returns duplicate entries.

Scenario C — Conflicting circuit breaker states:
Cron-triggered rebalance opens a circuit breaker. Queue-triggered rebalance sees the circuit as open and skips. Neither system knows the other's state.

Steps to Identify

grep -n "rebalancingService\|autoRebalancer" backend/src/index.ts
# Lines 224-231: both are started sequentially with no exclusion
grep -rn "checkAllPortfolios\|portfolioCheckWorker" backend/src/
# Both scan the same portfolios table

Proposed Fix

Option A (Recommended): Remove RebalancingService entirely since AutoRebalancerService is the intended replacement. Delete monitoring/rebalancer.ts and remove its startup from index.ts.

Option B: If RebalancingService has functionality not present in AutoRebalancerService (e.g., WebSocket broadcast on drift detection), extract that behavior into the queue-backed system and then remove the cron.

Option C (Minimum viable): Add a distributed lock (Redis SET NX EX) around portfolio checks so only one system can run at a time. This prevents double-execution without requiring a refactor.

// In both systems' check loops:
const lock = await redis.set('portfolio-check-lock', '1', 'NX', 'EX', 120)
if (!lock) return // Another check is in progress
try {
    await checkAllPortfolios()
} finally {
    await redis.del('portfolio-check-lock')
}

Files Affected

  • backend/src/index.ts — lines 224–231
  • backend/src/monitoring/rebalancer.ts — remove or repurpose
  • backend/src/services/autoRebalancer.ts — add deduplication if needed

Metadata

Metadata

Assignees

Labels

GrantFox OSSIssue tracked in GrantFox OSSMaybe RewardedIssue may be eligible for a GrantFox rewardOfficial CampaignCampaign: Official Campaign

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions