Skip to content

feat: learned routing weights from human-labeled junctions #211

Description

@jameshgrn

Summary

Replaced the handcrafted lexicographic 3-tuple (effective_width, log_facc, pathlen) junction scoring with a weighted scalar score learned from 1,967 human-labeled junction decisions.

Methodology

  1. Training data: 1,967 unique junctions from 906 backwater QC paths, labeled by JHG. At each junction, the human selected the correct upstream branch.
  2. Feature engineering: Pairwise log1p-difference features comparing two candidate branches (A vs B): log1p(attr_A) - log1p(attr_B) for width, facc, slope, pathlen_hw; integer diff for stream_order.
  3. Model: Logistic regression on mirrored pairwise features (no regularization). Coefficients directly become routing weights.
  4. Validation: Leave-junction-out GroupKFold CV: 89.7% junction accuracy vs 88.3% for old 3-tuple.

Learned Weights

score = 1.972*log1p(ew) + 0.227*log1p(facc) - 0.228*log1p(slope) + 0.234*log1p(pathlen) + 0.288*stream_order
Feature Weight Share Note
effective_width +1.972 67% SWOT-preferred (n_obs≥5), else GRWL
stream_order +0.288 10% new — not in old 3-tuple
pathlen +0.234 8% log1p-transformed cumulative path length
slope -0.228 8% new, negative — prefer lower gradient (mainstem)
facc +0.227 8% log1p(flow accumulation)

Two new signals vs old 3-tuple

  • slope (negative weight): mainstem channels have lower gradients than tributaries. This is a fundamental geomorphic pattern the old ranking ignored entirely.
  • stream_order: higher-order reaches are preferred. 10% of the score.

Impact

Pipeline rerun on all 6 regions:

  • Junction accuracy: 88.3% → 89.7% on 1,967 labeled junctions
  • Net improvement: +29 junctions (146 pipeline-wins vs 117 old-wins at 263 disagreements)
  • Routing changes: ~61K best_headwater, ~85K best_outlet changed across 248K reaches
  • Mainstem overlap: 99.9% agreement with old algorithm on major rivers (Mississippi, Amazon, Nile, Danube, Mekong, Murray)
  • All post-save gates passed (V001, V005, V007, V008, T001, T002)
  • 0 monotonicity violations in hydro_dist_out

Files

  • Weights: src/sword_v17c_pipeline/stages/graph.py (ROUTING_WEIGHTS, routing_score())
  • Used by: stages/distances.py (best_headwater/outlet), stages/mainstem.py (mainstem walk + main neighbors)
  • Training labels: data/backwater_junction_labels_v003.parquet
  • GBM comparison model: data/routing_gbm_v3.joblib

GBM comparison

Also trained a GBM (gradient boosting) on the same data: 90.0% CV accuracy vs 89.7% LogReg. The 0.3% difference doesn't justify a model file dependency. LogReg weights are hardcoded — no runtime dependency, fully interpretable.

Error analysis

158 errors (8%) on labeled junctions fall into 3 categories:

  • CAT1 (42%): Human picked narrower + less facc (deltas, tidal — unlearnable from reach attributes)
  • CAT2 (40%): Width misleading, facc correct (lakes, wide side channels)
  • CAT3 (18%): Facc misleading, width correct (already handled)

The ~90% ceiling is structural — CAT1 errors require river name continuity or geographic knowledge no reach-level attribute can provide.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions