Skip to content

skill: add divergence-regime proxy test for collinear signals#1

Open
erik683 wants to merge 1 commit into
CSS-Electronics:masterfrom
erik683:pr-divergence-proxy
Open

skill: add divergence-regime proxy test for collinear signals#1
erik683 wants to merge 1 commit into
CSS-Electronics:masterfrom
erik683:pr-divergence-proxy

Conversation

@erik683

@erik683 erik683 commented Jul 1, 2026

Copy link
Copy Markdown

**Hi! Erik here. Felt like I should drop a human line before Clanker Claude takes over again (no offense, we obviously all love the guy...but check out all those em-dashes!). We were running into problems where certain signals like RPM, Torque or MAP rise and fall in sync to an extent and sometimes (or often) from what I understand as a non-mathematics PhD student, the formulas used to find signals can get "confused" by this and can't differentiate between them or worse, mistake one for the other. Luckily there are still differences that will separate all signals if you look hard enough, like Torque for example being able to go negative as you pull your foot off the gas and into coast. In that scenario it's clear we're not talking about a possible RPM signal any more!

I'm hopeful these additions to the reverse-engineering SKILL.md plus the helper scripts will mitigate false positives or negatives due to signals moving seemingly in tandem.**

Add divergence-regime proxy test for collinear signals

The problem

Some signals ride shotgun on the bus. Torque and airflow climb together on every normal pull, so if you just correlate candidate bytes against a torque reference, a byte carrying airflow (or RPM, or pedal position) will score just as well as the real torque byte — sometimes better. Global correlation can't tell "this is torque" from "this moves whenever torque moves." You can end up shipping a DBC entry for the wrong signal and not find out until it disagrees with reality out on the road.

The fix isn't a better formula — it's finding the moment the two signals actually disagree, and looking only there. Engine overrun (foot off the pedal, RPM still up) is that moment for torque vs. airflow: torque goes negative (engine braking), airflow stays low and positive. A byte that's really torque will follow torque into negative territory there. A byte that's really airflow won't.

What this adds

filter_regime.py — slices a trace down to a physical operating window you define from co-variate channels (RPM, pedal, MAP, whatever you've already got sidecars for). Point it at your trace, your target sidecar, your co-variate sidecars, and a plain condition like "rpm > 1800 and pedal < 2". It hands back the trace and target restricted to just that window, tells you what % of the log survived, and prints each co-variate's range inside vs. outside the window so you can sanity-check the slice actually caught the regime you meant (not just noise). The --where parser only understands comparisons and boolean logic on the channels you bind — no code execution risk from a crafted expression.

correlate.py --covariate — once you're inside that window, re-run correlate and hand it the co-variates too. For every candidate byte it now reports a margin: how much better it tracks your target than the best-fitting co-variate. Positive and it's following your target; negative and you've found a proxy wearing your target's clothes. It also flags proxy_suspect candidates — bytes that rank-track the reference suspiciously well but don't hold up to a straight linear check, the fingerprint of a stand-in signal rather than the real one.

Bug fix (the one that mattered on this dataset) — torque is a signed value, and on overrun it crosses zero. Read as unsigned, that crossing looks like a huge discontinuity (two's-complement wraps to a big positive number), which tanks the correlation score. The tool was correctly picking the signed interpretation as the winner, but it was still computing the target-vs-co-variate margin from the earlier, worse unsigned score — so the real torque byte looked like a weaker match than it was, right in the one window we built this feature to check. Now, once the signed read wins, the lag search re-runs on the signed data so the score and the margin both come from the interpretation actually being kept.

Docs — a walkthrough of the divergence-regime workflow (build co-variate sidecars → define the regime → filter → re-correlate with --covariate), plus a note on reporting discipline: a finite search that doesn't find a signal is evidence, not proof. Report what regime you tested, what you found, and a recommendation — not a flat "this isn't broadcast."

Why it's worth having

Collinear signals are everywhere on a real bus — every load-related channel tends to move with every other load-related channel under normal driving. Up to now, separating them meant staring at plots and hoping to spot the divergence by eye. This gives you a repeatable, scriptable way to find the window where two signals disagree and let the data settle which byte is which, instead of trusting a global correlation that both signals will pass.

Separate a target from a co-variate it rides on (e.g. torque vs airflow)
by restricting the search to the operating window where they diverge:

- filter_regime.py: slice a trace + target sidecar to a physical regime
  defined from co-variate channels (safe --where evaluator, retained-%
  and inside/outside-range report).
- correlate.py --covariate: score each candidate against co-variates too
  and emit a target-vs-co-variate Spearman margin, plus a proxy_suspect
  flag (high Spearman / low R^2).
- Fix: when the signed read wins, re-run the lag search on it so score
  and margin use that read; a signed field's margin was computed from the
  unsigned score, which collapses across a two's-complement zero-crossing
  (torque on overrun) and biased the margin against the real target.
- Docs: divergence-regime recipe, and reporting discipline (report the
  evidence and a recommendation, not a flat "not broadcast" from a finite
  search).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant