Skip to content

Optimize conntrack monitor time window.#1794

Merged
jasonlyc merged 1 commit intofirewalla:masterfrom
chgchi:my_dev
Mar 13, 2026
Merged

Optimize conntrack monitor time window.#1794
jasonlyc merged 1 commit intofirewalla:masterfrom
chgchi:my_dev

Conversation

@chgchi
Copy link
Contributor

@chgchi chgchi commented Mar 10, 2026

use nodejs spawn + kill child when any data matched.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the WAN “fallback connectivity” check that monitors conntrack -E updates, aiming to shorten average detection time by streaming output via child_process.spawn and terminating the monitor as soon as qualifying traffic is observed.

Changes:

  • Replace the exec(timeout conntrack | grep ...) pipeline with a spawn('sudo', ['conntrack', ...]) stream parser that can stop early on first match.
  • Simplify conntrack monitoring window logic by removing adaptive timing and using a single CONNTRACK_MONITOR_SEC_DEFAULT value (now 10s).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +1558 to +1559
child.stderr.on('data', () => {});
child.on('error', () => finish(false));
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

child.stderr is fully ignored, which makes it hard to diagnose common failure modes (e.g., sudo not permitted, conntrack missing, or kernel permissions). Capturing a small amount of stderr (or logging it at debug level) would make fallback behavior more observable and help distinguish “no traffic seen” from “monitor failed to run.”

Copilot uses AI. Check for mistakes.
Comment on lines +1522 to +1525
return new Promise((resolve) => {
const child = spawn('sudo', [
'conntrack', '-E', '-e', 'UPDATE', '--reply-dst', replyDst
], { stdio: ['ignore', 'pipe', 'pipe'] });
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new spawn/stream parsing + early-kill logic introduced here is behaviorally significant and currently has no targeted tests. Add unit tests that stub child_process.spawn to validate: resolves true on first matching line (public dst=), ignores private dst, handles stdout chunk/line splitting, resolves false on timeout/error/close, and terminates the child when finishing early to avoid lingering conntrack -E processes.

Copilot generated this review using guidance from organization custom instructions.
Comment on lines 1575 to 1578
const results = await Promise.all(
replyDsts.map(d => this._checkWanStatusByConntrackForReplyDst(d, monitorSec))
replyDsts.map(d => this._checkWanStatusByConntrackForReplyDst(d, CONNTRACK_MONITOR_SEC_DEFAULT))
);
return results.some(Boolean);
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Promise.all(...) here means _checkWanStatusByConntrack() will still wait for every per-IP conntrack monitor to finish/timeout, even if one replyDst already matched and resolved true. That largely negates the intended optimization (early exit) and can add up to CONNTRACK_MONITOR_SEC_DEFAULT delay to WAN checks. Consider resolving as soon as any monitor returns true (e.g., race/any on truthy) and cancelling the remaining children.

Copilot uses AI. Check for mistakes.
Comment on lines +1530 to +1532
// Process runs as pi; sudo/conntrack run as root so we can't kill via process.kill(-pgid). Use sudo pkill.
const pkill = spawn('sudo', [
'pkill', '-9', '-f', `conntrack -E -e UPDATE --reply-dst ${replyDst}`
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sudo pkill -9 -f ...${replyDst} is risky here: pkill -f treats the pattern as a regex, so the dots in an IPv4 address will match any character and could kill unrelated conntrack -E monitors. It can also interfere with concurrent checks for the same replyDst by killing the other instance. Safer approach is to terminate by PID (e.g., sudo kill/sudo kill -9 using child.pid) or otherwise make the match exact/escaped and scoped to just this spawned process.

Suggested change
// Process runs as pi; sudo/conntrack run as root so we can't kill via process.kill(-pgid). Use sudo pkill.
const pkill = spawn('sudo', [
'pkill', '-9', '-f', `conntrack -E -e UPDATE --reply-dst ${replyDst}`
// Process runs as pi; sudo/conntrack run as root so we can't kill via process.kill(-pgid). Use sudo pkill on the child PID.
const pkill = spawn('sudo', [
'pkill', '-9', '-P', String(child.pid)

Copilot uses AI. Check for mistakes.
@chgchi chgchi force-pushed the my_dev branch 2 times, most recently from 4041083 to 7e4d4c9 Compare March 11, 2026 02:39
@MelvinTo
Copy link
Contributor

Re-reviewed this PR (latest commit 7e4d4c9).

Overall direction (streaming + early match) is good, but I found a couple of correctness issues that should be fixed before merge.

Critical Issues

[plugins/interface/intf_base_plugin.js:1529]
pkill arguments are ordered incorrectly:

spawn('sudo', ['pkill', '-P', '-9', child.pid], ...)

For pkill, -P expects parent PID list immediately after it. With current argv, -9 is consumed as the -P argument and child.pid becomes a name pattern, so the intended kill path can fail or target the wrong processes. This can leave conntrack -E running after finish().

Please change to a valid form (e.g. signal first, then -P <pid>), or avoid pkill and terminate the tracked process tree deterministically.

Potential Bugs

[plugins/interface/intf_base_plugin.js:1569]
Promise.all(replyDsts.map(...)) still waits for every destination monitor to finish. Even if one destination returns true quickly, overall return is delayed by slower/timeout peers (up to full monitor window). This weakens the early-exit benefit and may delay WAN-up detection.

[plugins/interface/intf_base_plugin.js:1530]
finish() resolves first and then cleanup happens best-effort without verifying child termination. If kill fails, long-lived monitor processes may accumulate under repeated checks.

Performance Issues

[plugins/interface/intf_base_plugin.js:56, 1570]
The adaptive monitor window was removed and default raised to 10s for all states. Combined with Promise.all, this increases worst-case latency and process lifetime during flap/noisy periods.

Maintainability / Style

[plugins/interface/intf_base_plugin.js:1542]
Regex readability: prefer /(udp|ESTABLISHED)/ over /udp|ESTABLISHED/.

Suggestions

  • Replace aggregate Promise.all with a cancellable Promise.any-style flow so one confirmed-positive destination can short-circuit the whole fallback.
  • Use explicit child lifecycle handling (track PID, SIGTERM/SIGKILL escalation, and close verification) instead of pkill argument-dependent behavior.
  • Add debug logs when early success occurs but other monitors are still pending, to aid WAN-state latency diagnosis.

@jasonlyc jasonlyc merged commit 0024d92 into firewalla:master Mar 13, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants