Skip to content

fix: power brownout causing early shutdown#2627

Open
NickDunklee wants to merge 3 commits into
meshcore-dev:devfrom
NickDunklee:power-fix
Open

fix: power brownout causing early shutdown#2627
NickDunklee wants to merge 3 commits into
meshcore-dev:devfrom
NickDunklee:power-fix

Conversation

@NickDunklee
Copy link
Copy Markdown

@NickDunklee NickDunklee commented May 26, 2026

Update: Removed the contact expiration ShutdownHandler and associated code so this PR is just the brownout fix. I will open another PR that is just about contact flushing based on comments in this PR.

This is probably another "needs soaking" fix as it touches power.

Backstory on this one:

I noticed the sensor firmware build was aggressively sending
"Battery is low" messages constantly when a RAK19007 was below
50%. (These messages only show up in third party clients to all
node admins, as the stock MeshCore mobile client doesn't let
one see messages from a sensor node. Seems another power draw
sending that message, but not part of this PR.)

Then people on the local mesh have been on and off talking
about certain nodes randomly losing their contact lists on
some node types, and others were talking about Heltec V4
brownouts. I also observed Heltec v4 die prematurely around
50% and I started thinking they were all related.

Started digging into the code and found a few potential leads:

  • MeshCore does a "lazy" write on dirty_contacts_expiry
    in a 5 second window.
  • The shutdown/restart path do not clean this up
  • Low battery check is a poll every 8 seconds with
    no awareness of other things going on in the node

On the power piece:

Heltec V4 and other higher-powered nodes can hit the battery
harder when transmitting, below 50%, lithium batteries sag
more dramatically than they do at higher charge states.

If the power check happens at the same time as transmit,
the shutdown code gets called prematurely and shuts down
the node.

On the file write piece:

If the shutdown or restart paths are called, the code just calls
shutdown() or reboot() without checking and calling
saveContacts(). There do not appear to be any other file writes
that act this way.

The Fix

The change is kept using AUTO_SHUTDOWN_MILLIVOLTS so it respects
previous power threshold decisions across all node types.

With this change, all restart or shutdown paths will make
sure to call saveContacts() before shutting down to stop
the list from becoming corrupted.

It also suspends reading battery level for 250ms during transmit
(adjustable) so a power sag doesn't trigger an early shutdown.

On Heltec V4 at least, the MeshCore software power threshold is
much higher than the board's internal brownout/shutdown threshold.

Tested on

  • Heltec v4
  • RAK 19007
  • Heltec T096
  • RAK 19003

On the Heltec v4, I can now pass 50% and get down to 36% before it shuts
down. Although the voltage at 36% should probably actually say 5%
based on some voltage curve sites like this one.

That is probably an idea for future mobile app improvements, the MCU
temp and battery voltage could be used to calculate the battery percentage
in the app itself and it would likely seem a bit more "accurate"
on all board types without having to add math in the node code.

This is probably another "needs soaking" fix as it touches power.

Backstory on this one:

I noticed the sensor firmware build was aggressively sending
"Battery is low" messages constantly when a RAK19007 was below
50%. (These messages only show up in third party clients to all
node admins, as the stock MeshCore mobile client doesn't let
one see messages from a sensor node. Seems another power draw
sending that message, but not part of this PR.)

Then people on the local mesh have been on and off talking
about certain nodes randomly losing their contact lists on
some node types, and others were talking about Heltec V4
brownouts. I also observed Heltec v4 die prematurely around
50% and I started thinking they were all related.

Started digging into the code and found a few potential leads:

  - MeshCore does a "lazy" write on `dirty_contacts_expiry`
    in a 5 second window.
  - The shutdown/restart path do not clean this up
  - Low battery check is a poll every 8 seconds with
    no awareness of other things going on in the node

**On the power piece:**

Heltec V4 and other higher-powered nodes can hit the battery
harder when transmitting, below 50%, lithium batteries sag
more dramatically than they do at higher charge states.

If the power check happens at the same time as transmit,
the shutdown code gets called prematurely and shuts down
the node.

**On the file write piece:**

If the shutdown or restart paths are called, the code just calls
`shutdown()` or `reboot()` without checking and calling
`saveContacts()`. There do not appear to be any other file writes
that act this way.

**The Fix**

The change is kept using AUTO_SHUTDOWN_MILLIVOLTS so it respects
previous power threshold decisions across all node types.

With this change, all restart or shutdown paths will make
sure to call `saveContacts()` before shutting down to stop
the list from becoming corrupted.

It also suspends reading battery level for 250ms during transmit
(adjustable) so a power sag doesn't trigger an early shutdown.

On Heltec V4 at least, the MeshCore software power threshold is
much higher than the board's internal brownout/shutdown threshold.

**Tested on**

  - Heltec v4
  - RAK 19007
  - Heltec T096
  - RAK 19003

On the Heltec v4, I can now pass 50% and get down to 36% before it shuts
down. Although the voltage at 36% should probably actually say 5%
[based on some voltage curve sites like this one](https://voltagebasics.com/lithium-polymer-battery-voltage-chart/).

That is probably an idea for future mobile app improvements, the MCU
temp and battery voltage could be calculated in the app itself to generate
the battery percent and it would likely seem a bit more "accurate"
on all board types without having to add math in the node code.
@NickDunklee
Copy link
Copy Markdown
Author

From other PR research, I looked to see if it was possible the shutdown or restart paths might get triggered in "bad" states and edge cases that could lead to file corruption. It appears that the code path can't be called if it's a board only powered by USB and unplugged because unplugged means no power to execute code. It also looks like the AUTO_SHUTDOWN_MILLIVOLTS on nodes that use it is well above the brownout shutdown on the hardware. So it should be exceedingly rare that the battery is too low to write before shutdown. On RAK nRF52 nodes, they just run until they don't it looks like, and upon attempted reboot, LPCOMP catches a low battery threshold and keeps it from bootlooping.

@ripplebiz
Copy link
Copy Markdown
Member

I recently modified the hasPendingWork() rule to include dirty_contacts_expiry != 0.
Just wondering if this rule can be incorporated to delay shutdowns as well as the low power stuff?
(ie. rather than having to have the hook for checking the pending contacts write)

@NickDunklee
Copy link
Copy Markdown
Author

Oooh cool, will check that out and see!

@NickDunklee
Copy link
Copy Markdown
Author

Yeah, it looks like hasPendingWork could replace the ShutdownHandler class in my PR entirely. It also looks like I missed adding any shutdown handling to ui-orig and ui-tiny, woops.

Question, since this is your party and I'm just sampling the whiskey: do you think it's reasonable to have an up to 5 second delay on companion power off? And not trying to lead like a greasy car salesman.

If I'm reading it right, waiting for hasPendingWork during shutdown via button press/etc. would result in shutdown message and/or buzzer, then 0 to 5 seconds of arbitrary wait time if the loop has to loop, and only then reboot or display off, radio off, board off.

Code-wise, likely cleaner to manage. User experience might seem random, or user might think it hung. Or I could probably rig it so regardless if it has to wait for a work loop or not, it always takes 5 seconds to turn off.

@NickDunklee
Copy link
Copy Markdown
Author

I have another thought that might solve the code and user problems both in one, as this PR is actually two features not one.

  • I can update this PR to just handle the brownout code so it's just about brownouts and can be committed sooner and remove any shutdown handling code. Should be able to diff that out pretty quick.

I can start a new PR based on some of my shutdown-write code, but with the below, and discussion can continue there:

  • hasPendingWork - dirty_contacts_expiry path is the standard flow on shutdown for companion/repeater
  • The node takes up to 5 seconds to shut down, but will also shut down faster if it doesn't have to wait
  • No added wait loop for the sole purpose of user experience for visual purposes, which could still introduce a slow-writes-fail-to-commit scenario in certain conditions
  • "Shutting down" is replaced with "Power off in 5 seconds..." which may sometimes complete faster, but gives the user a time contract
  • On eInk display models, it will also say "Power off in 5 seconds..." but then just before switching the display off, it updates the eInk display to say "shut down" - so if an eInk user comes across it after it turned off, they know that it was able to successful turn off.
  • ShutdownHandler code is removed.

  - This PR is now just brownout protection for sensor and
    companion
  - Will open another PR with dirty write flush changes per PR
    review, that PR can sort how the write improvements
@NickDunklee NickDunklee changed the title fix: power brownout and dirty write flush on restart or shutdown fix: power brownout causing early shutdown Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants