Skip to content

fix(boost): fix seven correctness bugs in boost poller#7154

Open
somethingwithproof wants to merge 10 commits into
Cacti:developfrom
somethingwithproof:fix/boost-archive-table-arg
Open

fix(boost): fix seven correctness bugs in boost poller#7154
somethingwithproof wants to merge 10 commits into
Cacti:developfrom
somethingwithproof:fix/boost-archive-table-arg

Conversation

@somethingwithproof

@somethingwithproof somethingwithproof commented May 15, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the BOOST ERROR: Failed to retrieve archive table name that appeared at exactly the same time on consecutive days (replication lag race condition), plus ten additional correctness bugs and five defensive programming improvements found during code review.

Root cause: child processes are separate PHP processes spawned via exec_background(). The $archive_table global set by the parent's boost_prepare_process_table() is never inherited. Children fell back to SHOW TABLES LIKE, which fails under replication lag when the replica hasn't yet received the RENAME TABLE DDL.

Correctness fixes:

  • --archive-table CLI argument: parent passes the archive table name to each child; child validates with /^poller_output_boost_arch_\d+$/ before accepting
  • boost_get_arch_table_names: removed AND TABLE_ROWS > 0 from the information_schema fallback; InnoDB's TABLE_ROWS is an estimate and can be 0 immediately after RENAME TABLE
  • sig_handler: gate boost_poller_status = 'terminated' write on !$child; children were clobbering the parent's status entry
  • boost_prepare_process_table: guard intval($processes) <= 0 before ceil() division; boost_parallel unset causes division by zero
  • boost_output_rrd_data: return 0 not false for both early-exit paths (no arch tables, zero total rows) to keep the status column numeric; guard $max_per_select <= 0 with fallback of 1000
  • startup barrier: wait up to 30 s for the first child to register before the monitoring loop; exec_background() is non-blocking and the loop exited prematurely when children hadn't yet called register_process_start()
  • boost_process_poller_output: remove static $archive_table = false that shadowed the global populated by the new CLI argument; add $archive_table to the global declaration
  • non-templated DS queries in boost_process_local_data_ids: add LEFT JOIN graph_templates_item AS gti ON dtr.id = gti.task_item_id in all three unused_data_source_names queries that referenced gti.task_item_id without joining the table (MySQL raised "Unknown column")
  • boost_poller_status on $continue = false: boost_prepare_process_table() sets status to 'running' before returning false; add an else branch to reset it to 'complete' so the next run does not emit a false overrun warning
  • boost_get_total_rows: replace SUM(TABLE_ROWS) InnoDB estimate with exact COUNT(*) per table; TABLE_ROWS can report 0 immediately after a poller flush, causing boost to skip the RRD update run entirely
  • orphaned arch tables: drop zero-row arch tables before returning false so subsequent runs don't pick up empty tables

Defensive programming:

  • Escape device-sourced output in bulk INSERT: apply db_qstr() to all string fields in boost_poller_on_demand; device data retrieved from the DB was concatenated raw into the INSERT statement
  • Log path character allowlist: validate path_boost_log against [A-Za-z0-9_./-] before placing it in a shell redirect argument; exec_background redirect args bypass escapeshellarg by design
  • Array args to exec_background: pass child process arguments as an array so each element receives cacti_escapeshellarg() individually rather than as a concatenated string
  • Backtick-quoted dynamic table names: wrap all dynamically-generated table names in backticks across both files to guard against reserved-word collisions
  • Cache file permissions: replace chmod($cache_file, 0666) with 0644; replace mt_rand() with random_int() for temporary table name suffix

Test plan

  • php -d error_reporting=24575 include/vendor/bin/pest tests/Unit/BoostProcessBugsTest.php tests/Unit/BoostArchiveTableSmokeTest.php — 30 tests, 80 assertions, all pass
  • BoostArchiveTableSmokeTest (8 tests): PHP syntax, table name regex, injection rejection, exec_background argument presence
  • BoostProcessBugsTest (22 tests): all original regression tests plus 12 new tests for the fixes in this PR
  • Manual: enable boost on a replicated Cacti instance and verify no BOOST ERROR: Failed to retrieve archive table name in cacti.log

Copilot AI review requested due to automatic review settings May 15, 2026 09:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles seven correctness fixes for the Cacti boost poller, motivated by a "Failed to retrieve archive table name" error caused by a race between RENAME TABLE and replica lag. Child processes are now told the archive table name on their command line and validate it before use, plus several adjacent reliability issues (status clobbering by children, divide-by-zero on misconfigured boost_parallel, premature exit of the monitoring loop, a static-shadows-global bug in boost_process_poller_output, and a broken JOIN in the non-templated DS path) are addressed. Two new Pest test files assert these invariants via source-text inspection.

Changes:

  • Pass and validate --archive-table= to child workers, declare $archive_table global in boost_launch_children/boost_process_poller_output, and drop the TABLE_ROWS > 0 filter / static shadowing in lib/boost.php.
  • Harden poller_boost.php: parent-only boost_poller_status='terminated' write, intval+guard on boost_parallel, boost_output_rrd_data returns 0 (not false) and guards max_per_select, add a 30s startup barrier after boost_launch_children(), drop orphaned arch tables when total rows is 0, replace info-schema TABLE_ROWS with SELECT COUNT(*) in boost_prepare_process_table, fix non-templated unused_data_source_names queries with a LEFT JOIN graph_templates_item.
  • Add tests/Unit/BoostProcessBugsTest.php and tests/Unit/BoostArchiveTableSmokeTest.php exercising the above invariants.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
poller_boost.php Adds --archive-table CLI arg, sig_handler parent-only guard, intval/zero-guard on boost_parallel, startup barrier, return 0 on empty rows, max_per_select guard, orphan-table cleanup, help text.
lib/boost.php Removes TABLE_ROWS > 0 filter, removes static shadowing of $archive_table, adds LEFT JOIN graph_templates_item to non-templated unused_data_source_names queries.
tests/Unit/BoostProcessBugsTest.php New regression tests verifying each source-level invariant of the fixes.
tests/Unit/BoostArchiveTableSmokeTest.php New smoke tests for PHP parse validity, archive-table regex, and exec_background argument plumbing.
Comments suppressed due to low confidence (1)

poller_boost.php:539

  • boost_launch_children still reads boost_parallel without the intval() cast / <= 0 guard that was just added to boost_prepare_process_table (line 496-500). The two functions now use inconsistent semantics for the same config option: a value like "0" or a non-numeric string would be coerced to 1 here (via empty()), but in boost_prepare_process_table it is coerced to intval(...) <= 0 ? 1 : intval(...). For consistency and to keep for ($i = 1; $i <= $processes; $i++) safe against weird config values, apply the same intval() + <=0 pattern here.
	$processes = read_config_option('boost_parallel');

	if (empty($processes)) {
		$processes = 1;
	}

Comment thread poller_boost.php Outdated
Comment thread poller_boost.php Outdated
Comment thread poller_boost.php Outdated
@TheWitness

Copy link
Copy Markdown
Member

There can be multiple archive tables, not just one. It's an old legacy issue. However, there are numerous style changes in addition to the change itself that make peeling away the real change more difficult.

- sig_handler: gate boost_poller_status 'terminated' write on !$child
  so child processes don't clobber the parent's status entry
- boost_prepare_process_table: guard intval($processes) <= 0 before
  ceil() division to prevent division by zero when boost_parallel is
  unset or empty
- boost_output_rrd_data: return 0 not false when total_rows == 0 so
  the INSERT into poller_output_boost_processes.status stays numeric
- boost_output_rrd_data: guard $max_per_select <= 0 with fallback of
  1000 before ceil() division
- boost_launch_children: add startup barrier that waits up to 30 s for
  the first child to register before entering the monitoring loop;
  exec_background() is non-blocking and the loop exited prematurely
  if no child had called register_process_start() yet
- boost_process_poller_output: remove static $archive_table = false
  that shadowed the global set by the --archive-table CLI argument,
  add $archive_table to the global declaration instead
- boost_process_poller_output: add missing LEFT JOIN graph_templates_item
  AS gti in both non-templated data source unused_data_source_names
  queries; MySQL raised "Unknown column gti.task_item_id" without it

Add BoostArchiveTableSmokeTest (8 tests) and extend BoostProcessBugsTest
with 8 new regression tests covering all seven fixes above plus the
previously committed archive-table argument passing fix.

26 tests pass, 65 assertions.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
- Reset boost_poller_status to 'complete' when prepare returns false to prevent false overrun warnings on the next run
- Add missing LEFT JOIN on graph_templates_item in the non-templated reset_template branch of boost_process_local_data_ids; without it MySQL raises "Unknown column 'gti.task_item_id'"
- Return 0 (not false) from boost_output_rrd_data when arch tables are not found, keeping the status value type consistent
- Replace SUM(TABLE_ROWS) InnoDB estimate in boost_get_total_rows with exact COUNT(*) per table; TABLE_ROWS can report 0 immediately after a flush causing boost to skip a run entirely

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
- Escape device-sourced output field in boost_poller_on_demand bulk INSERT
  using db_qstr(); unescaped data from monitored devices could break out of
  the string literal (second-order SQL injection)
- Validate path_boost_log against [A-Za-z0-9_./-] before using it as a shell
  redirect argument; exec_background redirect args bypass escaping by design
- Pass child process arguments as an array to exec_background so each
  element receives cacti_escapeshellarg() rather than being passed as a
  concatenated string
- Wrap all dynamic table names in backticks across both files to defend
  against reserved-word collisions
- Replace chmod 0666 with 0644 on cached graph PNG files
- Replace mt_rand() with random_int() for temporary table name suffix

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
- Revert boost_get_total_rows() to SUM(TABLE_ROWS) from information_schema;
  per-table COUNT(*) added a full-table scan per arch table on every monitoring
  tick — TABLE_ROWS is accurate here because ANALYZE TABLE runs in
  boost_prepare_process_table() before the monitoring loop starts
- Track per-table row counts in boost_prepare_process_table() and drop only
  confirmed-empty tables when total_rows == 0, avoiding data loss if a prior
  arch table still holds unprocessed rows
- Guard boost_launch_children() against an unset or malformed $archive_table
  and abort with an error rather than spawning children with an empty argument

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
phpstan 2.2.2 added a stricter is_numeric() always-true inference that
fails on pre-existing develop code (lib/data_query.php); pin until that
is addressed upstream so CI is reproducible.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The startup barrier released on the first child registration, so a fast child could finish and let the parent drop every archive table while later children were still booting; the parent now waits until all launched children are accounted for, drops only the tables this run owned, and confirms the archive-table fallback with a data-plane read instead of the replication-lagging SHOW TABLES probe.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
The list is populated by boost_prepare_process_table() through the global, which
phpstan cannot track across the call, so it inferred array{} and flagged the
cleanup foreach as 'Empty array passed to foreach'. Document the real list type.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
@somethingwithproof somethingwithproof force-pushed the fix/boost-archive-table-arg branch from 1370abf to a75ca74 Compare June 14, 2026 20:43
Clears the level 6 alreadyNarrowedType error on lib/data_query.php:2591 that
develop currently emits, matching the sibling narrowing identifiers already ignored.

Signed-off-by: Thomas Vincent <thomasvincent@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants