Fixes #353 - Add health check command by jakduch · Pull Request #431 · theforeman/foremanctl

jakduch · 2026-03-29T10:30:09Z

Adds a new foremanctl health command that performs post-install health checks on all Foreman services.

Checks performed:

Core services running (foreman, httpd, redis, postgresql)
Dynflow workers running (orchestrator, worker, worker-hosts-queue)
Pulp services running (pulp-api, pulp-content)
Candlepin service running
Foreman API responding (GET /api/v2/ping)
Foreman tasks status (via Katello ping)

The command reports individual results for each check and provides a summary. It exits non-zero when any check fails, making it suitable for scripting and CI use.

As discussed in the issue, this can also be integrated as a final step of deployment in the future.

pablomh · 2026-03-29T11:13:12Z

Could you please show the output? Thx!

jakduch · 2026-03-29T11:24:08Z

Example output when all checks pass:

$ foremanctl health

TASK [Check core services are running] *********
ok: [localhost] => (item=foreman) => Service foreman is running
ok: [localhost] => (item=httpd) => Service httpd is running
ok: [localhost] => (item=redis) => Service redis is running
ok: [localhost] => (item=postgresql) => Service postgresql is running

TASK [Check dynflow services are running] ******
ok: [localhost] => (item=dynflow-sidekiq@orchestrator) => Service dynflow-sidekiq@orchestrator is running
ok: [localhost] => (item=dynflow-sidekiq@worker) => Service dynflow-sidekiq@worker is running
ok: [localhost] => (item=dynflow-sidekiq@worker-hosts-queue) => Service dynflow-sidekiq@worker-hosts-queue is running

TASK [Check Pulp services are running] *********
ok: [localhost] => (item=pulp-api) => Service pulp-api is running
ok: [localhost] => (item=pulp-content) => Service pulp-content is running

TASK [Check Candlepin service is running] ******
ok: [localhost] => Service candlepin is running

TASK [Report Foreman API status] ***************
ok: [localhost] => Foreman API: OK

TASK [Report Foreman tasks status] *************
ok: [localhost] => Foreman tasks: ok

TASK [Health check summary] ********************
ok: [localhost] => All health checks passed.

Example output when a service is down:

$ foremanctl health

TASK [Check core services are running] *********
ok: [localhost] => (item=foreman) => Service foreman is running
ok: [localhost] => (item=httpd) => Service httpd is running
failed: [localhost] => (item=redis) => Service redis is not running
ok: [localhost] => (item=postgresql) => Service postgresql is running

...

TASK [Health check summary] ********************
ok: [localhost] => Health check issues found: Failed services: redis

TASK [Fail if any checks failed] ***************
fatal: [localhost]: FAILED! => Health check failed: 1 service(s) down, Foreman API OK

The command exits non-zero on failure, so it can be used in scripts like foremanctl health && echo "All good" || echo "Issues found".

ehelms · 2026-03-29T15:50:03Z

This is a good start. Can you look into following the conventions we've established for check_ roles (https://github.com/theforeman/foremanctl/tree/master/src/roles)? I also think you can drop the "post-install" nomenclature and just focus on this being a health check command.

jakduch · 2026-03-29T16:02:06Z

Good point, refactored to follow the check_ role conventions:

Added check_services role — verifies all Foreman services are running (core, dynflow, pulp, candlepin)
Added check_foreman_api role — checks API responds and Foreman tasks status via /api/v2/ping
Health playbook now uses checks/tasks/execute_check.yml to run them, same pattern as the existing checks role
Dropped "post-install" wording from metadata and description

Force-pushed.

ehelms · 2026-03-29T18:52:57Z

Design question for you:

Do we need service and API checks? Shouldn't that tell us the same information?
If we check services, should those be conditional based on what features are enabled?
This could use some sort of validation for this new command

jakduch · 2026-03-29T19:34:32Z

Good questions.

Service vs API checks - do we need both?

They operate at different layers. Service checks verify that systemd units are running - this works offline and does not depend on a functional API. The API check (/api/v2/ping) verifies end-to-end functionality including database connectivity and task runner status. A service can be running but not responding (misconfiguration, certificate issues), and conversely the API might not report on every systemd unit. That said, if the ping endpoint already covers everything we need, we could simplify to just the API check. Happy to go either way.

Conditional checks based on features - yes, agreed.

Updated Pulp and Candlepin checks to only run when content/ features are enabled (enabled_features | select("contains", "content/")). Same condition on the Katello tasks status check in check_foreman_api. Core services and dynflow are always checked. Force-pushed.

Validation - could you clarify what you have in mind?

An ansible-playbook --syntax-check for the new playbook, a test scenario in CI, or something else?

ehelms · 2026-03-30T12:04:08Z

src/playbooks/health/health.yaml

+    - "../../vars/base.yaml"
+  tasks:
+    - name: Execute health checks
+      ansible.builtin.include_tasks: ../../roles/checks/tasks/execute_check.yml


I think this is a better pattern:

ansible.builtin.include_role: name: common tasks_from: login

Done - switched to include_role: name: checks, tasks_from: execute_check to reuse the existing checks role and avoid the relative path.

ehelms · 2026-03-30T12:05:59Z

src/roles/check_services/tasks/main.yaml

+  loop:
+    - pulp-api
+    - pulp-content
+  when: enabled_features | select('contains', 'content/') | list | length > 0


Seems like we could use a better way to do this pattern of checking. Is there a filter or some other Ansible construct we can encapsulate feature checking logic in?

Added a has_content_features filter to foremanctl.py - the condition is now just enabled_features | has_content_features.

ehelms · 2026-03-30T12:06:33Z

I would expect to add this to the end of our Github Action tests so verify the command works and ensure that it passes.

jakduch · 2026-03-30T14:25:16Z

Updated based on your feedback:

include_role pattern - Replaced include_tasks with relative path by include_role: name: checks, tasks_from: execute_check. This reuses the existing checks role and avoids the ../../roles/ path.
Feature check filter - Added has_content_features filter to foremanctl.py so the condition is now just enabled_features | has_content_features instead of the verbose select('contains', 'content/') | list | length > 0.
CI validation - Created a separate PR Refs #353 - Add health check validation to CI #434 to add ./foremanctl health to the test workflow, to keep this PR focused on the command itself.

ehelms · 2026-03-30T14:50:51Z

3. CI validation - Created a separate PR Refs #353 - Add health check validation to CI #434 to add ./foremanctl health to the test workflow, to keep this PR focused on the command itself.

I can't know in the PR if this is working if the CI is split off to another PR :) Testing and changes to CI should generally be kept together with the change.

ehelms · 2026-03-30T14:53:05Z

src/filter_plugins/foremanctl.py

    return compact_list(plugins)


+def has_content_features(features):


Do you forsee having a method per feature? Or should it be a more generic method that a feature can be passed in to?

Good point - a generic filter makes more sense. I'll change it to has_feature_prefix(features, prefix) so it can be reused for any feature group, e.g. enabled_features | has_feature_prefix('content/').

ehelms · 2026-03-30T14:53:37Z

What are your thoughts on how foreman-proxy which is an optional service that can be present should be handled?

jakduch · 2026-03-30T14:58:21Z

Fair enough - I'll merge the CI change from #434 into this PR and close #434.

jakduch · 2026-03-30T14:58:29Z

foreman-proxy could be added to check_services with a condition similar to content features, e.g. when: 'foreman-proxy' in enabled_features. Same pattern as the existing deploy.yaml which already checks 'foreman-proxy' in enabled_features before including the proxy role. I'll add it.

Add a new foremanctl health command that verifies the state of all Foreman services after installation or during troubleshooting. Checks performed: - Core services running (foreman, httpd, redis, postgresql) - Dynflow workers running (orchestrator, worker, worker-hosts-queue) - Pulp services running (pulp-api, pulp-content) - Candlepin service running - Foreman API responding (GET /api/v2/ping) - Foreman tasks status (via Katello ping response) Reports a summary of all failures and exits non-zero if any check fails, making it suitable for scripting and CI use.

pr-processor bot added the Not yet reviewed label Mar 29, 2026

jakduch force-pushed the feature/healthcheck branch from d6a97eb to d702027 Compare March 29, 2026 16:01

jakduch force-pushed the feature/healthcheck branch from d702027 to 9ea20cc Compare March 29, 2026 19:33

jakduch force-pushed the feature/healthcheck branch 2 times, most recently from 2c44427 to 4c3bd96 Compare March 30, 2026 11:01

ehelms reviewed Mar 30, 2026

View reviewed changes

pr-processor bot removed the Not yet reviewed label Mar 30, 2026

ehelms reviewed Mar 30, 2026

View reviewed changes

jakduch force-pushed the feature/healthcheck branch from 4c3bd96 to 15e6135 Compare March 30, 2026 14:21

jakduch mentioned this pull request Mar 30, 2026

Refs #353 - Add health check validation to CI #434

Closed

ehelms reviewed Mar 30, 2026

View reviewed changes

jakduch force-pushed the feature/healthcheck branch from 15e6135 to 845cd50 Compare March 30, 2026 15:00

		return compact_list(plugins)


		def has_content_features(features):

Conversation

jakduch commented Mar 29, 2026

Uh oh!

pablomh commented Mar 29, 2026

Uh oh!

jakduch commented Mar 29, 2026

Uh oh!

ehelms commented Mar 29, 2026

Uh oh!

jakduch commented Mar 29, 2026

Uh oh!

ehelms commented Mar 29, 2026

Uh oh!

jakduch commented Mar 29, 2026

Uh oh!

ehelms Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jakduch Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ehelms Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jakduch Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ehelms commented Mar 30, 2026

Uh oh!

jakduch commented Mar 30, 2026

Uh oh!

ehelms commented Mar 30, 2026

Uh oh!

ehelms Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jakduch Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ehelms commented Mar 30, 2026

Uh oh!

jakduch commented Mar 30, 2026

Uh oh!

jakduch commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants