Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions docs/module/health_monitor/architecture/assets/hm_shutdown.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
@startuml

box "User process"
participant "main"
participant "HealthMonitorBuilder"
participant "HealthMonitor"
participant "Lifecycle"
end box

box "LaunchDaemon process"
participant "LaunchDaemon"
end box

group APPLICATION_SELF_TERMINATING

...

main -> main++: end_of_scope()
main -> HealthMonitor: destroy()
HealthMonitor -> LaunchDaemon: notify_stopped(timestamp)

...

LaunchDaemon -> LaunchDaemon: stop_alive_monitoring()
main--
end

group APPLICATION_TERMINATING_ON_LAUNCH_DAEMON_REQUEST

== LaunchDaemon Side ==
...

alt EXTERNAL_SHUTDOWN_REQUEST
LaunchDaemon -> LaunchDaemon: stop_alive_monitoring()
note left
Stop monitoring as now we monitor shutdown timeout
configured per app
end note

loop app in apps
LaunchDaemon -[#blue]> Lifecycle: notify_shutdown_request()
end
end alt

== Application Side ==

LaunchDaemon -[#blue]> Lifecycle: notify_shutdown_request()
Lifecycle -> Lifecycle: release_main_for_shutdown()
...

main -> main++: end_of_scope()
main -> HealthMonitor: destroy()
HealthMonitor -> LaunchDaemon: notify_stopped(timestamp)
note left
Notification is send to keep consistent with self terminating case

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to show the case when the notify_stopped was send but the app did not shutdown within the timeout?

end note

HealthMonitor -> HealthMonitor: stop_background_thread()
HealthMonitor --> main

main--
end

@enduml
87 changes: 87 additions & 0 deletions docs/module/health_monitor/architecture/assets/hm_startup.puml
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
@startuml

box "User process"
actor "user"
participant "HealthMonitorBuilder"
participant "HealthMonitor"
participant "Lifecycle"
end box

box "LaunchDaemon process"
participant "LaunchDaemon"
end box

== Application Side ==
note right of user #lightblue
Each Application have **configuration**
for HealthMonitoring that is send to LaunchDaemon
end note

user -> HealthMonitorBuilder : build(supervisor_api_notification_cycle_time, ...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add to this diagram:

  • During startup launch_manager reads its config for the application and will know whether it should be monitored or not.
  • If monitoring is enabled, the launch_manager will setup the required IPC communication when starting the process

HealthMonitorBuilder -> HealthMonitor: create
HealthMonitor -> LaunchDaemon: register_health_monitor(supervisor_api_notification_cycle_time, ...)
note left
All configuration needed can be send here
end note

HealthMonitorBuilder --> user: HealthMonitor instance

user -> HealthMonitor: start()
HealthMonitor -> LaunchDaemon: notify_started(timestamp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One point of discussion could be if we want to explicitly start the monitoring via a separate IPC call notify_started(timestamp) as described here or just use the existing report_running as a trigger for launch_manager to start the monitoring.

From the user point of view it could be viewed as redundant to have both as both say somewhat "I finished initialization".

I am not sure which one is better, there seem to be advantages/disadvantages in both approaches.

A related question would be: Do we want to support monitoring of applications that do not use the lifecycle api and do not report running state? The diagram currently assumes no.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. notify_started is nothing user calls, user calls HealthMonitor.start() - this call is needed anyway and I thought it will be better if we have at LaunchDaemon clear information that user have started it, otherwise its fully bogus setup that will finalize with LaunchDaemon thinking there is no alive ping. But, guess both optios are valid.
  2. Yes, assumes no - but this does not cover system monitoring, so i still expect you may be monitoring on process level, but this was not the scope of this flow.

Copy link

@vinodreddy-g vinodreddy-g Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who is setting up the ipc channel and how?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not yet in scope as we did not tackled usage of mw::com as API. So this we will add once we design MW:COM api after having first working version.

HealthMonitor -> HealthMonitor: start_background_thread()
note left
Notification has to finish before background thread starts
to not race with lifecycle api.
end note
...

user -> Lifecycle: report_running()
Copy link

@vinodreddy-g vinodreddy-g Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should show the case when the timeout of startup to running was crossed?

Lifecycle -> LaunchDaemon
Copy link
Contributor

@NicolasFussberger NicolasFussberger Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also depict when the monitoring stops during shutdown of the application.

My assumption would be that:

  • Launch_manager stops monitoring at the point in time it sends SIGTERM to the application process

Open question:
Does the sending of alive notification on the application side need to be stopped, or is it enough that these are just ignored by launch_manager?
Do we want to support monitoring for "self-terminating" applications that terminate on their own?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since in app library does not take actions, it shall be fine just to do nothing there. But we shall describe it - correct.
  2. "My assumption would be that" - you mean a point in which the app receive shutdown request via lifecycle ?



== LaunchDaemon Side ==

note left of LaunchDaemon #lightblue
Each application have **configuration entry**
for Lifecycle parameters (as part of LaunchDaemon config) like:
- self terminating or not
- health monitored
- timeouts for startup, shutdown, ...

This config **does not include** any Health Monitoring parameters
as those are send during HealthMonitor registration.
end note

...
alt APPLICATION_USES_LIFECYCLE_API
user -> Lifecycle: report_running()
Lifecycle -> LaunchDaemon

LaunchDaemon -> LaunchDaemon: check_if_register_was_received()
note left #lightblue
This point is taken as **timestamp** used for
supervising application health monitoring. This point
is selected as before report_running we anyway monitoring
configured startup time per app and will handle errors in case
of timeout.
end note

alt not received
LaunchDaemon -> LaunchDaemon: error_reaction()
end

LaunchDaemon -> LaunchDaemon: check_if_notify_started_was_received()
alt not received
LaunchDaemon -> LaunchDaemon: error_reaction()
end


LaunchDaemon -> LaunchDaemon: start_monitor_user_application()
else APPLICATION_DOES NOT_USE_LIFECYCLE_API
note left of LaunchDaemon
**Health monitoring not allowed**, any register from this app
shall cause error reaction
end note
end

@enduml
18 changes: 18 additions & 0 deletions docs/module/health_monitor/architecture/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,24 @@ Dynamic Architecture

.. uml:: assets/hbm_usage.puml

.. comp_arc_dyn:: Health Monitoring Startup Interaction
:id: comp_arc_dyn__health_monitor__startup_view
:security: NO
:safety: ASIL_B
:status: valid
:fulfils: comp_req__health_monitor__dummy

.. uml:: assets/hm_startup.puml

.. comp_arc_dyn:: Health Monitoring Shutdown Interaction
:id: comp_arc_dyn__health_monitor__shutdown_view
:security: NO
:safety: ASIL_B
:status: valid
:fulfils: comp_req__health_monitor__dummy

.. uml:: assets/hm_shutdown.puml

Interfaces
----------

Expand Down
Loading