-
Notifications
You must be signed in to change notification settings - Fork 12
Health Monitoring interaction view #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Health Monitoring interaction view #52
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run //:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
6f2302a to
3493f35
Compare
| HealthMonitorBuilder --> user: HealthMonitor instance | ||
|
|
||
| user -> HealthMonitor: start() | ||
| HealthMonitor -> LaunchDaemon: notify_started(timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One point of discussion could be if we want to explicitly start the monitoring via a separate IPC call notify_started(timestamp) as described here or just use the existing report_running as a trigger for launch_manager to start the monitoring.
From the user point of view it could be viewed as redundant to have both as both say somewhat "I finished initialization".
I am not sure which one is better, there seem to be advantages/disadvantages in both approaches.
A related question would be: Do we want to support monitoring of applications that do not use the lifecycle api and do not report running state? The diagram currently assumes no.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notify_startedis nothing user calls, user callsHealthMonitor.start()- this call is needed anyway and I thought it will be better if we have at LaunchDaemon clear information that user have started it, otherwise its fully bogus setup that will finalize with LaunchDaemon thinking there is no alive ping. But, guess both optios are valid.- Yes, assumes no - but this does not cover system monitoring, so i still expect you may be monitoring on process level, but this was not the scope of this flow.
|
|
||
| == Application Side == | ||
|
|
||
| user -> HealthMonitorBuilder : build(supervisor_api_notification_cycle_time, ...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add to this diagram:
- During startup launch_manager reads its config for the application and will know whether it should be monitored or not.
- If monitoring is enabled, the launch_manager will setup the required IPC communication when starting the process
| ... | ||
|
|
||
| user -> Lifecycle: report_running() | ||
| Lifecycle -> LaunchDaemon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also depict when the monitoring stops during shutdown of the application.
My assumption would be that:
- Launch_manager stops monitoring at the point in time it sends SIGTERM to the application process
Open question:
Does the sending of alive notification on the application side need to be stopped, or is it enough that these are just ignored by launch_manager?
Do we want to support monitoring for "self-terminating" applications that terminate on their own?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since in app library does not take actions, it shall be fine just to do nothing there. But we shall describe it - correct.
- "My assumption would be that" - you mean a point in which the app receive shutdown request via lifecycle ?
| main -> HealthMonitor: destroy() | ||
| HealthMonitor -> LaunchDaemon: notify_stopped(timestamp) | ||
| note left | ||
| Notification is send to keep consistent with self terminating case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to show the case when the notify_stopped was send but the app did not shutdown within the timeout?
| end note | ||
| ... | ||
|
|
||
| user -> Lifecycle: report_running() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should show the case when the timeout of startup to running was crossed?
| HealthMonitorBuilder --> user: HealthMonitor instance | ||
|
|
||
| user -> HealthMonitor: start() | ||
| HealthMonitor -> LaunchDaemon: notify_started(timestamp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is setting up the ipc channel and how?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not yet in scope as we did not tackled usage of mw::com as API. So this we will add once we design MW:COM api after having first working version.
For now kept in HM component, shall be actually in some common place as it shows all 3 components working together.
Closes eclipse-score/score#2476