Add COS integration to the applications support it#799
Conversation
gboutry
left a comment
There was a problem hiding this comment.
What is the goal of this PR?
Cos agent
We don't want to relate grafana-agents to every apps in the deployment. Is that going to increase the number of units of otel co agent? Let's not do that, except if there's a valid reason.
What are we trying to observe? If we want semantically to observe all logs from a machine, then sunbeam-machine is the right principal to relate to.
If we want "workload" observability, then we should only relate to apps exposing the cos-agent or else relation.
Let's not blindly increase the matrix of relations / units too much please.
hardware observer
If hardware observer is only about observing hardware, why should we relate it to every apps? It should only be related to sunbeam-machine.
|
Have you tested this PR on a multi-node deployment? How much more time does this take to settle? Should we modify our heuristics on observability readiness to accomodate? |
|
@gboutry thanks for the feedback, let me convert to draft first and schedule a meeting to discuss this goal. Performance is indeed something to be considered. Please review together with canonical/sunbeam-terraform#142
I've tested this on a single node (all-in-one deployment) and maas mode deployment as mentioned here: https://github.com/canonical/sunbeam-maas-ps6/tree/main. |
d5b3631 to
926095f
Compare
|
Example juju status after enabling observability (single node, MAAS mode is similar but the output is too long): https://pastebin.ubuntu.com/p/HZjhgqTJXN/ Example CLI: https://pastebin.ubuntu.com/p/VjmRyb3nBR/ |
4a918fe to
1412cf3
Compare
|
@gboutry @hemanthnakkina please have a look when you have time |
- Relate microovn to observability agent if it is present in the model; network role is optional. - Deploy hardware observer and relate it to sunbeam machine charm in openstack machine model to monitor various hardware devices. Also relate observability agent (opentelemetry-collector) to sunbeam machine charm since hardware observer and otelcol need to colocate to the same principal charm to forward metrics / dashboard. - Add sub-command to allow users to configure hardware observer's 3rd party resources.
1412cf3 to
3bd4bcf
Compare
| @click.argument("resource-name", type=str) | ||
| @click.argument("resource-path", type=click.Path(exists=True, dir_okay=False)) | ||
| @pass_method_obj | ||
| def attach_resource( |
There was a problem hiding this comment.
The UX currently looks like
sunbeam observability attach-resource RESOURCE_NAME RESOURCE_PATH
sunbeam observability list-resources
First of all resources is charm terminology and may not make sense to an operator. And second the UX is not clear about hardware monitoring tools (however help shows that). Can we change the UX to
sunbeam observability add-hardware-monitoring-utility RESOURCE_NAME RESOURCE_PATH
sunbeam observability list-hardware-motinoring-utilities
- Will there be a case to completely reset the hardware monitoring utility once added? If so, we need another command to reset per RESOURCE_NAME
- Do charm hardware observer expects a few known values for RESOURCE_NAME?
There was a problem hiding this comment.
@tytus-kurek can you let us know your thoughts on the UX?
There was a problem hiding this comment.
Ok, I see the resource names @ https://charmhub.io/hardware-observer/resources/
Maybe in help we can have link to resources so that operator will know resource names that are supported
There was a problem hiding this comment.
About the UX, I check across different features (e.g. validation, vault, etc) they all have different convention, so I am not sure which is the best;. Another proposal from me is that:
sunbeam observability hardware-monitoring-utility add RESOURCE_NAME RESOURCE_PATH
sunbeam observability hardware-monitoring-utility list
but that might make the command group longer...
Will there be a case to completely reset the hardware monitoring utility once added? If so, we need another command to reset per RESOURCE_NAME
I think hwo does not support reset / un-upload resource..
Add monitoring for MicroOVN if it exists
Add monitoring for openstack machine model
Add sub-command to observability feature to upload resource for hardware observer
Set observability agent's
tls_insecure_skip_verifyconfig totrueinsecure_skip_verifyinMetricsEndpointProvideris overwritten bytls_insecure_skip_verifyconfig option opentelemetry-collector-k8s-operator#265true