Skip to content

Add COS integration to the applications support it#799

Open
chanchiwai-ray wants to merge 2 commits into
canonical:mainfrom
chanchiwai-ray:observability-enablement
Open

Add COS integration to the applications support it#799
chanchiwai-ray wants to merge 2 commits into
canonical:mainfrom
chanchiwai-ray:observability-enablement

Conversation

@chanchiwai-ray
Copy link
Copy Markdown

@chanchiwai-ray chanchiwai-ray commented May 21, 2026

Add monitoring for MicroOVN if it exists

  • Relate microovn to observability agent if it is present in the model because network role is optional.

Add monitoring for openstack machine model

  • Deploy hardware observer and relate it to sunbeam machine charm in openstack machine model to monitor various hardware devices
  • Relate observability agent to sunbeam-machine charm via juju-info because it's required for hardware-observer and observability agent to be in the same principal charm to forward alerts / metrics / dashboard

Add sub-command to observability feature to upload resource for hardware observer

  • hardware observer requires 3rd party proprietary resources.
  • UI is tested but I don't have the relevant hardware to test the upload resource

Set observability agent's tls_insecure_skip_verify config to true

Copy link
Copy Markdown
Collaborator

@gboutry gboutry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the goal of this PR?

Cos agent

We don't want to relate grafana-agents to every apps in the deployment. Is that going to increase the number of units of otel co agent? Let's not do that, except if there's a valid reason.

What are we trying to observe? If we want semantically to observe all logs from a machine, then sunbeam-machine is the right principal to relate to.

If we want "workload" observability, then we should only relate to apps exposing the cos-agent or else relation.

Let's not blindly increase the matrix of relations / units too much please.

hardware observer

If hardware observer is only about observing hardware, why should we relate it to every apps? It should only be related to sunbeam-machine.

@gboutry
Copy link
Copy Markdown
Collaborator

gboutry commented May 21, 2026

Have you tested this PR on a multi-node deployment? How much more time does this take to settle? Should we modify our heuristics on observability readiness to accomodate?

@chanchiwai-ray
Copy link
Copy Markdown
Author

@gboutry thanks for the feedback, let me convert to draft first and schedule a meeting to discuss this goal. Performance is indeed something to be considered. Please review together with canonical/sunbeam-terraform#142

Have you tested this PR on a multi-node deployment? How much more time does this take to settle? Should we modify our heuristics on observability readiness to accomodate?

I've tested this on a single node (all-in-one deployment) and maas mode deployment as mentioned here: https://github.com/canonical/sunbeam-maas-ps6/tree/main.

@chanchiwai-ray chanchiwai-ray marked this pull request as draft May 21, 2026 10:58
@chanchiwai-ray chanchiwai-ray force-pushed the observability-enablement branch from d5b3631 to 926095f Compare May 26, 2026 08:49
@chanchiwai-ray
Copy link
Copy Markdown
Author

Example juju status after enabling observability (single node, MAAS mode is similar but the output is too long): https://pastebin.ubuntu.com/p/HZjhgqTJXN/

Example CLI: https://pastebin.ubuntu.com/p/VjmRyb3nBR/

@chanchiwai-ray chanchiwai-ray marked this pull request as ready for review May 26, 2026 08:54
@chanchiwai-ray chanchiwai-ray force-pushed the observability-enablement branch from 4a918fe to 1412cf3 Compare May 26, 2026 09:55
@chanchiwai-ray chanchiwai-ray requested a review from gboutry May 26, 2026 10:00
@chanchiwai-ray
Copy link
Copy Markdown
Author

@gboutry @hemanthnakkina please have a look when you have time

- Relate microovn to observability agent if it is present in the model;
  network role is optional.
- Deploy hardware observer and relate it to sunbeam machine charm in
  openstack machine model to monitor various hardware devices. Also
  relate observability agent (opentelemetry-collector) to sunbeam
  machine charm since hardware observer and otelcol need to colocate to
  the same principal charm to forward metrics / dashboard.
- Add sub-command to allow users to configure hardware observer's 3rd
  party resources.
@chanchiwai-ray chanchiwai-ray force-pushed the observability-enablement branch from 1412cf3 to 3bd4bcf Compare June 3, 2026 02:11
@chanchiwai-ray chanchiwai-ray marked this pull request as ready for review June 3, 2026 02:39
@click.argument("resource-name", type=str)
@click.argument("resource-path", type=click.Path(exists=True, dir_okay=False))
@pass_method_obj
def attach_resource(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UX currently looks like

sunbeam observability attach-resource RESOURCE_NAME RESOURCE_PATH
sunbeam observability list-resources

First of all resources is charm terminology and may not make sense to an operator. And second the UX is not clear about hardware monitoring tools (however help shows that). Can we change the UX to

sunbeam observability add-hardware-monitoring-utility RESOURCE_NAME RESOURCE_PATH
sunbeam observability list-hardware-motinoring-utilities
  • Will there be a case to completely reset the hardware monitoring utility once added? If so, we need another command to reset per RESOURCE_NAME
  • Do charm hardware observer expects a few known values for RESOURCE_NAME?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tytus-kurek can you let us know your thoughts on the UX?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see the resource names @ https://charmhub.io/hardware-observer/resources/

Maybe in help we can have link to resources so that operator will know resource names that are supported

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the UX, I check across different features (e.g. validation, vault, etc) they all have different convention, so I am not sure which is the best;. Another proposal from me is that:

sunbeam observability hardware-monitoring-utility add RESOURCE_NAME RESOURCE_PATH
sunbeam observability hardware-monitoring-utility list 

but that might make the command group longer...

Will there be a case to completely reset the hardware monitoring utility once added? If so, we need another command to reset per RESOURCE_NAME

I think hwo does not support reset / un-upload resource..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants