Open
Conversation
Signed-off-by: toby lorne <toby@toby.codes>
adds an optional prometheus server which handles /metrics adds flags for enabling prometheus server, configuring listen address and port adds 3 metrics: - skuttle_node_termination_errors_total - skuttle_node_termination_skips_total - skuttle_node_terminations_total this will enable monitoring node skuttling activity via prometheus Signed-off-by: toby lorne <toby@toby.codes>
vixus0
requested changes
Jun 24, 2021
Owner
vixus0
left a comment
There was a problem hiding this comment.
Looks good overall, just the topology label keys can be updated :)
Re: testing, I have a vague idea around integration testing with kind since you can define multiple nodes with labels in its configuration file. Just need to get round to setting it up.
Setting the topology labels is dependent on the cloud-controller, right? The Kubernetes docs suggest /zone and /region should be set but I'm not sure about /role. It's not in the "well known labels": https://github.com/kubernetes/api/blob/ccc65c06cccc78a07b45598ec7c135dca7d84ed2/core/v1/well_known_labels.go#L22
role -> instance type role is not available, instance type is useful refering to https://kubernetes.io/docs/reference/labels-annotations-taints/ Signed-off-by: toby lorne <toby@toby.codes> Co-authored-by: vixus0 <vixus0@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Disclaimer
I have not tested this as I have not been particularly energised to set up mocks, nor do I have access to an AWS k8s cluster
What
Prometheus can be enabled via flags with configurable address and port
Prometheus exposes binary metrics, as well as three controller metrics:
skuttle_node_termination_errors_totalskuttle_node_termination_skips_totalskuttle_node_terminations_totalThese metrics record errors when skuttling, skips (when nodes exist but are picked up by skuttle) and skuttles (node terminations)
Why
So we can have time-series charts showing EC2 node terminations, when skuttle decides to drop depth charges
How to use
When prometheus exposes metrics, they appear like the following:
When Kubernetes does not have sufficient label metadata that should be set by default (refer to the code), the metrics appear as:
These metrics enable a skuttle SLI:
which could potentially be drilled down via region or availability zones, which may provide insights which could optimise spot instance placement
How to review