Skip to content

feat: elasticsearch_up 指标添加cluster标签#1413

Merged
kongfei605 merged 6 commits intoflashcatcloud:mainfrom
0x0034:feat/esAddClusterName
Mar 3, 2026
Merged

feat: elasticsearch_up 指标添加cluster标签#1413
kongfei605 merged 6 commits intoflashcatcloud:mainfrom
0x0034:feat/esAddClusterName

Conversation

@0x0034
Copy link
Contributor

@0x0034 0x0034 commented Mar 3, 2026

No description provided.

Signed-off-by: ruochen <wanxialianwei@gmail.com>
@0x0034 0x0034 closed this Mar 3, 2026
@0x0034 0x0034 reopened this Mar 3, 2026
0x0034 added 2 commits March 3, 2026 15:50
…pes 指标添加cluster标签

Signed-off-by: ruochen <wanxialianwei@gmail.com>
Signed-off-by: ruochen <wanxialianwei@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an Elasticsearch cluster identifier as a label so that “up”/node stats metrics can be distinguished across clusters when scraping multiple Elasticsearch endpoints.

Changes:

  • Extends serverInfo to store clusterName and attempts to fetch it during Gather().
  • Adds a cluster label to the elasticsearch up sample.
  • Threads clusterName into collector.NewNodes(...) and applies it as a const label on node-stats “up/scrape/failure” metrics.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
inputs/elasticsearch/elasticsearch.go Fetches cluster name, adds cluster label to elasticsearch up, and passes cluster name into node stats collector.
inputs/elasticsearch/collector/nodes.go Extends NewNodes signature, adds const cluster labels to a few metrics, and introduces GetClusterName.
inputs/elasticsearch/collector/nodes_test.go Updates NewNodes call site for the new function signature.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2281 to +2312
func GetClusterName(client *http.Client, user, password, s string) (string, error) {
u, err := url.Parse(s)
if err != nil {
return "", fmt.Errorf("failed to parse URL %s: %s", s, err)
}
if user != "" && password != "" {
u.User = url.UserPassword(user, password)
}

var cir ClusterInfoResponse
res, err := client.Get(u.String())
if err != nil {
return "", fmt.Errorf("failed to get cluster info from %s: %s", u.String(), err)
}
defer func() {
err = res.Body.Close()
if err != nil {
log.Println("failed to close http.Client, err: ", err)
}
}()
if res.StatusCode != http.StatusOK {
return "", fmt.Errorf("HTTP Request failed with code %d", res.StatusCode)
}
bts, err := io.ReadAll(res.Body)
if err != nil {
return "", err
}
if err := json.Unmarshal(bts, &cir); err != nil {
return "", err
}
return cir.ClusterName, nil
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetClusterName is new behavior and currently has no unit test coverage. Since this package already has HTTP fixture-based tests (e.g., nodes_test.go), consider adding a small test that validates cluster name extraction and non-200 handling.

Copilot uses AI. Check for mistakes.

// Always gather node stats
if err := inputs.Collect(collector.NewNodes(ins.Client, EsUrl, ins.AllNodes, ins.Node, ins.Local, ins.NodeStats), slist); err != nil {
if err := inputs.Collect(collector.NewNodes(ins.Client, EsUrl, ins.AllNodes, ins.Node, ins.Local, ins.NodeStats, ins.serverInfo[ins.Node].clusterName), slist); err != nil {
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ins.serverInfo is keyed by server URL (s), but here the lookup uses ins.Node (often a selector like "_local"), so clusterName will usually be empty and the cluster label on node stats metrics will be wrong. Use the current server key (s) when retrieving clusterName (and consider locking/using a local copy if needed).

Suggested change
if err := inputs.Collect(collector.NewNodes(ins.Client, EsUrl, ins.AllNodes, ins.Node, ins.Local, ins.NodeStats, ins.serverInfo[ins.Node].clusterName), slist); err != nil {
clusterName := ""
if info, ok := ins.serverInfo[s]; ok {
clusterName = info.clusterName
}
if err := inputs.Collect(collector.NewNodes(ins.Client, EsUrl, ins.AllNodes, ins.Node, ins.Local, ins.NodeStats, clusterName), slist); err != nil {

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +211
if info.clusterName, err = collector.GetClusterName(ins.Client, ins.UserName, ins.Password, s); err != nil {
slist.PushSample("elasticsearch", "up", 0, map[string]string{"address": s})
log.Println("E! failed to get cluster name:", err)
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetClusterName adds an extra HTTP request per server per Gather, but the existing GetNodeID call already unmarshals a nodeStatsResponse that includes cluster_name (see nodeStatsResponse.ClusterName). To reduce overhead, consider returning the cluster name from the same request (e.g., a new helper that returns both nodeID and clusterName) and avoid this additional GET / call.

Copilot uses AI. Check for mistakes.
kongfei605 and others added 3 commits March 3, 2026 16:04
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…pes 指标添加 address 标签

Signed-off-by: ruochen <wanxialianwei@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 209 to +215
up: prometheus.NewGauge(prometheus.GaugeOpts{
Name: prometheus.BuildFQName(namespace, "node_stats", "up"),
Help: "Was the last scrape of the Elasticsearch nodes endpoint successful.",
ConstLabels: prometheus.Labels{
"cluster": clusterName,
"address": url.String(),
},
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewNodes sets address ConstLabels to url.String(). Since the URL passed in is configured with basic auth (EsUrl.User = url.UserPassword(...)), this will embed credentials (user/password or API key) into Prometheus labels/metrics output. Please ensure the address label is derived from a sanitized URL (e.g., clone the URL and clear User, or use Scheme+Host only) so secrets are never exported as labels.

Copilot uses AI. Check for mistakes.
Comment on lines +2290 to +2303
func GetClusterName(client *http.Client, user, password, s string) (string, error) {
u, err := url.Parse(s)
if err != nil {
return "", fmt.Errorf("failed to parse URL %s: %s", s, err)
}
if user != "" && password != "" {
u.User = url.UserPassword(user, password)
}

var cir ClusterInfoResponse
res, err := client.Get(u.String())
if err != nil {
return "", fmt.Errorf("failed to get cluster info from %s: %s", u.String(), err)
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetClusterName is newly introduced but doesn’t have unit test coverage. Since similar helpers (e.g., fetchAndDecodeNodeStats) are tested in nodes_test.go, please add tests for GetClusterName covering: successful parsing of cluster_name, non-200 responses, and basic auth handling.

Copilot uses AI. Check for mistakes.
Comment on lines +208 to +217
if info.clusterName, err = collector.GetClusterName(ins.Client, ins.UserName, ins.Password, s); err != nil {
slist.PushSample("elasticsearch", "up", 0, map[string]string{"address": s})
log.Println("E! failed to get cluster name:", err)
return
}

slist.PushSample("elasticsearch", "up", 1, map[string]string{
"address": s,
"cluster": info.clusterName,
})
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The elasticsearch/up metric now uses different label sets depending on success vs failure: on failure you emit only {address: s}, but on success you emit {address: s, cluster: ...}. This creates separate time series for the same target and can leave stale series behind. Consider always including the cluster label (set to ""/"unknown" when unavailable) or keeping labels consistent across both paths.

Copilot uses AI. Check for mistakes.
Comment on lines +285 to 286
if err := inputs.Collect(collector.NewNodes(ins.Client, EsUrl, ins.AllNodes, ins.Node, ins.Local, ins.NodeStats, ins.serverInfo[ins.Node].clusterName), slist); err != nil {
log.Println("E! failed to collect nodes metrics:", err)
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewNodes(..., ins.serverInfo[ins.Node].clusterName) looks up serverInfo using ins.Node (e.g. "_local"), but serverInfo is populated with keys of the server URL string (s). This will typically pass an empty cluster name. Also, ins.serverInfo is only refreshed inside the earlier if ins.ClusterStats || len(ins.IndicesInclude) > 0 block; when that block is skipped, this read can use stale data from a previous run. Please derive the cluster name for the current server (s / EsUrl) with consistent key normalization and ensure it’s populated (or explicitly empty) on every Gather run before passing into NewNodes.

Copilot uses AI. Check for mistakes.
@kongfei605 kongfei605 merged commit 6567a38 into flashcatcloud:main Mar 3, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants