[Orchestrator] add new check to collect ecs tasks by kangyili · Pull Request #22060 · DataDog/datadog-agent

kangyili · 2024-01-15T15:03:54Z

What does this PR do?

This PR requires

New check `orchestrator_ecs`

Add a new check orchestrator_ecs in core checks to collect running tasks from ecs-ec2 and ecs-fargate.
The new check is controlled by DD_ORCHESTRATOR_EXPLORER_ENABLED and DD_ORCHESTRATOR_EXPLORER_ECS_COLLECTION_ENABLED
It uses Workloadmeta for pulling tasks in every run https://github.com/DataDog/datadog-agent/blob/f67c3d667e00dcf6350a9d80a7075998e57ad9e3/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs/task.go#L57
The payloads are sent to orchestrator endpoints

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

cit-pr-commenter · 2024-01-15T15:09:09Z

Go Package Import Differences

Baseline: 46c7bd1
Comparison: e78b81b

binary	os	arch	change
agent	linux	amd64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
agent	linux	arm64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
agent	windows	amd64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
agent	windows	386	+1, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
agent	darwin	amd64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
agent	darwin	arm64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
iot-agent	linux	amd64	+1, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
iot-agent	linux	arm64	+1, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
heroku-agent	linux	amd64	+1, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
cluster-agent	linux	amd64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs
cluster-agent	linux	arm64	+4, -0 +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/collectors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/processors/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/cluster/orchestrator/transformers/ecs +github.com/DataDog/datadog-agent/pkg/collector/corechecks/orchestrator/ecs

pr-commenter · 2024-01-15T17:23:18Z

Bloop Bleep... Dogbot Here

Regression Detector Results

Run ID: d9233538-ba17-4993-bdc8-1153e942e6ee
Baseline: b4f0a17
Comparison: 9fd8b7a3d7ac7cdf81ad97a36cf575b91067344e

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+0.02	[-6.28, +6.32]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	otel_to_otel_logs	ingress throughput	+1.30	[+0.64, +1.95]
➖	process_agent_standard_check	memory utilization	+0.63	[+0.59, +0.67]
➖	idle	memory utilization	+0.38	[+0.34, +0.43]
➖	file_to_blackhole	% cpu utilization	+0.02	[-6.28, +6.32]
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.02	[-0.04, +0.07]
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.06, +0.06]
➖	trace_agent_json	ingress throughput	+0.00	[-0.01, +0.01]
➖	trace_agent_msgpack	ingress throughput	-0.02	[-0.03, -0.01]
➖	process_agent_standard_check_with_stats	memory utilization	-0.04	[-0.09, -0.00]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-0.18	[-2.98, +2.62]
➖	file_tree	memory utilization	-0.68	[-0.78, -0.58]
➖	process_agent_real_time_mode	memory utilization	-0.83	[-0.87, -0.78]
➖	tcp_syslog_to_blackhole	ingress throughput	-0.95	[-1.03, -0.87]
➖	basic_py_check	% cpu utilization	-2.04	[-4.28, +0.21]
➖	pycheck_1000_100byte_tags	% cpu utilization	-2.63	[-7.52, +2.25]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

cswatt

release notes approved by docs

ogaca-dd

LGTM for files owned by agent shared components

kangyili · 2024-03-18T11:36:37Z

I created a new PR only updates code structure #23823

pr-commenter · 2024-03-19T15:55:38Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=30572198 --os-family=ubuntu

pr-commenter · 2024-03-19T15:55:46Z

Regression Detector

Regression Detector Results

Run ID: cf9295b8-f32c-4900-9c2c-31aeda2746e4
Baseline: 46c7bd1
Comparison: e78b81b

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+1.32	[-5.10, +7.75]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+1.32	[-5.10, +7.75]
➖	idle	memory utilization	+0.29	[+0.26, +0.32]
➖	tcp_syslog_to_blackhole	ingress throughput	+0.15	[+0.07, +0.23]
➖	otel_to_otel_logs	ingress throughput	+0.11	[-0.31, +0.52]
➖	process_agent_standard_check_with_stats	memory utilization	+0.06	[+0.04, +0.09]
➖	trace_agent_msgpack	ingress throughput	+0.03	[+0.01, +0.04]
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.02	[-0.02, +0.05]
➖	trace_agent_json	ingress throughput	+0.00	[-0.03, +0.04]
➖	uds_dogstatsd_to_api	ingress throughput	+0.00	[-0.20, +0.20]
➖	pycheck_1000_100byte_tags	% cpu utilization	-0.07	[-4.97, +4.83]
➖	file_tree	memory utilization	-0.24	[-0.33, -0.16]
➖	process_agent_real_time_mode	memory utilization	-0.59	[-0.62, -0.56]
➖	process_agent_standard_check	memory utilization	-0.63	[-0.67, -0.59]
➖	basic_py_check	% cpu utilization	-1.14	[-3.62, +1.34]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	-2.94	[-5.60, -0.28]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

xlucas · 2024-03-20T14:05:30Z

+	}
+
+	taskModel.Tags = tags
+	taskModel.EcsTags = toTags(task.Task.Tags)


Is there any normalization needed for those tags?

I'm not sure if we should do the normalisation here as the tag standard of ecs and datadog are different, changing tags can confuse user. We can probably leave it as it is for now

…cs-new-core-check

kangyili · 2024-03-21T16:41:50Z

/merge

dd-devflow · 2024-03-21T16:41:55Z

🚂 MergeQueue

Pull request added to the queue.

There are 3 builds ahead! (estimated merge in less than 29m)

Use /merge -c to cancel this operation!

dd-devflow · 2024-03-21T16:42:04Z

❌ MergeQueue

This merge request conflicts with another merge request ahead in the queue.

The merge requests in front of this one are:

#23963 with merge sha 9819a78
#23912 with merge sha fd8f744
#23621 with merge sha 3ab1ce3

If you need support, contact us on Slack #ci-interfaces with those details!

kangyili · 2024-03-21T17:09:20Z

/merge

dd-devflow · 2024-03-21T17:09:26Z

🚂 MergeQueue

Pull request added to the queue.

There are 3 builds ahead! (estimated merge in less than 29m)

Use /merge -c to cancel this operation!

dd-devflow · 2024-03-21T17:09:32Z

❌ MergeQueue

This merge request conflicts with another merge request ahead in the queue.

The merge requests in front of this one are:

#23621 with merge sha 0683c6d
#23931 with merge sha c6656b7
#23912 with merge sha 130c484

If you need support, contact us on Slack #ci-interfaces with those details!

…cs-new-core-check

kangyili · 2024-03-21T17:51:52Z

/merge

dd-devflow · 2024-03-21T17:51:57Z

🚂 MergeQueue

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

dd-devflow · 2024-03-21T19:12:50Z

🚂 MergeQueue

Added to the queue.

This build is next! (estimated merge in less than 28m)

Use /merge -c to cancel this operation!

* add new check to collect ecs tasks * address feedback * bump agent payload version

kangyili requested review from a team as code owners January 15, 2024 15:03

kangyili mentioned this pull request Jan 15, 2024

Update Workloadmeta ECS collector to use metadata v4 endpoint #21836

Merged

10 tasks

kangyili added this to the 7.52.0 milestone Jan 15, 2024

kangyili added the [deprecated] team/container-app label Jan 15, 2024

kangyili force-pushed the kangyi/ecs-new-core-check branch from 0a9bc12 to 19014bd Compare January 15, 2024 16:10

cswatt approved these changes Jan 16, 2024

View reviewed changes

ogaca-dd approved these changes Jan 23, 2024

View reviewed changes

kangyili commented Feb 1, 2024

View reviewed changes

Comment thread go.mod Outdated

xlucas requested changes Feb 8, 2024

View reviewed changes

Comment thread pkg/collector/corechecks/cluster/orchestrator/processors/common/base.go Outdated

Comment thread pkg/orchestrator/model/types.go Outdated

xlucas reviewed Feb 8, 2024

View reviewed changes

Comment thread pkg/collector/corechecks/cluster/orchestrator/collector_bundle.go Outdated

xlucas reviewed Feb 8, 2024

View reviewed changes

Comment thread pkg/collector/corechecks/cluster/orchestrator/collectors/collector.go

xlucas reviewed Feb 8, 2024

View reviewed changes

Comment thread pkg/collector/corechecks/cluster/orchestrator/collectors/ecs/task.go Outdated

mfpierre modified the milestones: 7.52.0, 7.53.0 Feb 15, 2024

kangyili force-pushed the kangyi/ecs branch from c390568 to e7d3c14 Compare February 28, 2024 14:31

kangyili requested a review from a team February 28, 2024 14:31

kangyili force-pushed the kangyi/ecs branch 3 times, most recently from 8d02b6a to 09f57df Compare March 1, 2024 09:19

kangyili marked this pull request as draft March 1, 2024 09:32

kangyili force-pushed the kangyi/ecs-new-core-check branch from 19014bd to 7bac94c Compare March 1, 2024 09:33

kangyili changed the base branch from kangyi/ecs to kangyi/ecs-fargate March 1, 2024 09:33

kangyili force-pushed the kangyi/ecs-new-core-check branch 3 times, most recently from b6cfa1d to 66bbe5d Compare March 1, 2024 13:28

kangyili force-pushed the kangyi/ecs-new-core-check branch from 58f15d5 to 076db95 Compare March 18, 2024 12:33

fisherevans approved these changes Mar 18, 2024

View reviewed changes

Comment thread go.mod Outdated

Comment thread pkg/collector/corechecks/cluster/orchestrator/collectors/ecs/task.go Outdated

vickenty approved these changes Mar 18, 2024

View reviewed changes

kangyili force-pushed the kangyi/update-code-structure branch from e9f189e to 52fb6a1 Compare March 19, 2024 08:50

Base automatically changed from kangyi/update-code-structure to main March 19, 2024 11:01

add new check to collect ecs tasks

d8faa74

kangyili force-pushed the kangyi/ecs-new-core-check branch from 9fd8b7a to d8faa74 Compare March 19, 2024 14:02

xlucas approved these changes Mar 20, 2024

View reviewed changes

address feedback

2cc7e05

kangyili force-pushed the kangyi/ecs-new-core-check branch from 7225ef1 to 2cc7e05 Compare March 21, 2024 12:17

kangyili added 2 commits March 21, 2024 13:28

bump agent payload version

2cdca36

Merge branch 'main' of github.com:DataDog/datadog-agent into kangyi/e…

1b9e971

…cs-new-core-check

clamoriniere approved these changes Mar 21, 2024

View reviewed changes

Merge branch 'main' of github.com:DataDog/datadog-agent into kangyi/e…

e78b81b

…cs-new-core-check

dd-mergequeue bot merged commit e6ec7d5 into main Mar 21, 2024

dd-mergequeue bot deleted the kangyi/ecs-new-core-check branch March 21, 2024 19:45

alexgallotta pushed a commit that referenced this pull request May 9, 2024

[Orchestrator] add new check to collect ecs tasks (#22060)

477a99b

* add new check to collect ecs tasks * address feedback * bump agent payload version

kangyili mentioned this pull request Jul 26, 2024

enable ecs collection in all e2e tests DataDog/test-infra-definitions#979

Merged

Conversation

kangyili commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

New check orchestrator_ecs

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

Uh oh!

cit-pr-commenter bot commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Go Package Import Differences

Uh oh!

pr-commenter bot commented Jan 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bloop Bleep... Dogbot Here

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

Uh oh!

cswatt left a comment

Choose a reason for hiding this comment

Uh oh!

ogaca-dd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kangyili commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pr-commenter bot commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test changes on VM

Uh oh!

pr-commenter bot commented Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xlucas Mar 20, 2024

Choose a reason for hiding this comment

Uh oh!

kangyili Mar 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kangyili commented Mar 21, 2024

Uh oh!

dd-devflow bot commented Mar 21, 2024

Uh oh!

dd-devflow bot commented Mar 21, 2024

Uh oh!

kangyili commented Mar 21, 2024

Uh oh!

dd-devflow bot commented Mar 21, 2024

Uh oh!

dd-devflow bot commented Mar 21, 2024

Uh oh!

kangyili commented Mar 21, 2024

kangyili commented Jan 15, 2024 •

edited

Loading

New check `orchestrator_ecs`

cit-pr-commenter bot commented Jan 15, 2024 •

edited

Loading

pr-commenter bot commented Jan 15, 2024 •

edited

Loading

kangyili commented Mar 18, 2024 •

edited

Loading

pr-commenter bot commented Mar 19, 2024 •

edited

Loading

pr-commenter bot commented Mar 19, 2024 •

edited

Loading