feat(datadog-aws-lambda): add distributed tracing for Rust Lambda functions#190
feat(datadog-aws-lambda): add distributed tracing for Rust Lambda functions#190Dogbu-cyber wants to merge 26 commits intodavid.ogbureke/aws-sdk-rustfrom
Conversation
There was a problem hiding this comment.
We should have all the code for inferred spans and trace extraction in libdatadog. cc @duncanista
There was a problem hiding this comment.
It's not currently ready to be in libdatadog so the relevant trace extraction code was taken from the extension and placed here.
There was a problem hiding this comment.
So we now have the same code in two locations? I think it makes more sense to work on moving the code from the extension to libdatadog than it does to duplicate all of this code.
Also, pretty sure that @duncanista moved the trace extraction code to libdatadog already.
There was a problem hiding this comment.
When I spoke to him he said his branch would not be ready in time, and that he thinks it's fine if this is duplicated here and then removed later.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9f0960ba57
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
ee01b82 to
612838b
Compare
aa4a9db to
decba09
Compare
612838b to
4f7442a
Compare
b1d472b to
fd24910
Compare
80fedd0 to
18c7914
Compare
f5a3f43 to
5fd8f52
Compare
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5fd8f52f48
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| .as_ref() | ||
| .map(|s| s.is_async) | ||
| .unwrap_or(false); | ||
| let upstream_cx = if validate_carrier(&result.carrier).is_some() { |
There was a problem hiding this comment.
Extract context without requiring Datadog trace-id header
TriggerExtractor::extract only calls the configured propagator when validate_carrier finds a numeric x-datadog-trace-id, so valid upstream contexts that are propagated as W3C-only headers (traceparent/tracestate) are dropped and a new root trace is created. This breaks distributed tracing for setups that use tracecontext-only injection (for example via propagation-style config) even though global::get_text_map_propagator can parse those headers; extraction should not be gated on a Datadog-specific key.
Useful? React with 👍 / 👎.
27e53f9 to
53797ee
Compare
bc510bc to
35b2e4f
Compare
3debc49 to
0b7351d
Compare
35b2e4f to
7d1fbbe
Compare
29211f0 to
d999596
Compare
c856bff to
ad914b1
Compare
5cf7cc6 to
778b9a8
Compare
…r inference Adds `datadog-lambda` (`integrations/aws/datadog-lambda/`), a crate that provides Datadog distributed tracing for Rust Lambda functions. Wrap your handler with `WrappedHandler` and each invocation automatically extracts upstream trace context, creates inferred trigger spans, and instruments the invocation with an `aws.lambda` root span. Supported triggers: - SQS: `_datadog` MessageAttribute (String, JSON) - SNS: `_datadog` MessageAttribute (Binary or String, JSON) - EventBridge: `_datadog` key in `Detail` JSON - SNS -> SQS: SQS body contains SNS notification - EventBridge -> SQS: SQS body contains EB event - EventBridge -> SNS: SNS message contains EB event - API Gateway REST v1 / HTTP v2: `headers` object (case-insensitive) - Lambda Function URL: `headers` object (case-insensitive)
Add the dd_resource_key meta tag to inferred spans for API Gateway HTTP (v2) and REST (v1) triggers, matching the Datadog Lambda Extension behavior. This tag is used by the Datadog backend to link inferred spans to AWS resources (e.g. the API Gateway "Invoked Functions" view). Also adds trigger_arn computation for both API Gateway trigger types and aws_region/get_aws_partition_by_region helpers mirroring the extension.
- Allow disallowed_methods on aws_region(): AWS_REGION / AWS_DEFAULT_REGION are Lambda platform variables, not application config - Allow type_complexity on test_handler() helper
… backend Replace the hand-rolled trigger detection logic (~1800 lines across triggers/) with libdd-trace-inferrer from libdatadog. The shared crate handles event parsing, carrier extraction, and inferred span construction; this crate now owns only the OTel span lifecycle on top of those results. - Add libdd-trace-inferrer workspace dependency (jordan.gonzalez/libdd-trace-inferrer branch) - Replace InferredSpan + triggers/ with InferenceResult from the shared crate - Delete the triggers/ directory (kept locally for reference, not compiled) - Drop DD_RESOURCE_KEY constant (no longer emitted) - Update tests to construct InferenceResult directly
…lue> Accept LambdaEvent<Box<RawValue>> in the Service impl so the runtime copies the payload bytes once without parsing them into a Value tree. Trigger extraction and typed deserialization both call .get() on the same RawValue, eliminating the intermediate serde_json::Value allocation. - Enable serde_json raw_value feature - Switch Service impl from LambdaEvent<Value> to LambdaEvent<Box<RawValue>> - Propagate &str through Invocation::start and TriggerExtractor::extract - Replace extract_from_headers(&Value) with RawHeaders<'_> zero-copy struct - Call inferrer.infer_span(&str) instead of infer_span_from_value(&Value)
…InferredSpanScope InferenceResult has at most one level of wrapping, so the span chain is always 0-2 elements. Replace the Vec with explicit outer and inner Option fields to reflect that constraint in the type.
global::get_text_map_propagator acquires a RwLock on every call. Resolve both the inferrer carrier and the header fallback carrier before entering the closure so the lock is taken exactly once.
…[allow(dead_code)]
SpanInferrer and AWS_REGION were previously reconstructed on every invocation. Store the inferrer in WrappedHandler (built once in new()) and thread it through Invocation::start and TriggerExtractor::extract.
…r-invocation lookup
…sult, avoid duplicate Strings
bacc2de to
81e8e60
Compare
libdd-trace-inferrer already handles carrier extraction for all trigger
types including API Gateway (REST/HTTP/WebSocket), ALB, and Lambda
Function URLs — it populates result.carrier directly from the event
headers. The local extract_from_headers_str fallback and carrier.rs
module were dead code.
Also replace format!("{err}") with err.to_string() in set_error.
Improves documentation density to match the style established in libdd-trace-inferrer: - Module-level `//!` docs on `invocation` and `span_inferrer` explaining each module's role and typical call sequence - Field-level `///` docs on `LambdaSpan`, `Invocation`, `TriggerContext`, `ActiveInferredSpan`, `TriggerExtraction` - Function docs on `LambdaSpan::start` (parent fallback logic), `Invocation::handler_context`, `finish`, `finish_spans`, `build_inferred_span` (start_ns fallback), `InferredSpanScope::start`, and `TriggerExtractor::extract` (zero trace-id sentinel) - `#[must_use]` on `handler_context` - `#![cfg_attr(not(test), deny(clippy::panic/unwrap_used/expect_used))]` to prevent tracing from silently crashing Lambda invocations
…th free function
TriggerExtractor carried no state and its single method was a free
function in disguise. Replace it with extract_trigger() at module level.
Move the trigger_tags key lookups ("function_trigger.event_source" /
"function_trigger.event_source_arn") from Invocation::start into
extract_trigger(), exposing them as typed fields on TriggerExtraction.
Invocation no longer reaches into InferenceResult's internal tag map.
…add innermost_context The tuple return (Context, Self) forced callers to juggle the innermost context as a separate value even though Self already owns it via self.inner. Drop the context from the return type and add innermost_context(&self, fallback) -> Context so the scope is self-contained. Callers pass the upstream context as the fallback for the no-inferred-spans case.
What does this PR do?
Adds
datadog-aws-lambda(instrumentation/datadog-aws-lambda/), a crate that provides Datadog distributed tracing for Rust Lambda functions. Wrap your handler withWrappedHandlerand each invocation automatically extracts upstream trace context, creates inferred trigger spans, and instruments the invocation with anaws.lambdaroot span.Trigger detection and carrier extraction are delegated to
libdd-trace-inferrer, an experimental shared crate in development inlibdatadog. This crate is a PoC implementation based on the work outlined in the Serverless Rust tracing design doc, originally started by @duncanista on thejordan.gonzalez/libdd-trace-inferrerbranch. This PR depends on a fork of that work atdavid.ogbureke/libdd-trace-inferrerto unblock the consumer side while the upstream crate matures.Supported triggers (as implemented by
libdd-trace-inferrer):aws.sqsaws.snsaws.eventbridgeaws.sns->aws.sqsaws.eventbridge->aws.sqsaws.eventbridge->aws.snsaws.apigatewayaws.httpapiaws.apigateway.websocketaws.lambda.urlaws.kinesisaws.dynamodbaws.s3aws.mskFor all trigger types, trace context carrier extraction is also handled by
libdd-trace-inferrer. A header-based fallback covers payloads not matched by any known trigger shape.Motivation
Completes the consumer side of distributed tracing through AWS managed services for Rust Lambdas. The producer side is handled by
datadog-aws(#189).Notes
lambda_runtimecrate.LambdaEvent<Box<RawValue>>— the runtime passes raw JSON bytes without deserializing into aValue, eliminating a redundant allocation before the user's type is constructed.global::get_text_map_propagator) is acquired once per invocation, not once per carrier branch.SpanInferrerandSdkTracerare constructed once at cold start and reused across invocations.datadog-opentelemetryis pulled in withfeatures = ["test-utils"]becauseset_trace_writer_synchronous_writeis currently gated behind that feature. Synchronous flush ensures spans are flushed from the handler's in-process buffer to the local Datadog extension before the handler returns, reducing span loss when the process freezes. This causes test-only deps (criterion, gRPC and HTTP exporters) to be compiled into the production binary, which has a binary size impact on cold starts.