Standalone spark connect trace per session by aboitreaud · Pull Request #11521 · DataDog/dd-trace-java

aboitreaud · 2026-06-01T15:32:05Z

What Does This Do

Emit one spark.application trace per Spark Connect session instead of a single global one for the whole server lifetime.

When the listener detects a Connect job (via the SparkConnect_OperationTag_..._Session_<id>_Operation_<id> tag on spark.jobTags), it parents spark.sql / spark.job spans under a per-session spark.application root span (tagged with session_id and spark.connect.server=true) rather than the global one. The per-session spans are kept in bounded maps and finished on server shutdown.

Motivation

A Spark Connect server is long-lived and hosts many independent client sessions. Today every session's work rolls up into one spark.application trace, which is unusable for per-user / per-session analysis and grows without bound.

https://datadoghq.atlassian.net/wiki/spaces/DNAP/pages/6756174072/Problem+Statement+Spark+Connect+and+Data+Jobs+Monitoring?atlOrigin=eyJpIjoiZmM0NmExYzI5OWRjNGVlMDk5NDg4NjExNDkzMDg5ZGIiLCJwIjoiY29uZmx1ZW5jZS1jaGF0cy1pbnQifQ

Additional Notes

Pure Connect servers (no global app span) now flush traces synchronously in finishApplication so per-session spans are not lost on JVM shutdown.
All per-session maps are guarded by the same maxCollectionSize() cap used elsewhere in the listener; on overflow, jobs fall back to the global spark.application span.
TODO left in finishApplication: per-session spans are currently closed on server shutdown rather than on session close, because Spark Connect does not surface a session-close event through SparkListener. To revisit when that hook becomes available.

Contributor Checklist

Format the title according to the contribution guidelines
Assign the type: and (comp: or inst:) labels in addition to any other useful labels
Avoid using close, fix, or any linking keywords when referencing an issue
Use solves instead, and assign the PR milestone to the issue
Update the CODEOWNERS file on source file addition, migration, or deletion
Update public documentation with any new configuration flags or behaviors
Add your completed PR to the merge queue by commenting `/merge`. You can also:
- Customize the commit message associated with the merge with `/merge --commit-message "..."`
- Remove your PR from the merge queue with `/merge -c`
- Skip all merge queue checks with `/merge -f --reason "reason"`; please use this judiciously, as some checks do not run at the PR-level
- Get more information in this doc

Jira ticket: [PROJ-IDENT]

…sion id

…application spans

…rengthen overflow test

aboitreaud · 2026-06-01T15:32:19Z

@codex review

dd-octo-sts · 2026-06-01T15:32:22Z

Hi! 👋 Thanks for your pull request! 🎉

To help us review it, please make sure to:

Add at least one type, and one component or instrumentation label to the pull request

If you need help, please check our contributing guidelines.

datadog-datadog-prod-us1-2 · 2026-06-01T15:33:11Z

✨ Fix all issues with BitsAI

⚠️ Warnings

🚦 8 Pipeline jobs failed

DataDog/apm-reliability/dd-trace-java | test_inst: [17, 5/8]

🔄 Retry job. This looks flaky and may succeed on retry.
7 failed tests in ReactorNettyHttpClientTest due to SSLException and TimeoutException.

Check pull requests | Check pull requests

platform error
Please add at least one type, and one component or instrumentation label to the pull request.

DataDog/apm-reliability/dd-trace-java | java-startup-parallel-check-slo-breaches

platform error
Failed to retrieve GitHub token. The policy 'gitlab.github-access.read' might not exist. Error: No such file or directory when accessing 'artifacts/fail-on-breach.github-token'.

View all 8 failed jobs.

Useful? React with 👍 / 👎

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 2e460d8 | Docs | Datadog PR Page | Give us feedback!}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e62cd6dbc1

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T15:35:06Z

+      if (tag.startsWith("spark-connect-session-")) {
+        String id = tag.substring("spark-connect-session-".length());


Recognize Spark's actual session job tag

In Spark Connect, jobs are tagged with the session tag generated by SparkSession.sessionJobTag, which is spark-session-$sessionUUID and is propagated through SQLExecution.withSessionTagsApplied; I could not find any Spark source path that emits spark-connect-session-. With this prefix check, real Connect jobs will return null here and continue to be parented under the global application span, so the new per-session trace behavior never activates outside these tests' synthetic properties.

Useful? React with 👍 / 👎.

…d of non-existent spark-connect-session-

aboitreaud · 2026-06-01T17:12:28Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e460d84d3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-01T17:15:21Z

+    } else if (connectSessionId != null) {
+      AgentSpan sessionAppSpan =
+          getOrCreatePerSessionApplicationSpan(connectSessionId, queryStart.time(), jobProperties);
+      spanBuilder.asChildOf(sessionAppSpan.context());
    } else if (isRunningOnDatabricks) {
      addDatabricksSpecificTags(spanBuilder, jobProperties, true);


Preserve Databricks parenting for Connect SQL jobs

When a SQL job on a Databricks cluster also carries Spark Connect spark.jobTags, this new branch wins before isRunningOnDatabricks, so addDatabricksSpecificTags(...) is skipped for the SQL span; because onJobStart then parents the job under that SQL span via the sqlSpan != null branch, neither span gets the Databricks job/run/task tags or parent context that this listener previously added for Databricks jobs. This only affects Spark Connect SQL workloads running in a Databricks environment; keep the Databricks path ahead of the Connect-session path or apply the Databricks tags/parenting as well.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-01T17:15:21Z

+      sessionAppSpan.finish(time * 1000);
+    }
+    perSessionApplicationSpans.clear();
+    perSessionApplicationMetrics.clear();
+    perSessionLastJobFailed.clear();
+    perSessionLastJobFailedMessage.clear();
+    perSessionLastJobFailedStackTrace.clear();


Flush finished session spans before the early return

For a pure Spark Connect server, the comment above notes that applicationSpan == null with jobCount > 0, so after these per-session application spans are finished the existing guard at line 410 returns before reaching the tracer.flush() used at application shutdown. Since these session root spans remain open until finishApplication, they can be left only in the writer queue when the JVM/SparkContext is stopping and may be dropped instead of being synchronously written; flush after finishing the per-session spans when this early-return path is taken.

Useful? React with 👍 / 👎.

aboitreaud added 4 commits June 1, 2026 15:31

Emit one spark.application trace per Spark Connect session

00ab7e6

Fix cap overflow, lastSqlFailed cross-session clearing, and empty ses…

5f862e9

…sion id

Add cap guard, TODO comment, and missing tests for per-session spark.…

c323779

…application spans

Replace mutable MAX_COLLECTION_SIZE with protected getter hook and st…

e62cd6d

…rengthen overflow test

aboitreaud requested a review from a team as a code owner June 1, 2026 15:32

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

Fix Connect session tag prefix: use SparkConnect_OperationTag_ instea…

2e460d8

…d of non-existent spark-connect-session-

chatgpt-codex-connector Bot reviewed Jun 1, 2026

View reviewed changes

aboitreaud added inst: apache spark Apache Spark instrumentation type: enhancement Enhancements and improvements labels Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standalone spark connect trace per session#11521

Standalone spark connect trace per session#11521
aboitreaud wants to merge 5 commits into
masterfrom
adrien.boitreaud/spark-connect-per-session-traces

aboitreaud commented Jun 1, 2026 •

edited

Loading

Uh oh!

aboitreaud commented Jun 1, 2026

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026 •

edited

Loading

Uh oh!

datadog-datadog-prod-us1-2 Bot commented Jun 1, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

aboitreaud commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if (tag.startsWith("spark-connect-session-")) {
		String id = tag.substring("spark-connect-session-".length());

Conversation

aboitreaud commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What Does This Do

Motivation

Additional Notes

Contributor Checklist

Uh oh!

aboitreaud commented Jun 1, 2026

Uh oh!

dd-octo-sts Bot commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datadog-datadog-prod-us1-2 Bot commented Jun 1, 2026 • edited by datadog-datadog-prod-us1 Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

aboitreaud commented Jun 1, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aboitreaud commented Jun 1, 2026 •

edited

Loading

dd-octo-sts Bot commented Jun 1, 2026 •

edited

Loading

datadog-datadog-prod-us1-2 Bot commented Jun 1, 2026 •

edited by datadog-datadog-prod-us1 Bot

Loading