Fix telemetry misattribution across connections on the same thread#1531
Open
sreekanth-db wants to merge 1 commit into
Open
Fix telemetry misattribution across connections on the same thread#1531sreekanth-db wants to merge 1 commit into
sreekanth-db wants to merge 1 commit into
Conversation
Telemetry export previously read the connection context from a static ThreadLocal (DatabricksThreadContextHolder) at export time. When multiple connections were used on the same thread (e.g. Thrift and SEA), the ThreadLocal held whichever connection was created last, so per-statement telemetry events were tagged with the wrong connection's context (transport mode, connection parameters, etc.). Each TelemetryCollector is already per-connection (keyed by connectionUuid), so it now stores its own connection context and passes it explicitly to TelemetryHelper.exportTelemetryLog instead of relying on the ThreadLocal. The old 2-arg overload is retained (deprecated) for backward compatibility. Fixes ES-1961329. Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Telemetry export previously read the connection context from a static
ThreadLocal(DatabricksThreadContextHolder) at export time. When multiple connections are used on the same thread (e.g. a Thrift connection and a SEA connection), theThreadLocalholds whichever connection was created last, so per-statement telemetry events were tagged with the wrong connection's context — including the transport mode (driverMode), connection UUID, and other connection parameters.This caused a measurable data-quality issue: ~22% of distinct SEA statement IDs in JDBC telemetry had no corresponding server-side record, because Thrift statements executed on the same thread were mislabeled as SEA (ES-1961329).
Root cause
TelemetryHelper.exportTelemetryLog(...)resolved the connection viaDatabricksThreadContextHolder.getConnectionContext(). ThatThreadLocalis set once perDatabricksConnectionconstruction and never refreshed per statement, so the last-created connection on a thread "wins" and all subsequent telemetry on that thread inherits its context.Fix
Each
TelemetryCollectoris already per-connection (keyed byconnectionUuidinTelemetryCollectorManager). It now:IDatabricksConnectionContextat construction time.TelemetryHelper.exportTelemetryLog(...)instead of relying on the sharedThreadLocal.A new
exportTelemetryLog(connectionContext, details, logLevel)overload takes the context explicitly; the old 2-arg overload is retained (deprecated) for backward compatibility.Real connections always have a unique
connectionUuid, so each gets its own collector holding its own context. Only context-less calls share theDEFAULT_CONNECTIONcollector; those events carry a null context and are skipped at export time, so no cross-connection misattribution can occur (documented inline).Files changed
TelemetryCollector.java— storesconnectionContext; passes it to all threeexportTelemetryLogcall sites; addsgetConnectionContext()accessor.TelemetryCollectorManager.java— forwards the context into the collector constructor; documents theDEFAULT_CONNECTIONtrade-off.TelemetryHelper.java— new explicit-contextexportTelemetryLogoverload; old overload deprecated.NEXT_CHANGELOG.md.Testing
Automated unit tests (added)
TelemetryCollectorTest: collector stores its own context (testCollectorStoresConnectionContext);recordOperationLatencycalls the 3-argexportTelemetryLog.TelemetryCollectorManagerTest: each collector stores the correct per-connection context (testCollectorStoresCorrectConnectionContext); a collector's context is not affected by subsequently creating another connection's collector (testCollectorContextNotAffectedBySubsequentConnectionCreation).TelemetryHelperTest:exportTelemetryLoguses the provided context and not the ThreadLocal (testExportTelemetryLogWithExplicitContext_UsesProvidedContext); null context short-circuits without export (testExportTelemetryLogWithExplicitContext_NullContextSkipsExport).Test runs
TelemetryCollectorTest,TelemetryCollectorManagerTest,TelemetryHelperTest: 128 tests, 0 failures.Manual end-to-end reproduction (SEA + Thrift on one thread)
Created a SEA connection and a Thrift connection on the same thread, executed a query on the SEA connection, and dumped the actual
DriverConnectionParameterstelemetry payload.Before the fix — payload for the SEA statement was built from the stale ThreadLocal (which held the Thrift connection):
After the fix — the collector uses its own stored context:
Fixes ES-1961329.