Skip to content

Add multi-database profiling support for SQL Server#2536

Draft
m-abulazm wants to merge 3 commits into
mainfrom
feat/profiler/mssql-multidb
Draft

Add multi-database profiling support for SQL Server#2536
m-abulazm wants to merge 3 commits into
mainfrom
feat/profiler/mssql-multidb

Conversation

@m-abulazm

@m-abulazm m-abulazm commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Changes

What does this PR do?

The mssql profiler previously assessed only a single database per run. This PR adds automatic detection between single-database and multi-database profiling based on the SQL Server edition and the configured database, and introduces a reusable variant-resolution mechanism to support it.

mssql auto-detection

  • A configured database scopes profiling to that one database (single_db).
  • A blank database lets the edition decide: Azure SQL Database (EngineEdition = 5) falls back to single_db; on-prem SQL Server and Azure SQL Managed Instance profile every accessible database (multi_db).
  • Credential configuration now accepts a blank database name (blank = all databases) for mssql.

Multi-database extracts (mssql/multi_db/, new)

  • New pipeline config and SQL extracts that fan out across every online, accessible user database (system DBs excluded), tagging each row with a database_name column.
  • UNION-based dynamic SQL for catalog views (tables, views, columns, routines, indexed views); cursor + USE for database-scoped DMVs (db_sizes, table_sizes) that can't use three-part naming.
  • All extracts and DDL gain a DATABASE_NAME column. The original config moves to mssql/single_db/.

Linked issues

Resolves #2483

Functionality

  • added relevant user documentation
  • added new CLI command
  • modified existing command: databricks labs lakebridge execute-database-profiler --source-tech mssql
  • ... +add your own

Tests

  • manually tested (Azure SQL)
  • added unit tests
  • added integration tests

Adds single_db and multi_db mssql profiler variants, auto-selected at execute time by probing SERVERPROPERTY('EngineEdition')
When a database is configured for mssql, the profiler scopes to just that database (single_db) on any edition; leaving it blank profiles all databases (multi_db on on-prem / Managed Instance, or the connected database on Azure SQL Database). The configure-time database prompt becomes optional for mssql (blank = all databases); legacy_synapse, which shares the configurator, still requires it.
@m-abulazm m-abulazm force-pushed the feat/profiler/mssql-multidb branch from 881335e to 21177f5 Compare June 26, 2026 10:06
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 84.31373% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.27%. Comparing base (5eb38d1) to head (a13b6b7).

Files with missing lines Patch % Lines
...databricks/labs/lakebridge/assessments/variants.py 88.09% 2 Missing and 3 partials ⚠️
...databricks/labs/lakebridge/assessments/profiler.py 40.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2536      +/-   ##
==========================================
+ Coverage   69.18%   69.27%   +0.08%     
==========================================
  Files         105      106       +1     
  Lines        9503     9548      +45     
  Branches     1052     1060       +8     
==========================================
+ Hits         6575     6614      +39     
- Misses       2731     2734       +3     
- Partials      197      200       +3     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown

❌ 168/169 passed, 1 flaky, 1 failed, 2 skipped, 1h11m7s total

❌ test_recon_for_report_type_is_data: pyspark.errors.exceptions.connect.AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/tmp/42b38c5f19834f07b916a736d4e3b073. SQLSTATE: 42K03 (50.604s)
pyspark.errors.exceptions.connect.AnalysisException: [PATH_NOT_FOUND] Path does not exist: dbfs:/tmp/42b38c5f19834f07b916a736d4e3b073. SQLSTATE: 42K03

JVM stacktrace:
org.apache.spark.sql.AnalysisException
	at org.apache.spark.sql.errors.QueryCompilationErrors$.dataPathNotExistError(QueryCompilationErrors.scala:2748)
	at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:1114)
	at scala.collection.immutable.List.flatMap(List.scala:294)
	at scala.collection.immutable.List.flatMap(List.scala:79)
	at org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:1095)
	at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:715)
	at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:536)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.org$apache$spark$sql$catalyst$analysis$ResolveDataSource$$loadV1BatchSource(ResolveDataSource.scala:317)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.$anonfun$applyOrElse$5(ResolveDataSource.scala:124)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:123)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource$$anonfun$apply$1.applyOrElse(ResolveDataSource.scala:64)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:141)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:121)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:141)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:418)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:137)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:133)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:45)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp(AnalysisHelper.scala:114)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUp$(AnalysisHelper.scala:113)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUp(LogicalPlan.scala:45)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:64)
	at org.apache.spark.sql.catalyst.analysis.ResolveDataSource.apply(ResolveDataSource.scala:62)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$17(RuleExecutor.scala:510)
	at org.apache.spark.sql.catalyst.rules.RecoverableRuleExecutionHelper.processRule(RuleExecutor.scala:664)
	at org.apache.spark.sql.catalyst.rules.RecoverableRuleExecutionHelper.processRule$(RuleExecutor.scala:648)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.processRule(RuleExecutor.scala:144)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$16(RuleExecutor.scala:510)
	at com.databricks.spark.util.MemoryTracker$.withThreadAllocatedBytes(MemoryTracker.scala:51)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.measureRule(QueryPlanningTracker.scala:350)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$15(RuleExecutor.scala:508)
	at com.databricks.spark.util.FrameProfiler$.$anonfun$record$1(FrameProfiler.scala:114)
	at com.databricks.spark.util.FrameProfilerExporter$.maybeExportFrameProfiler(FrameProfilerExporter.scala:198)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:105)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$14(RuleExecutor.scala:507)
	at scala.collection.LinearSeqOps.foldLeft(LinearSeq.scala:183)
	at scala.collection.LinearSeqOps.foldLeft$(LinearSeq.scala:179)
	at scala.collection.immutable.List.foldLeft(List.scala:79)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$13(RuleExecutor.scala:499)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at com.databricks.spark.util.FrameProfiler$.$anonfun$record$1(FrameProfiler.scala:114)
	at com.databricks.spark.util.FrameProfilerExporter$.maybeExportFrameProfiler(FrameProfilerExporter.scala:198)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:105)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:473)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$23(RuleExecutor.scala:620)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$23$adapted(RuleExecutor.scala:620)
	at scala.collection.immutable.List.foreach(List.scala:334)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:620)
	at com.databricks.spark.util.FrameProfiler$.$anonfun$record$1(FrameProfiler.scala:114)
	at com.databricks.spark.util.FrameProfilerExporter$.maybeExportFrameProfiler(FrameProfilerExporter.scala:198)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:105)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:366)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.super$execute(Analyzer.scala:665)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeSameContext$1(Analyzer.scala:665)
	at com.databricks.sql.unity.SAMSnapshotHelper$.visitPlansDuringAnalysis(SAMSnapshotHelper.scala:43)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeSameContext(Analyzer.scala:664)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:637)
	at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:473)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:637)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:554)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:354)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:265)
	at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:354)
	at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.resolveInFixedPoint(HybridAnalyzer.scala:418)
	at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.$anonfun$apply$1(HybridAnalyzer.scala:99)
	at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.withTrackedAnalyzerBridgeState(HybridAnalyzer.scala:136)
	at org.apache.spark.sql.catalyst.analysis.resolver.HybridAnalyzer.apply(HybridAnalyzer.scala:92)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$2(Analyzer.scala:614)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:425)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:614)
	at com.databricks.sql.unity.SAMSnapshotHelper$.visitPlansDuringAnalysis(SAMSnapshotHelper.scala:43)
	at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:603)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$3(QueryExecution.scala:523)
	at com.databricks.spark.util.FrameProfiler$.$anonfun$record$1(FrameProfiler.scala:114)
	at com.databricks.spark.util.FrameProfilerExporter$.maybeExportFrameProfiler(FrameProfilerExporter.scala:198)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:105)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:721)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$8(QueryExecution.scala:1050)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withExecutionPhase$1(SQLExecution.scala:161)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:348)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:344)
	at com.databricks.util.TracingSpanUtils$.$anonfun$withTracing$4(TracingSpanUtils.scala:235)
	at com.databricks.util.TracingSpanUtils$.withTracing(TracingSpanUtils.scala:129)
	at com.databricks.util.TracingSpanUtils$.withTracing(TracingSpanUtils.scala:233)
	at com.databricks.tracing.TracingUtils$.withTracing(TracingUtils.scala:296)
	at com.databricks.spark.util.DatabricksTracingHelper.withSpan(DatabricksSparkTracingHelper.scala:112)
	at com.databricks.spark.util.DBRTracing$.withSpan(DBRTracing.scala:47)
	at org.apache.spark.sql.execution.SQLExecution$.withExecutionPhase(SQLExecution.scala:142)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$7(QueryExecution.scala:1050)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1709)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$5(QueryExecution.scala:1043)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$4(QueryExecution.scala:1040)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$3(QueryExecution.scala:1040)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:1039)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution.withQueryExecutionId(QueryExecution.scala:1027)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:1038)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:860)
	at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:1037)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$2(QueryExecution.scala:515)
	at com.databricks.sql.util.MemoryTrackerHelper.withMemoryTracking(MemoryTrackerHelper.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$lazyAnalyzed$1(QueryExecution.scala:514)
	at scala.util.Try$.apply(Try.scala:217)
	at org.apache.spark.util.Utils$.doTryWithCallerStacktrace(Utils.scala:1749)
	at org.apache.spark.util.Utils$.getTryWithCallerStacktrace(Utils.scala:1810)
	at org.apache.spark.util.LazyTry.get(LazyTry.scala:75)
	at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:564)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:488)
	at org.apache.spark.sql.classic.Dataset$.$anonfun$ofRows$1(Dataset.scala:128)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:860)
	at org.apache.spark.sql.classic.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1157)
	at com.databricks.spark.util.FrameProfiler$.$anonfun$record$1(FrameProfiler.scala:114)
	at com.databricks.spark.util.FrameProfilerExporter$.maybeExportFrameProfiler(FrameProfilerExporter.scala:198)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:105)
	at org.apache.spark.sql.classic.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1157)
	at org.apache.spark.sql.classic.Dataset$.ofRows(Dataset.scala:126)
	at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:170)
	at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:148)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformReadRel(SparkConnectPlanner.scala:2251)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.$anonfun$transformRelation$1(SparkConnectPlanner.scala:234)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$8(SessionHolder.scala:743)
	at org.apache.spark.sql.connect.service.SessionHolder.measureSubtreeRelationNodes(SessionHolder.scala:759)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$usePlanCache$6(SessionHolder.scala:742)
	at scala.Option.getOrElse(Option.scala:201)
	at org.apache.spark.sql.connect.service.SessionHolder.usePlanCache(SessionHolder.scala:740)
	at org.apache.spark.sql.connect.planner.SparkConnectPlanner.transformRelation(SparkConnectPlanner.scala:229)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.transformRelation$1(SparkConnectAnalyzeHandler.scala:103)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.transformRelationPlan$1(SparkConnectAnalyzeHandler.scala:111)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.process(SparkConnectAnalyzeHandler.scala:128)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$3(SparkConnectAnalyzeHandler.scala:86)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$3$adapted(SparkConnectAnalyzeHandler.scala:78)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$2(SessionHolder.scala:536)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:860)
	at org.apache.spark.sql.connect.service.SessionHolder.$anonfun$withSession$1(SessionHolder.scala:536)
	at org.apache.spark.JobArtifactSet$.withActiveJobArtifactState(JobArtifactSet.scala:97)
	at org.apache.spark.sql.artifact.ArtifactManager.$anonfun$withResources$1(ArtifactManager.scala:124)
	at org.apache.spark.sql.artifact.ArtifactManager.withClassLoaderIfNeeded(ArtifactManager.scala:118)
	at org.apache.spark.sql.artifact.ArtifactManager.withResources(ArtifactManager.scala:123)
	at org.apache.spark.sql.connect.service.SessionHolder.withSession(SessionHolder.scala:535)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$1(SparkConnectAnalyzeHandler.scala:78)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.$anonfun$handle$1$adapted(SparkConnectAnalyzeHandler.scala:50)
	at com.databricks.spark.connect.logging.rpc.SparkConnectRpcMetricsCollectorUtils$.collectMetrics(SparkConnectRpcMetricsCollector.scala:279)
	at org.apache.spark.sql.connect.service.SparkConnectAnalyzeHandler.handle(SparkConnectAnalyzeHandler.scala:49)
	at org.apache.spark.sql.connect.service.SparkConnectService.analyzePlan(SparkConnectService.scala:116)
	at org.apache.spark.connect.proto.SparkConnectServiceGrpc$MethodHandlers.invoke(SparkConnectServiceGrpc.java:870)
	at org.sparkproject.connect.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:182)
	at org.sparkproject.connect.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at org.sparkproject.connect.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.$anonfun$onHalfClose$1(AuthenticationInterceptor.scala:419)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
	at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:104)
	at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$4(RequestContext.scala:367)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:80)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:348)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:344)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:78)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:75)
	at com.databricks.spark.util.DatabricksTracingHelper.withAttributionContext(DatabricksSparkTracingHelper.scala:62)
	at com.databricks.spark.util.DatabricksTracingHelper.withSpanFromRequest(DatabricksSparkTracingHelper.scala:89)
	at com.databricks.spark.util.DBRTracing$.withSpanFromRequest(DBRTracing.scala:43)
	at com.databricks.spark.connect.service.RequestContext.runWithSpanFromTags(RequestContext.scala:390)
	at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$3(RequestContext.scala:367)
	at com.databricks.spark.connect.service.RequestContext$.com$databricks$spark$connect$service$RequestContext$$withLocalProperties(RequestContext.scala:602)
	at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$2(RequestContext.scala:366)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:80)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:348)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:59)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:344)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:78)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:75)
	at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:29)
	at com.databricks.spark.util.UniverseAttributionContextWrapper.withValue(AttributionContextUtils.scala:242)
	at com.databricks.spark.connect.service.RequestContext.$anonfun$runWith$1(RequestContext.scala:365)
	at com.databricks.spark.connect.service.RequestContext.withContext(RequestContext.scala:398)
	at com.databricks.spark.connect.service.RequestContext.runWith(RequestContext.scala:358)
	at com.databricks.spark.connect.service.AuthenticationInterceptor$AuthenticatedServerCallListener.onHalfClose(AuthenticationInterceptor.scala:419)
	at org.sparkproject.connect.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
	at org.sparkproject.connect.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
	at org.sparkproject.connect.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:356)
	at org.sparkproject.connect.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:861)
	at org.sparkproject.connect.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at org.sparkproject.connect.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:133)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.$anonfun$run$1(SparkThreadLocalForwardingThreadPoolExecutor.scala:165)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$6(SparkThreadLocalForwardingThreadPoolExecutor.scala:119)
	at com.databricks.sql.transaction.tahoe.mst.MSTThreadHelper$.runWithMstTxnId(MSTThreadHelper.scala:57)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$5(SparkThreadLocalForwardingThreadPoolExecutor.scala:118)
	at com.databricks.spark.util.IdentityClaim$.withClaim(IdentityClaim.scala:48)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.$anonfun$runWithCaptured$4(SparkThreadLocalForwardingThreadPoolExecutor.scala:117)
	at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:51)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:116)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingHelper.runWithCaptured$(SparkThreadLocalForwardingThreadPoolExecutor.scala:93)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.runWithCaptured(SparkThreadLocalForwardingThreadPoolExecutor.scala:162)
	at org.apache.spark.util.threads.SparkThreadLocalCapturingRunnable.run(SparkThreadLocalForwardingThreadPoolExecutor.scala:165)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.lang.Thread.run(Thread.java:840)
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw catalog: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added catalog fixture: CatalogInfo(browse_only=False, catalog_type=<CatalogType.MANAGED_CATALOG: 'MANAGED_CATALOG'>, comment=None, connection_name=None, created_at=1782480789782, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', effective_predictive_optimization_flag=EffectivePredictiveOptimizationFlag(value=<EnablePredictiveOptimization.DISABLE: 'DISABLE'>, inherited_from_name='primary', inherited_from_type=None), enable_predictive_optimization=<EnablePredictiveOptimization.INHERIT: 'INHERIT'>, full_name='dummy_ceop6yclw', isolation_mode=<CatalogIsolationMode.OPEN: 'OPEN'>, metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_ceop6yclw', options=None, owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', properties={'RemoveAfter': '2026062615'}, provider_name=None, provisioning_info=None, securable_type=<SecurableType.CATALOG: 'CATALOG'>, share_name=None, storage_location=None, storage_root=None, updated_at=1782480789782, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e')
13:33 INFO [tests.integration.reconcile.conftest] Created catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw.dummy_swu6rjw2d schema: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added schema fixture: SchemaInfo(browse_only=None, catalog_name='dummy_ceop6yclw', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d', metastore_id=None, name='dummy_swu6rjw2d', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
13:33 INFO [tests.integration.reconcile.conftest] Created schema dummy_swu6rjw2d in catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_swu6rjw2d volume: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added volume fixture: VolumeInfo(access_point=None, browse_only=None, catalog_name='dummy_ceop6yclw', comment=None, created_at=1782480819376, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', encryption_details=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d.dummy_swu6rjw2d', metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_swu6rjw2d', owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', schema_name='dummy_swu6rjw2d', storage_location='abfss://labs-CLOUD_ENV-TEST_CATALOG-container@databrickslabsstorage.dfs.core.windows.net/8952c1e3-b265-4adf-98c3-6f755e2e1453/volumes/9146b2c7-2b7e-4850-ba1b-fba68ce69dad', updated_at=1782480819376, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', volume_id='9146b2c7-2b7e-4850-ba1b-fba68ce69dad', volume_type=<VolumeType.MANAGED: 'MANAGED'>)
[gw1] linux -- Python 3.13.12 /home/runner/work/lakebridge/lakebridge/.venv/bin/python
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw catalog: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added catalog fixture: CatalogInfo(browse_only=False, catalog_type=<CatalogType.MANAGED_CATALOG: 'MANAGED_CATALOG'>, comment=None, connection_name=None, created_at=1782480789782, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', effective_predictive_optimization_flag=EffectivePredictiveOptimizationFlag(value=<EnablePredictiveOptimization.DISABLE: 'DISABLE'>, inherited_from_name='primary', inherited_from_type=None), enable_predictive_optimization=<EnablePredictiveOptimization.INHERIT: 'INHERIT'>, full_name='dummy_ceop6yclw', isolation_mode=<CatalogIsolationMode.OPEN: 'OPEN'>, metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_ceop6yclw', options=None, owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', properties={'RemoveAfter': '2026062615'}, provider_name=None, provisioning_info=None, securable_type=<SecurableType.CATALOG: 'CATALOG'>, share_name=None, storage_location=None, storage_root=None, updated_at=1782480789782, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e')
13:33 INFO [tests.integration.reconcile.conftest] Created catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw.dummy_swu6rjw2d schema: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added schema fixture: SchemaInfo(browse_only=None, catalog_name='dummy_ceop6yclw', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d', metastore_id=None, name='dummy_swu6rjw2d', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
13:33 INFO [tests.integration.reconcile.conftest] Created schema dummy_swu6rjw2d in catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_swu6rjw2d volume: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added volume fixture: VolumeInfo(access_point=None, browse_only=None, catalog_name='dummy_ceop6yclw', comment=None, created_at=1782480819376, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', encryption_details=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d.dummy_swu6rjw2d', metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_swu6rjw2d', owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', schema_name='dummy_swu6rjw2d', storage_location='abfss://labs-CLOUD_ENV-TEST_CATALOG-container@databrickslabsstorage.dfs.core.windows.net/8952c1e3-b265-4adf-98c3-6f755e2e1453/volumes/9146b2c7-2b7e-4850-ba1b-fba68ce69dad', updated_at=1782480819376, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', volume_id='9146b2c7-2b7e-4850-ba1b-fba68ce69dad', volume_type=<VolumeType.MANAGED: 'MANAGED'>)
13:33 INFO [databricks.labs.lakebridge.reconcile.trigger_recon_service] report_type: data, data_source: databricks 
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Running on Databricks check completed with result: False
13:33 INFO [databricks.labs.lakebridge.reconcile.query_builder.hash_query] Hash Query for source: SELECT LOWER(SHA2(TRIM(s_address) || TRIM(s_name) || COALESCE(TRIM(`s_nationkey`), '_null_recon_') || TRIM(s_phone) || COALESCE(TRIM(`s_suppkey`), '_null_recon_'), 256)) AS hash_value_recon, `s_nationkey` AS `s_nationkey`, `s_suppkey` AS `s_suppkey` FROM :tbl WHERE s_name = 't' AND s_address = 'a'
13:33 INFO [databricks.labs.lakebridge.reconcile.query_builder.hash_query] Hash Query for target: SELECT LOWER(SHA2(TRIM(s_address_t) || TRIM(s_name) || COALESCE(TRIM(`s_nationkey_t`), '_null_recon_') || TRIM(s_phone_t) || COALESCE(TRIM(`s_suppkey_t`), '_null_recon_'), 256)) AS hash_value_recon, `s_nationkey_t` AS `s_nationkey`, `s_suppkey_t` AS `s_suppkey` FROM :tbl WHERE s_name = 't' AND s_address_t = 'a'
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Writing DF on parquet to path: /tmp/42b38c5f19834f07b916a736d4e3b073
13:33 INFO [databricks.labs.lakebridge.reconcile.recon_capture] Wrote DF on parquet
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Reading DF on parquet from path: /tmp/42b38c5f19834f07b916a736d4e3b073
13:33 INFO [databricks.labs.lakebridge.reconcile.recon_capture] Read DF on parquet
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw catalog: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added catalog fixture: CatalogInfo(browse_only=False, catalog_type=<CatalogType.MANAGED_CATALOG: 'MANAGED_CATALOG'>, comment=None, connection_name=None, created_at=1782480789782, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', effective_predictive_optimization_flag=EffectivePredictiveOptimizationFlag(value=<EnablePredictiveOptimization.DISABLE: 'DISABLE'>, inherited_from_name='primary', inherited_from_type=None), enable_predictive_optimization=<EnablePredictiveOptimization.INHERIT: 'INHERIT'>, full_name='dummy_ceop6yclw', isolation_mode=<CatalogIsolationMode.OPEN: 'OPEN'>, metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_ceop6yclw', options=None, owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', properties={'RemoveAfter': '2026062615'}, provider_name=None, provisioning_info=None, securable_type=<SecurableType.CATALOG: 'CATALOG'>, share_name=None, storage_location=None, storage_root=None, updated_at=1782480789782, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e')
13:33 INFO [tests.integration.reconcile.conftest] Created catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_ceop6yclw.dummy_swu6rjw2d schema: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added schema fixture: SchemaInfo(browse_only=None, catalog_name='dummy_ceop6yclw', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d', metastore_id=None, name='dummy_swu6rjw2d', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
13:33 INFO [tests.integration.reconcile.conftest] Created schema dummy_swu6rjw2d in catalog dummy_ceop6yclw for recon tests
13:33 INFO [databricks.labs.pytester.fixtures.baseline] Created dummy_swu6rjw2d volume: https://DATABRICKS_HOST/#explore/data/dummy_ceop6yclw/dummy_swu6rjw2d/dummy_swu6rjw2d
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] added volume fixture: VolumeInfo(access_point=None, browse_only=None, catalog_name='dummy_ceop6yclw', comment=None, created_at=1782480819376, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', encryption_details=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d.dummy_swu6rjw2d', metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_swu6rjw2d', owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', schema_name='dummy_swu6rjw2d', storage_location='abfss://labs-CLOUD_ENV-TEST_CATALOG-container@databrickslabsstorage.dfs.core.windows.net/8952c1e3-b265-4adf-98c3-6f755e2e1453/volumes/9146b2c7-2b7e-4850-ba1b-fba68ce69dad', updated_at=1782480819376, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', volume_id='9146b2c7-2b7e-4850-ba1b-fba68ce69dad', volume_type=<VolumeType.MANAGED: 'MANAGED'>)
13:33 INFO [databricks.labs.lakebridge.reconcile.trigger_recon_service] report_type: data, data_source: databricks 
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Running on Databricks check completed with result: False
13:33 INFO [databricks.labs.lakebridge.reconcile.query_builder.hash_query] Hash Query for source: SELECT LOWER(SHA2(TRIM(s_address) || TRIM(s_name) || COALESCE(TRIM(`s_nationkey`), '_null_recon_') || TRIM(s_phone) || COALESCE(TRIM(`s_suppkey`), '_null_recon_'), 256)) AS hash_value_recon, `s_nationkey` AS `s_nationkey`, `s_suppkey` AS `s_suppkey` FROM :tbl WHERE s_name = 't' AND s_address = 'a'
13:33 INFO [databricks.labs.lakebridge.reconcile.query_builder.hash_query] Hash Query for target: SELECT LOWER(SHA2(TRIM(s_address_t) || TRIM(s_name) || COALESCE(TRIM(`s_nationkey_t`), '_null_recon_') || TRIM(s_phone_t) || COALESCE(TRIM(`s_suppkey_t`), '_null_recon_'), 256)) AS hash_value_recon, `s_nationkey_t` AS `s_nationkey`, `s_suppkey_t` AS `s_suppkey` FROM :tbl WHERE s_name = 't' AND s_address_t = 'a'
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Writing DF on parquet to path: /tmp/42b38c5f19834f07b916a736d4e3b073
13:33 INFO [databricks.labs.lakebridge.reconcile.recon_capture] Wrote DF on parquet
13:33 DEBUG [databricks.labs.lakebridge.reconcile.recon_capture] Reading DF on parquet from path: /tmp/42b38c5f19834f07b916a736d4e3b073
13:33 INFO [databricks.labs.lakebridge.reconcile.recon_capture] Read DF on parquet
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] clearing 1 volume fixtures
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] removing volume fixture: VolumeInfo(access_point=None, browse_only=None, catalog_name='dummy_ceop6yclw', comment=None, created_at=1782480819376, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', encryption_details=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d.dummy_swu6rjw2d', metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_swu6rjw2d', owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', schema_name='dummy_swu6rjw2d', storage_location='abfss://labs-CLOUD_ENV-TEST_CATALOG-container@databrickslabsstorage.dfs.core.windows.net/8952c1e3-b265-4adf-98c3-6f755e2e1453/volumes/9146b2c7-2b7e-4850-ba1b-fba68ce69dad', updated_at=1782480819376, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', volume_id='9146b2c7-2b7e-4850-ba1b-fba68ce69dad', volume_type=<VolumeType.MANAGED: 'MANAGED'>)
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] clearing 1 schema fixtures
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] removing schema fixture: SchemaInfo(browse_only=None, catalog_name='dummy_ceop6yclw', catalog_type=None, comment=None, created_at=None, created_by=None, effective_predictive_optimization_flag=None, enable_predictive_optimization=None, full_name='dummy_ceop6yclw.dummy_swu6rjw2d', metastore_id=None, name='dummy_swu6rjw2d', owner=None, properties=None, schema_id=None, storage_location=None, storage_root=None, updated_at=None, updated_by=None)
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] clearing 1 catalog fixtures
13:33 DEBUG [databricks.labs.pytester.fixtures.baseline] removing catalog fixture: CatalogInfo(browse_only=False, catalog_type=<CatalogType.MANAGED_CATALOG: 'MANAGED_CATALOG'>, comment=None, connection_name=None, created_at=1782480789782, created_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', effective_predictive_optimization_flag=EffectivePredictiveOptimizationFlag(value=<EnablePredictiveOptimization.DISABLE: 'DISABLE'>, inherited_from_name='primary', inherited_from_type=None), enable_predictive_optimization=<EnablePredictiveOptimization.INHERIT: 'INHERIT'>, full_name='dummy_ceop6yclw', isolation_mode=<CatalogIsolationMode.OPEN: 'OPEN'>, metastore_id='8952c1e3-b265-4adf-98c3-6f755e2e1453', name='dummy_ceop6yclw', options=None, owner='3fe685a1-96cc-4fec-8cdb-6944f5c9787e', properties={'RemoveAfter': '2026062615'}, provider_name=None, provisioning_info=None, securable_type=<SecurableType.CATALOG: 'CATALOG'>, share_name=None, storage_location=None, storage_root=None, updated_at=1782480789782, updated_by='3fe685a1-96cc-4fec-8cdb-6944f5c9787e')
[gw1] linux -- Python 3.13.12 /home/runner/work/lakebridge/lakebridge/.venv/bin/python

Flaky tests:

  • 🤪 test_recon_redshift_job_succeeds (1m57.927s)

Running from acceptance #4944

@m-abulazm m-abulazm force-pushed the feat/profiler/mssql-multidb branch from d8575e6 to a13b6b7 Compare June 26, 2026 13:29
@m-abulazm m-abulazm changed the title Feat/profiler/mssql multidb Add multi-database profiling support for SQL Server Jun 26, 2026
@m-abulazm m-abulazm self-assigned this Jun 26, 2026

@dgomez04 dgomez04 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's align on the engine-variant vs. database-scope model (See variants.py).

"database": self.prompts.question("Enter the database name"),
# mssql: blank profiles every accessible database (on-prem / Managed Instance); a name scopes
# to that one database. legacy_synapse (shares this configurator) needs the dedicated-pool name.
"database": (

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skips a common middle case: customers want to scope the migration to a subset of databases on a shared instance. A comma-separated allowlist here would be a natural fit.

We can either address the changes in this pull request or open an issue and follow-up.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stepping back on the architecture before we land anything. I don't think single_db vs multi_db is the right axis.

Comparing the two pipeline configurations, they are identical except for 7 of the ~13 extracts. And those that do only differ in database scope.

What EngineEdition actually tells us is the engine variant - each variant differs in system-table surface as we had discussed. That's the thing that should select a query pool / pipeline configuration.

The curent == 5 ? single_db : multi_db check both ignores this axis and mis-fires on it.

Synapse (6), serverless (9), Fabric (12), and SQL Edge (11) all fall into the else and would run the on-prem-style sys.databases + USE + three-part-INFORMATION_SCHEMA SQL, which isn't valid there.

The shape I'd propose:

  • **EngineEdition maps to engine variant, which selects a query pool.** Pick the dialect-appropriate set of extracts, with a safe default for unknown editions instead of assuming on-prem multi database.
  • Database scope is a user input (all or comma-separated list), independent of the enigne.
  • Engine capability decides how a multi-database scope executes: cross-database dynamic SQL on on-prem/MI; per-connection looping (or "scope the connected DB only") on Azure SQL Database, which can't do cross-DB from one connection.

Note. Scoping a single query pool by all/subset needs a way to inject WHERE [name] IN (...) into the SQL, which the type: sql pipeline can't do today, so this likely comes with adding parameter substitution to the pipeline.

return None

if AUTO in spec:
if variant != AUTO:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every normal mssql run reaches here with variant=None, so None != AUTO is true and we emit the warning on each run. The user never passed a variant, so "ignoring" one is confusing.

Suggest guarding the warning: if variant and variant != AUTO: so it only fires when an explicit choice is discarded.

with DatabaseManager("mssql", connect_config) as db_manager:
# SERVERPROPERTY returns sql_variant, which pyodbc cannot fetch (ODBC type -16); CAST to int.
result = db_manager.fetch("SELECT CAST(SERVERPROPERTY('EngineEdition') AS INT) AS engine_edition")
engine_edition = int(result.rows[0][0])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional Improvement: int(result.rows[0][0]) will IndexError if the probe ever returns no rows. Low risk for SERVERPROPERTY, but a guarded failure ("could not determine SQL Server edition; defaulting to ...") reads better than a raw IndexError.

"""
cred_manager = create_credential_manager("mssql", EnvGetter(), creds_path=cred_file_path)
connect_config = cred_manager.get_credentials("mssql")
if connect_config.get("database"):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blank database on Azure SQL Database (EngineEdition == 5) returns single_db, after which MSSQLConnector connects with database=None (the login's default, often master). Is profiling the default/master database the intended result, or should blank+Azure-SQL-DB require a database name?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MSSQL profiler: a single profiler run can only profile one user database

2 participants