Add databricks doctor command#4730
Conversation
Integration test reportCommit: 1ac3fb3
25 interesting tests: 15 SKIP, 7 RECOVERED, 3 flaky
Top 24 slowest tests (at least 2 minutes):
|
shreyas-goenka
left a comment
There was a problem hiding this comment.
Note: This review was posted by Claude (AI assistant). Shreyas will do a separate, more thorough review pass.
Priority: HIGH — Config resolution diverges from real CLI auth path
MAJOR: resolveConfig diverges from real CLI auth
The resolveConfig function in databricks doctor constructs its own config resolution path instead of going through the standard SDK/CLI authentication flow. This means the doctor command could report "config is fine" while the real CLI fails (or vice versa). If the goal is to diagnose auth issues, it should use the same code path the CLI uses.
MEDIUM: Network check bypasses SDK HTTP client
The connectivity check uses http.DefaultClient directly instead of going through the SDK's configured HTTP client. In enterprise environments with proxies or custom TLS, this will give misleading results — the check might fail even though the SDK would succeed (or vice versa).
Other Observations
- Good idea for a diagnostic command overall
- The step-by-step output format is user-friendly
- Missing test coverage for the core diagnostic logic
4f9375e to
b2e9d7a
Compare
Approval status: pending
|
Adds a new top-level 'databricks doctor' command that runs diagnostic checks against the user's CLI setup and reports a checklist of results (text) or a JSON array. Checks: - CLI version - Updates (queries the GitHub releases API, skipped for dev builds) - Toolchain (git, python3, uv, terraform versions) - Proxy/TLS environment variables (HTTPS_PROXY, NO_PROXY, etc., with credentials masked) - Log file path - Config file readability and profile count - Current profile source - Authentication validity and auth type - Identity via CurrentUser.Me (skipped for account-level profiles) - Network reachability Design follows go-code-structure.md: check functions take context and primitives, Cobra RunE is a thin adapter, rendering is a pure function, and external dependencies (exec, HTTP client) are injected for tests. SPOG / unified-host account profiles are correctly classified via the existing auth.ResolveConfigType helper, so the SDK's ConfigType() returning InvalidConfig for those hosts no longer causes a misclassification. Also fixes a latent bug in libs/env/loader.go: Set was replaced with SetS so that non-string env-backed config attributes (e.g. ints, bools) are parsed correctly. Co-authored-by: Isaac
- Reword isAccountLevelConfig godoc to "can target" so callers don't read it as "account-exclusive". - Drop the withCheckTimeout wrapper and inline context.WithTimeout at call sites; the wrapper was not adding readability. - Move config-resolution error handling out of checkAuth and into runChecks, so checkAuth is only responsible for authenticating a resolved config. Co-authored-by: Isaac
… simonfaltum/doctor-rebuild
WithProfiler was removed as dead code on main (PR #4974). Match the rest of the doctor command by injecting the profiler directly into checkConfigFile, the same way exec and http.Client are injected into the other checks.
…mmand # Conflicts: # cmd/experimental/experimental.go
Two changes landed on main that broke the doctor branch: 1. Experimental_IsUnifiedHost was removed from the SDK Config (#5047). Update the doctor tests to express unified-host detection via DiscoveryURL containing /oidc/accounts/, matching the new auth.IsSPOG / HasUnifiedHostSignal mechanism. 2. fatih/color was replaced with in-tree cmdio ANSI helpers (#5178). Migrate doctor renderText to use cmdio.Red/Green/Yellow/Cyan/Bold. Adds a Bold helper to libs/cmdio/color.go since the existing migration didn't need bold but the doctor output does. Co-authored-by: Isaac
…mmand # Conflicts: # cmd/experimental/experimental.go # libs/cmdio/color.go
Post-merge fixes after merging origin/main: - CurrentUser.Me now takes iam.MeRequest - replace errors.As with errors.AsType (forbidigo) - regenerate out.test.toml in the new inline EnvMatrix format Address review feedback on config resolution: resolveConfig now applies the same pre-resolution steps as the real CLI auth path in cmd/root (DATABRICKS_HOST query param normalization and the [__settings__].default_profile setting), with tests covering both. The network check helpers document how their transport mirrors the SDK's selection and why the full ApiClient is not used for the probe. Co-authored-by: Isaac
|
Addressed the review feedback and refreshed the PR on top of current main (merge, no rebase). Summary of where each point landed, as of 0c382dc: resolveConfig diverges from real CLI auth path (major): Network check bypasses SDK HTTP client (medium): the check no longer uses a bare Missing test coverage: the command now has unit tests for every check ( Also fixed post-merge breakage from main: |
Why
Users debugging CLI setup issues (auth failures, config problems, network issues, proxy/TLS, missing toolchain) have no single command to diagnose their environment. They must manually run separate commands to check each layer.
Changes
Before: Users had to manually check auth, config, connectivity, and toolchain separately.
Now: A new experimental command
databricks experimental doctorruns all diagnostic checks and reports results as a checklist:git,python3,uv,terraform; missing binaries are flaggednot found, tools that ran and errored are flaggedexited N/error: <msg>HTTPS_PROXY,NO_PROXY,SSL_CERT_FILE,REQUESTS_CA_BUNDLE, etc. are set; userinfo in proxy URLs is replaced with***(including token-only forms likeTOKEN@proxy)DATABRICKS_LOG_FILE)CurrentUser.Me; account-level profiles callWorkspaces.Listso invalid account creds fail instead of being skippedText output uses colored status icons. JSON output (
--output json) returns a structured array. Auth failures are reported as check results, not command errors.Command placement
The command is registered under the
experimentalsubtree, so it is accessed as:It is hidden from top-level help, following the convention used by
experimental aitools. Source lives underexperimental/doctor/cmd/.Example output
Failure mode (no profile, missing credentials):
JSON output (
--output json) emits the same checks as a structured array.Key design decisions
go-code-structure.md, check functions takecontext.Contextand primitives (profile string, fromFlag bool), not*cobra.Command. The CobraRunEis a thin adapter. Rendering is a purerender(w, results, outputType)function. External dependencies (exec for toolchain, HTTP client for updates) are injected, so tests don't shell out or hit the network.Workspaces.Listcall, matching the patternauth describeuses, so invalid account PAT/OAuth tokens fail the identity check rather than being silently skipped.maskProxyValuereplaces the full userinfo segment with***whenever present, covering bothuser:pass@hostand token-onlyTOKEN@hostURLs.exec.ErrNotFoundrenders asnot found(install it),*exec.ExitErrorrenders asexited N(it's broken), other errors render with the message.auth.ResolveConfigTypehelper rather than rolling its own so SPOG profiles are classified correctly (the SDK's ownConfigType()returnsInvalidConfigfor them in v0.127.0+).libs/env/loader.go:Set→SetSso non-string env-backed config attributes (ints, bools) parse correctly.Changelog
No
NEXT_CHANGELOG.mdentry: experimental commands are omitted from release notes until they graduate out ofexperimental. The PR template makes NEXT_CHANGELOG additions opt-in.Open item
experimentaland so no longer collides with the auto-generated command namespace.Test plan
Workspaces.List, fail on 401)acceptance/cmd/experimental/doctor/(toolchain line masked for machine-independent output)make lintfullpassesmake checkspassesmake fmtfullclean