[telemetry] Detect Python package manager(s) at project setup#1918
Open
rugpanov wants to merge 1 commit into
Open
[telemetry] Detect Python package manager(s) at project setup#1918rugpanov wants to merge 1 commit into
rugpanov wants to merge 1 commit into
Conversation
Why: We need first-party data on which Python package manager(s) our users' projects actually use (pip/conda/uv/poetry) to prioritize VPEX setup-flow investment, replacing public-survey estimates. Measurement only -- no setup behavior changes. What: - Add packageManagerDetection.ts: a pure, signal-based classifier that reports all applicable managers plus a best-guess primary (uv > poetry > conda > pip), the firing signals, hasLockfile, and interpreter source. Treats bare uv/poetry on PATH as weak signals. - Add Events.PYTHON_ENV_SETUP_DETECTED with a typed, documented schema in telemetry/constants.ts (reuses existing Telemetry client; opt-out honored; categorical data only, no paths/package/cluster names). - Add telemetry/packageManagerExtensions.ts: the emit half, layered onto the Telemetry class via the commandExtensions declare-module pattern (recordPackageManagerDetection). Keeps disk/Python-extension deps out of the Telemetry client. - Add PackageManagerTelemetry.ts: the collection half -- a best-effort, non-blocking collector (disk + already-resolved interpreter metadata) that gathers signals, runs the pure classifier, and calls the emit method. Deduplicated per session on (trigger, projectRoot); failures degrade to unknown and are swallowed. - Wire emission into three touchpoints: project-open env check (auto_open), the set-up-environment command (explicit_command), and first Run/Debug with Databricks Connect (run/debug). - Add unit tests for the detector and pure helpers, and a dashboard-owner handoff note. Detection correctness: - interpreterSource is derived from the active interpreter alone, never from project files: a uv.lock project on a conda/venv/system interpreter reports that interpreter's real source, keeping the setup-flow gap visible. A genuinely uv-provisioned venv is identified by the `uv =` marker in pyvenv.cfg (pure pyvenvCfgMarksUv), not by uv.lock. - conda is attributed only when the active interpreter resides under CONDA_PREFIX (pure interpreterUnderCondaPrefix, with a path-boundary check), not on the bare env var, which is session-global in the extension host (launching from an activated conda shell) and would otherwise over-count conda for uv/poetry/pip projects. - pyproject [tool.uv]/[tool.poetry] detection uses a pure, bounded table-header scan (pyprojectHasToolSection) instead of substring matching: ignores comments and in-value mentions, rejects prefix collisions (e.g. tool.uvicorn), and matches subtable and array-of-table headers (e.g. [tool.uv.sources], [[tool.poetry.source]]) that the substring check missed. - No external executable is run for telemetry: the uv-on-PATH probe was removed (it spawned a PATH-resolved `uv` for a weak, non-attributing signal); detection now only reads disk and already-resolved interpreter metadata. Verification: - yarn run build (typecheck) passes. - eslint clean; prettier formatted. - yarn run test:unit: 228 passing, 0 failing (includes detector + helper tests). Co-authored-by: Isaac
c236f29 to
be9f174
Compare
Contributor
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Changes
Measurement-only telemetry to learn which Python package manager(s) our users' projects actually use (pip / conda / uv / poetry), so the VPEX setup-flow investment can be prioritized from first-party data instead of public-survey estimates. No setup behavior changes — this is detection only.
The work splits cleanly into three layers so each is independently testable and the dependency direction stays correct (high-level → low-level):
packageManagerDetection.ts): given a set of already-collected signals, reports every applicable manager, a best-guess primary (priorityuv > poetry > conda > pip), the firing signals,hasLockfile, and interpreter source. Side-effect free and total.telemetry/packageManagerExtensions.ts): addsrecordPackageManagerDetectionto the existingTelemetryclass via the samedeclare modulepattern ascommandExtensions.ts. Keeps disk/Python-extension dependencies out of the telemetry client.PackageManagerTelemetry.ts): a best-effort, non-blocking collector that reads disk and already-resolved interpreter metadata, runs the pure classifier, and calls the emit method. Deduplicated per session on(trigger, projectRoot); any failure degrades tounknownand is swallowed so it never disrupts setup.Emission is wired into three setup touchpoints: project-open environment check (
auto_open), the set-up-environment command (explicit_command), and first Run/Debug with Databricks Connect (run/debug).A new
Events.PYTHON_ENV_SETUP_DETECTEDevent carries a typed, documented schema (reuses the existing telemetry transport; opt-out honored; categorical data only — no paths, package names, or cluster names). A handoff note for the analytics/dashboard owner is included atsrc/telemetry/PACKAGE_MANAGER_DETECTION.md.Detection correctness (the parts most worth reviewing):
interpreterSourceis derived from the active interpreter alone, never from project files. Auv.lockproject running a conda/venv/system interpreter reports that interpreter's real source, keeping the "uv project, interpreter not uv-managed yet" setup-flow gap visible. A genuinely uv-provisioned venv is identified by theuv =marker inpyvenv.cfg, not byuv.lock.CONDA_PREFIX(path-boundary checked), not on the bare env var — which is session-global in the extension host (launching VS Code from an activated conda shell) and would otherwise over-count conda for uv/poetry/pip projects.pyproject[tool.uv]/[tool.poetry]detection uses a bounded table-header scan, not substring matching: ignores comments and in-value mentions, rejects prefix collisions (e.g.tool.uvicorn), and matches subtable and array-of-table headers ([tool.uv.sources],[[tool.poetry.source]]).uvfor a weak, non-attributing signal). Detection reads only disk and already-resolved interpreter metadata.Scope / privacy: measurement only — no changes to setup behavior (the VPEX flows are a separate effort). Only enum/categorical data and a closed set of signal identifiers are emitted; the existing telemetry opt-out (
telemetry.telemetryLevel) is respected by the transport.Tests
yarn run test:unit: 202 passing, 0 failing — includes the pure classifier (each manager, interpreter sources, overlaps like uv+pip / conda+pip / poetry+uv, weak signals, none) and pure helpers (pyprojectHasToolSection,pyvenvCfgMarksUv,interpreterUnderCondaPrefix), covering the conda-prefix boundary and shell-global false-positive cases.yarn run build(typecheck) passes.eslintclean;prettierformatted.Reviewer can validate with: