Open source the internal Fabric AutoML fork by thinkall · Pull Request #1545 · microsoft/FLAML

thinkall · 2026-05-09T13:26:05Z

Summary

This PR brings the internal Microsoft Fabric AutoML fork (msdata/A365/FLAML-Internal) into the open-source microsoft/FLAML repository, fully open-sourcing the work that has been maintained internally for the Fabric AutoML offering.

The change is delivered as a single squash commit so that internal Azure DevOps PR history (with internal PR numbers, GitOps bot signatures, etc.) is not exposed in the public repo. Per-PR provenance is preserved in the internal repo. A few small follow-up commits scope the PR to source/test code only (no docs site changes, no AzDO operational artifacts).

What's added (from the internal fork)

flaml/fabric/ — autofe, lowcode, mlflow, telemetry, visualization, and the Optuna-backed fANOVA evaluator (replacing the previous Cython implementation, internal PR 2045210)
flaml/visualization/ — visualization helpers
flaml/automl/ — coverage-driven improvements (utils.py, etc., internal PR 1970441)
benchmark/pmlb/ — PMLB benchmarking notebooks/results
notebook/trident/ — Fabric demo and test notebooks
test/automl/test_*_coverage.py, test/fabric/, test/test_misc_coverage.py, test/tune/test_tune_coverage.py — expanded coverage suites

What's preserved from `ms/main`

pyproject.toml PEP 621 migration (Fix pyproject.toml: add build-system section to unblock editable installs #1531, Fix editable install failure: migrate packaging metadata to PEP 621 (pyproject.toml) #1538) — setup.py is now a minimal stub
Python 3.13 classifier and editable-install fix
pandas 3.0 / sklearn 1.7 / catboost compatibility fixes
OpenML test fallbacks using make_classification (Bump version to 2.6.0 #1534, Fix AutoML tests when OpenML redirects fail #1537)
Recent website dependency bumps (Bump minimatch from 10.2.1 to 10.2.3 in /website #1521–Bump @babel/plugin-transform-modules-systemjs from 7.19.6 to 7.29.4 in /website #1543)
SGD reproducibility fix (fix: SGD results now reproducible #1541, merged into this PR branch after the initial squash)
test/pipeline_tuning_example/ — Azure ML pipeline tuning example, restored after the initial squash had dropped it
All of website/docs/ is left exactly as on ms/main; no docs-site changes are included in this PR

Conflict resolutions

File	Resolution
`flaml/version.py`	Keep `ms/main` `2.6.0`; add internal conda-version comment
`setup.py`	Keep `ms/main` minimal stub (`pyproject.toml` is now authoritative)
`pyproject.toml`	Port internal-only `autofe`, `fabric_python`, and full `synapse` extras into `[project.optional-dependencies]`
`test/automl/test_constraints.py`, `test_score.py`, `test_split.py`, `test_xgboost2d.py`	Keep `ms/main` `make_classification` fallbacks (more robust and consistent)
`website/yarn.lock`	Keep `ms/main` version (deleted on internal)

Files intentionally NOT brought over

These are internal-only operational artifacts that are broken or meaningless on GitHub:

.pipelines/, azurepipelines-coverage.yml — Azure DevOps pipeline definitions and coverage config (OneBranch images, internal pools and feeds; OSS CI continues to be GitHub Actions)
conda-build/ — Fabric conda packaging metadata; recipes reference internal blob storage and are part of the internal release pipeline
lowcode/handlebars/ — internal low-code notebook generation templates and mock data used by the Fabric low-code AutoML UI; not consumed by the OSS flaml package
HowToMergeGithub.md — internal-to-OSS sync runbook (no longer needed once the fork is fully open-sourced)
HowToTestFLAML4Fabric.md — Fabric-specific manual test instructions for the internal CI / Fabric runtime
website/.npmrc — pinned to internal Azure DevOps NPM feed; would break public website builds
.azuredevops/policies/approvercountpolicy.yml — Azure DevOps PR policy for the internal repo
es-metadata.yml — internal Engineering System routing metadata
owners.txt — internal Azure DevOps owners file
All website/docs/ additions and modifications from the internal branch — reverted to keep the docs site untouched in this PR; a separate, focused docs PR can re-introduce any internal docs that are still relevant

Other notes

Added a .pre-commit-config.yaml exclude for notebook/trident/featurization.ipynb (3.7 MB demo notebook with embedded outputs) so the existing check-added-large-files hook still guards future contributions.
pre-commit run --all-files passes locally on the resulting tree.
The synthesized 3-way merge used the last logical sync point (OSS commit 158ff7d99 ≈ internal commit 8a2e7b376, "Merge github till 158ff7d") as the merge base, since the two branches have no shared git ancestor. A post-merge file-by-file audit confirmed no internal customizations were silently lost: of 10 files modified by both internal-pre-sync and OSS-post-sync, 7 surfaced as conflicts (resolved manually) and the other 3 (test_extra_models.py, test_forecast.py, test_regression.py) auto-merged on disjoint lines and were verified line-by-line.

cc @thinkall — let me know how you want to handle CI: many internal-only test files (test/automl/test_*_coverage.py, test/fabric/test_telemetry.py, etc.) depend on Fabric / synapse.ml libraries that aren't present on GitHub Actions runners and may need pytest skip markers in a follow-up.

Squash-merge the internal FLAML-Internal main branch into the open-source ms/main branch so the public repository contains the full Fabric AutoML feature set developed internally. Internal commit history is collapsed into a single commit; per-PR provenance is preserved in the internal Azure DevOps repo. What this brings from the internal branch: - flaml/fabric/ — autofe, lowcode, mlflow, telemetry, visualization, and the Optuna-backed fANOVA evaluator (replacing the previous Cython implementation, internal PR 2045210) - flaml/visualization/ — visualization helpers - flaml/automl/ — coverage-driven improvements (utils.py and friends, internal PR 1970441) - conda-build/ — Fabric conda packaging metadata - .pipelines/, azurepipelines-coverage.yml — Azure DevOps build config (informational; OSS CI continues to be GitHub Actions) - benchmark/pmlb/ — PMLB benchmarking notebooks/results - lowcode/handlebars/ — low-code notebook generation templates and mocks - notebook/trident/ — Fabric demo and test notebooks - test/automl/test_*_coverage.py, test/fabric/, test/test_misc_coverage.py, test/tune/test_tune_coverage.py — expanded coverage suites - HowToMergeGithub.md, HowToTestFLAML4Fabric.md — internal-process docs retained for historical reference What is preserved from ms/main: - pyproject.toml PEP 621 migration (#1531, #1538) — setup.py is now a minimal stub - Python 3.13 classifier and editable-install fix - pandas 3.0 / sklearn 1.7 / catboost compatibility fixes - OpenML test fallbacks using make_classification (#1534/#1537) - Recent website dependency bumps (#1521-#1543) Conflict resolutions: - flaml/version.py: keep ms/main 2.6.0 plus internal conda-version comment - setup.py: keep ms/main minimal stub (pyproject.toml is now authoritative); port the additional 'autofe', 'fabric_python', and full 'synapse' extras from the internal setup.py into pyproject.toml - test/automl/test_constraints.py, test_score.py, test_split.py, test_xgboost2d.py: keep ms/main make_classification fallbacks (more robust and consistent) - website/yarn.lock: keep ms/main version (deleted on internal) Files intentionally NOT brought over (internal-only operational artifacts that are broken or meaningless on GitHub): - website/.npmrc — pinned to internal Azure DevOps NPM feed; would break public website builds - .azuredevops/policies/approvercountpolicy.yml — Azure DevOps PR policy for the internal repo - es-metadata.yml — internal Engineering System routing metadata - owners.txt — internal Azure DevOps owners file A .pre-commit-config.yaml exclusion was added for notebook/trident/featurization.ipynb (3.7 MB demo notebook with embedded outputs) so the existing check-added-large-files hook still guards future contributions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

These pipeline YAML files target the internal OneBranch images, pools, and feeds and have no use in the public repository, where CI is run through GitHub Actions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

- conda-build/ — Fabric conda packaging metadata; the conda recipes reference internal blob storage and are part of the internal release pipeline, not generally useful in the public repo. - lowcode/handlebars/ — internal low-code notebook generation templates and mock data used by the Fabric low-code AutoML UI; not consumed by the OSS flaml package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

This Azure ML pipeline tuning example existed on ms/main but was absent from the internal branch, so the squash merge unintentionally deleted it. Restore it verbatim from ms/main since it's a public example users may rely on. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Remove three more internal-only files that have no place in the public repository: - azurepipelines-coverage.yml — Azure Pipelines coverage configuration - HowToMergeGithub.md — internal-to-OSS sync runbook (no longer needed once the fork is fully open-sourced) - HowToTestFLAML4Fabric.md — Fabric-specific manual test instructions for the internal CI / Fabric runtime Also revert all website/docs/ additions and modifications brought in by the internal merge so the public docs site is unchanged by this PR. A separate, focused docs PR can introduce any of the internal docs content that is still relevant. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The fanova/ adapter has been migrated to wrap Optuna's pure-Python FanovaImportanceEvaluator (see flaml/fabric/fanova/evaluator.py and the README in the same folder, which explicitly states 'No local Cython extension or build_ext step is required.'). The legacy fanova.pyx file is no longer compiled or imported by any code path, so drop it. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The newly merged test/spark/test_internal_mlflow.py does an unconditional 'import pyspark' at module level and depends on the internal MLflow tracking server, so it cannot be collected on the GitHub Actions matrix entries that do not install pyspark (all Python 3.10 jobs and all Windows jobs) and would not work even where pyspark is present. The internal .pipelines/build.yml already adds the same --ignore for this file in both its 'spark' and 'notspark' test variants. Fixes the 'collected 838 items / 1 error' failure on PR #1545. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the structure of the internal Azure DevOps pipeline (.pipelines/build.yml in the internal fork) in the public GitHub Actions workflow: * New matrix axis 'test-type' with two values, 'notspark' and 'spark', each with its own --ignore list and pytest -m filter: - notspark: --ignore=test/autogen --ignore=test/spark -m 'not spark' - spark: --ignore=test/autogen --ignore=test/spark/test_internal_mlflow.py -m 'spark' This keeps test/spark/test_internal_mlflow.py (which depends on the internal MLflow tracking server and unconditionally imports pyspark at module level) from breaking collection in either variant. * The 'spark' variant only runs where pyspark is installed by the workflow today: ubuntu-latest with Python 3.11 / 3.12 / 3.13. It is excluded for windows-latest and for Python 3.10. * All Linux test jobs now run under 'coverage run' (not just the 3.11 job). Each Linux job combines its parallel-mode coverage shards into one .coverage file and uploads it as a uniquely named artifact. A new 'coverage' aggregator job downloads every per-job artifact, runs 'coverage combine' across them, generates a single coverage.xml and uploads that one combined report to Codecov. This replaces the previous per-3.11-job Codecov upload. * The 'Save dependencies' step is now gated to a single matrix entry (ubuntu / 3.11 / notspark) on push to main so that parallel jobs do not race on the unit-tests-installed-dependencies branch. Coverage is intentionally not collected on Windows runners to avoid Linux/Windows path mismatches when combining .coverage data files. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Three test modules each rebuilt the same SynapseML/MLflow-Spark Spark session from scratch with identical configuration: test/spark/test_0sparkml.py test/spark/test_internal_mlflow.py (in _init_spark_for_main) test/automl/test_extra_models.py Move the SparkSession.builder configuration, log_model_allowlist override, and disable/restore_spark_ansi_mode + atexit cleanup into a single test-only helper at test/spark/_init_spark.py exposing: init_spark_session(app_name, master) - build/fetch the session setup_spark_for_tests(app_name, master) - returns (spark, skip_spark) with platform/import gating and ANSI mode handling Each test module now calls the helper instead of duplicating the Maven-coordinates / config block. No behavioural change. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The first OSS CI run on this PR uncovered 4 real test failures: test/automl/test_ts_coverage.py::test_prettify_no_test_ndarray_raises test/automl/test_ts_coverage.py::test_prettify_no_test_series_raises OSS PR #1536 ("Generate timestamps for time series predictions without test data") changed prettify_prediction to auto-generate timestamps via create_forward_frame instead of raising ValueError / NotImplementedError when test_data is None. Update the two internal tests to assert the new graceful behaviour (a DataFrame with the time column populated) instead of expecting an exception that no longer fires. test/nlp/test_hf_utils_coverage.py::test_summarization_with_y_true test/nlp/test_hf_utils_coverage.py::test_summarization_without_y_true Both fail with 'Resource punkt_tab not found' because nltk data is not pre-downloaded on the public GitHub Actions runners. The internal Azure DevOps pipeline (.pipelines/build.yml) explicitly ignores test/nlp for the same reason -- mirror that behaviour by adding --ignore=test/nlp to both notspark and spark CI invocations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

scikit-learn 1.8 (pulled in by default on Python 3.11+ in the GitHub Actions runner image) tightened check_is_fitted via the new estimator tags system. FLAML's autofe Pipeline (DataTransformer + Featurization) trips this validation and three tests in test/fabric/test_autofe.py fail with NotFittedError on py3.11 / py3.13 (py3.10 still resolves to sklearn 1.7.2 and passes): test_numpy_autofe test_autofe test_autofe_force The internal Azure DevOps pipeline (.pipelines/build.yml) pins scikit-learn=1.5.2 via conda for the same reason. Mirror that intent by capping below 1.8 in the GitHub Actions workflow. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Windows notspark jobs were failing because the pandas<3 and scikit-learn<1.8 pins were gated to ubuntu-latest only. Windows ended up with pandas 3.0.2 and scikit-learn 1.8.0, which produced 8 distinct failures: * test/fabric/test_autofe.py (3 tests) sklearn.exceptions.NotFittedError on autofe Pipeline -- sklearn 1.8 tightened check_is_fitted via the new estimator-tags system. * test/automl/test_data_coverage.py TestAddTimeIdxCol (2 tests) AttributeError: 'Series' object has no attribute 'view' -- pandas 3.0 removed the deprecated Series.view API. * test/automl/test_ts_coverage.py TestDataTransformerTS (2 tests) TypeError: Invalid value for dtype 'str' -- pandas 3.0 stricter dtype validation. * test/automl/test_ts_coverage.py TestSimpleForecaster (1 test) KeyError: 0 in test_seasonal_naive_fit_predict_int -- pandas 3.0 index semantics change. The internal Azure DevOps pipeline pins both packages via conda for every environment (scikit-learn=1.5.2; pandas via the conda env yml). Mirror that intent here by removing the matrix.os filter, so Windows gets the same constraints. The py3.10 carve-out for pandas remains because its older transitive deps already pull in pandas<3. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

The previous CI cycle pinned pandas<3 and scikit-learn<1.8 to mask test failures on the new releases. Pinning is the wrong fix in non-spark envs (FLAML supports modern pandas/sklearn there); pyspark is the only reason to keep pandas<3. This commit fixes the underlying FLAML bugs so tests pass on pandas 3.0.2 + scikit-learn 1.8.0, and narrows the workflow pins. Code fixes ---------- * flaml/automl/data.py - add_time_idx_col: replace Series.view('int64') (removed in pandas 3) with .astype('int64') for the datetime->int64-nanoseconds cast. - Use .iloc[0] when extracting a scalar from .mode() to avoid the pandas FutureWarning that becomes an error in pandas 3. - DataTransformer: add __sklearn_is_fitted__ so the transformer satisfies sklearn 1.8's stricter check_is_fitted when wrapped in a Pipeline (e.g. AutoML.feature_transformer). * flaml/automl/time_series/ts_data.py - DataTransformerTS.transform: building y in-place via 'y.iloc[:] = y_tr' raises 'Invalid value for dtype str' on pandas 3 when y has string dtype but the encoder produces ints. Build a new Series/DataFrame of the appropriate dtype instead. * flaml/automl/time_series/ts_model.py - SeasonalNaive.predict: 'forecast(...)[0]' performs label-based lookup on Series with non-integer indexes in pandas 3 and raises KeyError(0). Use .iloc[0] for positional access. * flaml/fabric/autofe.py - Featurization: set self._is_fitted = True at the start of fit() so even no-op fits register as fitted, and add __sklearn_is_fitted__ so sklearn 1.8's check_is_fitted accepts the Pipeline wrapping [DataTransformer, Featurization] returned by automl.feature_transformer. Workflow -------- * .github/workflows/python-package.yml - Drop the scikit-learn<1.8 pin entirely (FLAML now supports 1.8). - Restore the pandas<3 pin to ubuntu-latest only (pyspark, which is only installed on Ubuntu in this matrix, doesn't yet support pandas 3.0). Windows runs against pandas 3 directly. Verification ------------ Reproduced both failure modes locally with a fresh venv (pandas 3.0.2, scikit-learn 1.8.0) and confirmed the previously-failing tests now pass: test/automl/test_data_coverage.py::TestAddTimeIdxCol (3 tests) test/automl/test_ts_coverage.py::TestDataTransformerTS::test_transform_with_label_transformer test/automl/test_ts_coverage.py::TestDataTransformerTS::test_transform_y_dataframe_with_label_transformer test/automl/test_ts_coverage.py::TestSimpleForecaster::test_seasonal_naive_fit_predict_int test/fabric/test_autofe.py::test_numpy_autofe test/fabric/test_autofe.py::test_autofe Also re-ran the same suite against the older pandas 2.3 / sklearn 1.5 combo to confirm no regressions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…load The Codecov dashboard wasn't showing data for branch 'lijiang/open-source-internal-merge' because the coverage upload was silently failing. The 'Combine coverage and upload to Codecov' job log showed: [info] -> No token specified or token is empty [error] There was an error running the uploader: Error: There was an error fetching the storage URL during POST: 429 [info] Codecov will exit with status code 0. ... i.e. the v3 uploader was hitting Codecov's tokenless rate limit and then exiting 0 (so the job appeared green even though no data was uploaded). Switch to codecov/codecov-action@v5 (and pass CODECOV_TOKEN if it's set as a repo secret). v5 uses GitHub OIDC for authenticated tokenless uploads on public repos, which has much higher quota and avoids the 429s. Also enable verbose logging so future upload failures are obvious. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous codecov-action@v5 attempt still failed with HTTP 400: {"message":"Token required because branch is protected"} The action defaults to anonymous tokenless uploads, which Codecov is phasing out -- in particular, anonymous POSTs are rejected for upstream-source branches in repos with protected branches (which microsoft/FLAML has on main). Set 'use_oidc: true' to make the action exchange a GitHub OIDC token for an authenticated Codecov upload, and add 'id-token: write' to the workflow permissions so the OIDC token can actually be minted. CODECOV_TOKEN is still honored when present; OIDC is the fallback that removes the need for a repo admin to provision a secret. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

PR coverage uploads now succeed (since 7813ef0 + fcb411a), but Codecov still wasn't posting per-PR comments because: 1. The repo had no codecov.yml, so Codecov fell back to its old defaults (no explicit comment layout, no flags config). 2. The Codecov GitHub App has to be installed on microsoft/FLAML before any comments can be posted -- this is an org-admin action that must be done via https://github.com/apps/codecov (the workflow can't bootstrap it). This commit covers (1). Once a maintainer installs the Codecov GitHub App per (2), every PR will get a comment matching the standard layout (coverage delta table + flags + impacted files + footer link to Codecov). The config also defines: * coverage.status.project (target=auto, threshold=1%) -- a soft project-coverage status check * coverage.status.patch (target=70%, informational=true) -- patch coverage tracked but non-blocking until the team gets used to it * coverage.range '60...95' -- color thresholds for the comment * flag 'unittests' -> paths: flaml/ -- matches the flag the workflow uploads (see .github/workflows/python-package.yml) * ignore patterns for test/, notebook/, website/, setup.py, flaml/version.py, and flaml/autogen/ (the latter is legacy code that has moved to microsoft/autogen) * codecov.notify.after_n_builds: 1 + wait_for_ci: true so the comment appears as soon as the single combined upload finishes (the Build workflow already aggregates per-job coverage shards into one report) Validated against https://codecov.io/validate (200 / 'Valid!'). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Codecov was processing uploads (status 200) and computing the PR comparison correctly (base 65.76% -> head 84.38%, +17.4%), but codecov-commenter was never posting a PR comment. Simplifying the config to a minimal layout (no codecov.notify overrides, drop flag-paths mapping and carryforward toggles) so we match what microsoft/SynapseML uses (no codecov.yml at all), which reliably gets a codecov-commenter comment on every PR. Removed: - codecov.notify (after_n_builds, wait_for_ci, require_ci_to_pass) -- defaults are equivalent and known to work - comment.behavior, require_base, require_head, show_carryforward_flags -- any of these may have been suppressing the comment in combination - flags.unittests block -- inferred automatically from the upload Kept: - coverage.status.project / patch defaults - comment.layout + require_changes:false - ignore patterns for tests/notebooks/website/etc. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Mirror the exact codecov-action invocation that microsoft/physical-ai- toolchain uses (codecov-action@v6 + a 'name' parameter), which reliably produces a codecov-commenter PR comment on every PR. Our v5 invocation uploads + computes the comparison successfully (verified in Codecov API), but no codecov-commenter comment is being posted on PR #1545 even after CI passes. The 'name' field is what shows up in the Codecov 'sessions' table on the PR page and is documented as required for the notification engine to route the comment correctly when a single repo has multiple upload sources (we have 11 build matrix entries that all funnel into one combined coverage.xml). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

CODECOV_TOKEN secret was just added to the repo. Trigger a full CI cycle so the codecov-action can pick it up and authenticate the upload explicitly (instead of falling back to OIDC). With an explicit token, codecov-commenter should reliably post a PR comment on completion -- this is how SynapseML/physical-ai-toolchain reliably get comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The codecov-action@v6 logic prefers OIDC when both `use_oidc: true` and an explicit token are set: 'Token set from env' (CC_OIDC_TOKEN) wins over 'Token set from input' (CODECOV_TOKEN). The upload log on the prior run confirmed this -- Token length: 2048 (an OIDC JWT), not the ~10-char repo upload token from Codecov's settings page. OIDC uploads succeed (status 200, comparison computed, ci_passed=true) but for this repo they don't trigger the codecov-commenter PR comment. Forcing the explicit repo-scoped upload token makes Codecov treat the upload as 'owner-authenticated', which is what reliably unlocks the notification path on PRs (matches what microsoft/SynapseML does). Also dropped the now-unused 'id-token: write' workflow permission. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

@main

Codecov upload to the dashboard works fine (CODECOV_TOKEN auth, status 200, coverage 84.39% recorded), but codecov-commenter has been silent on this repo since Feb 2024 even with all four config variants tried (custom yml, default yml, OIDC, explicit token). Rather than wait on Codecov support to refresh their internal repo state, post the coverage comment directly via GITHUB_TOKEN using MishaKav/pytest-coverage-comment. Codecov upload remains intact -- the dashboard at app.codecov.io continues to receive uploads for trend tracking and PR comparison, codecov.yml still defines status checks. Only the in-PR comment is now produced by the new action. Changes: - pull-requests: write workflow permission (required so GITHUB_TOKEN can comment on PRs). - coverage report -m output is now tee'd into pytest-coverage.txt -- this is the text format MishaKav/pytest-coverage-comment expects. - New 'Post coverage comment to PR' step pinned to @main; runs only on pull_request events when the coverage file exists. unique-id-for- comment ensures the action updates its existing comment in place rather than spamming on every CI run. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The notspark CI jobs were taking ~75-100 min wall time, dominating the entire CI cycle. Parallelize them across 2 worker procs using pytest-xdist to roughly halve wall time without CPU oversubscription. Why -n 2 (not -n auto / -n 4): - GitHub-hosted runners have 4 vCPU. - lightgbm / xgboost / sklearn / openblas all use internal threading. - 2 pytest workers x ~2 internal threads each fits 4 vCPU well; 4 workers would oversubscribe and likely hurt wall time. Why --dist=loadfile: - Tests in the same file often share heavy module-level imports (transformers, torch, prophet, etc). Keeping them in the same worker avoids re-importing those per test. Why spark stays serial: - SynapseML's SparkSession is a single global JVM instance and the Spark workers themselves already parallelize work. Pytest-level parallelism would contend on the same JVM, hurting rather than helping. Spark jobs already run in ~20 min. Coverage in xdist worker subprocesses: - Coverage.py needs explicit subprocess instrumentation -- when pytest-xdist spawns workers, they don't inherit coverage from the parent. Add a coverage_subprocess.pth file in site-packages that invokes coverage.process_startup(), gated by the COVERAGE_PROCESS_START env var (set per test step). - .coveragerc already sets parallel=true, so each worker writes its own .coverage.<host>.<pid>.<rand> shard which we already combine. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Two distinct failures showed up after enabling pytest-xdist on notspark: 1) MLflowIntegration PicklingError in test_forecast.py / test_score.py / test_notebook_example.py: _pickle.PicklingError: Can't pickle <class 'flaml.fabric.mlflow.MLflowIntegration'>: it's not the same object as flaml.fabric.mlflow.MLflowIntegration Root cause: test/fabric/test_mlflow_coverage.py was calling importlib.reload(flaml.fabric.mlflow) in-process to test the pyspark-missing import fallback. After reload, the class object bound at flaml.fabric.mlflow.MLflowIntegration is brand new -- different identity from the one already imported into flaml.automl.automl via 'from flaml.fabric.mlflow import MLflowIntegration'. Any pre-existing AutoML instances pickle-fail because pickle's class lookup-by-qualname returns the post-reload class, mismatching obj.__class__. With serial pytest, automl/* runs before fabric/* (lexical order) so the reload happens after all AutoML pickling. Under --dist=loadfile, files distribute non-deterministically across workers and the order can interleave. Fix: rewrite the test to spawn a fresh subprocess instead of reloading the module in-process. The subprocess can mutate sys.modules freely without polluting the parent worker. 2) JAVA_GATEWAY_EXITED on test/automl/test_extra_models.py: The file has pytestmark = pytest.mark.spark so all tests would be deselected by -m 'not spark', but pytest still IMPORTS the module during collection -- and the module's top-level code calls setup_spark_for_tests('MyApp'), starting a SparkSession. Two pytest-xdist workers each starting their own SparkSession collide on the JVM gateway port. Fix: add --ignore=test/automl/test_extra_models.py to the notspark pytest invocation. The spark variant still picks it up (spark step doesn't ignore test/automl). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Previous attempt (tee'd 'coverage report -m' into pytest-coverage.txt) was rejected by MishaKav with 'Coverage file ... has bad format or wrong data' because the action's text-mode parser also requires the pytest test-session summary line (e.g. "=== N passed in T s ===") which plain 'coverage report' doesn't emit. MishaKav supports an alternative input pytest-xml-coverage-path that takes a Cobertura XML coverage file directly -- the same one we already generate for the Codecov upload. Switch to that and drop the pytest-coverage.txt artifact. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The previous commit accidentally duplicated 'unique-id-for-comment' in the action's 'with:' mapping, which the check-yaml pre-commit hook caught: found duplicate key 'unique-id-for-comment' with value 'pytest-coverage-comment' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-05-13T10:51:22Z

thinkall and others added 13 commits May 9, 2026 13:23

Merge branch 'main' into lijiang/open-source-internal-merge

4f5394d

Remove .pipelines/ Azure DevOps pipeline definitions

47ddd60

These pipeline YAML files target the internal OneBranch images, pools, and feeds and have no use in the public repository, where CI is run through GitHub Actions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

thinkall requested a review from Copilot May 10, 2026 23:10

Copilot AI reviewed May 10, 2026

View reviewed changes

thinkall and others added 14 commits May 11, 2026 02:37

Merge branch 'main' into lijiang/open-source-internal-merge

23ce9c0

Merge branch 'main' into lijiang/open-source-internal-merge

cc5e50f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open source the internal Fabric AutoML fork#1545

Open source the internal Fabric AutoML fork#1545
thinkall wants to merge 28 commits into
mainfrom
lijiang/open-source-internal-merge

thinkall commented May 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

thinkall commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's added (from the internal fork)

What's preserved from ms/main

Conflict resolutions

Files intentionally NOT brought over

Other notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

thinkall commented May 9, 2026 •

edited

Loading

What's preserved from `ms/main`

github-actions Bot commented May 13, 2026 •

edited

Loading