Skip to content

Open source the internal Fabric AutoML fork#1545

Open
thinkall wants to merge 28 commits into
mainfrom
lijiang/open-source-internal-merge
Open

Open source the internal Fabric AutoML fork#1545
thinkall wants to merge 28 commits into
mainfrom
lijiang/open-source-internal-merge

Conversation

@thinkall
Copy link
Copy Markdown
Collaborator

@thinkall thinkall commented May 9, 2026

Summary

This PR brings the internal Microsoft Fabric AutoML fork (msdata/A365/FLAML-Internal) into the open-source microsoft/FLAML repository, fully open-sourcing the work that has been maintained internally for the Fabric AutoML offering.

The change is delivered as a single squash commit so that internal Azure DevOps PR history (with internal PR numbers, GitOps bot signatures, etc.) is not exposed in the public repo. Per-PR provenance is preserved in the internal repo. A few small follow-up commits scope the PR to source/test code only (no docs site changes, no AzDO operational artifacts).

What's added (from the internal fork)

  • flaml/fabric/autofe, lowcode, mlflow, telemetry, visualization, and the Optuna-backed fANOVA evaluator (replacing the previous Cython implementation, internal PR 2045210)
  • flaml/visualization/ — visualization helpers
  • flaml/automl/ — coverage-driven improvements (utils.py, etc., internal PR 1970441)
  • benchmark/pmlb/ — PMLB benchmarking notebooks/results
  • notebook/trident/ — Fabric demo and test notebooks
  • test/automl/test_*_coverage.py, test/fabric/, test/test_misc_coverage.py, test/tune/test_tune_coverage.py — expanded coverage suites

What's preserved from ms/main

Conflict resolutions

File Resolution
flaml/version.py Keep ms/main 2.6.0; add internal conda-version comment
setup.py Keep ms/main minimal stub (pyproject.toml is now authoritative)
pyproject.toml Port internal-only autofe, fabric_python, and full synapse extras into [project.optional-dependencies]
test/automl/test_constraints.py, test_score.py, test_split.py, test_xgboost2d.py Keep ms/main make_classification fallbacks (more robust and consistent)
website/yarn.lock Keep ms/main version (deleted on internal)

Files intentionally NOT brought over

These are internal-only operational artifacts that are broken or meaningless on GitHub:

  • .pipelines/, azurepipelines-coverage.yml — Azure DevOps pipeline definitions and coverage config (OneBranch images, internal pools and feeds; OSS CI continues to be GitHub Actions)
  • conda-build/ — Fabric conda packaging metadata; recipes reference internal blob storage and are part of the internal release pipeline
  • lowcode/handlebars/ — internal low-code notebook generation templates and mock data used by the Fabric low-code AutoML UI; not consumed by the OSS flaml package
  • HowToMergeGithub.md — internal-to-OSS sync runbook (no longer needed once the fork is fully open-sourced)
  • HowToTestFLAML4Fabric.md — Fabric-specific manual test instructions for the internal CI / Fabric runtime
  • website/.npmrc — pinned to internal Azure DevOps NPM feed; would break public website builds
  • .azuredevops/policies/approvercountpolicy.yml — Azure DevOps PR policy for the internal repo
  • es-metadata.yml — internal Engineering System routing metadata
  • owners.txt — internal Azure DevOps owners file
  • All website/docs/ additions and modifications from the internal branch — reverted to keep the docs site untouched in this PR; a separate, focused docs PR can re-introduce any internal docs that are still relevant

Other notes

  • Added a .pre-commit-config.yaml exclude for notebook/trident/featurization.ipynb (3.7 MB demo notebook with embedded outputs) so the existing check-added-large-files hook still guards future contributions.
  • pre-commit run --all-files passes locally on the resulting tree.
  • The synthesized 3-way merge used the last logical sync point (OSS commit 158ff7d99 ≈ internal commit 8a2e7b376, "Merge github till 158ff7d") as the merge base, since the two branches have no shared git ancestor. A post-merge file-by-file audit confirmed no internal customizations were silently lost: of 10 files modified by both internal-pre-sync and OSS-post-sync, 7 surfaced as conflicts (resolved manually) and the other 3 (test_extra_models.py, test_forecast.py, test_regression.py) auto-merged on disjoint lines and were verified line-by-line.

cc @thinkall — let me know how you want to handle CI: many internal-only test files (test/automl/test_*_coverage.py, test/fabric/test_telemetry.py, etc.) depend on Fabric / synapse.ml libraries that aren't present on GitHub Actions runners and may need pytest skip markers in a follow-up.

thinkall and others added 13 commits May 9, 2026 13:23
Squash-merge the internal FLAML-Internal main branch into the open-source
ms/main branch so the public repository contains the full Fabric AutoML
feature set developed internally. Internal commit history is collapsed
into a single commit; per-PR provenance is preserved in the internal
Azure DevOps repo.

What this brings from the internal branch:
- flaml/fabric/ — autofe, lowcode, mlflow, telemetry, visualization, and
  the Optuna-backed fANOVA evaluator (replacing the previous Cython
  implementation, internal PR 2045210)
- flaml/visualization/ — visualization helpers
- flaml/automl/ — coverage-driven improvements (utils.py and friends,
  internal PR 1970441)
- conda-build/ — Fabric conda packaging metadata
- .pipelines/, azurepipelines-coverage.yml — Azure DevOps build config
  (informational; OSS CI continues to be GitHub Actions)
- benchmark/pmlb/ — PMLB benchmarking notebooks/results
- lowcode/handlebars/ — low-code notebook generation templates and mocks
- notebook/trident/ — Fabric demo and test notebooks
- test/automl/test_*_coverage.py, test/fabric/, test/test_misc_coverage.py,
  test/tune/test_tune_coverage.py — expanded coverage suites
- HowToMergeGithub.md, HowToTestFLAML4Fabric.md — internal-process docs
  retained for historical reference

What is preserved from ms/main:
- pyproject.toml PEP 621 migration (#1531, #1538) — setup.py is now a
  minimal stub
- Python 3.13 classifier and editable-install fix
- pandas 3.0 / sklearn 1.7 / catboost compatibility fixes
- OpenML test fallbacks using make_classification (#1534/#1537)
- Recent website dependency bumps (#1521-#1543)

Conflict resolutions:
- flaml/version.py: keep ms/main 2.6.0 plus internal conda-version comment
- setup.py: keep ms/main minimal stub (pyproject.toml is now authoritative);
  port the additional 'autofe', 'fabric_python', and full 'synapse' extras
  from the internal setup.py into pyproject.toml
- test/automl/test_constraints.py, test_score.py, test_split.py, test_xgboost2d.py:
  keep ms/main make_classification fallbacks (more robust and consistent)
- website/yarn.lock: keep ms/main version (deleted on internal)

Files intentionally NOT brought over (internal-only operational artifacts
that are broken or meaningless on GitHub):
- website/.npmrc — pinned to internal Azure DevOps NPM feed; would break
  public website builds
- .azuredevops/policies/approvercountpolicy.yml — Azure DevOps PR policy
  for the internal repo
- es-metadata.yml — internal Engineering System routing metadata
- owners.txt — internal Azure DevOps owners file

A .pre-commit-config.yaml exclusion was added for
notebook/trident/featurization.ipynb (3.7 MB demo notebook with embedded
outputs) so the existing check-added-large-files hook still guards future
contributions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
These pipeline YAML files target the internal OneBranch images, pools,
and feeds and have no use in the public repository, where CI is run
through GitHub Actions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- conda-build/ — Fabric conda packaging metadata; the conda recipes
  reference internal blob storage and are part of the internal release
  pipeline, not generally useful in the public repo.
- lowcode/handlebars/ — internal low-code notebook generation templates
  and mock data used by the Fabric low-code AutoML UI; not consumed by
  the OSS flaml package.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This Azure ML pipeline tuning example existed on ms/main but was absent
from the internal branch, so the squash merge unintentionally deleted
it. Restore it verbatim from ms/main since it's a public example users
may rely on.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove three more internal-only files that have no place in the public
repository:
- azurepipelines-coverage.yml — Azure Pipelines coverage configuration
- HowToMergeGithub.md — internal-to-OSS sync runbook (no longer
  needed once the fork is fully open-sourced)
- HowToTestFLAML4Fabric.md — Fabric-specific manual test instructions
  for the internal CI / Fabric runtime

Also revert all website/docs/ additions and modifications brought in
by the internal merge so the public docs site is unchanged by this PR.
A separate, focused docs PR can introduce any of the internal docs
content that is still relevant.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The fanova/ adapter has been migrated to wrap Optuna's pure-Python
FanovaImportanceEvaluator (see flaml/fabric/fanova/evaluator.py and
the README in the same folder, which explicitly states 'No local
Cython extension or build_ext step is required.'). The legacy
fanova.pyx file is no longer compiled or imported by any code path,
so drop it.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The newly merged test/spark/test_internal_mlflow.py does an
unconditional 'import pyspark' at module level and depends on the
internal MLflow tracking server, so it cannot be collected on the
GitHub Actions matrix entries that do not install pyspark
(all Python 3.10 jobs and all Windows jobs) and would not work even
where pyspark is present. The internal .pipelines/build.yml already
adds the same --ignore for this file in both its 'spark' and
'notspark' test variants.

Fixes the 'collected 838 items / 1 error' failure on PR #1545.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the structure of the internal Azure DevOps pipeline
(.pipelines/build.yml in the internal fork) in the public GitHub
Actions workflow:

* New matrix axis 'test-type' with two values, 'notspark' and 'spark',
  each with its own --ignore list and pytest -m filter:
  - notspark: --ignore=test/autogen --ignore=test/spark -m 'not spark'
  - spark:    --ignore=test/autogen
              --ignore=test/spark/test_internal_mlflow.py
              -m 'spark'
  This keeps test/spark/test_internal_mlflow.py (which depends on the
  internal MLflow tracking server and unconditionally imports pyspark
  at module level) from breaking collection in either variant.

* The 'spark' variant only runs where pyspark is installed by the
  workflow today: ubuntu-latest with Python 3.11 / 3.12 / 3.13. It is
  excluded for windows-latest and for Python 3.10.

* All Linux test jobs now run under 'coverage run' (not just the 3.11
  job). Each Linux job combines its parallel-mode coverage shards into
  one .coverage file and uploads it as a uniquely named artifact. A new
  'coverage' aggregator job downloads every per-job artifact, runs
  'coverage combine' across them, generates a single coverage.xml and
  uploads that one combined report to Codecov. This replaces the
  previous per-3.11-job Codecov upload.

* The 'Save dependencies' step is now gated to a single matrix entry
  (ubuntu / 3.11 / notspark) on push to main so that parallel jobs do
  not race on the unit-tests-installed-dependencies branch.

Coverage is intentionally not collected on Windows runners to avoid
Linux/Windows path mismatches when combining .coverage data files.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Three test modules each rebuilt the same SynapseML/MLflow-Spark Spark
session from scratch with identical configuration:

  test/spark/test_0sparkml.py
  test/spark/test_internal_mlflow.py (in _init_spark_for_main)
  test/automl/test_extra_models.py

Move the SparkSession.builder configuration, log_model_allowlist
override, and disable/restore_spark_ansi_mode + atexit cleanup into
a single test-only helper at test/spark/_init_spark.py exposing:

  init_spark_session(app_name, master)        - build/fetch the session
  setup_spark_for_tests(app_name, master)     - returns (spark, skip_spark)
                                                with platform/import gating
                                                and ANSI mode handling

Each test module now calls the helper instead of duplicating the
Maven-coordinates / config block. No behavioural change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The first OSS CI run on this PR uncovered 4 real test failures:

  test/automl/test_ts_coverage.py::test_prettify_no_test_ndarray_raises
  test/automl/test_ts_coverage.py::test_prettify_no_test_series_raises
    OSS PR #1536 ("Generate timestamps for time series predictions
    without test data") changed prettify_prediction to auto-generate
    timestamps via create_forward_frame instead of raising
    ValueError / NotImplementedError when test_data is None.
    Update the two internal tests to assert the new graceful behaviour
    (a DataFrame with the time column populated) instead of expecting
    an exception that no longer fires.

  test/nlp/test_hf_utils_coverage.py::test_summarization_with_y_true
  test/nlp/test_hf_utils_coverage.py::test_summarization_without_y_true
    Both fail with 'Resource punkt_tab not found' because nltk data
    is not pre-downloaded on the public GitHub Actions runners.
    The internal Azure DevOps pipeline (.pipelines/build.yml) explicitly
    ignores test/nlp for the same reason -- mirror that behaviour by
    adding --ignore=test/nlp to both notspark and spark CI invocations.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
scikit-learn 1.8 (pulled in by default on Python 3.11+ in the GitHub
Actions runner image) tightened check_is_fitted via the new estimator
tags system. FLAML's autofe Pipeline (DataTransformer + Featurization)
trips this validation and three tests in test/fabric/test_autofe.py
fail with NotFittedError on py3.11 / py3.13 (py3.10 still resolves to
sklearn 1.7.2 and passes):

  test_numpy_autofe
  test_autofe
  test_autofe_force

The internal Azure DevOps pipeline (.pipelines/build.yml) pins
scikit-learn=1.5.2 via conda for the same reason. Mirror that intent
by capping below 1.8 in the GitHub Actions workflow.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Windows notspark jobs were failing because the pandas<3 and scikit-learn<1.8
pins were gated to ubuntu-latest only. Windows ended up with pandas 3.0.2 and
scikit-learn 1.8.0, which produced 8 distinct failures:

  * test/fabric/test_autofe.py (3 tests)
      sklearn.exceptions.NotFittedError on autofe Pipeline -- sklearn 1.8
      tightened check_is_fitted via the new estimator-tags system.

  * test/automl/test_data_coverage.py TestAddTimeIdxCol (2 tests)
      AttributeError: 'Series' object has no attribute 'view' -- pandas 3.0
      removed the deprecated Series.view API.

  * test/automl/test_ts_coverage.py TestDataTransformerTS (2 tests)
      TypeError: Invalid value for dtype 'str' -- pandas 3.0 stricter dtype
      validation.

  * test/automl/test_ts_coverage.py TestSimpleForecaster (1 test)
      KeyError: 0 in test_seasonal_naive_fit_predict_int -- pandas 3.0 index
      semantics change.

The internal Azure DevOps pipeline pins both packages via conda for every
environment (scikit-learn=1.5.2; pandas via the conda env yml). Mirror that
intent here by removing the matrix.os filter, so Windows gets the same
constraints. The py3.10 carve-out for pandas remains because its older
transitive deps already pull in pandas<3.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@thinkall thinkall requested a review from Copilot May 10, 2026 23:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

thinkall and others added 14 commits May 11, 2026 02:37
The previous CI cycle pinned pandas<3 and scikit-learn<1.8 to mask test
failures on the new releases. Pinning is the wrong fix in non-spark envs
(FLAML supports modern pandas/sklearn there); pyspark is the only reason
to keep pandas<3. This commit fixes the underlying FLAML bugs so tests
pass on pandas 3.0.2 + scikit-learn 1.8.0, and narrows the workflow pins.

Code fixes
----------
* flaml/automl/data.py
  - add_time_idx_col: replace Series.view('int64') (removed in pandas 3)
    with .astype('int64') for the datetime->int64-nanoseconds cast.
  - Use .iloc[0] when extracting a scalar from .mode() to avoid the
    pandas FutureWarning that becomes an error in pandas 3.
  - DataTransformer: add __sklearn_is_fitted__ so the transformer
    satisfies sklearn 1.8's stricter check_is_fitted when wrapped in
    a Pipeline (e.g. AutoML.feature_transformer).

* flaml/automl/time_series/ts_data.py
  - DataTransformerTS.transform: building y in-place via
    'y.iloc[:] = y_tr' raises 'Invalid value for dtype str' on pandas 3
    when y has string dtype but the encoder produces ints. Build a new
    Series/DataFrame of the appropriate dtype instead.

* flaml/automl/time_series/ts_model.py
  - SeasonalNaive.predict: 'forecast(...)[0]' performs label-based
    lookup on Series with non-integer indexes in pandas 3 and raises
    KeyError(0). Use .iloc[0] for positional access.

* flaml/fabric/autofe.py
  - Featurization: set self._is_fitted = True at the start of fit() so
    even no-op fits register as fitted, and add __sklearn_is_fitted__
    so sklearn 1.8's check_is_fitted accepts the Pipeline wrapping
    [DataTransformer, Featurization] returned by automl.feature_transformer.

Workflow
--------
* .github/workflows/python-package.yml
  - Drop the scikit-learn<1.8 pin entirely (FLAML now supports 1.8).
  - Restore the pandas<3 pin to ubuntu-latest only (pyspark, which is
    only installed on Ubuntu in this matrix, doesn't yet support
    pandas 3.0). Windows runs against pandas 3 directly.

Verification
------------
Reproduced both failure modes locally with a fresh venv (pandas 3.0.2,
scikit-learn 1.8.0) and confirmed the previously-failing tests now pass:
  test/automl/test_data_coverage.py::TestAddTimeIdxCol (3 tests)
  test/automl/test_ts_coverage.py::TestDataTransformerTS::test_transform_with_label_transformer
  test/automl/test_ts_coverage.py::TestDataTransformerTS::test_transform_y_dataframe_with_label_transformer
  test/automl/test_ts_coverage.py::TestSimpleForecaster::test_seasonal_naive_fit_predict_int
  test/fabric/test_autofe.py::test_numpy_autofe
  test/fabric/test_autofe.py::test_autofe
Also re-ran the same suite against the older pandas 2.3 / sklearn 1.5
combo to confirm no regressions.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…load

The Codecov dashboard wasn't showing data for branch
'lijiang/open-source-internal-merge' because the coverage upload was
silently failing. The 'Combine coverage and upload to Codecov' job log
showed:

    [info] -> No token specified or token is empty
    [error] There was an error running the uploader: Error: There was
            an error fetching the storage URL during POST: 429
    [info] Codecov will exit with status code 0. ...

i.e. the v3 uploader was hitting Codecov's tokenless rate limit and then
exiting 0 (so the job appeared green even though no data was uploaded).

Switch to codecov/codecov-action@v5 (and pass CODECOV_TOKEN if it's set
as a repo secret). v5 uses GitHub OIDC for authenticated tokenless
uploads on public repos, which has much higher quota and avoids the
429s. Also enable verbose logging so future upload failures are obvious.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous codecov-action@v5 attempt still failed with HTTP 400:
    {"message":"Token required because branch is protected"}

The action defaults to anonymous tokenless uploads, which Codecov is
phasing out -- in particular, anonymous POSTs are rejected for
upstream-source branches in repos with protected branches (which
microsoft/FLAML has on main). Set 'use_oidc: true' to make the action
exchange a GitHub OIDC token for an authenticated Codecov upload, and
add 'id-token: write' to the workflow permissions so the OIDC token can
actually be minted.

CODECOV_TOKEN is still honored when present; OIDC is the fallback that
removes the need for a repo admin to provision a secret.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR coverage uploads now succeed (since 7813ef0 + fcb411a), but
Codecov still wasn't posting per-PR comments because:

  1. The repo had no codecov.yml, so Codecov fell back to its old
     defaults (no explicit comment layout, no flags config).
  2. The Codecov GitHub App has to be installed on microsoft/FLAML
     before any comments can be posted -- this is an org-admin
     action that must be done via https://github.com/apps/codecov
     (the workflow can't bootstrap it).

This commit covers (1). Once a maintainer installs the Codecov GitHub
App per (2), every PR will get a comment matching the standard layout
(coverage delta table + flags + impacted files + footer link to
Codecov). The config also defines:

  * coverage.status.project (target=auto, threshold=1%) -- a soft
    project-coverage status check
  * coverage.status.patch (target=70%, informational=true) -- patch
    coverage tracked but non-blocking until the team gets used to it
  * coverage.range '60...95' -- color thresholds for the comment
  * flag 'unittests' -> paths: flaml/ -- matches the flag the workflow
    uploads (see .github/workflows/python-package.yml)
  * ignore patterns for test/, notebook/, website/, setup.py,
    flaml/version.py, and flaml/autogen/ (the latter is legacy code
    that has moved to microsoft/autogen)
  * codecov.notify.after_n_builds: 1 + wait_for_ci: true so the
    comment appears as soon as the single combined upload finishes
    (the Build workflow already aggregates per-job coverage shards
    into one report)

Validated against https://codecov.io/validate (200 / 'Valid!').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov was processing uploads (status 200) and computing the PR
comparison correctly (base 65.76% -> head 84.38%, +17.4%), but
codecov-commenter was never posting a PR comment. Simplifying the config
to a minimal layout (no codecov.notify overrides, drop flag-paths
mapping and carryforward toggles) so we match what microsoft/SynapseML
uses (no codecov.yml at all), which reliably gets a codecov-commenter
comment on every PR.

Removed:
- codecov.notify (after_n_builds, wait_for_ci, require_ci_to_pass) --
  defaults are equivalent and known to work
- comment.behavior, require_base, require_head, show_carryforward_flags --
  any of these may have been suppressing the comment in combination
- flags.unittests block -- inferred automatically from the upload

Kept:
- coverage.status.project / patch defaults
- comment.layout + require_changes:false
- ignore patterns for tests/notebooks/website/etc.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Mirror the exact codecov-action invocation that microsoft/physical-ai-
toolchain uses (codecov-action@v6 + a 'name' parameter), which reliably
produces a codecov-commenter PR comment on every PR. Our v5 invocation
uploads + computes the comparison successfully (verified in Codecov
API), but no codecov-commenter comment is being posted on PR #1545
even after CI passes.

The 'name' field is what shows up in the Codecov 'sessions' table on
the PR page and is documented as required for the notification engine
to route the comment correctly when a single repo has multiple upload
sources (we have 11 build matrix entries that all funnel into one
combined coverage.xml).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CODECOV_TOKEN secret was just added to the repo. Trigger a full CI
cycle so the codecov-action can pick it up and authenticate the upload
explicitly (instead of falling back to OIDC). With an explicit token,
codecov-commenter should reliably post a PR comment on completion --
this is how SynapseML/physical-ai-toolchain reliably get comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The codecov-action@v6 logic prefers OIDC when both `use_oidc: true` and
an explicit token are set: 'Token set from env' (CC_OIDC_TOKEN) wins
over 'Token set from input' (CODECOV_TOKEN). The upload log on the
prior run confirmed this -- Token length: 2048 (an OIDC JWT), not the
~10-char repo upload token from Codecov's settings page.

OIDC uploads succeed (status 200, comparison computed, ci_passed=true)
but for this repo they don't trigger the codecov-commenter PR comment.
Forcing the explicit repo-scoped upload token makes Codecov treat the
upload as 'owner-authenticated', which is what reliably unlocks the
notification path on PRs (matches what microsoft/SynapseML does).

Also dropped the now-unused 'id-token: write' workflow permission.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Codecov upload to the dashboard works fine (CODECOV_TOKEN auth, status
200, coverage 84.39% recorded), but codecov-commenter has been silent
on this repo since Feb 2024 even with all four config variants tried
(custom yml, default yml, OIDC, explicit token). Rather than wait on
Codecov support to refresh their internal repo state, post the coverage
comment directly via GITHUB_TOKEN using MishaKav/pytest-coverage-comment.

Codecov upload remains intact -- the dashboard at app.codecov.io
continues to receive uploads for trend tracking and PR comparison,
codecov.yml still defines status checks. Only the in-PR comment is
now produced by the new action.

Changes:
- pull-requests: write workflow permission (required so GITHUB_TOKEN
  can comment on PRs).
- coverage report -m output is now tee'd into pytest-coverage.txt --
  this is the text format MishaKav/pytest-coverage-comment expects.
- New 'Post coverage comment to PR' step pinned to @main; runs only on
  pull_request events when the coverage file exists. unique-id-for-
  comment ensures the action updates its existing comment in place
  rather than spamming on every CI run.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The notspark CI jobs were taking ~75-100 min wall time, dominating
the entire CI cycle. Parallelize them across 2 worker procs using
pytest-xdist to roughly halve wall time without CPU oversubscription.

Why -n 2 (not -n auto / -n 4):
- GitHub-hosted runners have 4 vCPU.
- lightgbm / xgboost / sklearn / openblas all use internal threading.
- 2 pytest workers x ~2 internal threads each fits 4 vCPU well;
  4 workers would oversubscribe and likely hurt wall time.

Why --dist=loadfile:
- Tests in the same file often share heavy module-level imports
  (transformers, torch, prophet, etc). Keeping them in the same
  worker avoids re-importing those per test.

Why spark stays serial:
- SynapseML's SparkSession is a single global JVM instance and the
  Spark workers themselves already parallelize work. Pytest-level
  parallelism would contend on the same JVM, hurting rather than
  helping. Spark jobs already run in ~20 min.

Coverage in xdist worker subprocesses:
- Coverage.py needs explicit subprocess instrumentation -- when
  pytest-xdist spawns workers, they don't inherit coverage from the
  parent. Add a coverage_subprocess.pth file in site-packages that
  invokes coverage.process_startup(), gated by the
  COVERAGE_PROCESS_START env var (set per test step).
- .coveragerc already sets parallel=true, so each worker writes its
  own .coverage.<host>.<pid>.<rand> shard which we already combine.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Two distinct failures showed up after enabling pytest-xdist on
notspark:

1) MLflowIntegration PicklingError in test_forecast.py / test_score.py
   / test_notebook_example.py:
   _pickle.PicklingError: Can't pickle <class 'flaml.fabric.mlflow.MLflowIntegration'>:
       it's not the same object as flaml.fabric.mlflow.MLflowIntegration

   Root cause: test/fabric/test_mlflow_coverage.py was calling
   importlib.reload(flaml.fabric.mlflow) in-process to test the
   pyspark-missing import fallback. After reload, the class object
   bound at flaml.fabric.mlflow.MLflowIntegration is brand new --
   different identity from the one already imported into
   flaml.automl.automl via 'from flaml.fabric.mlflow import
   MLflowIntegration'. Any pre-existing AutoML instances pickle-fail
   because pickle's class lookup-by-qualname returns the post-reload
   class, mismatching obj.__class__.

   With serial pytest, automl/* runs before fabric/* (lexical order)
   so the reload happens after all AutoML pickling. Under
   --dist=loadfile, files distribute non-deterministically across
   workers and the order can interleave.

   Fix: rewrite the test to spawn a fresh subprocess instead of
   reloading the module in-process. The subprocess can mutate
   sys.modules freely without polluting the parent worker.

2) JAVA_GATEWAY_EXITED on test/automl/test_extra_models.py:
   The file has pytestmark = pytest.mark.spark so all tests would
   be deselected by -m 'not spark', but pytest still IMPORTS the
   module during collection -- and the module's top-level code
   calls setup_spark_for_tests('MyApp'), starting a SparkSession.
   Two pytest-xdist workers each starting their own SparkSession
   collide on the JVM gateway port.

   Fix: add --ignore=test/automl/test_extra_models.py to the
   notspark pytest invocation. The spark variant still picks it up
   (spark step doesn't ignore test/automl).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous attempt (tee'd 'coverage report -m' into pytest-coverage.txt)
was rejected by MishaKav with 'Coverage file ... has bad format or
wrong data' because the action's text-mode parser also requires the
pytest test-session summary line (e.g. "=== N passed in T s ===")
which plain 'coverage report' doesn't emit.

MishaKav supports an alternative input pytest-xml-coverage-path that
takes a Cobertura XML coverage file directly -- the same one we
already generate for the Codecov upload. Switch to that and drop the
pytest-coverage.txt artifact.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The previous commit accidentally duplicated 'unique-id-for-comment'
in the action's 'with:' mapping, which the check-yaml pre-commit
hook caught:
  found duplicate key 'unique-id-for-comment' with value 'pytest-coverage-comment'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

coverage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants