Skip to content

Add 'pyproject' to types_or for ruff checks#158

Open
mxr wants to merge 1 commit intoastral-sh:mainfrom
mxr:patch-1
Open

Add 'pyproject' to types_or for ruff checks#158
mxr wants to merge 1 commit intoastral-sh:mainfrom
mxr:patch-1

Conversation

@mxr
Copy link
Copy Markdown
Contributor

@mxr mxr commented Mar 3, 2026

Summary

This allows ruff to check RUF200 and other checks in pyproject.toml

Test Plan

pre-commit try-repo

Copy link
Copy Markdown

@bn-andrew bn-andrew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one, pending that upgrade in identify 👍🏻

@mxr mxr marked this pull request as ready for review March 15, 2026 18:55
@mxr mxr force-pushed the patch-1 branch 2 times, most recently from f004428 to 836250e Compare March 16, 2026 13:46
@mxr mxr requested a review from ntBre March 16, 2026 13:46
Copy link
Copy Markdown
Contributor

@ntBre ntBre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

@ntBre
Copy link
Copy Markdown
Contributor

ntBre commented Mar 17, 2026

Hmm, I'm wondering if maybe we should only suggest this in the documentation for now rather than changing the hook itself. The version of pre-commit I have installed from the Arch repos errors out on unknown type tags, so I'm a bit worried that this will break users with older pre-commit/identify installations. uvx pre-commit was fine, on the other hand, but probably only because I hadn't run it before and got a brand-new installation. I guess it's not really clear when we could add this to the hook, though, unless pre-commit bumps its minimum identify version.

I'm curious to get your thoughts (and maybe @MichaReiser's as well).

@mxr
Copy link
Copy Markdown
Contributor Author

mxr commented Mar 17, 2026

Good point. Let me use https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/ to see how long it will take for the vast majority of downloads to be on identify>=2.6.18. The other thing I can think of is that pre-commit could be patched to ignore unknown types but that also requires waiting for adoption. That solution would also has some edge cases (for example if none of the types are known then that would error out anyway)

@MichaReiser
Copy link
Copy Markdown
Member

@ntBre and I very briefly discussed this change in our 1:1 and concluded that we can't change our default hook in a way that breaks most users. Mentioning this improvement in the documentation does make sense.

@mxr
Copy link
Copy Markdown
Contributor Author

mxr commented Mar 18, 2026

Makes sense. Here's a query I put together that helps us understand adoption. Let's define adoption as a weekly download share of a given version (or higher). For each release since v2.6.9 (March 2025):

  • look at each week since release
  • compute downloads of target version or greater during that week divided by all identify downloads during that week
  • then return the first weeks_since_release where that weekly download share is at least 50/60/80/80/90%
Query
WITH releases AS (
  SELECT '2.6.9' AS target_version, DATE '2025-03-08' AS release_date, 2 AS target_major, 6 AS target_minor, 9 AS target_patch
  UNION ALL SELECT '2.6.10', DATE '2025-04-19', 2, 6, 10
  UNION ALL SELECT '2.6.11', DATE '2025-05-23', 2, 6, 11
  UNION ALL SELECT '2.6.12', DATE '2025-05-23', 2, 6, 12
  UNION ALL SELECT '2.6.13', DATE '2025-08-09', 2, 6, 13
  UNION ALL SELECT '2.6.14', DATE '2025-09-06', 2, 6, 14
  UNION ALL SELECT '2.6.15', DATE '2025-10-02', 2, 6, 15
  UNION ALL SELECT '2.6.16', DATE '2026-01-12', 2, 6, 16
  UNION ALL SELECT '2.6.17', DATE '2026-03-01', 2, 6, 17
  UNION ALL SELECT '2.6.18', DATE '2026-03-15', 2, 6, 18
),
thresholds AS (
  SELECT '50' AS threshold_label, 0.50 AS threshold_value
  UNION ALL SELECT '60', 0.60
  UNION ALL SELECT '70', 0.70
  UNION ALL SELECT '80', 0.80
  UNION ALL SELECT '90', 0.90
),
downloads AS (
  SELECT
    DATE(timestamp) AS download_date,
    SAFE_CAST(REGEXP_EXTRACT(file.version, r'^(\d+)') AS INT64) AS major,
    SAFE_CAST(REGEXP_EXTRACT(file.version, r'^\d+\.(\d+)') AS INT64) AS minor,
    SAFE_CAST(REGEXP_EXTRACT(file.version, r'^\d+\.\d+\.(\d+)') AS INT64) AS patch
  FROM `bigquery-public-data.pypi.file_downloads`
  WHERE file.project = 'identify'
    AND DATE(timestamp) >= DATE '2025-03-08'
    AND DATE(timestamp) <= CURRENT_DATE()
),
weekly_counts AS (
  SELECT
    r.target_version,
    r.release_date,
    DATE_DIFF(d.download_date, r.release_date, WEEK) AS weeks_since_release,
    COUNT(*) AS total_downloads_all_versions,
    COUNTIF(
      d.major IS NOT NULL
      AND d.minor IS NOT NULL
      AND d.patch IS NOT NULL
      AND (
        d.major > r.target_major
        OR (d.major = r.target_major AND d.minor > r.target_minor)
        OR (d.major = r.target_major AND d.minor = r.target_minor AND d.patch >= r.target_patch)
      )
    ) AS target_or_higher_downloads
  FROM releases r
  JOIN downloads d
    ON d.download_date >= r.release_date
  GROUP BY 1, 2, 3
),
weekly_share AS (
  SELECT
    target_version,
    release_date,
    weeks_since_release,
    SAFE_DIVIDE(target_or_higher_downloads, total_downloads_all_versions) AS share_for_week
  FROM weekly_counts
),
threshold_hits AS (
  SELECT
    target_version,
    threshold_label,
    weeks_since_release
  FROM (
    SELECT
      w.target_version,
      t.threshold_label,
      w.weeks_since_release,
      ROW_NUMBER() OVER (
        PARTITION BY w.target_version, t.threshold_label
        ORDER BY w.weeks_since_release
      ) AS rn
    FROM weekly_share w
    JOIN thresholds t
      ON w.share_for_week >= t.threshold_value
  )
  WHERE rn = 1
)
SELECT
  r.target_version,
  MAX(IF(t.threshold_label = '50', t.weeks_since_release, NULL)) AS weeks_since_release_50,
  MAX(IF(t.threshold_label = '60', t.weeks_since_release, NULL)) AS weeks_since_release_60,
  MAX(IF(t.threshold_label = '70', t.weeks_since_release, NULL)) AS weeks_since_release_70,
  MAX(IF(t.threshold_label = '80', t.weeks_since_release, NULL)) AS weeks_since_release_80,
  MAX(IF(t.threshold_label = '90', t.weeks_since_release, NULL)) AS weeks_since_release_90
FROM releases r
LEFT JOIN threshold_hits t
  ON r.target_version = t.target_version
GROUP BY r.target_version, r.release_date
ORDER BY r.release_date;

Results (blanks mean the threshold is never reached):

target_version weeks_since_
release_50
weeks_since_
release_60
weeks_since_
release_70
weeks_since_
release_80
weeks_since_
release_90
2.6.9 1 1 5 17 39
2.6.10 1 1 5 17 36
2.6.11 1 1 6 14 31
2.6.12 1 1 6 14 31
2.6.13 1 1 9 17
2.6.14 1 1 9 14
2.6.15 1 2 8 12
2.6.16 0 1 9
2.6.17 1
2.6.18

In summary, it takes 14-17 weeks to hit 80% adoption and 31-39 weeks to reach 90% adoption across all of the pypi ecosystem.

If you agree with the methodology, I can table this PR until ~40 weeks from now, and update the README with how to update types_or to lint pyproject.toml files instead. Does that work?

@mxr
Copy link
Copy Markdown
Contributor Author

mxr commented Mar 19, 2026

Filed #161 for the README

@bn-andrew
Copy link
Copy Markdown

Out of interest, what would be the downsides of making that version of identify a minimum requirement for this project? 🤔

Adding identify>=2.6.18 as a dependency should allow this change to be rolled out now?

I can't think of any major downsides but I feel like there is something I am not considering.

@mxr
Copy link
Copy Markdown
Contributor Author

mxr commented Mar 24, 2026

I don't think that would work. I believe the additional_dependencies apply to the environment that ruff executes in, but not the environment of the pre-commit runner (which is what needs identify>=2.6.18 to know what files to pass to ruff). But you could try it out?

@bn-andrew
Copy link
Copy Markdown

bn-andrew commented Mar 25, 2026

Oh of course, yeah that makes sense. It might help purely because of caching in uv / pip but not something that would be a guarantee.
I can see the hooks have the https://pre-commit.com/#hooks-minimum_pre_commit_version and https://pre-commit.com/#config-additional_dependencies
But I don't think either of those will quite lets us manipulate the root pre-commit environment enough either ☹️

Flat out this week but might try and have a play next week, although it doesn't seem likely to be a workable option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants