Add 'pyproject' to types_or for ruff checks#158
Conversation
bn-andrew
left a comment
There was a problem hiding this comment.
Nice one, pending that upgrade in identify 👍🏻
f004428 to
836250e
Compare
|
Hmm, I'm wondering if maybe we should only suggest this in the documentation for now rather than changing the hook itself. The version of pre-commit I have installed from the Arch repos errors out on unknown type tags, so I'm a bit worried that this will break users with older pre-commit/identify installations. I'm curious to get your thoughts (and maybe @MichaReiser's as well). |
|
Good point. Let me use https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/ to see how long it will take for the vast majority of downloads to be on |
|
@ntBre and I very briefly discussed this change in our 1:1 and concluded that we can't change our default hook in a way that breaks most users. Mentioning this improvement in the documentation does make sense. |
|
Makes sense. Here's a query I put together that helps us understand adoption. Let's define adoption as a weekly download share of a given version (or higher). For each release since v2.6.9 (March 2025):
QueryWITH releases AS (
SELECT '2.6.9' AS target_version, DATE '2025-03-08' AS release_date, 2 AS target_major, 6 AS target_minor, 9 AS target_patch
UNION ALL SELECT '2.6.10', DATE '2025-04-19', 2, 6, 10
UNION ALL SELECT '2.6.11', DATE '2025-05-23', 2, 6, 11
UNION ALL SELECT '2.6.12', DATE '2025-05-23', 2, 6, 12
UNION ALL SELECT '2.6.13', DATE '2025-08-09', 2, 6, 13
UNION ALL SELECT '2.6.14', DATE '2025-09-06', 2, 6, 14
UNION ALL SELECT '2.6.15', DATE '2025-10-02', 2, 6, 15
UNION ALL SELECT '2.6.16', DATE '2026-01-12', 2, 6, 16
UNION ALL SELECT '2.6.17', DATE '2026-03-01', 2, 6, 17
UNION ALL SELECT '2.6.18', DATE '2026-03-15', 2, 6, 18
),
thresholds AS (
SELECT '50' AS threshold_label, 0.50 AS threshold_value
UNION ALL SELECT '60', 0.60
UNION ALL SELECT '70', 0.70
UNION ALL SELECT '80', 0.80
UNION ALL SELECT '90', 0.90
),
downloads AS (
SELECT
DATE(timestamp) AS download_date,
SAFE_CAST(REGEXP_EXTRACT(file.version, r'^(\d+)') AS INT64) AS major,
SAFE_CAST(REGEXP_EXTRACT(file.version, r'^\d+\.(\d+)') AS INT64) AS minor,
SAFE_CAST(REGEXP_EXTRACT(file.version, r'^\d+\.\d+\.(\d+)') AS INT64) AS patch
FROM `bigquery-public-data.pypi.file_downloads`
WHERE file.project = 'identify'
AND DATE(timestamp) >= DATE '2025-03-08'
AND DATE(timestamp) <= CURRENT_DATE()
),
weekly_counts AS (
SELECT
r.target_version,
r.release_date,
DATE_DIFF(d.download_date, r.release_date, WEEK) AS weeks_since_release,
COUNT(*) AS total_downloads_all_versions,
COUNTIF(
d.major IS NOT NULL
AND d.minor IS NOT NULL
AND d.patch IS NOT NULL
AND (
d.major > r.target_major
OR (d.major = r.target_major AND d.minor > r.target_minor)
OR (d.major = r.target_major AND d.minor = r.target_minor AND d.patch >= r.target_patch)
)
) AS target_or_higher_downloads
FROM releases r
JOIN downloads d
ON d.download_date >= r.release_date
GROUP BY 1, 2, 3
),
weekly_share AS (
SELECT
target_version,
release_date,
weeks_since_release,
SAFE_DIVIDE(target_or_higher_downloads, total_downloads_all_versions) AS share_for_week
FROM weekly_counts
),
threshold_hits AS (
SELECT
target_version,
threshold_label,
weeks_since_release
FROM (
SELECT
w.target_version,
t.threshold_label,
w.weeks_since_release,
ROW_NUMBER() OVER (
PARTITION BY w.target_version, t.threshold_label
ORDER BY w.weeks_since_release
) AS rn
FROM weekly_share w
JOIN thresholds t
ON w.share_for_week >= t.threshold_value
)
WHERE rn = 1
)
SELECT
r.target_version,
MAX(IF(t.threshold_label = '50', t.weeks_since_release, NULL)) AS weeks_since_release_50,
MAX(IF(t.threshold_label = '60', t.weeks_since_release, NULL)) AS weeks_since_release_60,
MAX(IF(t.threshold_label = '70', t.weeks_since_release, NULL)) AS weeks_since_release_70,
MAX(IF(t.threshold_label = '80', t.weeks_since_release, NULL)) AS weeks_since_release_80,
MAX(IF(t.threshold_label = '90', t.weeks_since_release, NULL)) AS weeks_since_release_90
FROM releases r
LEFT JOIN threshold_hits t
ON r.target_version = t.target_version
GROUP BY r.target_version, r.release_date
ORDER BY r.release_date;
Results (blanks mean the threshold is never reached):
In summary, it takes 14-17 weeks to hit 80% adoption and 31-39 weeks to reach 90% adoption across all of the pypi ecosystem. If you agree with the methodology, I can table this PR until ~40 weeks from now, and update the README with how to update |
|
Filed #161 for the README |
|
Out of interest, what would be the downsides of making that version of identify a minimum requirement for this project? 🤔 Adding I can't think of any major downsides but I feel like there is something I am not considering. |
|
I don't think that would work. I believe the |
|
Oh of course, yeah that makes sense. It might help purely because of caching in uv / pip but not something that would be a guarantee. Flat out this week but might try and have a play next week, although it doesn't seem likely to be a workable option. |
Summary
This allows ruff to check RUF200 and other checks in pyproject.toml
Test Plan
pre-commit try-repo