From c3ac49025f12a5be765d3d83c08ad1f0a2e3cf16 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Wed, 23 Mar 2022 15:28:29 +0800 Subject: [PATCH 01/11] Update tidb-lightning-faq.md --- tidb-lightning/tidb-lightning-faq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tidb-lightning/tidb-lightning-faq.md b/tidb-lightning/tidb-lightning-faq.md index f12af703887db..45ce601ad200a 100644 --- a/tidb-lightning/tidb-lightning-faq.md +++ b/tidb-lightning/tidb-lightning-faq.md @@ -119,7 +119,7 @@ If `tidb-lightning` abnormally exited, the cluster might be stuck in the "import {{< copyable "shell-regular" >}} ```sh -tidb-lightning-ctl --fetch-mode +tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode ``` You can force the cluster back to "normal mode" using the following command: @@ -127,7 +127,7 @@ You can force the cluster back to "normal mode" using the following command: {{< copyable "shell-regular" >}} ```sh -tidb-lightning-ctl --switch-mode=normal +tidb-lightning-ctl --config tidb-lightning.toml --fetch-mode ``` ## Can TiDB Lightning be used with 1-Gigabit network card? From c4e83db28e689e722b75b2e0fa2a7c08b15ac702 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Mon, 6 Jun 2022 10:41:48 +0800 Subject: [PATCH 02/11] Update alert-rules.md --- alert-rules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/alert-rules.md b/alert-rules.md index c06f3330addbc..abe9503b24b90 100644 --- a/alert-rules.md +++ b/alert-rules.md @@ -419,7 +419,7 @@ This section gives the alert rules for the TiKV component. * Solution: - Adjust the `block-cache-size` value of both `rockdb.defaultcf` and `rocksdb.writecf`. + Adjust the `block-cache-size` value of both `rocksdb.defaultcf` and `rocksdb.writecf`. #### `TiKV_GC_can_not_work` From e11f2901d79cdf88b23d59bafd4de3b2c154587c Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Mon, 6 Jun 2022 10:43:10 +0800 Subject: [PATCH 03/11] Revert "Update alert-rules.md" This reverts commit c4e83db28e689e722b75b2e0fa2a7c08b15ac702. --- alert-rules.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/alert-rules.md b/alert-rules.md index abe9503b24b90..c06f3330addbc 100644 --- a/alert-rules.md +++ b/alert-rules.md @@ -419,7 +419,7 @@ This section gives the alert rules for the TiKV component. * Solution: - Adjust the `block-cache-size` value of both `rocksdb.defaultcf` and `rocksdb.writecf`. + Adjust the `block-cache-size` value of both `rockdb.defaultcf` and `rocksdb.writecf`. #### `TiKV_GC_can_not_work` From 0db7df99f2ab4562d4961b7fd2803491c9d5915d Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Tue, 1 Aug 2023 11:26:57 +0800 Subject: [PATCH 04/11] Update dumpling-overview.md --- dumpling-overview.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/dumpling-overview.md b/dumpling-overview.md index de8bf19a336d9..f4cf5c6ace2a0 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -108,7 +108,7 @@ In the command above: + The `-t` option specifies the number of threads for the export. Increasing the number of threads improves the concurrency of Dumpling and the export speed, and also increases the database's memory consumption. Therefore, it is not recommended to set the number too large. Usually, it's less than 64. -+ The `-r` option specifies the maximum number of rows in a single file. With this option specified, Dumpling enables the in-table concurrency to speed up the export and reduce the memory usage. When the upstream database is TiDB v3.0 or later versions, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting and the specific `-r` value does not affect the split algorithm. When the upstream database is MySQL and the primary key is of the `int` type, specifying `-r` can also enable the in-table concurrency. ++ The `-r` option enables the in-table concurrency to speed up the export. When the source database is TiDB, a `-r` value greater than 0 indicates that the TiDB region information is used for splitting, and reduces the memory usage. The specific `-r` value does not affect the split algorithm. When the source database is MySQL and the primary key is of the `int` type, specifying `-r` can also enable the in-table concurrency. + The `-F` option is used to specify the maximum size of a single file (the unit here is `MiB`; inputs like `5GiB` or `8KB` are also acceptable). It is recommended to keep its value to 256 MiB or less if you plan to use TiDB Lightning to load this file into a TiDB instance. > **Note:** From ae9709d906e7ffdaec0e461810a5427061603e7a Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Tue, 12 Sep 2023 16:26:10 +0800 Subject: [PATCH 05/11] update go to v1.21 --- hardware-and-software-requirements.md | 2 +- pd-control.md | 2 +- pd-recover.md | 2 +- tidb-control.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hardware-and-software-requirements.md b/hardware-and-software-requirements.md index 15d824567af30..bb8e7ae2ecfd5 100644 --- a/hardware-and-software-requirements.md +++ b/hardware-and-software-requirements.md @@ -47,7 +47,7 @@ As an open-source distributed SQL database with high performance, TiDB can be de | Libraries required for compiling and running TiDB | Version | | :--- | :--- | -| Golang | 1.20 or later | +| Golang | 1.21 or later | | Rust | nightly-2022-07-31 or later | | GCC | 7.x | | LLVM | 13.0 or later | diff --git a/pd-control.md b/pd-control.md index 20e391c4b9852..53eb4a9321d6d 100644 --- a/pd-control.md +++ b/pd-control.md @@ -33,7 +33,7 @@ To obtain `pd-ctl` of the latest version, download the TiDB server installation ### Compile from source code -1. [Go](https://golang.org/) 1.20 or later is required because the Go modules are used. +1. [Go](https://golang.org/) 1.21 or later is required because the Go modules are used. 2. In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make` or `make pd-ctl` command to compile and generate `bin/pd-ctl`. ## Usage diff --git a/pd-recover.md b/pd-recover.md index 1910e6c46ba24..24b3bd3049d80 100644 --- a/pd-recover.md +++ b/pd-recover.md @@ -10,7 +10,7 @@ PD Recover is a disaster recovery tool of PD, used to recover the PD cluster whi ## Compile from source code -+ [Go](https://golang.org/) 1.20 or later is required because the Go modules are used. ++ [Go](https://golang.org/) 1.21 or later is required because the Go modules are used. + In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make pd-recover` command to compile and generate `bin/pd-recover`. > **Note:** diff --git a/tidb-control.md b/tidb-control.md index 921a27a761ded..2c79e6ff347f4 100644 --- a/tidb-control.md +++ b/tidb-control.md @@ -26,7 +26,7 @@ After installing TiUP, you can use `tiup ctl:v tidb` command to ### Compile from source code -- Compilation environment requirement: [Go](https://golang.org/) 1.20 or later +- Compilation environment requirement: [Go](https://golang.org/) 1.21 or later - Compilation procedures: Go to the root directory of the [TiDB Control project](https://github.com/pingcap/tidb-ctl), use the `make` command to compile, and generate `tidb-ctl`. - Compilation documentation: you can find the help files in the `doc` directory; if the help files are lost or you want to update them, use the `make doc` command to generate the help files. From f0b8b492a6596caf9b78d23428211449fe42b92e Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Tue, 12 Sep 2023 16:27:52 +0800 Subject: [PATCH 06/11] Revert "update go to v1.21" This reverts commit ae9709d906e7ffdaec0e461810a5427061603e7a. --- hardware-and-software-requirements.md | 2 +- pd-control.md | 2 +- pd-recover.md | 2 +- tidb-control.md | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hardware-and-software-requirements.md b/hardware-and-software-requirements.md index bb8e7ae2ecfd5..15d824567af30 100644 --- a/hardware-and-software-requirements.md +++ b/hardware-and-software-requirements.md @@ -47,7 +47,7 @@ As an open-source distributed SQL database with high performance, TiDB can be de | Libraries required for compiling and running TiDB | Version | | :--- | :--- | -| Golang | 1.21 or later | +| Golang | 1.20 or later | | Rust | nightly-2022-07-31 or later | | GCC | 7.x | | LLVM | 13.0 or later | diff --git a/pd-control.md b/pd-control.md index 53eb4a9321d6d..20e391c4b9852 100644 --- a/pd-control.md +++ b/pd-control.md @@ -33,7 +33,7 @@ To obtain `pd-ctl` of the latest version, download the TiDB server installation ### Compile from source code -1. [Go](https://golang.org/) 1.21 or later is required because the Go modules are used. +1. [Go](https://golang.org/) 1.20 or later is required because the Go modules are used. 2. In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make` or `make pd-ctl` command to compile and generate `bin/pd-ctl`. ## Usage diff --git a/pd-recover.md b/pd-recover.md index 24b3bd3049d80..1910e6c46ba24 100644 --- a/pd-recover.md +++ b/pd-recover.md @@ -10,7 +10,7 @@ PD Recover is a disaster recovery tool of PD, used to recover the PD cluster whi ## Compile from source code -+ [Go](https://golang.org/) 1.21 or later is required because the Go modules are used. ++ [Go](https://golang.org/) 1.20 or later is required because the Go modules are used. + In the root directory of the [PD project](https://github.com/pingcap/pd), use the `make pd-recover` command to compile and generate `bin/pd-recover`. > **Note:** diff --git a/tidb-control.md b/tidb-control.md index 2c79e6ff347f4..921a27a761ded 100644 --- a/tidb-control.md +++ b/tidb-control.md @@ -26,7 +26,7 @@ After installing TiUP, you can use `tiup ctl:v tidb` command to ### Compile from source code -- Compilation environment requirement: [Go](https://golang.org/) 1.21 or later +- Compilation environment requirement: [Go](https://golang.org/) 1.20 or later - Compilation procedures: Go to the root directory of the [TiDB Control project](https://github.com/pingcap/tidb-ctl), use the `make` command to compile, and generate `tidb-ctl`. - Compilation documentation: you can find the help files in the `doc` directory; if the help files are lost or you want to update them, use the `make doc` command to generate the help files. From 9e88a6e364585f84a6176eb2dabb9c3a6e01b6f5 Mon Sep 17 00:00:00 2001 From: xixirangrang <35301108+hfxsd@users.noreply.github.com> Date: Fri, 19 Apr 2024 10:14:53 +0800 Subject: [PATCH 07/11] add encoding='utf-8 for windows --- scripts/release_notes_update_pr_author_info_add_dup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/release_notes_update_pr_author_info_add_dup.py b/scripts/release_notes_update_pr_author_info_add_dup.py index 92039bf89380d..22f7c0cf56a49 100644 --- a/scripts/release_notes_update_pr_author_info_add_dup.py +++ b/scripts/release_notes_update_pr_author_info_add_dup.py @@ -174,7 +174,7 @@ def create_release_file(version, dup_notes_levels, dup_notes): release_file = os.path.join(ext_path, f'release-{version}.md') shutil.copyfile(template_file, release_file) # Replace the file content - with open(release_file, 'r+') as file: + with open(release_file, 'r+', encoding='utf-8') as file: content = file.read() content = content.replace('x.y.z', version) version_parts = version.split('.') From e25b4afaf0a5c4f98a688d4aecd6409c53298579 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Mon, 23 Dec 2024 12:10:58 +0800 Subject: [PATCH 08/11] Merge remote-tracking branch 'upstream/master' From 0c77222e5336b8a430a08f74d17f5bfbd3d5d8fe Mon Sep 17 00:00:00 2001 From: houfaxin Date: Tue, 21 Jan 2025 18:09:45 +0800 Subject: [PATCH 09/11] Merge remote-tracking branch 'upstream/master' From e6edea0666f177a89b4a32842a16bbd2f4632d72 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Thu, 7 Aug 2025 11:46:59 +0800 Subject: [PATCH 10/11] Update tidb-performance-tuning-config.md --- tidb-performance-tuning-config.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tidb-performance-tuning-config.md b/tidb-performance-tuning-config.md index ed1c5fab3f63a..6bb43748c1ba8 100644 --- a/tidb-performance-tuning-config.md +++ b/tidb-performance-tuning-config.md @@ -272,8 +272,8 @@ The following table compares throughput (operations per second) between the base | Item | Baseline (OPS) | Optimized (OPS) | Improvement | | ---------| ---- | ----| ----| -| load data | 2858.5 | 5074.3 | +77.59% | -| workloada | 2243.0 | 12804.3 | +470.86% | +| Load data | 2858.5 | 5074.3 | +77.59% | +| Workload | 2243.0 | 12804.3 | +470.86% | #### Performance analysis @@ -461,7 +461,7 @@ You can control the execution mode of DML statements using the [`tidb_dml_type`] To use the bulk DML execution mode, set `tidb_dml_type` to `"bulk"`. This mode optimizes bulk data loading without conflicts and reduces memory usage during large write operations. Before using this mode, ensure that: -- Auto-commit is enabled. +- Enable [`autocommit`](/system-variables.md#autocommit). - The [`pessimistic-auto-commit`](/tidb-configuration-file.md#pessimistic-auto-commit-new-in-v600) configuration item is set to `false`. ```sql From 15bf44e8888023ee0385181bd3d94f61b7cedfd9 Mon Sep 17 00:00:00 2001 From: houfaxin Date: Fri, 24 Apr 2026 15:06:48 +0800 Subject: [PATCH 11/11] Weekly TiDB PR Doc Check --- .../workflows/tidb-pr-weekly-doc-check.yml | 102 ++++++ .../check_tidb_prs_and_create_docs_cn_pr.py | 309 ++++++++++++++++++ 2 files changed, 411 insertions(+) create mode 100644 .github/workflows/tidb-pr-weekly-doc-check.yml create mode 100644 scripts/check_tidb_prs_and_create_docs_cn_pr.py diff --git a/.github/workflows/tidb-pr-weekly-doc-check.yml b/.github/workflows/tidb-pr-weekly-doc-check.yml new file mode 100644 index 0000000000000..1a2db56ec3563 --- /dev/null +++ b/.github/workflows/tidb-pr-weekly-doc-check.yml @@ -0,0 +1,102 @@ +name: Weekly TiDB PR Doc Check (docs-cn) + +on: + schedule: + # 01:00 every Monday in Asia/Shanghai (UTC+8) => 17:00 every Sunday UTC + - cron: "0 17 * * 0" + workflow_dispatch: + +jobs: + weekly-check: + if: github.repository == 'pingcap/docs' + runs-on: ubuntu-latest + + permissions: + contents: read + + env: + SOURCE_REPO: pingcap/tidb + OUTPUT_DIR: tmp/tidb-doc-check + DOCS_CN_BASE_BRANCH: master + + steps: + - name: Checkout docs repo + uses: actions/checkout@v4 + + - name: Setup Python + uses: actions/setup-python@v5 + with: + python-version: "3.11" + + - name: Scan merged TiDB PRs in last weekly window + id: scan + env: + GITHUB_TOKEN: ${{ secrets.DOCS_CN_BOT_TOKEN || github.token }} + SOURCE_REPO: ${{ env.SOURCE_REPO }} + OUTPUT_DIR: ${{ env.OUTPUT_DIR }} + DOCS_CN_BASE_BRANCH: ${{ env.DOCS_CN_BASE_BRANCH }} + run: | + set -euo pipefail + python scripts/check_tidb_prs_and_create_docs_cn_pr.py + + - name: Skip when no docs updates are needed + if: steps.scan.outputs.needs_update != 'true' + run: | + echo "No doc-impact PRs found in this weekly window." + + - name: Checkout docs-cn repo + if: steps.scan.outputs.needs_update == 'true' + uses: actions/checkout@v4 + with: + repository: pingcap/docs-cn + token: ${{ secrets.DOCS_CN_BOT_TOKEN }} + ref: ${{ steps.scan.outputs.docs_cn_base_branch }} + path: docs-cn + persist-credentials: false + + - name: Copy weekly report into docs-cn + if: steps.scan.outputs.needs_update == 'true' + shell: bash + run: | + set -euo pipefail + mkdir -p docs-cn/weekly-doc-sync + cp "${{ steps.scan.outputs.report_path }}" "docs-cn/weekly-doc-sync/${{ steps.scan.outputs.report_filename }}" + + - name: Create docs-cn PR + if: steps.scan.outputs.needs_update == 'true' + uses: peter-evans/create-pull-request@v7 + with: + path: docs-cn + token: ${{ secrets.DOCS_CN_BOT_TOKEN }} + branch: ${{ steps.scan.outputs.branch_name }} + base: ${{ steps.scan.outputs.docs_cn_base_branch }} + commit-message: "docs: weekly TiDB PR doc-impact check (${{ steps.scan.outputs.window_start_date }} to ${{ steps.scan.outputs.window_end_date }})" + title: "docs: weekly TiDB PR doc-impact check (${{ steps.scan.outputs.window_start_date }} to ${{ steps.scan.outputs.window_end_date }})" + body: | + ### What is changed, added or deleted? (Required) + + Add a weekly report that checks merged `pingcap/tidb` code PRs from the previous week and identifies PRs that likely require docs updates. + + - Source repo: `${{ env.SOURCE_REPO }}` + - Time window (Asia/Shanghai): `${{ steps.scan.outputs.window_start_date }} 00:00` to `${{ steps.scan.outputs.window_end_date }} 00:00` + - Report file: `weekly-doc-sync/${{ steps.scan.outputs.report_filename }}` + + This report is heuristic-based and requires maintainer confirmation before making detailed docs edits. + + ### Which TiDB version(s) do your changes apply to? (Required) + + - [x] master + + ### What is the related PR or file link(s)? + + - TiDB merged PR search: https://github.com/pingcap/tidb/pulls?q=sort%3Aupdated-desc+is%3Apr+is%3Amerged + + ### Do your changes match any of the following descriptions? + + - [ ] Delete files + - [ ] Change aliases + - [ ] Need modification after applied to another branch + - [ ] Might cause conflicts after applied to another branch + add-paths: | + weekly-doc-sync/${{ steps.scan.outputs.report_filename }} + delete-branch: true diff --git a/scripts/check_tidb_prs_and_create_docs_cn_pr.py b/scripts/check_tidb_prs_and_create_docs_cn_pr.py new file mode 100644 index 0000000000000..27ea4704d8298 --- /dev/null +++ b/scripts/check_tidb_prs_and_create_docs_cn_pr.py @@ -0,0 +1,309 @@ +#!/usr/bin/env python3 +"""Weekly checker for merged TiDB PRs that might require docs updates. + +This script: +1. Collects merged PRs in pingcap/tidb during the previous Monday-to-Monday window + in Asia/Shanghai timezone. +2. Uses lightweight heuristics to decide whether a PR likely needs docs updates. +3. Writes a markdown report and json summary for downstream CI steps. +4. Exposes outputs for GitHub Actions via GITHUB_OUTPUT. +""" + +from __future__ import annotations + +import datetime as dt +import json +import os +import pathlib +import urllib.parse +import urllib.request +from typing import Dict, List, Tuple + + +SOURCE_REPO = os.environ.get("SOURCE_REPO", "pingcap/tidb") +OUTPUT_DIR = pathlib.Path(os.environ.get("OUTPUT_DIR", "tmp/tidb-doc-check")).resolve() +DOCS_CN_BASE_BRANCH = os.environ.get("DOCS_CN_BASE_BRANCH", "master") +TOKEN = os.environ.get("GITHUB_TOKEN", "").strip() + + +POSITIVE_LABELS = { + "type/compatibility", + "type/compatibility or feature change", + "type/feature", + "type/enhancement", + "release-note", +} + +NEGATIVE_LABELS = { + "type/ci", + "type/chore", + "type/refactor", + "type/test", + "type/build", +} + +POSITIVE_KEYWORDS = [ + "compatibility", + "deprecate", + "deprecated", + "new feature", + "sql", + "syntax", + "default value", + "system variable", + "configuration", + "config", + "api", + "planner", + "optimizer", + "ddl", +] + +WATCH_PATH_PREFIXES = [ + "pkg/sessionctx/variable/", + "pkg/config/", + "pkg/parser/", + "pkg/ddl/", + "pkg/planner/", + "pkg/executor/", + "br/", + "lightning/", + "dumpling/", +] + + +def gh_api_json(url: str) -> Dict: + headers = { + "Accept": "application/vnd.github+json", + "X-GitHub-Api-Version": "2022-11-28", + "User-Agent": "tidb-doc-weekly-checker", + } + if TOKEN: + headers["Authorization"] = f"Bearer {TOKEN}" + req = urllib.request.Request(url, headers=headers) + with urllib.request.urlopen(req, timeout=30) as resp: + return json.loads(resp.read().decode("utf-8")) + + +def list_search_results(query: str) -> List[Dict]: + all_items: List[Dict] = [] + page = 1 + while True: + params = urllib.parse.urlencode( + { + "q": query, + "sort": "updated", + "order": "desc", + "per_page": 100, + "page": page, + } + ) + data = gh_api_json(f"https://api.github.com/search/issues?{params}") + items = data.get("items", []) + if not items: + break + all_items.extend(items) + if len(items) < 100: + break + page += 1 + return all_items + + +def list_pr_files(repo: str, number: int) -> List[str]: + files: List[str] = [] + page = 1 + while True: + url = ( + f"https://api.github.com/repos/{repo}/pulls/{number}/files" + f"?per_page=100&page={page}" + ) + data = gh_api_json(url) + if not data: + break + files.extend(item.get("filename", "") for item in data) + if len(data) < 100: + break + page += 1 + return [f for f in files if f] + + +def weekly_window_shanghai(now_utc: dt.datetime) -> Tuple[dt.datetime, dt.datetime]: + utc8 = dt.timezone(dt.timedelta(hours=8)) + now_sh = now_utc.astimezone(utc8) + monday_this_week = (now_sh - dt.timedelta(days=now_sh.weekday())).date() + end_sh = dt.datetime.combine(monday_this_week, dt.time(0, 0), tzinfo=utc8) + start_sh = end_sh - dt.timedelta(days=7) + return start_sh, end_sh + + +def classify_pr(pr: Dict, pr_files: List[str]) -> Tuple[bool, List[str], int]: + score = 0 + reasons: List[str] = [] + + labels = {label.get("name", "").lower() for label in pr.get("labels", [])} + title = (pr.get("title") or "").lower() + body = (pr.get("body") or "").lower() + text = f"{title}\n{body}" + + hit_positive_labels = sorted(POSITIVE_LABELS.intersection(labels)) + hit_negative_labels = sorted(NEGATIVE_LABELS.intersection(labels)) + + if hit_positive_labels: + score += 2 + reasons.append(f"Hit labels: {', '.join(hit_positive_labels)}") + if hit_negative_labels and not hit_positive_labels: + score -= 1 + reasons.append(f"Only maintenance labels: {', '.join(hit_negative_labels)}") + + kw_hits = sorted({kw for kw in POSITIVE_KEYWORDS if kw in text}) + if kw_hits: + score += 1 + reasons.append(f"Keyword hints: {', '.join(kw_hits[:5])}") + + path_hits = sorted( + { + path + for path in pr_files + if any(path.startswith(prefix) for prefix in WATCH_PATH_PREFIXES) + } + ) + if path_hits: + score += 1 + reasons.append(f"Core behavior paths touched (count={len(path_hits)})") + + only_tests_or_tools = bool(pr_files) and all( + path.startswith("tests/") + or path.startswith("pkg/util/") + or path.startswith(".github/") + for path in pr_files + ) + if only_tests_or_tools and not hit_positive_labels: + score -= 1 + reasons.append("Files look test/tooling-only") + + needs_docs_update = score >= 2 + if not reasons: + reasons.append("No clear doc-impact signal found") + return needs_docs_update, reasons, score + + +def write_github_output(kv: Dict[str, str]) -> None: + output_path = os.environ.get("GITHUB_OUTPUT", "").strip() + if not output_path: + return + with open(output_path, "a", encoding="utf-8") as f: + for k, v in kv.items(): + f.write(f"{k}={v}\n") + + +def main() -> None: + if not TOKEN: + raise SystemExit("GITHUB_TOKEN is required.") + + now_utc = dt.datetime.now(dt.timezone.utc) + start_sh, end_sh = weekly_window_shanghai(now_utc) + start_date = start_sh.date().isoformat() + end_date = end_sh.date().isoformat() + + query = f"repo:{SOURCE_REPO} is:pr is:merged merged:{start_date}..{end_date}" + merged_prs = list_search_results(query) + + results: List[Dict] = [] + needs_update_prs: List[Dict] = [] + for item in merged_prs: + number = item["number"] + pr_detail = gh_api_json(f"https://api.github.com/repos/{SOURCE_REPO}/pulls/{number}") + pr_files = list_pr_files(SOURCE_REPO, number) + + needs_docs_update, reasons, score = classify_pr(pr_detail, pr_files) + row = { + "number": number, + "title": pr_detail.get("title", ""), + "url": pr_detail.get("html_url", ""), + "merged_at": pr_detail.get("merged_at", ""), + "labels": [x.get("name", "") for x in pr_detail.get("labels", [])], + "score": score, + "needs_docs_update": needs_docs_update, + "reasons": reasons, + } + results.append(row) + if needs_docs_update: + needs_update_prs.append(row) + + OUTPUT_DIR.mkdir(parents=True, exist_ok=True) + window_tag = f"{start_date}_to_{end_date}" + report_filename = f"tidb-weekly-doc-check-{window_tag}.md" + json_filename = f"tidb-weekly-doc-check-{window_tag}.json" + + report_path = OUTPUT_DIR / report_filename + json_path = OUTPUT_DIR / json_filename + + lines: List[str] = [] + lines.append("# TiDB weekly merged PR doc-impact check") + lines.append("") + lines.append(f"- Source repo: `{SOURCE_REPO}`") + lines.append(f"- Time window (Asia/Shanghai): `{start_date} 00:00` to `{end_date} 00:00`") + lines.append(f"- Total merged PRs found: `{len(results)}`") + lines.append(f"- PRs judged as docs-update-needed: `{len(needs_update_prs)}`") + lines.append("") + + if needs_update_prs: + lines.append("## PRs that likely need docs updates") + lines.append("") + for pr in needs_update_prs: + lines.append(f"### #{pr['number']} {pr['title']}") + lines.append(f"- PR: {pr['url']}") + lines.append(f"- Merged at: `{pr['merged_at']}`") + lines.append(f"- Labels: `{', '.join(pr['labels']) if pr['labels'] else 'none'}`") + lines.append(f"- Heuristic score: `{pr['score']}`") + lines.append(f"- Reasons: {'; '.join(pr['reasons'])}") + lines.append("") + lines.append("## Suggested next action") + lines.append("") + lines.append("- Confirm each candidate PR and update matching docs pages in `pingcap/docs-cn`.") + lines.append("- This report is heuristic-based and should be reviewed by a maintainer.") + lines.append("") + else: + lines.append("## Result") + lines.append("") + lines.append("No PR reached the docs-update threshold in this window.") + lines.append("") + + report_path.write_text("\n".join(lines), encoding="utf-8") + + json_payload = { + "source_repo": SOURCE_REPO, + "time_window": { + "timezone": "Asia/Shanghai", + "start": start_sh.isoformat(), + "end": end_sh.isoformat(), + }, + "total_merged_prs": len(results), + "docs_update_needed_count": len(needs_update_prs), + "pull_requests": results, + } + json_path.write_text(json.dumps(json_payload, ensure_ascii=False, indent=2) + "\n", encoding="utf-8") + + branch_tag = end_date.replace("-", "") + branch_name = f"weekly/tidb-doc-check-{branch_tag}" + + write_github_output( + { + "needs_update": "true" if needs_update_prs else "false", + "report_path": str(report_path), + "json_path": str(json_path), + "report_filename": report_filename, + "branch_name": branch_name, + "docs_cn_base_branch": DOCS_CN_BASE_BRANCH, + "window_start_date": start_date, + "window_end_date": end_date, + } + ) + + print(f"Report: {report_path}") + print(f"Summary JSON: {json_path}") + print(f"Needs update: {'yes' if needs_update_prs else 'no'}") + + +if __name__ == "__main__": + main()