feat: add typos spell checker CI workflow and fix typo (fixes #1266)#1319
feat: add typos spell checker CI workflow and fix typo (fixes #1266)#1319shivamtiwari3 wants to merge 2 commits intogoogle:mainfrom
Conversation
…oogle#1266) Root cause: the codebase had no automated spell checking, allowing typos to accumulate silently. PR google#1240 fixed many manually but there was no CI gate to prevent regressions. Fix: - Add .typos.toml with exclusions for generated/binary files and domain-specific terms (HashiCorp, LHarc, COBOL .cpy extension) that are intentional and would otherwise produce false positives. - Add .github/workflows/typos.yml that runs crate-ci/typos on every push and pull request with the action pinned to a specific SHA. - Fix typo found during triage: "indentify" -> "identify" in python/tests/test_magika_python_module.py.
There was a problem hiding this comment.
Code Review
This pull request introduces a CI workflow for spell checking using typos and fixes a typo in a test file. The configuration for the spell checker in .typos.toml is well-structured with clear exclusions and an allowlist for domain-specific terms. The typo fix is correct. I have one suggestion regarding the use of a print statement in the test file.
|
|
||
| for ws_num in ws_nums: | ||
| print(f"Calling indentify_bytes with {ws_num} whitespaces") | ||
| print(f"Calling identify_bytes with {ws_num} whitespaces") |
There was a problem hiding this comment.
While fixing the typo is good, using print() in automated tests is generally discouraged as it can clutter test output. If this output is useful for debugging, consider using the logging module, which integrates well with pytest's output capturing. If it was for temporary debugging, it should be removed.
print() in tests clutters captured output; no information is lost since pytest's test name already identifies the failing scenario.
|
Applied — removed the |
Summary
Fixes #1266.
Adds automated spell checking via
typos(crate-ci/typos) on every push and pull request, with a.typos.tomlconfiguration that excludes generated/binary files and allowlists domain-specific terms to keep the false-positive rate near zero.Root Cause
There was no CI gate for spelling errors, so typos accumulated undetected between manual passes (the last manual fix being PR #1240). Without automation, any contributor could inadvertently introduce new misspellings with no warning.
Solution
.typos.toml— Configures the checker:*.json,*.lock,tests_data/,assets/models/to avoid scanning generated and binary fileshve(ONNX internals),afterall(test variable),hashi(HashiCorp vendor name),harc(LHarc archive format),cpy/CPY(COBOL file extensions),licens(stem of "LICENSEs", the plural of LICENSE in comments).github/workflows/typos.yml— Runs on push tomainand all PRs, usingcrate-ci/typospinned to its SHA for supply-chain security (same pattern as the existingactions/checkoutpin).Bonus fix: Corrected a real typo found during triage —
indentify→identifyinpython/tests/test_magika_python_module.py:277.typos --config .typos.toml --format briefnow exits 0 on the full repository.Testing
typos --config .typos.toml --format brieflocally — zero errorsuv run pytest tests -m "not slow"ruff checkandruff formatpass cleanChecklist
.typos.toml(not inline ignores)