Skip to content

feat: add typos spell checker CI workflow and fix typo (fixes #1266)#1319

Open
shivamtiwari3 wants to merge 2 commits intogoogle:mainfrom
shivamtiwari3:feat/add-typos-spell-checker-workflow
Open

feat: add typos spell checker CI workflow and fix typo (fixes #1266)#1319
shivamtiwari3 wants to merge 2 commits intogoogle:mainfrom
shivamtiwari3:feat/add-typos-spell-checker-workflow

Conversation

@shivamtiwari3
Copy link

Summary

Fixes #1266.

Adds automated spell checking via typos (crate-ci/typos) on every push and pull request, with a .typos.toml configuration that excludes generated/binary files and allowlists domain-specific terms to keep the false-positive rate near zero.


Root Cause

There was no CI gate for spelling errors, so typos accumulated undetected between manual passes (the last manual fix being PR #1240). Without automation, any contributor could inadvertently introduce new misspellings with no warning.


Solution

.typos.toml — Configures the checker:

  • Excludes *.json, *.lock, tests_data/, assets/models/ to avoid scanning generated and binary files
  • Allowlists six domain-specific terms found during initial triage: hve (ONNX internals), afterall (test variable), hashi (HashiCorp vendor name), harc (LHarc archive format), cpy/CPY (COBOL file extensions), licens (stem of "LICENSEs", the plural of LICENSE in comments)

.github/workflows/typos.yml — Runs on push to main and all PRs, using crate-ci/typos pinned to its SHA for supply-chain security (same pattern as the existing actions/checkout pin).

Bonus fix: Corrected a real typo found during triage — indentifyidentify in python/tests/test_magika_python_module.py:277.

typos --config .typos.toml --format brief now exits 0 on the full repository.


Testing

  • Ran typos --config .typos.toml --format brief locally — zero errors
  • All 33 existing Python tests pass: uv run pytest tests -m "not slow"
  • ruff check and ruff format pass clean

Checklist

  • Workflow triggers on push and pull_request
  • Action pinned to a commit SHA (supply-chain hygiene)
  • False positives excluded via .typos.toml (not inline ignores)
  • One real typo fixed as part of triage
  • All existing tests pass
  • Read CONTRIBUTING.md

…oogle#1266)

Root cause: the codebase had no automated spell checking, allowing typos
to accumulate silently. PR google#1240 fixed many manually but there was no CI
gate to prevent regressions.

Fix:
- Add .typos.toml with exclusions for generated/binary files and
  domain-specific terms (HashiCorp, LHarc, COBOL .cpy extension) that
  are intentional and would otherwise produce false positives.
- Add .github/workflows/typos.yml that runs crate-ci/typos on every
  push and pull request with the action pinned to a specific SHA.
- Fix typo found during triage: "indentify" -> "identify" in
  python/tests/test_magika_python_module.py.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a CI workflow for spell checking using typos and fixes a typo in a test file. The configuration for the spell checker in .typos.toml is well-structured with clear exclusions and an allowlist for domain-specific terms. The typo fix is correct. I have one suggestion regarding the use of a print statement in the test file.


for ws_num in ws_nums:
print(f"Calling indentify_bytes with {ws_num} whitespaces")
print(f"Calling identify_bytes with {ws_num} whitespaces")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While fixing the typo is good, using print() in automated tests is generally discouraged as it can clutter test output. If this output is useful for debugging, consider using the logging module, which integrates well with pytest's output capturing. If it was for temporary debugging, it should be removed.

print() in tests clutters captured output; no information is lost since
pytest's test name already identifies the failing scenario.
@shivamtiwari3
Copy link
Author

Applied — removed the print() call from the test loop. The test name and pytest's parametrized output already provide enough context if the assertion fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add spell checker to github workflows

1 participant