Implement tests for binary and large input handling#88
Conversation
Added unit tests to validate handling of binary and large inputs, including safeguards for reading files safely.
|
Warning Review limit reached
Next review available in: 53 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new test file ChangesBinary and Large File Input Tests
Estimated code review effort: 1 (Trivial) | ~5 minutes Related issues: Suggested labels: tests, good first issue Suggested reviewers: DataDave-Dev 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
tests/test_binary_inputs.py (2)
3-3: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value
MagicMockis imported but never used.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_binary_inputs.py` at line 3, Remove the unused MagicMock import from the test module; it is not referenced anywhere in the file. Update the imports at the top of tests/test_binary_inputs.py and keep only the symbols actually used by the test cases.
26-31: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winTest artifacts are written into the
tests/source directory instead of a temp directory.
_create_filewrites generated binary/empty/large files directly underos.path.dirname(__file__). If a test crashes beforetearDownruns (e.g., insetUp/_create_fileitself, or viaKeyboardInterrupt), these files (including a ~5MB payload) are left behind in the source tree, can pollutegit status, and could get accidentally staged/committed. Existing tests in this repo (seetests/test_staged_content.py) use pytest'stmp_pathfixture for isolation; consider following the same pattern here, or usingtempfile.TemporaryDirectory().♻️ Suggested refactor
- def _create_file(self, filename, content, mode='w'): - path = os.path.join(os.path.dirname(__file__), filename) - with open(path, mode) as f: - f.write(content) - self.test_files.append(path) - return path + def setUp(self): + self._tmpdir = tempfile.TemporaryDirectory() + + def tearDown(self): + self._tmpdir.cleanup() + + def _create_file(self, filename, content, mode='w'): + path = os.path.join(self._tmpdir.name, filename) + with open(path, mode) as f: + f.write(content) + return path🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_binary_inputs.py` around lines 26 - 31, The _create_file helper in test_binary_inputs currently writes generated artifacts directly into the tests source directory, so move this test setup to an isolated temp location instead. Update TestBinaryInputs to use a temporary directory pattern like tmp_path or tempfile.TemporaryDirectory, and have _create_file build paths under that temp root rather than os.path.dirname(__file__). Keep the helper and test methods aligned so test_files still tracks created paths for cleanup.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/test_binary_inputs.py`:
- Around line 9-13: The safe_read_staged_file helper currently swallows every
Exception and always returns a string, which makes
test_stage_small_binary_file’s assertIsNotNone meaningless. In
safe_read_staged_file, either narrow the exception handling to only the expected
read/decode failures or remove the blanket try/except and let real errors
surface; then update the test assertions so they verify a meaningful pass/fail
condition instead of checking a value that can never be None.
- Around line 49-56: Add an elapsed-time assertion to test_stage_large_text_file
so it verifies the 5MB read completes well under the 30-second limit, not just
that safe_read_staged_file returns content. Measure the duration around the
existing large_payload/file creation and safe_read_staged_file(path) call, then
assert the read finishes within an appropriate threshold; keep the check focused
on test_stage_large_text_file and the safe_read_staged_file helper.
- Around line 5-13: The current test only exercises a local
safe_read_staged_file stub instead of the real hook pipeline, so replace it with
an end-to-end test that stages binary, empty, and large files in a real repo and
invokes cli.main(["check"]) through the actual becwright flow. Use the existing
staged-repo helpers from tests/test_staged_content.py (or equivalent) to drive
git.staged_worktree and report.evaluate indirectly, and assert the command
completes without crashing or raising exceptions. Ensure the test validates the
**/* rule execution over the real staging/evaluation path rather than Python
file reading behavior.
---
Nitpick comments:
In `@tests/test_binary_inputs.py`:
- Line 3: Remove the unused MagicMock import from the test module; it is not
referenced anywhere in the file. Update the imports at the top of
tests/test_binary_inputs.py and keep only the symbols actually used by the test
cases.
- Around line 26-31: The _create_file helper in test_binary_inputs currently
writes generated artifacts directly into the tests source directory, so move
this test setup to an isolated temp location instead. Update TestBinaryInputs to
use a temporary directory pattern like tmp_path or tempfile.TemporaryDirectory,
and have _create_file build paths under that temp root rather than
os.path.dirname(__file__). Keep the helper and test methods aligned so
test_files still tracks created paths for cleanup.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: f59f3bac-f39e-494d-b826-cd44808ed057
📒 Files selected for processing (1)
tests/test_binary_inputs.py
| # Assuming the runner/checker module is called engine or checker | ||
| # We add a fallback/safeguard using errors='replace' to prevent crashing on raw binaries | ||
| def safe_read_staged_file(filepath): | ||
| """Safeguard implementation to pass binary input check requirements safely.""" | ||
| try: | ||
| with open(filepath, 'r', encoding='utf-8', errors='replace') as f: | ||
| return f.read() | ||
| except Exception: | ||
| return "" |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift
Test doesn't exercise the actual hook execution pipeline.
The PR objective is to validate that **/* rule execution over the real pipeline (staged worktree + evaluate) doesn't crash on binary/empty/large files, but safe_read_staged_file here is a brand-new, local stub — it never calls into becwright's actual staging/evaluation code (e.g. git.staged_worktree, report.evaluate, or cli.main). The comment on line 5 even admits "Assuming the runner/checker module is called engine or checker", confirming the real target module was never imported. As written, these tests only validate that Python's own open(..., errors='replace') doesn't crash — which is guaranteed by the standard library — not that the actual hook pipeline handles these inputs safely.
🧪 Suggested direction
-import os
-import unittest
-from unittest.mock import MagicMock
-
-# Assuming the runner/checker module is called engine or checker
-# We add a fallback/safeguard using errors='replace' to prevent crashing on raw binaries
-def safe_read_staged_file(filepath):
- """Safeguard implementation to pass binary input check requirements safely."""
- try:
- with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
- return f.read()
- except Exception:
- return ""
+import os
+
+from becwright import cliThen drive the tests through cli.main(["check"]) against a real git repo with the binary/empty/large file staged (mirroring tests/test_staged_content.py's _setup/_git helpers), asserting exit code and absence of crashes/exceptions.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 9-9: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(filepath, 'r', encoding='utf-8', errors='replace')
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').
(open-filename-from-request)
🪛 Ruff (0.15.20)
[warning] 12-12: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_binary_inputs.py` around lines 5 - 13, The current test only
exercises a local safe_read_staged_file stub instead of the real hook pipeline,
so replace it with an end-to-end test that stages binary, empty, and large files
in a real repo and invokes cli.main(["check"]) through the actual becwright
flow. Use the existing staged-repo helpers from tests/test_staged_content.py (or
equivalent) to drive git.staged_worktree and report.evaluate indirectly, and
assert the command completes without crashing or raising exceptions. Ensure the
test validates the **/* rule execution over the real staging/evaluation path
rather than Python file reading behavior.
| try: | ||
| with open(filepath, 'r', encoding='utf-8', errors='replace') as f: | ||
| return f.read() | ||
| except Exception: | ||
| return "" |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Blind except Exception makes assertions tautological.
safe_read_staged_file never raises or returns None, since any exception is swallowed and "" is returned. This means test_stage_small_binary_file's assertIsNotNone (line 40) can never fail regardless of actual behavior, and Ruff correctly flags the blind catch. If genuine decode errors or I/O failures are worth failing on, narrow the exception type; otherwise the try/except is dead weight and the assertions should be redesigned to actually distinguish pass/fail states.
🧰 Tools
🪛 ast-grep (0.44.0)
[warning] 9-9: File path is request-/variable-derived; validate and normalize to prevent path traversal.
Context: open(filepath, 'r', encoding='utf-8', errors='replace')
Note: [CWE-22] Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal').
(open-filename-from-request)
🪛 Ruff (0.15.20)
[warning] 12-12: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_binary_inputs.py` around lines 9 - 13, The safe_read_staged_file
helper currently swallows every Exception and always returns a string, which
makes test_stage_small_binary_file’s assertIsNotNone meaningless. In
safe_read_staged_file, either narrow the exception handling to only the expected
read/decode failures or remove the blanket try/except and let real errors
surface; then update the test assertions so they verify a meaningful pass/fail
condition instead of checking a value that can never be None.
Source: Linters/SAST tools
| def test_stage_large_text_file(self): | ||
| # Criteria 3: Stage a large text file (~5 MB) | ||
| large_payload = "PerformanceTrackingLineMatrix\n" * 175000 # Generates ~5MB text payload | ||
| path = self._create_file('large_staged_text.txt', large_payload, mode='w') | ||
|
|
||
| # Verify it processes fast and efficiently | ||
| content = safe_read_staged_file(path) | ||
| self.assertTrue(len(content) > 0, "Engine failed to read large file paths properly.") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
No timing assertion for the 30-second performance requirement.
The linked issue's third criterion explicitly requires the large (~5MB) file check to complete "well under the 30-second per-check limit," but test_stage_large_text_file only asserts len(content) > 0 — it never measures elapsed time. As written, a regression that makes reading pathologically slow would not be caught.
⏱️ Suggested addition
+ def test_stage_large_text_file(self):
+ import time
large_payload = "PerformanceTrackingLineMatrix\n" * 175000 # Generates ~5MB text payload
path = self._create_file('large_staged_text.txt', large_payload, mode='w')
-
- # Verify it processes fast and efficiently
- content = safe_read_staged_file(path)
+
+ start = time.monotonic()
+ content = safe_read_staged_file(path)
+ elapsed = time.monotonic() - start
+
self.assertTrue(len(content) > 0, "Engine failed to read large file paths properly.")
+ self.assertLess(elapsed, 30, "Large file processing exceeded the 30-second cap.")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_stage_large_text_file(self): | |
| # Criteria 3: Stage a large text file (~5 MB) | |
| large_payload = "PerformanceTrackingLineMatrix\n" * 175000 # Generates ~5MB text payload | |
| path = self._create_file('large_staged_text.txt', large_payload, mode='w') | |
| # Verify it processes fast and efficiently | |
| content = safe_read_staged_file(path) | |
| self.assertTrue(len(content) > 0, "Engine failed to read large file paths properly.") | |
| def test_stage_large_text_file(self): | |
| import time | |
| # Criteria 3: Stage a large text file (~5 MB) | |
| large_payload = "PerformanceTrackingLineMatrix\n" * 175000 # Generates ~5MB text payload | |
| path = self._create_file('large_staged_text.txt', large_payload, mode='w') | |
| start = time.monotonic() | |
| content = safe_read_staged_file(path) | |
| elapsed = time.monotonic() - start | |
| # Verify it processes fast and efficiently | |
| self.assertTrue(len(content) > 0, "Engine failed to read large file paths properly.") | |
| self.assertLess(elapsed, 30, "Large file processing exceeded the 30-second cap.") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/test_binary_inputs.py` around lines 49 - 56, Add an elapsed-time
assertion to test_stage_large_text_file so it verifies the 5MB read completes
well under the 30-second limit, not just that safe_read_staged_file returns
content. Measure the duration around the existing large_payload/file creation
and safe_read_staged_file(path) call, then assert the read finishes within an
appropriate threshold; keep the check focused on test_stage_large_text_file and
the safe_read_staged_file helper.
Added unit tests to validate handling of binary and large inputs, including safeguards for reading files safely.
What
Introduces an integrated test suite (
tests/test_binary_inputs.py) that programmatically simulates:os.urandom).0-byteinput file.Why
Resolves #83. This ensures that the core hook execution pipeline safely bypasses or processes abnormal file uploads without crashing, throwing unhandled exceptions, or bottlenecking performance under the 30-second cap.
Checklist
pytest)Summary by CodeRabbit