Skip to content

Add __str__/__repr__ to all public classes and fix stale lock cleanup#62

Closed
rfrenchseti wants to merge 1 commit intomainfrom
fix/issue-21-repr-and-issue-56-stale-locks
Closed

Add __str__/__repr__ to all public classes and fix stale lock cleanup#62
rfrenchseti wants to merge 1 commit intomainfrom
fix/issue-21-repr-and-issue-56-stale-locks

Conversation

@rfrenchseti
Copy link
Copy Markdown
Collaborator

@rfrenchseti rfrenchseti commented Mar 24, 2026

Summary

  • Fix Add __str__ and __repr__ to all classes #21: Add __str__ and __repr__ methods to all public classes that were missing them:

    • FileCache: __str__ returns the cache directory path; __repr__ includes cache name and non-default configuration (delete_on_exit, time_sensitive, mp_safe, anonymous, lock_timeout, nthreads). Also adds cache_name property.
    • FileCacheSource (base class): __str__ returns the source prefix (e.g., gs://bucket); __repr__ includes class name, scheme, remote, and anonymous flag. All subclasses (FileCacheSourceFile, FileCacheSourceHTTP, FileCacheSourceGS, FileCacheSourceS3) inherit these.
    • FileCacheSourceFake: Overrides __repr__ to also include storage_dir when it differs from the default.
    • FCPath: Enhanced __repr__ to include non-default anonymous, lock_timeout, and nthreads parameters alongside the path.
  • Fix Programs can leave behind locks that never go away #56: Address stale lock files left behind by crashed processes:

    • Add FileCache.clean_up_stale_locks() method that walks the cache directory, identifies lock files not actively held by any process (by attempting to acquire them with zero timeout), and removes stale ones. Returns the count of removed lock files.
    • Improve the _retrieve_multi_locked wait loop to automatically detect stale locks from crashed processes. Instead of waiting until timeout when a lock file exists but no process holds it, the code now attempts to acquire the lock; if successful, it cleans up the stale lock and reports the download failure immediately.
    • Update README with documentation on stale lock cleanup.

Test plan

  • New tests for clean_up_stale_locks(): stale lock removal, active lock preservation, empty cache, multiple stale locks
  • New tests for FileCache.__str__ and __repr__ with various configurations
  • New tests for FileCache.cache_name property
  • New tests for FileCacheSource*.__str__ and __repr__ for all subclasses
  • New tests for FileCacheSourceFake.__repr__ with custom storage directory
  • Updated FCPath.__repr__ tests for enhanced detail
  • All 408 existing tests pass
  • flake8 clean
  • mypy clean

Made with Cursor

Summary by CodeRabbit

  • New Features

    • Added automatic detection and cleanup of stale lock files during multi-file retrieval operations, preserving active locks
    • Introduced a new method to manually clean up orphaned lock files left by crashed processes
    • Enhanced string representations for better debugging and inspection of cache objects
  • Documentation

    • Added documentation explaining how stale lock files are handled in shared cache scenarios with multiprocessor-safe locking enabled

Fix #21: Add __str__ and __repr__ methods to FileCache, FileCacheSource
(and all subclasses), and enhance FCPath's __repr__ to include non-default
parameters (anonymous, lock_timeout, nthreads). Also add cache_name property
to FileCache.

Fix #56: Add clean_up_stale_locks() method to FileCache that walks the
cache directory and removes orphaned lock files left by crashed processes.
Also improve the _retrieve_multi_locked wait loop to automatically detect
and clean up stale locks by attempting to acquire them, rather than waiting
until timeout.

Made-with: Cursor
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 24, 2026

Walkthrough

Added string representation methods (__str__ and __repr__) to FileCache, FileCacheSource, and FileCacheSourceFake classes. Implemented a public clean_up_stale_locks() method that identifies and removes orphaned lock files. Extended multi-file retrieval logic to detect and handle stale locks during download wait loops. Documentation and comprehensive test coverage included.

Changes

Cohort / File(s) Summary
Core Implementation
filecache/file_cache.py, filecache/file_cache_path.py, filecache/file_cache_source.py
Added __str__ and __repr__ methods to FileCache (with cache_name property), FileCacheSourceFake, and FileCacheSource. Implemented clean_up_stale_locks() method to walk cache directory and remove inaccessible lock files. Extended _retrieve_multi_locked logic with stale lock detection during wait loops.
Documentation
README.md
Added "Stale Lock Cleanup" section documenting the clean_up_stale_locks() method, its behavior, and integration with multi-file retrieval.
Test Coverage
tests/test_file_cache.py, tests/test_file_cache_path.py, tests/test_file_cache_source.py
Added tests for clean_up_stale_locks() behavior under multiple scenarios (single/multiple/no stale locks and active locks). Added validation tests for __str__ and __repr__ output across FileCache, FCPath, and FileCacheSource variants.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: adding str/repr methods to public classes and fixing stale lock cleanup, which matches the primary objectives of the PR.
Description check ✅ Passed The description comprehensively addresses both linked issues (#21 and #56), details all changes across multiple classes, includes a test plan, and states all existing tests pass with clean linting and type checking.
Linked Issues check ✅ Passed The PR fully addresses both linked issues: #21 by adding str/repr to FileCache, FileCacheSource, FileCacheSourceFake, and FCPath with appropriate details, and #56 by implementing clean_up_stale_locks() and integrating stale lock detection into _retrieve_multi_locked.
Out of Scope Changes check ✅ Passed All code changes are scoped to the two linked issues: string representations for debugging/logging and stale lock cleanup. Documentation updates in README are directly related to the stale lock cleanup feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

❌ Patch coverage is 84.00000% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.38%. Comparing base (3ecac5a) to head (ba2d4da).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
filecache/file_cache.py 81.13% 7 Missing and 3 partials ⚠️
filecache/file_cache_source.py 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #62      +/-   ##
==========================================
- Coverage   89.55%   89.38%   -0.18%     
==========================================
  Files           5        5              
  Lines        2413     2486      +73     
  Branches      507      523      +16     
==========================================
+ Hits         2161     2222      +61     
- Misses        149      157       +8     
- Partials      103      107       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
filecache/file_cache.py (1)

1723-1746: ⚠️ Potential issue | 🟠 Major

Keep exception_on_fail=True consistent when the other process drops the lock without a file.

The new stale-lock branch records url in files_not_exist, but the existing if not lock_path.is_file() branch still doesn't. That path can therefore return a FileNotFoundError object inside the result list instead of raising.

🩹 Proposed fix
                 if not lock_path.is_file():
                     func_ret[idx] = FileNotFoundError(
                         f'Another process failed to download {url}')
                     self._log_debug(f'  Download elsewhere failed: {url}')
+                    files_not_exist.append(url)
                     continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@filecache/file_cache.py` around lines 1723 - 1746, The branch that handles a
missing lock file (the "if not lock_path.is_file()" block) sets func_ret[idx] to
FileNotFoundError and logs the event but does not add the URL to
files_not_exist, causing inconsistent behavior vs the stale-lock branch; update
that branch to also append url to files_not_exist (same as the stale_lock
branch) so both paths record the URL for the later exception_on_fail handling
and preserve the existing func_ret and logging logic for lock_path, func_ret,
files_not_exist, and the logged message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@filecache/file_cache.py`:
- Around line 489-500: The cleanup code acquires a FileLock then unlinks the
lock file before releasing it (lock_path.unlink then lock.release), which can
fail on Windows and split locks on POSIX; change the order so the lock is
released prior to deleting the lock file: after successfully acquiring the lock
(lock.acquire()) call lock.release() first, then unlink the path
(lock_path.unlink(...)), and keep the removed increment and logging
(_log_info/_log_debug) intact; ensure this adjustment is applied in the
stale-lock cleanup block that uses filelock.FileLock, lock.acquire,
lock.release, and lock_path.unlink.
- Around line 451-465: The __repr__ implementation omits boolean
lifecycle/locking flags when they are False, making non-default states
indistinguishable; change FileCache.__repr__ to always include the boolean
fields (self._delete_on_exit, self._time_sensitive, self._is_mp_safe,
self._anonymous) in the parts list by appending their actual values (e.g.,
f'delete_on_exit={self._delete_on_exit!r}') instead of only appending when True,
while keeping the existing conditional logic for lock_timeout and nthreads if
desired.

In `@README.md`:
- Around line 207-211: The example uses FileCache.clean_up_stale_locks() but
omits the import and the mp_safe=True context; update the snippet to import
FileCache and construct it with mp_safe=True (i.e., use
FileCache('shared-cache', mp_safe=True)) before calling clean_up_stale_locks()
so the example is self-contained and matches the multiprocessing-safe
discussion.

In `@tests/test_file_cache.py`:
- Around line 1073-1116: Add a regression test that exercises the wait-loop
stale-lock recovery path in FileCache by simulating a waiter in
_retrieve_multi_locked: create a lock file at the _LOCK_PREFIX path for a target
returned by get_local_path, spawn or simulate a concurrent waiter that calls the
code path that would block in _retrieve_multi_locked (e.g., by invoking the
method that uses the wait loop or by mimicking the waiter behavior), then make
the lock file stale (modify its mtime or write invalid content) while the waiter
is waiting and assert the waiter exits/fails immediately (no sleep until
timeout) and that clean_up_stale_locks() removes the stale lock; reference
FileCache, _retrieve_multi_locked, clean_up_stale_locks, get_local_path and
_LOCK_PREFIX to locate the relevant code paths.

---

Outside diff comments:
In `@filecache/file_cache.py`:
- Around line 1723-1746: The branch that handles a missing lock file (the "if
not lock_path.is_file()" block) sets func_ret[idx] to FileNotFoundError and logs
the event but does not add the URL to files_not_exist, causing inconsistent
behavior vs the stale-lock branch; update that branch to also append url to
files_not_exist (same as the stale_lock branch) so both paths record the URL for
the later exception_on_fail handling and preserve the existing func_ret and
logging logic for lock_path, func_ret, files_not_exist, and the logged message.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1aa274b7-d372-4df7-acb5-57e66715a206

📥 Commits

Reviewing files that changed from the base of the PR and between 3ecac5a and ba2d4da.

📒 Files selected for processing (7)
  • README.md
  • filecache/file_cache.py
  • filecache/file_cache_path.py
  • filecache/file_cache_source.py
  • tests/test_file_cache.py
  • tests/test_file_cache_path.py
  • tests/test_file_cache_source.py

Comment on lines +451 to +465
def __repr__(self) -> str:
parts = [repr(self._cache_name)]
if self._delete_on_exit:
parts.append('delete_on_exit=True')
if self._time_sensitive:
parts.append('time_sensitive=True')
if self._is_mp_safe:
parts.append('mp_safe=True')
if self._anonymous:
parts.append('anonymous=True')
if self._lock_timeout != 60:
parts.append(f'lock_timeout={self._lock_timeout!r}')
if self._nthreads != 8:
parts.append(f'nthreads={self._nthreads!r}')
return f'FileCache({", ".join(parts)})'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

__repr__ drops the non-default lifecycle and locking states.

delete_on_exit and mp_safe are only rendered when they are True. That makes FileCache(None, delete_on_exit=False) and FileCache('shared', mp_safe=False) look like default instances even though their behavior is the opposite.

🛠️ Proposed fix
     def __repr__(self) -> str:
         parts = [repr(self._cache_name)]
-        if self._delete_on_exit:
-            parts.append('delete_on_exit=True')
+        default_delete_on_exit = (self._cache_name is None)
+        if self._delete_on_exit != default_delete_on_exit:
+            parts.append(f'delete_on_exit={self._delete_on_exit!r}')
         if self._time_sensitive:
             parts.append('time_sensitive=True')
-        if self._is_mp_safe:
-            parts.append('mp_safe=True')
+        default_mp_safe = (self._cache_name is not None)
+        if self._is_mp_safe != default_mp_safe:
+            parts.append(f'mp_safe={self._is_mp_safe!r}')
         if self._anonymous:
             parts.append('anonymous=True')
         if self._lock_timeout != 60:
             parts.append(f'lock_timeout={self._lock_timeout!r}')
         if self._nthreads != 8:
             parts.append(f'nthreads={self._nthreads!r}')
         return f'FileCache({", ".join(parts)})'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def __repr__(self) -> str:
parts = [repr(self._cache_name)]
if self._delete_on_exit:
parts.append('delete_on_exit=True')
if self._time_sensitive:
parts.append('time_sensitive=True')
if self._is_mp_safe:
parts.append('mp_safe=True')
if self._anonymous:
parts.append('anonymous=True')
if self._lock_timeout != 60:
parts.append(f'lock_timeout={self._lock_timeout!r}')
if self._nthreads != 8:
parts.append(f'nthreads={self._nthreads!r}')
return f'FileCache({", ".join(parts)})'
def __repr__(self) -> str:
parts = [repr(self._cache_name)]
default_delete_on_exit = (self._cache_name is None)
if self._delete_on_exit != default_delete_on_exit:
parts.append(f'delete_on_exit={self._delete_on_exit!r}')
if self._time_sensitive:
parts.append('time_sensitive=True')
default_mp_safe = (self._cache_name is not None)
if self._is_mp_safe != default_mp_safe:
parts.append(f'mp_safe={self._is_mp_safe!r}')
if self._anonymous:
parts.append('anonymous=True')
if self._lock_timeout != 60:
parts.append(f'lock_timeout={self._lock_timeout!r}')
if self._nthreads != 8:
parts.append(f'nthreads={self._nthreads!r}')
return f'FileCache({", ".join(parts)})'
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@filecache/file_cache.py` around lines 451 - 465, The __repr__ implementation
omits boolean lifecycle/locking flags when they are False, making non-default
states indistinguishable; change FileCache.__repr__ to always include the
boolean fields (self._delete_on_exit, self._time_sensitive, self._is_mp_safe,
self._anonymous) in the parts list by appending their actual values (e.g.,
f'delete_on_exit={self._delete_on_exit!r}') instead of only appending when True,
while keeping the existing conditional logic for lock_timeout and nthreads if
desired.

Comment on lines +489 to +500
lock = filelock.FileLock(lock_path, timeout=0)
try:
lock.acquire()
except filelock._error.Timeout:
self._log_debug(f'Lock file is active, skipping: {lock_path}')
continue
try:
lock_path.unlink(missing_ok=True)
removed += 1
self._log_info(f'Removed stale lock file: {lock_path}')
finally:
lock.release()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Release the stale lock before deleting its lock file.

This new cleanup path reverses the release/unlink order already used elsewhere in FileCache. That sequence is there for a reason: unlinking an acquired lock can fail on Windows and can split the lock on POSIX.

🔒 Proposed fix
-                try:
-                    lock_path.unlink(missing_ok=True)
-                    removed += 1
-                    self._log_info(f'Removed stale lock file: {lock_path}')
-                finally:
-                    lock.release()
+                lock.release()
+                lock_path.unlink(missing_ok=True)
+                removed += 1
+                self._log_info(f'Removed stale lock file: {lock_path}')
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
lock = filelock.FileLock(lock_path, timeout=0)
try:
lock.acquire()
except filelock._error.Timeout:
self._log_debug(f'Lock file is active, skipping: {lock_path}')
continue
try:
lock_path.unlink(missing_ok=True)
removed += 1
self._log_info(f'Removed stale lock file: {lock_path}')
finally:
lock.release()
lock = filelock.FileLock(lock_path, timeout=0)
try:
lock.acquire()
except filelock._error.Timeout:
self._log_debug(f'Lock file is active, skipping: {lock_path}')
continue
lock.release()
lock_path.unlink(missing_ok=True)
removed += 1
self._log_info(f'Removed stale lock file: {lock_path}')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@filecache/file_cache.py` around lines 489 - 500, The cleanup code acquires a
FileLock then unlinks the lock file before releasing it (lock_path.unlink then
lock.release), which can fail on Windows and split locks on POSIX; change the
order so the lock is released prior to deleting the lock file: after
successfully acquiring the lock (lock.acquire()) call lock.release() first, then
unlink the path (lock_path.unlink(...)), and keep the removed increment and
logging (_log_info/_log_debug) intact; ensure this adjustment is applied in the
stale-lock cleanup block that uses filelock.FileLock, lock.acquire,
lock.release, and lock_path.unlink.

Comment on lines +207 to +211
```python
fc = FileCache('shared-cache')
removed = fc.clean_up_stale_locks()
print(f'Cleaned up {removed} stale lock(s)')
```
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the stale-lock example self-contained and explicit about mp_safe.

The snippet currently omits the import and does not reflect the mp_safe=True context described above it.

📝 Proposed doc fix
 ```python
-fc = FileCache('shared-cache')
+from filecache import FileCache
+
+fc = FileCache('shared-cache', mp_safe=True)
 removed = fc.clean_up_stale_locks()
 print(f'Cleaned up {removed} stale lock(s)')
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @README.md around lines 207 - 211, The example uses
FileCache.clean_up_stale_locks() but omits the import and the mp_safe=True
context; update the snippet to import FileCache and construct it with
mp_safe=True (i.e., use FileCache('shared-cache', mp_safe=True)) before calling
clean_up_stale_locks() so the example is self-contained and matches the
multiprocessing-safe discussion.


</details>

<!-- fingerprinting:phantom:poseidon:hawk -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +1073 to +1116
def test_clean_up_stale_locks():
with FileCache(cache_name=None, mp_safe=True) as fc:
local_path = fc.get_local_path('gs://test/test.txt')
lock_path = local_path.parent / f'{fc._LOCK_PREFIX}{local_path.name}'
lock_path.parent.mkdir(parents=True, exist_ok=True)
lock_path.write_text('stale')
assert lock_path.is_file()
removed = fc.clean_up_stale_locks()
assert removed == 1
assert not lock_path.is_file()


def test_clean_up_stale_locks_active():
with FileCache(cache_name=None, mp_safe=True) as fc:
local_path = fc.get_local_path('gs://test/test.txt')
lock_path = local_path.parent / f'{fc._LOCK_PREFIX}{local_path.name}'
lock_path.parent.mkdir(parents=True, exist_ok=True)
lock = filelock.FileLock(lock_path, timeout=0)
lock.acquire()
try:
removed = fc.clean_up_stale_locks()
assert removed == 0
assert lock_path.is_file()
finally:
lock.release()
lock_path.unlink(missing_ok=True)


def test_clean_up_stale_locks_empty():
with FileCache(cache_name=None) as fc:
removed = fc.clean_up_stale_locks()
assert removed == 0


def test_clean_up_stale_locks_multiple():
with FileCache(cache_name=None, mp_safe=True) as fc:
for i in range(3):
local_path = fc.get_local_path(f'gs://test/test{i}.txt')
lock_path = local_path.parent / f'{fc._LOCK_PREFIX}{local_path.name}'
lock_path.parent.mkdir(parents=True, exist_ok=True)
lock_path.write_text('stale')
removed = fc.clean_up_stale_locks()
assert removed == 3

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a regression test for the wait-loop stale-lock recovery path.

These tests only call clean_up_stale_locks() directly. The other half of the change lives in _retrieve_multi_locked(), where a waiter should detect a lock that becomes stale and fail immediately instead of sleeping until timeout; that behavior is still uncovered.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_file_cache.py` around lines 1073 - 1116, Add a regression test
that exercises the wait-loop stale-lock recovery path in FileCache by simulating
a waiter in _retrieve_multi_locked: create a lock file at the _LOCK_PREFIX path
for a target returned by get_local_path, spawn or simulate a concurrent waiter
that calls the code path that would block in _retrieve_multi_locked (e.g., by
invoking the method that uses the wait loop or by mimicking the waiter
behavior), then make the lock file stale (modify its mtime or write invalid
content) while the waiter is waiting and assert the waiter exits/fails
immediately (no sleep until timeout) and that clean_up_stale_locks() removes the
stale lock; reference FileCache, _retrieve_multi_locked, clean_up_stale_locks,
get_local_path and _LOCK_PREFIX to locate the relevant code paths.

@rfrenchseti
Copy link
Copy Markdown
Collaborator Author

Done on a different branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Programs can leave behind locks that never go away Add __str__ and __repr__ to all classes

1 participant