Skip to content

fix: improve chunk joining process with error handling#158

Merged
lohanidamodar merged 3 commits intomainfrom
fix-chunks-join
Feb 23, 2026
Merged

fix: improve chunk joining process with error handling#158
lohanidamodar merged 3 commits intomainfrom
fix-chunks-join

Conversation

@lohanidamodar
Copy link
Contributor

@lohanidamodar lohanidamodar commented Feb 23, 2026

  • Use stream copy to improve join chunks

Summary by CodeRabbit

  • Refactor
    • Chunked upload assembly now streams data, with stronger error handling and reliable cleanup of temporary files to reduce failures and resource leaks.
  • Tests
    • New tests cover assembly correctness, cleanup on success, recovery when parts are missing, and handling of stale assembly artifacts.

@coderabbitai
Copy link

coderabbitai bot commented Feb 23, 2026

Walkthrough

Refactors the private joinChunks method in src/Storage/Device/Local.php to return void and assemble chunked uploads via streaming: creates temporary assembly paths, opens a destination stream, copies each chunk part with stream_copy_to_stream, handles open/copy/finalize errors with cleanup and exceptions, unlinks processed parts, renames the assembled file into place, and attempts to remove temporary logs/directories with non-fatal warnings on failure. Adds tests in tests/Storage/Device/LocalTest.php including a makeJoinTestStorage helper and four tests covering successful assembly, cleanup, missing-part error behavior, and overwriting stale assembly files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: refactoring the chunk joining process with improved error handling and streaming implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-chunks-join

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/Storage/Device/Local.php (2)

184-184: Declare : void return type on joinChunks.

♻️ Proposed fix
-    private function joinChunks(string $path, int $chunks)
+    private function joinChunks(string $path, int $chunks): void
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Storage/Device/Local.php` at line 184, The method signature for
joinChunks currently lacks an explicit return type; update the function
declaration for joinChunks(string $path, int $chunks) to include the void return
type (joinChunks(string $path, int $chunks): void), and ensure the method body
contains no return statements that return a value (only bare returns or none),
updating any logic that expects a value accordingly; this makes the intent
explicit and satisfies static analysis/type checks.

212-213: \dirname($tmp) evaluated twice — extract to a variable.

♻️ Proposed refactor
+        $tmpDir = \dirname($tmp);
-        if (! \rmdir(\dirname($tmp))) {
-            \trigger_error('Failed to remove temporary chunk directory '.dirname($tmp).' (chunk log: '.$tmp.')', E_USER_WARNING);
+        if (! \rmdir($tmpDir)) {
+            \trigger_error('Failed to remove temporary chunk directory '.$tmpDir.' (chunk log: '.$tmp.')', E_USER_WARNING);
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Storage/Device/Local.php` around lines 212 - 213, The code calls
\dirname($tmp) twice when attempting to remove the temporary chunk directory;
extract the directory into a local variable (e.g., $tmpDir) before the if check
and use that variable both in the rmdir call and in the \trigger_error message
so \dirname($tmp) is evaluated only once (update the block around the rmdir and
trigger_error in Local.php where \dirname($tmp) currently appears).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Storage/Device/Local.php`:
- Around line 201-208: The code currently unlinks each part file inside the loop
(using unlink) right after a successful stream_copy_to_stream, making the
operation non-retryable on later failures; change this so parts are only removed
after the entire assembly completed successfully: remove the immediate
unlink($part) from the loop in the method handling chunk assembly (the
stream_copy_to_stream / fopen / fclose block), instead push the part paths into
a local array (e.g., $partsToUnlink) and after the loop completes without errors
and after joinChunks/finish logic succeeds, iterate that array and unlink each
file; ensure failures during the loop throw/return before any unlink happens (or
are caught and rethrown) so partial uploads keep their .part files for retries.
- Around line 188-205: The current assembly opens the final destination via
fopen($path, 'wb') and truncates it immediately, leaving a partial/zero-byte
file on any error during chunk reads (opening $part) or stream_copy_to_stream
failures; change the logic to write to a temporary file in the same directory
(e.g. use dirname($tmp) plus a unique temp name), open that temp with fopen(...
'wb'), stream_copy_to_stream chunks into the temp, close handles, then
atomically rename(temp, $path) on success; if any chunk open or copy fails,
ensure you fclose any open handles and unlink the temp file (or as a simpler
alternative, call unlink($path) in both existing error paths if you keep the
current approach). Use the existing symbols ($path, $tmp, $part, fopen,
stream_copy_to_stream, unlink, rename) to locate and update the code.

---

Nitpick comments:
In `@src/Storage/Device/Local.php`:
- Line 184: The method signature for joinChunks currently lacks an explicit
return type; update the function declaration for joinChunks(string $path, int
$chunks) to include the void return type (joinChunks(string $path, int $chunks):
void), and ensure the method body contains no return statements that return a
value (only bare returns or none), updating any logic that expects a value
accordingly; this makes the intent explicit and satisfies static analysis/type
checks.
- Around line 212-213: The code calls \dirname($tmp) twice when attempting to
remove the temporary chunk directory; extract the directory into a local
variable (e.g., $tmpDir) before the if check and use that variable both in the
rmdir call and in the \trigger_error message so \dirname($tmp) is evaluated only
once (update the block around the rmdir and trigger_error in Local.php where
\dirname($tmp) currently appears).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/Storage/Device/LocalTest.php (2)

478-483: Redundant per-part assertions once $tmpDir absence is already confirmed.

Since line 476 already asserts \is_dir($tmpDir) is false, any file path constructed from $tmpDir . DIRECTORY_SEPARATOR . ... is guaranteed not to exist. The loop adds no independent signal — it cannot fail unless line 476 also fails.

♻️ Proposed simplification
 $this->assertFalse(\is_dir($tmpDir), 'Temp chunk directory should be removed after assembly');
 $this->assertFalse(\file_exists($tmpAssemble), 'Temp assembly file should be removed after rename');
-for ($i = 1; $i <= 3; $i++) {
-    $this->assertFalse(
-        \file_exists($tmpDir.DIRECTORY_SEPARATOR.'test.part.'.$i),
-        "Part file $i should be removed after assembly"
-    );
-}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Storage/Device/LocalTest.php` around lines 478 - 483, The loop
asserting non-existence of "$tmpDir.DIRECTORY_SEPARATOR.'test.part.'.$i" is
redundant because the test already asserts \is_dir($tmpDir) is false; remove the
for-loop (the three \file_exists assertions) in
tests/Storage/Device/LocalTest.php so the single \is_dir($tmpDir) check
suffices, or if you want explicit per-file checks keep them and remove the prior
\is_dir($tmpDir) assertion—update the test to use only one of these mutually
redundant checks referencing $tmpDir and the 'test.part.$i' filenames.

442-448: Temp directory leaks when any assertion fails before the final cleanup call.

All four new tests call $storage->delete($storage->getRoot(), true) as their very last line. A failed assertion throws AssertionFailedError (which extends Error, not Exception), so it bypasses the cleanup line. Since tearDown() is empty, temp directories under sys_get_temp_dir() accumulate on any test failure.

Wrap the assertions and cleanup in a try/finally block in each test:

♻️ Example fix for testJoinChunksAssemblesContentCorrectly (apply the same pattern to all four tests)
 public function testJoinChunksAssemblesContentCorrectly(): void
 {
     $storage = $this->makeJoinTestStorage();
     $dest = $storage->getRoot().DIRECTORY_SEPARATOR.'test.dat';
 
-    $storage->uploadData('AAAA', $dest, 'application/octet-stream', 1, 3);
-    $storage->uploadData('BBBB', $dest, 'application/octet-stream', 2, 3);
-    $storage->uploadData('CCCC', $dest, 'application/octet-stream', 3, 3);
-
-    $this->assertTrue(\file_exists($dest));
-    $this->assertSame('AAAABBBBCCCC', \file_get_contents($dest));
-
-    $storage->delete($storage->getRoot(), true);
+    try {
+        $storage->uploadData('AAAA', $dest, 'application/octet-stream', 1, 3);
+        $storage->uploadData('BBBB', $dest, 'application/octet-stream', 2, 3);
+        $storage->uploadData('CCCC', $dest, 'application/octet-stream', 3, 3);
+
+        $this->assertTrue(\file_exists($dest));
+        $this->assertSame('AAAABBBBCCCC', \file_get_contents($dest));
+    } finally {
+        $storage->delete($storage->getRoot(), true);
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Storage/Device/LocalTest.php` around lines 442 - 448, The temp dirs
created by makeJoinTestStorage() leak if a test assertion fails because cleanup
is done after assertions; wrap each test that uses makeJoinTestStorage() (e.g.,
testJoinChunksAssemblesContentCorrectly and the three other new tests) so that
assertions run inside a try block and the cleanup call
$storage->delete($storage->getRoot(), true) is moved into a finally block;
alternatively, implement tearDown() to detect and remove the temp directory
returned by makeJoinTestStorage(), but do not rely on post-assertion cleanup
outside a finally/tearDown to avoid leaks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/Storage/Device/LocalTest.php`:
- Around line 478-483: The loop asserting non-existence of
"$tmpDir.DIRECTORY_SEPARATOR.'test.part.'.$i" is redundant because the test
already asserts \is_dir($tmpDir) is false; remove the for-loop (the three
\file_exists assertions) in tests/Storage/Device/LocalTest.php so the single
\is_dir($tmpDir) check suffices, or if you want explicit per-file checks keep
them and remove the prior \is_dir($tmpDir) assertion—update the test to use only
one of these mutually redundant checks referencing $tmpDir and the
'test.part.$i' filenames.
- Around line 442-448: The temp dirs created by makeJoinTestStorage() leak if a
test assertion fails because cleanup is done after assertions; wrap each test
that uses makeJoinTestStorage() (e.g., testJoinChunksAssemblesContentCorrectly
and the three other new tests) so that assertions run inside a try block and the
cleanup call $storage->delete($storage->getRoot(), true) is moved into a finally
block; alternatively, implement tearDown() to detect and remove the temp
directory returned by makeJoinTestStorage(), but do not rely on post-assertion
cleanup outside a finally/tearDown to avoid leaks.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/Storage/Device/Local.php (1)

217-220: unlink($tmpAssemble) return value silently ignored on rename failure.

Every other cleanup operation in this method emits a trigger_error(E_USER_WARNING) on failure (lines 223–224, 228–229, 232–233). The unlink at line 218 is the only exception — a failed unlink here leaves a stale tmp_assemble_* file with no diagnostic signal, inconsistent with the rest of the cleanup pattern.

♻️ Proposed fix for consistency
     if (! \rename($tmpAssemble, $path)) {
-        \unlink($tmpAssemble);
+        if (! \unlink($tmpAssemble)) {
+            \trigger_error('Failed to remove temporary assembly file '.$tmpAssemble, E_USER_WARNING);
+        }
         throw new Exception('Failed to finalize assembled file '.$path);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Storage/Device/Local.php` around lines 217 - 220, The
unlink($tmpAssemble) call inside the rename failure branch currently ignores its
return value; update this to mirror the other cleanup calls by checking the
return and emitting a trigger_error(E_USER_WARNING) on failure (same pattern
used for other cleanup lines), so when the rename fails the subsequent unlink
failure is logged; locate the block handling rename($tmpAssemble, $path) and the
surrounding cleanup logic that references $tmpAssemble and modify that unlink to
test its boolean result and call trigger_error with a descriptive message on
failure before throwing the Exception.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Storage/Device/Local.php`:
- Around line 186-188: The abort() routine currently only removes files under
tmpDir (the dirname($path)/'tmp_'.basename($path)) and therefore leaves orphaned
partial assembly files created as $tmpAssemble
(dirname($path).'/tmp_assemble_'.basename($path)); update abort() to also check
for and unlink the $tmpAssemble path (and any similarly named tmp_assemble_* for
the same basename) before returning so orphaned files created by
joinChunks/opening $tmpAssemble are cleaned up on abort. Ensure you reference
the same basename logic used to construct $tmpAssemble and handle
missing-file/no-permission errors silently as the existing abort() does.

---

Nitpick comments:
In `@src/Storage/Device/Local.php`:
- Around line 217-220: The unlink($tmpAssemble) call inside the rename failure
branch currently ignores its return value; update this to mirror the other
cleanup calls by checking the return and emitting a
trigger_error(E_USER_WARNING) on failure (same pattern used for other cleanup
lines), so when the rename fails the subsequent unlink failure is logged; locate
the block handling rename($tmpAssemble, $path) and the surrounding cleanup logic
that references $tmpAssemble and modify that unlink to test its boolean result
and call trigger_error with a descriptive message on failure before throwing the
Exception.

Comment on lines +186 to +188
$tmpDir = \dirname($path).DIRECTORY_SEPARATOR.'tmp_'.\basename($path);
$tmp = $tmpDir.DIRECTORY_SEPARATOR.\basename($path).'_chunks.log';
$tmpAssemble = \dirname($path).DIRECTORY_SEPARATOR.'tmp_assemble_'.\basename($path);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

abort() does not clean up orphaned tmp_assemble_* files.

$tmpAssemble lives in dirname($path), outside tmpDir. If the process is killed after fopen($tmpAssemble, 'wb') opens the file but before any error-path unlink($tmpAssemble) can fire, the partial assembly file is orphaned. abort() (lines 277–295) iterates only within dirname($path)/tmp_$basename/ and never touches dirname($path) directly, so calling abort() after such an interruption leaves tmp_assemble_$basename on disk indefinitely — the next joinChunks self-heal only fires if the same filename is uploaded again.

Consider deleting tmp_assemble_$basename inside abort() as well:

🐛 Proposed fix in abort()
 public function abort(string $path, string $extra = ''): bool
 {
     if (file_exists($path)) {
         \unlink($path);
     }

+    $tmpAssemble = \dirname($path).DIRECTORY_SEPARATOR.'tmp_assemble_'.\basename($path);
+    if (\file_exists($tmpAssemble)) {
+        \unlink($tmpAssemble);
+    }
+
     $tmp = \dirname($path).DIRECTORY_SEPARATOR.'tmp_'.\basename($path).DIRECTORY_SEPARATOR;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$tmpDir = \dirname($path).DIRECTORY_SEPARATOR.'tmp_'.\basename($path);
$tmp = $tmpDir.DIRECTORY_SEPARATOR.\basename($path).'_chunks.log';
$tmpAssemble = \dirname($path).DIRECTORY_SEPARATOR.'tmp_assemble_'.\basename($path);
public function abort(string $path, string $extra = ''): bool
{
if (file_exists($path)) {
\unlink($path);
}
$tmpAssemble = \dirname($path).DIRECTORY_SEPARATOR.'tmp_assemble_'.\basename($path);
if (\file_exists($tmpAssemble)) {
\unlink($tmpAssemble);
}
$tmp = \dirname($path).DIRECTORY_SEPARATOR.'tmp_'.\basename($path).DIRECTORY_SEPARATOR;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Storage/Device/Local.php` around lines 186 - 188, The abort() routine
currently only removes files under tmpDir (the
dirname($path)/'tmp_'.basename($path)) and therefore leaves orphaned partial
assembly files created as $tmpAssemble
(dirname($path).'/tmp_assemble_'.basename($path)); update abort() to also check
for and unlink the $tmpAssemble path (and any similarly named tmp_assemble_* for
the same basename) before returning so orphaned files created by
joinChunks/opening $tmpAssemble are cleaned up on abort. Ensure you reference
the same basename logic used to construct $tmpAssemble and handle
missing-file/no-permission errors silently as the existing abort() does.

@lohanidamodar lohanidamodar merged commit f014be4 into main Feb 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants