fix(importer): handle non-fixed-width multi-volume RAR volume naming#686
Merged
Conversation
Multi-volume RAR sets numbered without consistent zero-padding (part01..part09 then part010..part0NN) failed to import. rardecode follows a set by computing the next volume name at the first volume's digit width (part10.rar) while the real file is part010.rar, so volume following AND the part->segment mapping both stopped after volume 9. The file imported with only ~3.5% of its segments and failed at playback with "No segments to download". - Add internal/importer/rarname leaf package (SetKey / VolumeNumber / Scheme + regexes) as the single source of truth; archive re-exports them and rar aliases archive.* (avoids an import cycle: archive -> iso -> filesystem). - UsenetFileSystem and DecryptingFileSystem now resolve a requested volume by (set, scheme, volume-number) when the exact filename misses, so width mismatches still follow the whole set. - convertAggregatedFilesToRarContent and the nested map functions resolve rardecode part.Path -> NZB file the same way via a generic partLocator, so every followed volume's segments are attached. - Add a coverage guard in AnalyzeRarContentFromNzb that fails an import when followed volume bytes are < 80% of supplied volume bytes (catches truncated sets at import instead of at playback). Tests: unit tests for rarname, the volume_index/partLocator resolvers and the segment mapping, plus an end-to-end import battery test (TestImportBattery_RarWidthMismatchVolumeNaming) backed by a committed 14-volume fixture (part01..part09, part010..part014). Verified RED without the fallback (reproduces "no files were successfully processed") and GREEN with it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Multi-volume RAR sets numbered without consistent zero-padding —
…part01.rar……part09.rar, then…part010.rar……part0NN.rar(literallypart0+ the unpadded number) — failed to import and play. Example: a 259-volume, 27.2 GB stored REMUX imported with only 946 MB of segments (9 volumes) and failed at playback with:Root cause
The
rardecodefork follows a multi-volume set by computing the next volume name at the first volume's digit width (nextNewVolName). After…part09.rar(2 digits) it asks for…part10.rar, but the real file is…part010.rar. This width mismatch broke the set in two places:UsenetFileSystem.Openmissed…part10.rar→ rardecode stopped at volume 9.convertAggregatedFilesToRarContentmapped rardecode's computedpart.Path(…part10.rar) back to the NZB file via exact-name lookup, so even when all volumes were followed, onlypart01–part09matched and the rest were dropped → import failed withno files were successfully processed.Fix
internal/importer/rarname—SetKey/VolumeNumber/Scheme+ regexes as the single source of truth.archivere-exports them andraraliasesarchive.*. (A leaf package is required becausearchive → archive/iso → filesystem, so the helpers can't live inarchiveorrarwithout an import cycle.)UsenetFileSystem&DecryptingFileSystem— on an exact-name miss,Open/Statresolve a volume by(set, scheme, volume-number), so…part10.rar→…part010.rarregardless of padding width.convertAggregatedFilesToRarContent+ nestedmapNestedFile*— resolvepart.Path→ source the same way via a genericpartLocator[T], so every followed volume's segments are attached.AnalyzeRarContentFromNzb— fail an import when followed volume bytes are < 80% of supplied volume bytes, surfacing truncated/incomplete sets at import time instead of at playback.Tests
rarname(width-independent parsing), thevolume_index/partLocatorresolvers (incl. cross-set collision), the truncation guard, and the segment mapping (TestConvertAggregatedFilesWidthMismatch).TestImportBattery_RarWidthMismatchVolumeNaming, backed by a committed 14-volume fixture (part01…part09,part010…part014) generated bytestdata/gen_fixtures.sh. Runs the fullProcessNzbFilepipeline over the in-memory fakepool and asserts full segment coverage.no files were successfully processed) and GREEN with it.go test -race ./internal/importer/...passes; build + gofmt clean.Reviewer notes
rar+ a rename step ingen_fixtures.sh; CI needs norar(fixtures are committed,loadManifestt.Skips if absent).-m0stored, unencrypted;has_password=true(from the NZB<meta password>) is harmless and thePassword.txtin such releases is a filename de-obfuscation password, not a RAR password.