feat(libstore): make cycle errors more verbose#6
Open
milahu wants to merge 61 commits into
Open
Conversation
Co-Authored-By: Milan Hauth <milahu@gmail.com>
Previously, `CycleEdgeScanSink::operator()` copied the entire `getResult()` `StringSet` twice on every 64KB chunk to detect newly found hashes. For large files, this created O(n * chunks) overhead. Now we track which hashes have been recorded for the current file using `recordedForCurrentFile`, avoiding the set copies. The insert() returns true only for newly seen hashes, making this O(1) per hash found.
The `hashPathMap` was being passed to `CycleEdgeScanSink` and stored as a member variable, but was never actually used. The sink only needs the hash strings for detection via `RefScanSink`, not the full `StorePath` mapping.
Moved the `edges` member variable from public to private section for better encapsulation. Access is provided through the `getEdges()` method.
Replaced raw `read()` with `readFull()` helper, which properly handles partial reads and `EINTR`. The previous code manually checked for errors but didn't handle the case where `read()` returns fewer bytes than requested.
After refactoring to use `RefScanSink`, we no longer manually search for hashes in buffers, so the `refLength` constant (hash length) is unused. `RefScanSink` handles this internally.
The single-pass greedy algorithm could fail to connect all edges if they arrived in certain orders. For example, edges A→B, C→D, D→A, B→C would result in two paths [D→A→B→C] and [C→D] instead of one complete cycle. Added a second pass that repeatedly tries to merge existing multiedges with each other until no more merges are possible. This ensures we find complete cycle paths regardless of edge discovery order.
…File APIs Replaced POSIX-specific file operations with portable alternatives to improve Windows compatibility.
Added explicit `#include "nix/util/archive.hh"` since we use the caseHackSuffix constant from it.
Replace string concatenation for path joining with type-safe `std::filesystem::path operator/`.
…anPath The "2" suffix was unclear and didn't communicate the function's purpose. The new name better describes what it does: walks the filesystem tree and scans each file using the provided sink.
Changed `walkAndScanPath` to take `std::filesystem::path` instead of `string`, eliminating repeated string→path conversions throughout the function.
…acOS only Since caseHackSuffix is only used inside `#ifdef __APPLE__` blocks, gate the archive.hh include with the same condition.
Move `find-cycles.hh` to `src/libstore/include/nix/store/build/` to ensure it is properly installed as a public header and can be used by tests and other consumers of the library.
Replace the multi-pass O(n²) algorithm with a single-pass O(n) approach using hashmaps to track path endpoints. Before: - First pass: O(n*m) to join each edge with existing multiedges - Second pass: O(m²) repeated merging until no more merges possible - Required multiple iterations through the entire dataset Now: - Single O(n) pass through edges - Two hashmaps (pathStartingAt, pathEndingAt) enable O(1) lookups - Immediately identifies connections without linear searches - Handles all cases in one pass: * Extending existing paths (prepend/append) * Joining two separate paths * Forming cycles when a path connects to itself
Replace direct `std::filesystem` operations with `SourceAccessor`..
revert parts of feat(libstore/find-cycles): try to find file paths revert back to type using StoreCycleEdge = std::deque<std::string>; restore the function transformEdgesToMultiedges
before this change, also buildInputs were found as cycle edges but actual cycles cannot include buildInputs because buildInputs dont know this derivation's output paths fix failing test cyclic-fullpaths-buildInputs
fix failing test cyclic-fullpaths-self
fix failing test cyclic-fullpaths-duplicate
so the file path with the shortest derivation path comes first
2b370ff to
e5e567d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
continue NixOS#14218
example output with c1ea0d4
note: to make this work i had to patch multiple-outputs.nix
if
getDetailedCycleErrorfails to find a cyclethen we also show an internal error
real world example: libtorrent
edges 1 + 2 = cycle: out → dev → out
edge 3 may be a false positive
ping @lovesegfault @ConnorBaker @xokdvium @Ericson2314 @edolstra @Dessix @nrdxp @SuperSandro2000 @jonringer @copumpkin @vcunat @datakurre @jtojnar @Artturin @arianvp @charles-dyfis-net @peterwaller-arm @abbradar @gahara @jb55