Fix files missing from large directories#25
Open
buttercookie42 wants to merge 4 commits intoFileSysOrg:masterfrom
Open
Fix files missing from large directories#25buttercookie42 wants to merge 4 commits intoFileSysOrg:masterfrom
buttercookie42 wants to merge 4 commits intoFileSysOrg:masterfrom
Conversation
The problem is that in order to do the comparison between the file name from the iterator and the passed-in FileInfo, we always need to consume the respective Path from the DirectoryStream iterator, so even when the comparison declares success, we have already consumed that Path. This means that the next FIND_NEXT2 call will skip that entry and start one Path too late when it accesses this search context again. For correctness, we therefore have to rewind the iterator by one after having found the correct Path, so that the next nextFileInfo() call restarts at the right point. The current solution isn't the best for large directories performance-wise due to having to iterate twice over all files up to the resume point, but at least it is correct and better than omitting the first file from every FIND_NEXT2 response (or equivalent split-up Find request reply for SMB2).
Since I'm adding another caller for FileInfo.copyFrom(), I've had a look at that method and it seems that we ought to handle m_shortname there, too. And while we're at it, same thing for resetInfo(), too.
…isting The problem with the way our search code is implemented is that when we need to split up a large search response (with lots of files) into multiple packets, at the boundary between each packet we need to retrieve the FileInfo object for the same file twice - once to find out that it doesn't fit into the previous packet anymore, and a second time to actually transmit it in the followup packet. The protocol handler therefore calls restartAt() on the SearchContext in order to effectively rewind it by one entry, so that the next call to nextFileInfo() during the subsequent response packet returns the correct FileInfo entry (compare also the previous commit). With the NIO files API, this turns into a problem, because the Streams-based iterator cannot be rewound, so every time we need to backtrack by one entry, we need to reiterate through all the directory's contents up to the desired restart point. (And even worse, the restartAt(FileInfo)-based method needs actually needs to iterate twice for every call.) For large directories with thousands of files (or more), this turns into a very noticeable overhead when listing the directory contents. To work around this issue, we now cache the last returned FileInfo object, and check in restartAt(FileInfo) whether the call corresponds to the common case of going back by merely one entry. If so, instead of expensively rewinding the iterator, we simply set the next call to nextFileInfo() to return the previously cached FileInfo object and only subsequently to resume iterating normally through the directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In large directories I've observed that a number of files are missing on jfileserver shares. After some more investigation, it turned out that the missing files correspond to FIND_NEXT2 requests – the first file from every FIND_NEXT2 response is missing. (And I suspect that SMB2/3 would be similarly affected if a search needs to be split up into multiple request/reponse pairs.)
Fixing this issue in turn exacerbates an existing performance issue in the
restartAt(FileInfo)code inJavaNIOSearchContext. Due to the way our search and SMB APIs are structured, when we need to split up a large search response (with lots of files) into multiple packets, at the boundary between each packet we need to retrieve the FileInfo object for the same file twice - once to find out that it doesn't fit into the previous packet anymore, and a second time to actually transmit it in the followup packet.Because the NIO directory iterator cannot be rewound to go back by one entry, this means that we have to iterate through the whole directory up to the target file again, and due to the fix for the above we actually need to do it twice now.
To fix the resulting performance issue in large directories (where on a phone and with a few thousand files a single FIND_FIRST/NEXT2 call can take hundreds of ms and return usually at most ~100 files), I propose special-casing this common scenario of restarting the SearchContext by exactly one entry earlier.