Skip to content

Very slow parsing large .ost files due to DeflateStream CopyTo reading to end of file #28

@dtzuul

Description

@dtzuul

(First - thank you for maintaining this!)

Trying to read relatively large .OST files of about 5 GB (with compressed data blocks) is very slow.
Reading folder structure > 1 min, a single message about 4 seconds.
large PST files are ok.

=======
Profiler shows 95% of CPU in DeflateStream.CopyTo
This is due to deflate stream CopyTo inflating/advancing to the end of file for each block read request.
Attached image shows that instead of 1040 bytes is inflating approx 32K for this particular block (and takes 200 ms +). This is against the ost file on a RAM disk so no disk access is involved (tried a few things to make it faster :) )

this below change works correctly for my datasets, resulting in a hundreds of messages per second.
in NDB.cs ReadAndDecompress , use loop read instead of copyto

`

               using (DeflateStream decompressionStream = new DeflateStream(fs, CompressionMode.Decompress, true))
               {
                   if (buffer == null)
                       buffer = new byte[rb.InflatedLength];
                   int curPos = 0;
                   int buffLen = buffer.Length;
                   int batchSize = buffLen > 65536 ? 16384 : 4096;
                   while (curPos < buffer.Length)
                   {
                       int count = Math.Min(batchSize, buffLen - curPos);
                       int bytesRead = decompressionStream.Read(buffer, curPos, count);
                       if (bytesRead == 0) break;
                       curPos += bytesRead;
                   }
               }

`

here are the stream positions before and after, along with the file size
Image

Thank you again!

P.S. Sorry for not using pull request

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions