Version 3 ldm opt by senhuang42 · Pull Request #4 · senhuang42/zstd

senhuang42 · 2020-09-29T22:17:46Z

Draft PR
Preliminary results regarding compressed size:

5259889920 (5GB) bytes: 5 Linux kernels concatenated: versions 4.19.148, 5.4.68, 5.8.12, 5.9-rc7, 5.8.12:
- Old: Level 22 - long mode - window log = 27
  - 622643668 → 11.84%
- Old: Level 22 - no long mode - window log = 27
  - 624737369 → 11.88%
- New: Level 22 - long mode - window log = 27
  - 622639116 → 11.84%
1995028480 (2GB) bytes: 2 relatively close Linux kernels concatenated: 5.8.12, 5.9-rc7
- Old: Level 22 - long mode - window log = 30
  - 123668734 → 6.20%
- Old: Level 22 - no long mode - window log = 30
  - 131944438 → 6.61%
- New: Level 22 - long mode - window log = 30
  - 122466509 → 6.14%
211950592 (200MB) bytes: silesia.tar
- Old: Level 22 - long mode - window log = 27
  - 52777588 → 24.90%
- Old: Level 22 - no long mode - window log = 27
  - 52701591 → 24.87%
- New: Level 22 - long mode - window log = 27
  - 52699843 → 24.86%

New algorithm approach:

Essentially, we will use the LDM rawSeqStore given to us in a similar way to the current LDM insertion algorithm works. What this means is that essentially for each new block, the LDM sequence at position pos within the LDM seqStore is guaranteed to start at the same relative position as the block. That is, for each n bytes processed in the block, n bytes are processed and consumed in the LDM seqstore, and the member pos will point to relevant sequence.
As such, all of our calculations are purely relative within each block, and the approach is robust against issues such as window updates, etc.

…l is high enough

Cyan4973 · 2020-09-29T22:32:24Z

Little trick :
if you want to be sure that I'll notice your PR into your own fork,
just cc @Cyan4973 in a comment,
this will be enough to generate a notification.

Cyan4973 · 2020-09-29T22:46:47Z

Compression ratio :
These are good results, and within expected range.

Could you also add a comparison for a use case where --long improves ratio,
for both "old" and "new" implementations ?
We want to show that, on top of avoiding compression ratio regressions, which is key,
the new implementation also improves compression ratio (a bit) for favorable cases.

I believe it can be achieved by using the same tarball of linux sources,
but extending the window to 1 GB,
using command --long=30.

If you are afraid of long compression delays when using --ultra in combination with 5 GB input,
you may want to reduce the tarball to 2 kernel versions, consecutive or distant,
to reduce budget to compress around ~2 GB.

senhuang42 · 2020-09-30T00:14:09Z

@Cyan4973 Good suggestion - I've updated the original post with a benchmark comparing the case you mentioned, and indeed the new method seems to still maintain the compression ratio gains.

I will be including much more comprehensive benchmarks in the actual version of the PR against the real Zstd repo, including some rough estimates on speed.

Cyan4973 · 2020-09-30T00:59:40Z

Quick comparison on my laptop, compressing silesia.tar

branch	conf	size	time	cpu%
`v1.3.4`	-19 -T0	54118223	36.4 s	259%
`v1.3.5`	-19 -T0	53451696	39.7 s	290%
`v1.3.6`	-19 -T0	53451696	38.6 s	287%
`v1.4.0`	-19 -T0	53285591	42.7 s	295%
`v1.4.4`	-19 -T0	53278346	41.8 s	293%
`dev`	-19 -T0	53279750	42.6 s	293%
`ldm3`	-19 -T0	53279750	43.3 s	296%

`v1.3.4`	-19 -T0 --long=23	54312876	47.1 s	162%
`v1.3.5`	-19 -T0 --long=23	53554099	99.5 s	99%
`v1.3.6`	-19 -T0 --long=23	53554099	99.8 s	99%
`v1.4.0`	-19 -T0 --long=23	53545769	101.1 s	99%
`v1.4.4`	-19 -T0 --long=23	53542105	99.2 s	99%
`dev`	-19 -T0 --long=23	53543099	100.3 s	99%
`ldm3`	-19 -T0 --long=23	53276362	108.0 s	99%

`dev`	-19 -T0 --long=23	53537474	46.3 s	288%

Level 19 offers a full search over the 8 MB window.
In this situation, --long=23 is expected to provide basically nothing, since it targets the same window size.
This is a "hard" scenario for the ldm.
As expected, combining ldm with the optimal parser translates into a loss with current and previous versions of zstd.
In contrast, this branch manages to generate a tiny gain, proving once again that it's no longer detrimental.

What worries me though is the discovery that enabling the ldm mode kills multithreading performance.
This effect was unexpected. The combination of ldm with multithreading is supposed to work since v1.3.4.

However, this effect is unrelated to this PR. It was present before!
But it's concerning, because it indirectly impacts the usefulness of --long command in combination with high compression mode and multithreading (most users of high compression mode appreciate the multi-threading feature).

This probably warrants an investigation, though in a different follow up PR.

cc @terrelln : the feature (ldm + mt) seems to have disappeared almost immediately after its introduction into v1.3.4. By any chance, do you remember anything about it ?

edit : this is confirmed : the reduction in nb of active threads come from the large expansion of job size when triggering the --long mode. When forcing it back to expected 32 MB, one can observe multithreading active again.
There's probably a topic around revisiting the algorithm determining job size, though this should be a follow-up PR.

senhuang42 · 2020-09-30T15:37:20Z

+    U32 ldmStartPosInBlock = 0;
+    U32 ldmEndPosInBlock = 0;
+    U32 ldmOffset = 0;
+


@terrelln So this comment below goes into more explicit detail about the surprising behavior I mentioned during our 1:1 and the code below is the my supposed "mitigation", basically pushing back adding the match by adding more bytes to the litLength of the ldm.

@senhuang42 I see the problem now. This is not the correct solution. This will likely break on large files.

I believe this is caused by btultra2 which does 2 passes over the first block. You can double check by seeing if the problem goes away at level 18, which uses btultra, not btultra2. ` You'll need to reset the LDM seqStore after the first pass, to re-consume the sequences.

https://github.com/facebook/zstd/blob/cc88eb7594d9c70ec2440ce6122f7861dbd64af2/lib/compress/zstd_opt.c#L1123-L1129

senhuang42 · 2020-09-30T20:24:18Z

The multi-threading issue definitely seems to be just a configuration sort of issue - multithreading was enabled when I tried compressing a 5GB file with --long -22 --ultra -T0, but it does seem worth it to adjust it in a follow-up PR since it seems like there were pretty good speed gains on silesia.tar when it was enabled for ldm mode.

terrelln · 2020-09-30T20:16:06Z

+        bytesToSkip -= seq->litLength;
+        seq->litLength = 0;
+        if (bytesToSkip < seq->matchLength) {
+            seq->matchLength -= (U32)bytesToSkip;


What happens when matchLength < 3 after this line? That matchLength would be unrepresentable.

I actually allow matchLength < 3 since: we will always reject any matchLength < MINMATCH later on when we try to add LDMs, and I wanted to keep the functions as simple as possible to avoid any issues with miscounting the number of bytes we actually skipped. It would definitely make sense to note this specifically though.

terrelln · 2020-09-30T20:17:43Z

+    if (remainingBytes <= currSeq.litLength) {
+        currSeq.offset = 0;
+    } else if (remainingBytes < currSeq.litLength + currSeq.matchLength) {
+        currSeq.matchLength = remainingBytes - currSeq.litLength;


Same question here about matchLength < 3.

terrelln · 2020-09-30T20:28:38Z

+    U32 ldmStartPosInBlock = 0;
+    U32 ldmEndPosInBlock = 0;
+    U32 ldmOffset = 0;
+


@senhuang42 I see the problem now. This is not the correct solution. This will likely break on large files.

I believe this is caused by btultra2 which does 2 passes over the first block. You can double check by seeing if the problem goes away at level 18, which uses btultra, not btultra2. ` You'll need to reset the LDM seqStore after the first pass, to re-consume the sequences.

https://github.com/facebook/zstd/blob/cc88eb7594d9c70ec2440ce6122f7861dbd64af2/lib/compress/zstd_opt.c#L1123-L1129

terrelln · 2020-09-30T20:34:39Z

@senhuang42 LDM + MT at level 19 will create 256 MB jobs by default. The code controlling it is here

https://github.com/facebook/zstd/blob/cc88eb7594d9c70ec2440ce6122f7861dbd64af2/lib/compress/zstdmt_compress.c#L1188

You can control it on the CLI with -B. E.g. a 32MB job size is -B32MB.

We should use the cycleLog instead of the chainLog. That will reduce it to a 128 MB job size. Then we could consider using cycleLog+3 instead of +4.

We could also take the strategy into account, and have a more fine grained approach.

senhuang42 · 2020-09-30T21:17:42Z

@terrelln the note about btultra2 was super helpful and cleared up this mystery that had been bugging me the entire time. Thanks!

senhuang42 added 22 commits September 26, 2020 11:07

Add rawSeqStore to match state

6c3c86e

Modify codepath to use opt parser exclusively if the compression leve…

ef464e3

…l is high enough

Add ldm helper function declarations into opt parser

1ded541

Add callsites to appropriate locations in ..opt_generic()

cfa4afb

Flesh out required args for ldm_handleLdm()

848bfa9

Implement basic splitSequence and skipSequence functions

8d03d44

Implement ldm_getNextMatch()

bd02f2c

Implement part of ldm_maybeAddLdm()

26b6859

Get zstd to build with new functions and callsites, fix arguments

fa1501f

Add debug statements, flesh out functions

4262d70

Add re-copying of ldmSeqStore after processing

98378bc

Added more debugging

c4c0652

Add initial getNextMatch() in opt parser

5643c4d

Add base adjustment correction

753a90e

Adjustments to no longer segfault on nci

55930bb

Fix function argument to getNextMatch()

b2390a4

Add a function ldm_voidSequences()

6e02a1d

Add proper bounds check on adding ldms

c8dcc46

Fixed end of match boundary update issues

2b76c76

Fixed sifting algorithm

48a2886

Cleanups, add comments and explanations

fd10641

ldm_getNextMatch fixed return values

d59921e

senhuang42 added 2 commits September 30, 2020 11:06

Address mixed variables C90 warning

c6b4e63

Add extra bounds check to prevent heap access after free ASAN error

ebf8794

senhuang42 force-pushed the version_3_ldm_opt branch from 35e9845 to ebf8794 Compare September 30, 2020 15:32

senhuang42 commented Sep 30, 2020

View reviewed changes

senhuang42 added 2 commits September 30, 2020 12:54

Prevent duplicate LDMs from being inserted

c07f0fe

Add unit tests into playTests.sh

3ceaf38

terrelln reviewed Sep 30, 2020

View reviewed changes

senhuang42 added 7 commits September 30, 2020 17:20

Remove rawSeqStore.base and add rawSeqStore.posInSequence

5f2ef90

Add ldm_calculateMatchRange() function

84b1872

Adjustments to ldm_calculateMatchRange() to calculate bounds correctly

8a511ef

Refactor existing functions to use posInSequence

4c35482

Reset ldmSeqStore after initStats_ultra() pass for btultra2

ed40d4e

Correct matchLength calculation and remove unnecessary functions

16fea7c

Improve documentation of relevant structs

04a5763

senhuang42 force-pushed the version_3_ldm_opt branch from d6ff12c to de77b83 Compare October 1, 2020 15:56

Make function descriptions more accurate

c979b2c

senhuang42 force-pushed the version_3_ldm_opt branch 2 times, most recently from ce09360 to 167276d Compare October 1, 2020 16:49

Add DEBUGLOG() calls in ldm helpers

05fd1be

senhuang42 force-pushed the version_3_ldm_opt branch from 167276d to 05fd1be Compare October 1, 2020 16:49

Prefix new static ldm helpers with ZSTD_opt

09eedf6

senhuang42 force-pushed the version_3_ldm_opt branch from a7c9887 to 09eedf6 Compare October 1, 2020 22:46

senhuang42 mentioned this pull request Oct 12, 2020

Multithreading with --long has potentially suboptimal settings facebook/zstd#2352

Closed

Conversation

senhuang42 commented Sep 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Cyan4973 commented Sep 29, 2020

Uh oh!

Cyan4973 commented Sep 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

senhuang42 commented Sep 30, 2020

Uh oh!

Cyan4973 commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

senhuang42 Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

terrelln Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

senhuang42 commented Sep 30, 2020

Uh oh!

terrelln Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

senhuang42 Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

terrelln Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

terrelln Sep 30, 2020

Choose a reason for hiding this comment

Uh oh!

terrelln commented Sep 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

senhuang42 commented Sep 30, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

senhuang42 commented Sep 29, 2020 •

edited

Loading

Cyan4973 commented Sep 29, 2020 •

edited

Loading

Cyan4973 commented Sep 30, 2020 •

edited

Loading

terrelln commented Sep 30, 2020 •

edited

Loading