I've opened this issue to discuss some changes we can make to the determination of jobLog (which is the value that ZSTDMT_computeTargetJobLog() returns) when using multithreading.
When compressing silesia.tar (~200MB) with -T0 -19 --long=23, @Cyan4973 discovered that as of version 1.3.5, multithreading is not truly enabled with -T0. This has led to some discussion about better settings for multithreading while using LDM, and in particular, reducing the job sizes across the board.
Currently, ldm uses chainLog exclusively to compute the job size. Here, I present a table of some compression configurations and how the LDM currently derives jobLog, and two possible changes that we could make. I'm in favor of approach 2 and being more liberal with using smaller jobSizes. On silesia.tar, we can compress nearly twice as fast on the -19 wlog=23, -19 wlog=27 and -22 wlog=27 cases in particular, compared to no multithreading.
I feel that with -T0, I generally expect the program to err on the side of parallelizing more, rather than less, and the existing 256MB default jobSize at -19 -T0 --long seems too large.
| Conf |
dev, no --long |
dev, --long: MAX(21, chainLog+4) |
proposal 1, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+4) |
proposal 2, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+3) |
| -22 wlog=27 |
29 |
30 |
28 |
27 |
| -19 wlog=27 |
29 |
28 |
25 |
24 |
| -19 wlog=23 |
25 |
28 |
25 |
24 |
| -16 wlog=23 |
25 |
26 |
25 |
24 |
| -13 wlog=22 |
24 |
25 |
25 |
24 |
| -11 wlog=22 |
24 |
25 |
26 |
25 |
| -11 wlog=27 |
29 |
25 |
26 |
25 |
| -9 wlog=21 |
23 |
23 |
24 |
23 |
| -7 wlog=21 |
23 |
23 |
23 |
22 |
| -3 wlog=21 |
23 |
21 |
21 |
21 |
| -3 wlog=27 |
29 |
21 |
21 |
21 |
| -1 wlog=19 |
21 |
21 |
21 |
21 |
| -1 wlog=22 |
25 |
21 |
21 |
21 |
| -1 wlog=27 |
29 |
21 |
21 |
21 |
I've opened this issue to discuss some changes we can make to the determination of
jobLog(which is the value thatZSTDMT_computeTargetJobLog()returns) when using multithreading.When compressing
silesia.tar(~200MB) with-T0 -19 --long=23, @Cyan4973 discovered that as of version1.3.5, multithreading is not truly enabled with-T0. This has led to some discussion about better settings for multithreading while using LDM, and in particular, reducing the job sizes across the board.Currently,
ldmuseschainLogexclusively to compute the job size. Here, I present a table of some compression configurations and how the LDM currently derivesjobLog, and two possible changes that we could make. I'm in favor of approach 2 and being more liberal with using smaller jobSizes. Onsilesia.tar, we can compress nearly twice as fast on the-19 wlog=23,-19 wlog=27and-22 wlog=27cases in particular, compared to no multithreading.I feel that with
-T0, I generally expect the program to err on the side of parallelizing more, rather than less, and the existing 256MB default jobSize at-19 -T0 --longseems too large.--long--long:MAX(21, chainLog+4)--long:MAX(21, ZSTD_cycleLog(hashLog, strategy)+4)--long:MAX(21, ZSTD_cycleLog(hashLog, strategy)+3)