Skip to content

Multithreading with --long has potentially suboptimal settings #2352

@senhuang42

Description

@senhuang42

I've opened this issue to discuss some changes we can make to the determination of jobLog (which is the value that ZSTDMT_computeTargetJobLog() returns) when using multithreading.

When compressing silesia.tar (~200MB) with -T0 -19 --long=23, @Cyan4973 discovered that as of version 1.3.5, multithreading is not truly enabled with -T0. This has led to some discussion about better settings for multithreading while using LDM, and in particular, reducing the job sizes across the board.

Currently, ldm uses chainLog exclusively to compute the job size. Here, I present a table of some compression configurations and how the LDM currently derives jobLog, and two possible changes that we could make. I'm in favor of approach 2 and being more liberal with using smaller jobSizes. On silesia.tar, we can compress nearly twice as fast on the -19 wlog=23, -19 wlog=27 and -22 wlog=27 cases in particular, compared to no multithreading.

I feel that with -T0, I generally expect the program to err on the side of parallelizing more, rather than less, and the existing 256MB default jobSize at -19 -T0 --long seems too large.

Conf dev, no --long dev, --long: MAX(21, chainLog+4) proposal 1, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+4) proposal 2, --long: MAX(21, ZSTD_cycleLog(hashLog, strategy)+3)
-22 wlog=27 29 30 28 27
-19 wlog=27 29 28 25 24
-19 wlog=23 25 28 25 24
-16 wlog=23 25 26 25 24
-13 wlog=22 24 25 25 24
-11 wlog=22 24 25 26 25
-11 wlog=27 29 25 26 25
-9 wlog=21 23 23 24 23
-7 wlog=21 23 23 23 22
-3 wlog=21 23 21 21 21
-3 wlog=27 29 21 21 21
-1 wlog=19 21 21 21 21
-1 wlog=22 25 21 21 21
-1 wlog=27 29 21 21 21

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions