pydiskmark — Implementation Specification

Source of truth: jdm-java (smart branch) as of 2026-06-20. This document describes the behaviour of the Java implementation so that a functionally-equivalent Python port can be built without referencing the Java source directly.

1. Overview

pydiskmark is a cross-platform disk benchmark utility. It measures sustained sequential and random I/O performance and reports bandwidth (MB/s), latency (ms/block), and IOPS.

The tool runs in two modes:

CLI — primary mode; invoked with python -m pydiskmark run.
GUI — desktop interface built with Tkinter + matplotlib; launch with python -m pydiskmark gui.

2. Terminology

Term	Meaning
Benchmark	A single top-level run, may contain one WRITE and/or one READ operation.
Operation	One directed I/O phase — either WRITE or READ — within a benchmark.
Sample	A single timed measurement unit: writes/reads `numBlocks` blocks and records elapsed time.
Block	The atomic I/O unit; size is configurable (e.g. 512 KB, 4 KB).
txSize	Total KB transferred by one operation = `blockSizeKb × numBlocks × numSamples`.
Bandwidth	MB/s = bytes-transferred / elapsed-seconds.
Latency	Average time per block in milliseconds = `elapsed_ns / 1_000_000 / numBlocks`.
IOPS	Blocks (ops) per second across the entire operation = `totalOps / elapsed_sec`.

3. Constants & Units

KILOBYTE = 1_024          # bytes
MEGABYTE = 1_024 * 1_024  # bytes
GIGABYTE = 1_024 * 1_024 * 1_024
APP_NAME  = "pydiskmark"
DATADIRNAME = "pdm-data"  # sub-directory created inside the chosen location
PROPERTIES_FILENAME = "pdm.properties"

Binary units (powers of 2) are used throughout — never decimal SI units.

4. Enumerations

4.1 `BenchmarkType`

READ       — read-only benchmark (files written silently in a prep phase)
WRITE      — write-only benchmark
READ_WRITE — write phase followed by a read phase on the same data

4.2 `IOMode`

READ
WRITE

4.3 `BlockSequence`

SEQUENTIAL — blocks accessed in order 0, 1, 2 … N-1
RANDOM     — each block index chosen via randint(0, numBlocks-1)

4.4 `IoEngine`

MODERN — positional I/O via os.pwrite / os.pread (POSIX) or
          CreateFileW / WriteFile (Windows), with optional Direct I/O
          (O_DIRECT on Linux/macOS; FILE_FLAG_NO_BUFFERING on Windows).
          Aligned buffers are required when Direct I/O is active.

4.5 `SectorAlignment`

Constant	Bytes	Display Label
`NONE`	-1 (OS default)	None (OS Default)
`ALIGN_512`	512	512 B (Legacy)
`ALIGN_4K`	4 096	4 KB (Standard) — default
`ALIGN_8K`	8 192	8 KB (Enterprise)
`ALIGN_16K`	16 384	16 KB (High-End)
`ALIGN_64K`	65 536	64 KB (RAID/Stripe)

Each enum member stores a (bytes, display) tuple, with .bytes and .display properties and a __str__ that returns the display label.

When MODERN engine is used with Direct I/O, the write/read buffer must be aligned to the selected value. In Python, VirtualAlloc (Windows) returns 64 KB-aligned memory and mmap (POSIX) returns page-aligned memory, both of which satisfy all alignment values.

5. Data Model

5.1 `BenchmarkConfig` (parameter snapshot)

Captured once at benchmark start; immutable during a run.

Field	Type	Default	Notes
`app_version`	str	from version file
`profile`	BenchmarkProfile	QUICK_TEST
`profile_modified`	bool	False
`benchmark_type`	BenchmarkType	READ_WRITE
`block_order`	BlockSequence	SEQUENTIAL
`num_blocks`	int	32	blocks per sample
`block_size`	int	bytes (e.g. 512*1024)	`block_size_kb * KILOBYTE`
`num_samples`	int	200
`num_threads`	int	1
`tx_size`	int	KB	`block_size_kb * num_blocks * num_samples`
`io_engine`	IoEngine	MODERN
`direct_io_enabled`	bool	False
`write_sync_enabled`	bool	False
`sector_alignment`	SectorAlignment	ALIGN_4K
`multi_file_enabled`	bool	True
`test_dir`	str	path	absolute path to data directory

Helper predicates:

def has_write_operation(self) -> bool:
    return self.benchmark_type in (BenchmarkType.WRITE, BenchmarkType.READ_WRITE)

def has_read_operation(self) -> bool:
    return self.benchmark_type in (BenchmarkType.READ, BenchmarkType.READ_WRITE)

5.2 `BenchmarkSystemInfo`

Captured once at benchmark start from the runtime environment.

Field	Type	Notes
`os`	str	`platform.system()`
`arch`	str	`platform.machine()`
`processor_name`	str	CPU brand string
`runtime`	str	`"Python x.y.z"` replaces Java's `jdk` field
`location_dir`	str	absolute path of the test location

5.3 `BenchmarkDriveInfo`

Field	Type	Notes
`drive_model`	str	human-readable device model
`partition_id`	str	drive letter (Windows) or partition path (Linux/macOS)
`percent_used`	int	0–100
`used_gb`	float
`total_gb`	float

5.4 `Sample`

One timed I/O measurement.

Field	JSON key	Type	Notes
`type`	—	IOMode	not serialised
`sample_num`	`sn`	int	1-based sample index
`bw_mb_sec`	`bw`	float	bandwidth for this sample (MB/s)
`cum_avg`	`bt`	float	running cumulative average bandwidth
`cum_max`	`mx`	float	running cumulative maximum
`cum_min`	`mn`	float	running cumulative minimum
`access_time_ms`	`la`	float	latency: elapsed_ns/1e6 / num_blocks
`cum_acc_time_ms`	`lt`	float	running cumulative average latency

All float fields are rounded to 4 decimal places in JSON/YAML/CSV output.

5.5 `BenchmarkOperation`

Aggregated results of one I/O phase.

Field	Type	Notes
`io_mode`	IOMode	READ or WRITE
`block_order`	BlockSequence
`num_blocks`	int
`block_size`	int	bytes
`num_samples`	int
`tx_size`	int	KB
`num_threads`	int
`write_sync_enabled`	bool \| None	None for READ operations
`start_time`	datetime	set on construction
`end_time`	datetime \| None	set after all samples complete
`samples`	list[Sample]	all samples in insertion order
`bw_avg`	float	final average bandwidth (MB/s)
`bw_max`	float
`bw_min`	float
`acc_avg`	float	final average latency (ms)
`iops`	int	total blocks / elapsed seconds (see §8.4)

Display helpers:

def get_mode_display(self) -> str:
    if self.io_mode == IOMode.WRITE and self.write_sync_enabled:
        return "Write*"
    return self.io_mode.value

def get_duration_ms(self) -> int | None:
    if self.end_time is None:
        return None
    return int((self.end_time - self.start_time).total_seconds() * 1000)

5.6 `Benchmark`

Top-level container for a complete benchmark run.

Field	JSON key	Type	Notes
`id`	`_id`	str (UUID4)	primary key; string in JSON
`username`	—	str	`os.getlogin()` or `"anonymous"`
`system_info`	—	BenchmarkSystemInfo
`drive_info`	—	BenchmarkDriveInfo
`config`	—	BenchmarkConfig
`start_time`	—	datetime	set by `record_start_time()`
`end_time`	—	datetime \| None	set by `record_end_time()`
`operations`	—	list[BenchmarkOperation]	1 or 2 elements

Result text output format:

-------------------------------------------
pydiskmark Benchmark Results (vX.Y)
-------------------------------------------
Profile: <name>
Benchmark: <type>
Drive: <model>
Capacity: <percent>% (<used>/<total> GB)
Timestamp: <start_time>
CPU: <processor_name>
System: <os> / <arch>
Runtime: <runtime_string>
Path: <location_dir>
-------------------------------------------
Order: SEQUENTIAL|RANDOM
IOMode: Read|Write
Thread(s): N
Blocks(size): N(B)
Samples: N
TxSize(KB): N
Speed(MB/s): N.NN
SpeedMin(MB/s): N.NN
SpeedMax(MB/s): N.NN
Latency(ms): N.NN
IOPS: N
-------------------------------------------

6. Pre-defined Profiles (`BenchmarkProfile`)

Each profile is a named constant with fixed defaults. CLI users can override individual parameters without switching profiles.

Symbol	Name	Type	Order	Threads	Samples	Blocks	BlockKB	Direct	Sync	Alignment	MultiFile
`QUICK_TEST`	Quick Test	READ_WRITE	SEQ	1	50	32	1024	Yes	No	4K	No
`MAX_THROUGHPUT`	Max Throughput	READ_WRITE	SEQ	1	100	256	1024	Yes	No	4K	No
`HIGH_LOAD_RANDOM_T32`	Random 4K (T32)	READ_WRITE	RAND	32	200	128	4	Yes	No	4K	Yes
`LOW_LOAD_RANDOM_T1`	Random 4K (T1)	READ_WRITE	RAND	1	150	64	4	Yes	No	4K	No
`MAX_WRITE_STRESS`	Max Write Stress (T4)	WRITE	SEQ	4	250	512	512	Yes	Yes	4K	Yes
`MEDIA_PLAYBACK`	Media Playback	READ	SEQ	1	160	64	2048	Yes	No	4K	No
`VIDEO_EXPORTING`	Video Exporting	WRITE	SEQ	4	500	128	1024	Yes	No	4K	No
`PHOTO_LIBRARY`	Photo Library	READ	RAND	8	1000	8	128	Yes	No	4K	Yes

7. Test File Layout

<location_dir>/
└── pdm-data/               ← DATADIRNAME
    ├── testdata.pdm         ← single-file mode
    └── testdata0.pdm        ← multi-file mode (one per sample)
        testdata1.pdm
        …
        testdataN.pdm

File path selection (per sample):

def get_test_file(sample_num: int, config: BenchmarkConfig) -> Path:
    if config.multi_file_enabled:
        return Path(config.test_dir) / f"testdata{sample_num}.pdm"
    return Path(config.test_dir) / "testdata.pdm"

8. Core Benchmark Algorithm

8.1 Thread Range Partitioning

Samples are divided evenly across threads. Remainder samples are distributed one-per-thread to the leading threads.

def divide_into_ranges(start_index: int, end_index: int, num_threads: int) -> list[tuple[int,int]]:
    """Returns list of (start, end) exclusive ranges, one per thread."""
    n = end_index - start_index
    ranges = []
    range_size, remainder = divmod(n, num_threads)
    start = start_index
    for i in range(num_threads):
        end = start + range_size + (1 if remainder > 0 else 0)
        remainder = max(0, remainder - 1)
        ranges.append((start, end))
        start = end
    return ranges

8.2 `BenchmarkRunner.execute()` — Top-Level Orchestration

1. Compute total units (for progress bar):
   blocks_per_phase = num_blocks × num_samples
   w_units = blocks_per_phase  if has_write OR is READ (prep phase reuses write counter)
   r_units = blocks_per_phase  if has_read
   units_total = w_units + r_units

2. Query drive info (model, partition, disk usage).

3. Construct Benchmark object; populate system/drive info.

4. Partition sample range into thread ranges:
   start = next_sample_number (1-based, global counter)
   end   = start + num_samples
   ranges = divide_into_ranges(start, end, num_threads)

5. record_start_time()

6. Execution:
   a. If has_write:  run_operation(WRITE, ranges)
   b. Else (READ-only): run_read_preparation(ranges)  # writes files silently

7. Force progress to 100 % (throttled_progress_update(force=True))

8. Cache drop (if not cancelled AND has_read AND not direct_io OR (direct_io AND macOS)):
   call listener.attempt_cache_drop()

9. If has_read AND not cancelled: run_operation(READ, ranges)

10. record_end_time()

11. Return benchmark object.

8.3 `run_operation(mode, ranges)` — Single I/O Phase

1. Create BenchmarkOperation for this mode (copy params from config).

2. Launch one thread per range.

3. Per thread, for each sample index s in range:
   a. Create Sample(type=mode, sample_num=s)
   b. Perform I/O (measure_write or measure_read)
   c. update_metrics(sample)  — update global running stats
   d. Update op cumulative stats: bw_max, bw_min, bw_avg, acc_avg
   e. op.samples.append(sample)
   f. Increment write_units_complete or read_units_complete (thread-safe)
   g. listener.on_sample_complete(sample)
   h. throttled_progress_update()

4. Wait for all threads to finish.

5. Set op.end_time; compute op.iops (see §8.4).

8.4 IOPS Calculation

def set_total_ops(op: BenchmarkOperation, total_ops: int) -> None:
    elapsed_ns = (op.end_time - op.start_time).total_seconds() * 1e9
    if elapsed_ns > 0:
        op.iops = round(total_ops / (elapsed_ns / 1e9))

total_ops is the sum of blocks completed across all threads for this mode (i.e. write_units_complete or read_units_complete).

8.5 Cumulative Metrics (`update_metrics`)

Called after each sample, before notifying the listener.

# For WRITE samples:
if w_max == -1 or sample.bw_mb_sec > w_max: w_max = sample.bw_mb_sec
if w_min == -1 or sample.bw_mb_sec < w_min: w_min = sample.bw_mb_sec

n = sample.sample_num
if w_avg == -1:
    w_avg = sample.bw_mb_sec
else:
    w_avg = ((n - 1) * w_avg + sample.bw_mb_sec) / n

if w_acc == -1:
    w_acc = sample.access_time_ms
else:
    w_acc = ((n - 1) * w_acc + sample.access_time_ms) / n

sample.cum_avg = w_avg
sample.cum_max = w_max
sample.cum_min = w_min
sample.cum_acc_time_ms = w_acc
# Mirror for READ using r_* variables

8.6 Progress Throttling

Progress updates to the listener are rate-limited:

UPDATE_INTERVAL_MS = 25

def throttled_progress_update(force: bool = False) -> None:
    now_ms = time.monotonic_ns() // 1_000_000
    elapsed = now_ms - last_update_ms
    completed = write_units_complete + read_units_complete
    if force or elapsed >= UPDATE_INTERVAL_MS:
        percent = int(completed / units_total * 100)
        percent = max(0, min(100, percent))
        listener.on_progress_update(percent, 100)
        last_update_ms = now_ms

9. I/O Measurement (`Sample`)

9.1 WRITE — Modern Engine (`measure_write`)

Open file with: WRITE | CREATE
                + DSYNC if write_sync_enabled
                + O_DIRECT if direct_io_enabled (Linux/macOS only)

Allocate aligned buffer of block_size bytes (alignment = sector_alignment.bytes,
or default system alignment if NONE).

For b in range(num_blocks):
    if cancelled: break
    block_index = randint(0, num_blocks-1) if RANDOM else b
    byte_offset = block_index * block_size
    pwrite(fd, buffer, byte_offset)  or equivalent positional write
    total_bytes_written += block_size
    runner.update_write_progress()

elapsed_ns = end_ns - start_ns
access_time_ms = (elapsed_ns / 1e6) / num_blocks
bw_mb_sec = (total_bytes_written / MEGABYTE) / (elapsed_ns / 1e9)

9.2 READ — Modern Engine (`measure_read`)

Open file with: READ
                + O_DIRECT if direct_io_enabled

For b in range(num_blocks):
    if cancelled: break
    block_index = randint(0, num_blocks-1) if RANDOM else b
    byte_offset = block_index * block_size
    pread(fd, buffer, byte_offset)
    total_bytes_read += block_size
    runner.update_read_progress()

elapsed_ns = end_ns - start_ns
access_time_ms = (elapsed_ns / 1e6) / num_blocks
bw_mb_sec = (total_bytes_read / MEGABYTE) / (elapsed_ns / 1e9)

9.3 READ Preparation (`prepare_read`)

Used when benchmark_type == READ (no prior WRITE phase). Writes sequential data to the test file without timing or recording bandwidth. Uses the write units counter so the progress bar reflects preparation work.

Open file with: WRITE | CREATE | TRUNCATE

For b in range(num_blocks):
    if cancelled: break
    byte_offset = b * block_size
    pwrite(fd, buffer, byte_offset)
    runner.update_write_progress()

10. OS-Specific Behaviours

10.1 Cache Drop

Must be attempted before the READ phase of a READ or READ_WRITE benchmark (to prevent reads hitting the page cache).

OS	Privileged	Action
Linux	root	`sync` then `echo 1 > /proc/sys/vm/drop_caches`
Linux	non-root	Print instructions; block until user presses Enter
macOS	root	`sync; sudo purge`
macOS	non-root	Print instructions; block until user presses Enter
Windows	admin	Run `EmptyStandbyList.exe` from install dir
Windows	non-admin	Print instructions; block until user presses Enter

Skip cache drop entirely when:

Direct I/O is enabled and OS is not macOS (kernel bypasses page cache).
Benchmark was cancelled.

10.2 Drive Model Detection

OS	Method
Linux	`lsblk -o NAME,MODEL`, resolve via `/sys/block/<dev>/device/model`
macOS	`diskutil info <device>` → `Device / Media Name`
Windows	WMI query `Win32_DiskDrive` → `Model`, mapped via drive letter

10.3 Partition / Drive Letter

OS	Field
Linux	`/proc/mounts` or `df` → resolves symlinks to `/dev/sdX` or `/dev/nvmeXnY`
macOS	`df` then `diskutil info`
Windows	Extract drive letter from path root (`C`, `D`, …)

10.4 Disk Usage

OS	Command
Linux / macOS	`df -k <path>` — parse `1K-blocks`, `Used`, `Use%` columns
Windows	WMI `Win32_LogicalDisk` or `GetDiskFreeSpaceEx`

Result fields: percent_used, used_gb, total_gb.

10.5 Processor Name

OS	Method
Linux	Parse `/proc/cpuinfo` → `model name`
macOS	`sysctl -n machdep.cpu.brand_string`
Windows	`WMIC CPU get Name` or `winreg` → `HKLM\HARDWARE\DESCRIPTION\System\CentralProcessor\0\ProcessorNameString`

10.6 Direct I/O

OS	Support
Linux	`os.O_DIRECT` flag; buffer must be sector-aligned
macOS	`fcntl.F_NOCACHE` via `fcntl(fd, fcntl.F_NOCACHE, 1)`
Windows	`FILE_FLAG_NO_BUFFERING` via `CreateFile` (ctypes or pywin32)

If Direct I/O open fails, fall back silently to buffered I/O and log a warning.

10.7 Write Sync

OS	Mechanism
Linux	Open with `O_SYNC` or `O_DSYNC`; alternatively `os.fsync(fd)` per write
macOS	Same as Linux
Windows	`FILE_FLAG_WRITE_THROUGH` via `CreateFile`

11. Configuration Persistence

Settings are stored in a Java .properties-style flat-text file:

~/.pdm/<version>/pdm.properties

Format: key=value, one per line, # comment lines.

Key	Default	Notes
`activeProfile`	`QUICK_TEST`
`profileModified`	`false`
`benchmarkType`	`READ_WRITE`
`blockSequence`	`SEQUENTIAL`
`numOfSamples`	`200`
`numOfBlocks`	`32`
`blockSizeKb`	`512`
`numOfThreads`	`1`
`ioEngine`	`MODERN`
`writeSyncEnable`	`false`
`directEnable`	`false`
`sectorAlignment`	`ALIGN_4K`
`multiFile`	`true`
`autoRemoveData`	`true`
`autoReset`	`true`

Note: The Python port also persists theme, locationDir to cover the most-used GUI preferences. Properties not relevant to the Python port (palette, sharePortal, uploadResourceLocator, uploadProtocol) are omitted.

12. CLI Interface

Entry point: python -m pydiskmark run [OPTIONS]

12.1 Sub-command: `run`

Option	Short	Type	Default	Description
`--profile`	`-p`	str	`QUICK_TEST`	Named profile
`--type`	`-t`	str	profile default	`READ`, `WRITE`, `READ_WRITE`
`--threads`	`-T`	int	profile default	Number of concurrent threads
`--order`	`-o`	str	profile default	`SEQUENTIAL`, `RANDOM`
`--blocks`	`-b`	int	profile default	Blocks per sample
`--block-size`	`-z`	int	profile default	Block size in KB
`--samples`	`-n`	int	profile default	Number of samples
`--direct`	`-d`	flag	False	Enable Direct I/O
`--write-sync`	`-y`	flag	False	Enable write-sync
`--alignment`	`-a`	str	`ALIGN_4K`	`NONE`, `ALIGN_512`, `ALIGN_4K`, `ALIGN_8K`, `ALIGN_16K`, `ALIGN_64K` (profile default used if not specified)
`--multi-file`	`-m`	flag	False	One file per sample
`--location`	`-l`	path	`$HOME`	Directory for test files
`--export`	`-e`	path	None	Export results to JSON file
`--save`	`-s`	flag	False	Persist to local database
`--clean`	`-c`	flag	False	Delete existing data dir first
`--verbose`	`-v`	flag	False	Verbose logging

Override precedence: explicit CLI option > profile default.

12.2 Execution Flow (CLI)

1. Load profile defaults into App state.
2. Apply any explicit CLI overrides.
3. Set location_dir; derive data_dir = location_dir / "pdm-data".
4. Validate location_dir is writable.
5. If --clean and data_dir exists: delete recursively.
6. Create data_dir if not present.
7. init() — collect OS/CPU info.
8. Print progress bar during benchmark.
9. Print result text after completion.
10. Export JSON if --export specified.
11. Save to DB if --save specified.
12. Remove data_dir if auto_remove_data is True.

12.3 Progress Bar (CLI)

Progress: [##########          ]  50% (50/100 units)

Length: 50 characters of # / space.
Rendered with \r (carriage return, no newline) at UPDATE_INTERVAL = 25 ms.
Cursor hidden during run (\x1b[?25l), restored after (\x1b[?25h).

13. Export Formats

13.1 JSON (`.json`)

Full serialisation of the Benchmark object tree using the JSON field names defined in §5. Pretty-printed, 2-space indent.

{
  "_id": "a1b2c3d4-e5f6-...",
  "username": "james",
  "config": { ... },
  "systemInfo": { ... },
  "driveInfo": { ... },
  "startTime": "2026-06-20T14:30:00",
  "endTime": "2026-06-20T14:31:30",
  "operations": [
    {
      "ioMode": "WRITE",
      "samples": [
        { "sn": 1, "bw": 523.4, "bt": 523.4, "la": 0.95, "lt": 0.95, "mn": 523.4, "mx": 523.4 }
      ],
      "bandwidth": 523.4,
      "latency": 0.95,
      "iops": 12345
    }
  ]
}

13.2 YAML (`.yml`)

Same structure as JSON, emitted as YAML without --- document-start marker.

13.3 CSV (`.csv`)

Flat table of samples with metadata header as # comment lines.

# pydiskmark x.y Benchmark Summary
# ---------------------------
# Date: 2026-06-20 14:30:00
# Model: Samsung 990 Pro
# Profile: Quick Test
# Type: READ_WRITE
# Threads: 1
# Order: SEQUENTIAL
# Blocks: 32
# BlockSize: 524288
# Samples: 50
# WRITE Result: bw 523.40 MB/s, lat 0.95 ms, iops 12345
# READ Result: bw 610.20 MB/s, lat 0.82 ms, iops 14321
# ---------------------------

sn,ioMode,bw,bt,la,lt,mn,mx
1,WRITE,523.4,523.4,0.95,0.95,523.4,523.4
...

14. Local Database (optional — `--save`)

The Java version uses Apache Derby (embedded SQL) with JPA/Hibernate. For the Python port use SQLite (sqlite3 stdlib) as the embedded database.

Schema tables: benchmark, benchmark_operation.

benchmark.id — UUID string (primary key)
benchmark_operation.benchmark_id — foreign key

The database file lives at:

~/.pdm/<version>/pdm.db

Operations to implement: save, find_all, delete_all, delete_by_ids.

15. Listener / Callback Interface

BenchmarkRunner is decoupled from output via a listener protocol:

class BenchmarkListener(Protocol):
    def on_sample_complete(self, sample: Sample) -> None: ...
    def on_progress_update(self, completed: int, total: int) -> None: ...
    def is_cancelled(self) -> bool: ...
    def attempt_cache_drop(self) -> None: ...

The CLI implementation of attempt_cache_drop calls the OS-specific cache drop logic (§10.1) and blocks until the user confirms or completes.

16. Application State

These globals are equivalent to Java's static App.* fields. In Python, encapsulate in an AppState dataclass or module-level variables.

Variable	Type	Default	Notes
`location_dir`	Path	None	where test files go
`data_dir`	Path	None	`location_dir / "pdm-data"`
`export_path`	Path	None
`auto_save`	bool	False	persist to DB
`verbose`	bool	False
`multi_file`	bool	True
`auto_remove_data`	bool	True	delete data dir after run
`auto_reset`	bool	True	reset running stats before run
`direct_enable`	bool	False
`write_sync_enable`	bool	False
`io_engine`	IoEngine	MODERN
`sector_alignment`	SectorAlignment	ALIGN_4K
`active_profile`	BenchmarkProfile	QUICK_TEST
`profile_modified`	bool	False
`benchmark_type`	BenchmarkType	READ_WRITE
`block_sequence`	BlockSequence	SEQUENTIAL
`num_of_samples`	int	200
`num_of_blocks`	int	32
`block_size_kb`	int	512
`num_of_threads`	int	1
`next_sample_number`	int	1	global monotonic counter
`os`	str	—	from platform
`arch`	str	—	from platform
`processor_name`	str	—
`username`	str	—	from os

Running stats (reset between benchmarks when auto_reset = True):

w_max = w_min = w_avg = w_acc = w_iops = -1.0
r_max = r_min = r_avg = r_acc = r_iops = -1.0

17. Module Structure (Suggested)

jdm-python/
├── pydiskmark/
│   ├── __init__.py
│   ├── __main__.py          ← entry point: python -m pydiskmark
│   ├── app.py               ← global state, init(), get_config()
│   ├── benchmark.py         ← Benchmark, BenchmarkConfig, BenchmarkSystemInfo,
│   │                           BenchmarkDriveInfo, BenchmarkType, IOMode,
│   │                           BlockSequence, IoEngine, SectorAlignment
│   ├── benchmark_operation.py
│   ├── benchmark_profile.py ← enum of pre-defined profiles
│   ├── benchmark_runner.py  ← BenchmarkRunner, BenchmarkListener
│   ├── cli.py               ← argparse entry point, CliListener, results printer
│   ├── exporter.py          ← JSON / YAML / CSV export
│   ├── io_engine.py         ← cross-platform Direct I/O: alloc_aligned, open_file,
│   │                           pwrite, pread, close_file, free_aligned
│   ├── sample.py            ← Sample, measure_write, measure_read, prepare_read
│   ├── util.py              ← randint, delete_directory, etc.
│   └── util_os.py           ← OS-specific: drive model, cache drop, disk usage,
│                               processor name, partition id
├── tests/
│   ├── test_phase1.py
│   ├── test_phase2.py
│   └── test_phase3.py
├── pyproject.toml
├── README.md
└── SPEC.md                  ← this file

18. Key Differences from Java — Python Implementation Notes

Java concern	Python equivalent
`ExecutorService.newFixedThreadPool(N)`	`concurrent.futures.ThreadPoolExecutor(N)`
`LongAdder` / `AtomicLong`	`threading.Lock` + `int`, or `threading.local` partial sums
`RandomAccessFile` (legacy)	removed — only MODERN engine is implemented
`FileChannel` + `MemorySegment` (modern)	`os.open()` + `os.pwrite()` / `os.pread()`
`ExtendedOpenOption.DIRECT`	`os.O_DIRECT` (Linux); `fcntl.F_NOCACHE` (macOS)
`StandardOpenOption.DSYNC`	`os.O_DSYNC` (Linux/macOS) or `os.fsync()` per write
JPA/Derby database	`sqlite3` (stdlib)
Jackson JSON serialiser	`json` stdlib or `dataclasses-json`
`System.nanoTime()`	`time.perf_counter_ns()`
GC detection / hints	Not applicable in Python — removed from model
`picocli`	`argparse` or `click`
Single-instance lock (`FileLock`)	Not required for CLI-only port

19. Out of Scope

SMART data collection (Smart.java, SmartPanel.java)
Community portal upload (Portal.java)
Windows MSI / Linux DEB / macOS PKG packaging

20. GUI — Desktop Interface (Phase 5)

Status: Implemented. Launch with python -m pydiskmark gui.

20.1 Overview

The GUI provides a visual desktop interface that replicates the Java Swing frontend's layout and functionality. It uses the same BenchmarkRunner / BenchmarkListener pipeline as the CLI — no changes to the engine layer.

20.2 Toolkit Decision

Chosen: Tkinter + matplotlib + sv-ttk

Factor	Decision
Toolkit	`tkinter` (stdlib) — zero runtime cost, ships with Python
Theme	`sv-ttk` (Sun Valley) — modern flat dark/light appearance
Chart	`matplotlib` via `FigureCanvasTkAgg` — dual-axis, real-time, well-packaged
Threading	Queue + `root.after(50ms)` polling — all Tkinter mutations on main thread

Additional runtime dependencies: matplotlib>=3.7, sv-ttk>=2.5, Pillow>=9 (optional — used for splash and About icons; falls back to emoji if absent).

20.3 Layout

┌────────────────────────────────────────────────────────────────┐
│  Menu: File | Action | Options | Help                          │
├───┬────────────────────────────────────────────────────────────┤
│ D │  Drives tab selected:                                      │
│ r │    DrivesPanel fills the ENTIRE content area               │
│ i │    (drive selector, info card, all-drives table, test dir) │
│ v │                                                            │
│ e │  Benchmark tab selected:                                   │
│ s │  ┌──────────────┬──────────────────────────────────────┐  │
│   │  │ ControlPanel │          ChartPanel                  │  │
│ B │  │ (320 px)     │  (matplotlib dual-axis, fills rest)  │  │
│ e │  │ settings /   │                                      │  │
│ n │  │ start/stop / │                                      │  │
│ c │  │ metrics grid │                                      │  │
│ h │  └──────────────┴──────────────────────────────────────┘  │
├───┴────────────────────────────────────────────────────────────┤
│  [Benchmark Operations] [Events]                               │
│   HistoryPanel — treeview of past operations                   │
├────────────────────────────────────────────────────────────────┤
│  Status text          [progress bar]  Total Tx (KB): N         │
└────────────────────────────────────────────────────────────────┘

20.4 Module Structure

pydiskmark/
├── db.py                    ← SQLite persistence (Phase 5)
└── gui/
    ├── __init__.py          ← launch_gui() entry point
    ├── theme.py             ← sv_ttk dark/light + chart colour palette
    ├── listener.py          ← GuiListener — queue-based BenchmarkListener
    ├── chart_panel.py       ← matplotlib FigureCanvasTkAgg, dual-axis
    ├── control_panel.py     ← Benchmark tab — settings combos + results grid
    ├── drives_panel.py      ← Drives tab — drive list, info card, dir chooser
    ├── history_panel.py     ← Benchmark Operations tab — DB history treeview
    └── main_window.py       ← MainWindow orchestrator

20.5 Left-Tab Panels

Drives Tab (`DrivesPanel`)

Top: drive selector dropdown (Drive letter — capacity)
Left sub-panel: Drive Info card
- Drive model, partition, usage percentage, used/total GB
- Access indicators: Read ✓ / Write ✓
- Usage progress bar
Right sub-panel: All Drives table
- Columns: Drive/Mount | Total (GB) | Used (GB) | Free (GB) | Usage %
- Click a row → update drive info card + set app.location_dir
Bottom: Test Directory path display + Browse button

Benchmark Tab (`ControlPanel`)

Profile, Type, Threads, Block Order, Blocks/Sample, Block Size (KB), Samples — dropdowns
Start / Stop button (toggles, disables combos during run)
Results grid: 3-column (Metric | Write | Read) for MB/s, Lat (ms), IOPS

20.6 Chart (`ChartPanel`)

Left Y-axis: Bandwidth (MB/s)
Right Y-axis: Latency (ms)
X-axis: Sample number
Series:
- Write BW — solid orange line
- Write Avg — dashed orange line
- Write Latency — small orange square markers (right axis)
- Read BW — solid cyan line
- Read Avg — dashed cyan line
- Read Latency — small cyan square markers (right axis)
Batched redraws every 3rd sample to avoid lag at high sample counts
set_title(str) — updates the chart suptitle with drive info
retheme() — reapply colours after dark/light toggle

20.7 Threading Model

Worker thread (BenchmarkRunner)
  └─ GuiListener.on_sample_complete()  ──→  queue.put((EVT_SAMPLE, sample))
  └─ GuiListener.on_progress_update()  ──→  queue.put((EVT_PROGRESS, %, %))
  └─ GuiListener.attempt_cache_drop()  ──→  queue.put((EVT_CACHE_DROP, event))
                                             blocks until main thread sets event
  └─ _run_worker() completes           ──→  queue.put((EVT_COMPLETE, benchmark))

Main thread (Tkinter)
  └─ _poll_queue() every 50 ms via root.after()
       drains queue → updates chart, progress bar, metrics labels
       on EVT_COMPLETE → saves to DB, refreshes history, re-enables controls

After the worker thread dies, _poll_queue performs one extra drain to catch EVT_COMPLETE posted just before thread exit (eliminates the "Benchmark cancelled" false-positive race condition).

20.8 Startup Sequence

The main window is built entirely off-screen to avoid flicker:

1. root.withdraw()                 — hide window before building
2. _SplashScreen shown            — borderless Toplevel, progress bar 0–100
3. Each build step advances the progress bar (10 % → 80 %)
4. DrivesPanel.refresh() — FAST phase only (shutil.disk_usage, <5 ms)
   Background thread starts; results delivered via queue.Queue
5. update_idletasks() + retheme()
6. root.wm_attributes("-alpha", 0) — compositor-level invisible
7. root.deiconify()
8. root.update()                  — drain full event queue (background
                                     patches, matplotlib draws) while invisible
9. root.wm_attributes("-alpha", 1) — snap to fully-rendered in one frame
10. splash.close()                — dismiss only after main window is opaque

DrivesPanel two-phase refresh:

Phase	Thread	What happens
FAST	Main	`shutil.disk_usage()` only — populates table with `"..."` in Model column, <5 ms
SLOW	Daemon	`get_drive_model()`, `get_filesystem()`, `get_bus_type()`, `get_sector_sizes()` per drive; selected drive processed first

Results are handed back to the main thread via queue.Queue + a 50 ms after() poller (_poll_update_queue). self.after() is never called from the background thread — Tkinter's _register() is not thread-safe in Python 3.14+.

20.9 Persistence (`db.py`)

DB location: ~/.pdm/<version>/pdm.db (SQLite, stdlib sqlite3).

Schema — one row per benchmark operation:

CREATE TABLE benchmark_ops (
    id            INTEGER PRIMARY KEY AUTOINCREMENT,
    group_id      TEXT,    -- UUID shared by ops from the same run
    drive_model   TEXT,
    partition_id  TEXT,
    profile       TEXT,
    benchmark_type TEXT,
    io_mode       TEXT,    -- WRITE | READ
    block_order   TEXT,
    num_samples   INTEGER,
    num_blocks    INTEGER,
    block_size_kb INTEGER,
    num_threads   INTEGER,
    start_time    TEXT,
    elapsed_ms    INTEGER,
    lat_avg_ms    REAL,
    iops          INTEGER,
    bw_mb_sec     REAL,
    data_json     TEXT     -- full benchmark JSON for chart replay
);

Benchmarks are auto-saved after each successful run. Double-clicking a row in the "Benchmark Operations" history tab replays that benchmark in the chart without re-running I/O.

20.10 Benchmark Execution Flow (GUI)

1. User selects drive (Drives tab) — sets location_dir
2. User selects profile / adjusts settings (Benchmark tab)
3. User clicks Start (or Ctrl+R)
4. GUI disables controls, clears chart, resets metrics
5. BenchmarkRunner.execute() runs in a daemon thread
6. on_sample_complete() → queue → chart.add_sample() every 50 ms
7. on_progress_update() → queue → progress bar update
8. attempt_cache_drop() → queue → modal dialog, blocks worker until dismissed
9. EVT_COMPLETE → db.save_benchmark() → history.refresh() → re-enable controls
10. Elapsed time shown in status bar; drive info refreshed

20.11 Export from GUI

File → Export: file dialog → JSON / YAML / CSV (reuses exporter.export())
Double-click history row: replay historical benchmark in chart (no re-run)

20.12 Design Principles

Left-tab layout matches jdm-java (Drives | Benchmark vertical tabs)
Dark mode by default via sv_ttk; Options → Toggle Theme switches to light
Settings persisted to ~/.pdm/<version>/pdm.properties on exit; loaded at startup
Default benchmarkType is READ_WRITE when no config file exists
Keyboard shortcuts: Ctrl+R (start), Esc (stop)
About dialog centred on the parent window (transient + grab_set)
Window title: pydiskmark <version> — <arch> — <CPU>
Chart suptitle: <drive model> — <partition>: <pct>% (<used>/<total> GB)

20.13 Benchmark Tab — Control Panel Grid Layout

The ControlPanel widget uses a 3-column Tkinter grid that replicates jdiskmark's Swing layout. The container frame has a fixed width=320 px (pack_propagate(False)).

Column definitions

Col	Role	Weight	Effect
0	Narrow label anchor (Profile / Type rows only)	0 (fixed)	`minsize=60 px` — holds the short "Profile" / "Type" label text
1	Middle — label overflow + combo left-edge	2	Receives ~40 % of the expandable space
2	Combo-only column (rows 3–7)	3	Receives ~60 % of the expandable space

Columns 1 and 2 together fill all space beyond col 0's 60 px minimum. The 2 : 3 weight ratio produces a 40 % : 60 % split, matching jdiskmark.

Row spanning rules

Row │ Widget          │ Col span (label) │ Col span (combo)
────┼─────────────────┼──────────────────┼──────────────────
 0  │ Profile         │ col 0 only       │ cols 1+2, sticky="ew"
 1  │ Type            │ col 0 only       │ cols 1+2, sticky="ew"
 2  │ Threads         │ cols 0+1         │ col 2 only, sticky="ew"
 3  │ Block Order     │ cols 0+1         │ col 2 only, sticky="ew"
 4  │ Blocks / Sample │ cols 0+1         │ col 2 only, sticky="ew"
 5  │ Block Size (KB) │ cols 0+1         │ col 2 only, sticky="ew"
 6  │ Samples         │ cols 0+1         │ col 2 only, sticky="ew"
 7  │ Start button    │ cols 0+1+2 (columnspan=3), sticky="ew"
 8  │ Separator       │ cols 0+1+2 (columnspan=3), sticky="ew"
 9  │ Results frame   │ cols 0+1+2 (columnspan=3), sticky="ew"

Visual result

 Col 0 (≥60 px) │  Col 1 (40 %)  │  Col 2 (60 %)  │
────────────────┼────────────────────────────────────┤
 Profile        │  [Profile combo ── spans 1+2 ──]  ▼│
 Type           │  [Type combo ──── spans 1+2 ──]  ▼│
 [Threads ─── spans 0+1 ──────]  │  [  1  ]       ▼│
 [Block Order ─ spans 0+1 ──]    │  [Sequential]  ▼│
 [Blocks / Sample ─ spans 0+1]   │  [   512  ]    ▼│
 [Block Size (KB) ─ spans 0+1]   │  [   512  ]    ▼│
 [Samples ─── spans 0+1 ──────]  │  [   250  ]    ▼│
 [────────────── Start ──────────────────────────]  │

Key rules

No width= hint on combos. All combos use sticky="ew" to fill their column. Hardcoded width= values fight with Tkinter's grid geometry and should not be set.
pack_propagate(False) on ctrl_frame is essential. Without it the frame would shrink to the minimum size of its children.
40 : 60 split ensures the shorter numeric combos (Threads, Samples, …) are proportionally narrower than the Profile / Type combos, matching jdiskmark.
Column 0 weight=0 prevents the label anchor from expanding; all growth goes to cols 1 and 2 in a 2 : 3 ratio.

Relevant source

control_panel.py — _build_ui() method
main_window.py — ctrl_frame = ttk.Frame(bench_page, width=320)

End of specification.

Uh oh!

FilesExpand file tree

SPEC.md

Latest commit

History