Skip to content

sudoer0x0/yhash

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yhash: Complete Technical Documentation

Python ≥ 3.9 License: MIT PyPI


Version: 1.0.0
Developer: Adekunle Abdulmujeeb
License: MIT
Language: Python 3.8+

Features


Table of Contents

  1. Overview
  2. Installation
  3. Project Structure
  4. Technology Stack
  5. Architecture & Internal Workings
  6. Algorithms
  7. Feature Reference
  8. Flag Compatibility Matrix
  9. Progress Bars
  10. Hash Display Design
  11. Security Model
  12. Memory Safety
  13. Error Handling
  14. Test Suite
  15. Module Reference

1. Overview

Yhash (Yung Hash) is a command-line interface tool for computing cryptographic hashes of files, directories, and text strings. It is written entirely in Python and exposes a single yhash command with composable flags.

Design goals:

  • Memory safety — files are never fully loaded into memory; they stream through 8 KB chunks.
  • Speed — multiple algorithms applied in a single file-read pass; directories hashed in parallel.
  • Clarity — every hash value always prints in full, regardless of terminal width.
  • Simplicity — single-dash flags, sensible defaults, no subcommands.
  • Security — all algorithm names are validated against a strict whitelist before any I/O begins; no eval, exec, or shell calls are made anywhere.

2. Installation

From PyPI

pip install yhash

Using pipx (recommended for CLI tools)

pipx installs the tool in an isolated virtual environment and adds yhash to your PATH without affecting your system Python packages.

pipx install yhash

To install from a local directory (from source):

pipx install .

From source with pip

git clone https://github.com/yhash/yhash.git
cd yhash
pip install .

Editable install (for development)

pip install -e .

Verify the installation:

yhash --version
# Yhash  v1.0.0

3. Project Structure

yhash/
├── pyproject.toml          # PEP 621 packaging metadata and entry point
├── README.md               # Quick-start reference
├── LICENSE                 # MIT licence
├── src/
│   └── yhash/
│       ├── __init__.py     # Package version metadata
│       ├── constants.py    # All shared constants — algorithms, thresholds, flags
│       ├── hasher.py       # Core hashing engine (the only file that calls hashlib)
│       ├── utils.py        # File collection, manifest I/O, clipboard, validation
│       ├── formatter.py    # All Rich terminal output
│       ├── cli.py          # Click CLI definition and dispatch logic
│       └── py.typed        # PEP 561 type-checking marker
└── tests/
    ├── test_hasher.py      # Unit tests — hashing engine
    ├── test_utils.py       # Unit tests — utilities
    ├── test_cli.py         # Integration tests — CLI flags and flows
    ├── test_security.py    # Security and vulnerability tests
    └── test_system.py      # System tests — end-to-end workflows

The src/ layout (PEP 517) ensures the package is never accidentally imported from the project root during development; only the installed version is importable.


4. Technology Stack

Click ≥ 8.1

What it is: A Python package for building command-line interfaces via decorators.

How it is used: Every flag (-sha256, -chain, -text, etc.) is declared as a @click.option decorator on the main() function. Click handles all argument parsing, type conversion, help generation, and version output. The CliRunner from click.testing is also used in the integration test suite to invoke the CLI in-process without spawning a subprocess.

Key settings used:

  • context_settings = {"help_option_names": ["-help", "--help"]} — enables -help as a valid help flag alongside the standard --help.
  • no_args_is_help=False — prevents Click from printing help when no arguments are given; Yhash handles that case itself (showing the banner and usage).
  • @click.argument("files", nargs=-1) — captures zero or more positional file paths into a tuple.
  • @click.option("-algo", "--algorithms", "show_algo", is_flag=True) — one option declared with two names, so both -algo and --algorithms work.

Rich ≥ 13.7

What it is: A Python library for rich text and beautiful terminal formatting.

How it is used:

Rich component Where used Purpose
Console(soft_wrap=True) formatter.py Shared output sink; soft_wrap=True prevents Rich from truncating long hash strings
Rule formatter.py Horizontal separator between output sections
Text formatter.py Styled inline text with per-character style control
Panel formatter.py Bordered panels for MATCH/MISMATCH verdict, About, Manifest Created
Table formatter.py Multi-file hash results table, algorithm reference table, verification meta info
Progress cli.py Progress bars for large files (> 5 MB) and recursive directory batches
SpinnerColumn cli.py Animated spinner on the left of the progress bar
BarColumn cli.py The actual filled progress bar
TaskProgressColumn cli.py Percentage complete
TransferSpeedColumn cli.py MB/s read speed
TimeRemainingColumn cli.py Estimated time to completion

Why soft_wrap=True: By default, Rich measures the width of each print() call and inserts a newline at the terminal boundary. For a 128-character SHA-512 or BLAKE2b digest in a narrow terminal, this would produce a truncated line ending in . With soft_wrap=True, Rich emits the full string and leaves line-wrapping to the terminal, which visually wraps the characters but never drops any of them.

overflow="fold" on table columns: Long digests in multi-file tables are folded (wrapped) within the cell rather than being clipped. This is set on every hash-value column of the results table.

hashlib (Python standard library)

What it is: Python's built-in interface to cryptographic hash functions, backed by OpenSSL.

How it is used: All cryptographic operations live exclusively in hasher.py. No other module calls hashlib directly.

# For all algorithms except BLAKE2b:
hasher = hashlib.new("sha256")
hasher.update(chunk)
digest = hasher.hexdigest()

# For BLAKE2b (requires special constructor):
hasher = hashlib.blake2b()

The hashlib.new(name) factory method is used for all algorithms except blake2b, which requires its own constructor. This is abstracted behind the internal _make_hasher(algorithm) function.

pyperclip ≥ 1.9

What it is: A cross-platform Python clipboard module.

How it is used: Called only from utils.copy_to_clipboard(). The function wraps the call in a broad except Exception so that a clipboard failure (common on headless servers, CI environments, or Linux systems without xclip/xsel) is never a crash — it returns False and cli.py shows a warning instead.

concurrent.futures (Python standard library)

What it is: Python's high-level threading and multiprocessing interface.

How it is used: ThreadPoolExecutor is used in hasher.hash_files_parallel() and in the recursive directory flow inside cli.py. File hashing is I/O-bound, making threads (not processes) the correct tool — threads allow concurrent disk reads without the serialisation overhead of multiprocessing. Worker count is capped at min(8, number_of_files) to avoid thread thrashing on large directories.

pyproject.toml / setuptools ≥ 68

What it is: The modern Python packaging standard (PEP 517/621).

How it is used: All project metadata (name, version, dependencies, Python requirement, entry point) lives in pyproject.toml. The entry point yhash = "yhash.cli:cli_entry" wires the yhash shell command to the cli_entry() function. The [tool.setuptools] section uses an explicit package-dir = {"": "src"} declaration so that both pip install and pipx install resolve the package correctly without relying on auto-discovery.


5. Architecture & Internal Workings

Yhash is divided into five layers. Data flows strictly from right to left in this diagram; only cli.py imports from all other layers.

 Shell
  │
  ▼
cli_entry()           ← cli.py     (argv normalisation + Click dispatch)
  │
  ├──► hasher.py      (cryptographic operations — hashlib only)
  ├──► utils.py       (file I/O, manifest, clipboard, validation)
  └──► formatter.py   (all terminal output — Rich only)
            │
            └──► constants.py  (shared constants, no imports from other modules)

5.1 Entry Point & argv Normalisation

The shell command yhash calls cli_entry(), not main() directly.

def cli_entry() -> None:
    sys.argv = [sys.argv[0]] + _normalise_argv(sys.argv[1:])
    main()

Before Click parses anything, _normalise_argv() iterates over every argument and lower-cases any token whose lowercase form appears in NORMALISABLE_FLAGS:

NORMALISABLE_FLAGS = {"-sha256", "-sha512", "-sha384", "-sha1", "-md5", "-blake2b",
                      "-chain", "-copy", "-create", "-json", "-about", "-algo", "-text"}

This means -SHA256, -Sha256, and -sha256 are all normalised to -sha256 before Click sees them. File paths, hash values, and chain strings are left untouched because their lowercase forms do not match any flag in the set. This is a pure string-set lookup — O(1) per token.

5.2 CLI Dispatch Logic

main() is the single Click command. After argument parsing, it dispatches to one of six flows in strict priority order:

1. show_algo  → display algorithm table, return
2. show_about → display about panel, return
3. No input   → display banner + help, return
4. chain      → chain hashing (file or text)
5. text_input → independent text hashing
6. check_hash → hash verification
7. recursive  → recursive directory hashing
8. files > 1  → multi-file parallel hashing
9. files == 1 → single file hashing

Mutual-exclusion checks happen before dispatch:

  • -chain + any algorithm flag → error (the chain string already specifies all algorithms)
  • -text + file arguments → error (text and file inputs cannot be mixed)

5.3 Hashing Engine

hasher.py is the only module that imports hashlib. All other modules call functions from hasher.py.

_make_hasher(algorithm)

Internal factory. Validates the algorithm against SUPPORTED_ALGORITHMS and returns the appropriate hashlib object. Raises ValueError for any unsupported name, which is the security boundary that prevents injection.

compute_hashes(file_path, algorithms, *, progress, task)

The core file-hashing function. Opens the file in binary mode and reads it in CHUNK_SIZE = 8192 byte (8 KB) chunks. All hashers are updated on every chunk, so N algorithms cost N × (CPU work per chunk), but the file is only read once regardless of N.

File on disk:
 ┌──────────┬──────────┬──────────┬──────────┐
 │ chunk 0  │ chunk 1  │ chunk 2  │ chunk 3  │  ...
 └──────────┴──────────┴──────────┴──────────┘
      │            │
      ▼            ▼
  sha256.update  sha256.update  ...
  sha512.update  sha512.update  ...
  blake2b.update blake2b.update ...

After the loop ends, hexdigest() is called on each hasher. The result dict preserves the order of the input algorithms list.

Optional progress and task parameters allow the caller (cli.py) to inject a Rich Progress context for live progress bars on large files; the engine calls progress.update(task, advance=len(chunk)) on every iteration if they are provided.

chain_hash_file(file_path, algorithms, *, progress, task)

Implements the chain hashing algorithm:

  1. Open and stream the file with the first algorithm only (8 KB chunks). Produce hex string H0.
  2. For each subsequent algorithm Ai, create a fresh hasher and call hasher.update(H_{i-1}.encode("utf-8")). Produce Hi.
  3. Return all intermediate and final hashes as an ordered dict.

The key design decision is that each chained step hashes the UTF-8 encoding of the previous hex string, not the raw bytes of the previous digest. This means the chain input is always a printable ASCII string of hexadecimal digits.

File bytes  →  [SHA256]  →  "a3f9..."   (64 hex chars)
                              │
                         .encode("utf-8")
                              │
                              ▼
               [SHA512]  →  "8b2e..."   (128 hex chars)  ← Final Chained Hash

chain_hash_text(text, algorithms)

Identical to chain_hash_file but the initial input is text.encode("utf-8") rather than file bytes. No streaming or progress bar is needed since text is always small.

hash_text(text, algorithms)

Hashes the same UTF-8 bytes independently with each algorithm. All algorithms see the same input — this is not chained. Returns a dict mapping algorithm → hexdigest.

hash_files_parallel(file_paths, algorithms, *, max_workers)

Uses ThreadPoolExecutor to hash multiple files concurrently. Each file is submitted as an independent task. as_completed() yields results in completion order (not submission order), which is handled correctly — results are stored by path string key, not by order.

Worker count: min(max_workers, len(file_paths)) — capped to avoid creating more threads than files.

Error handling: if compute_hashes() raises for a specific file, the exception is caught inside the worker and stored as {"error": str(exc)} in the result dict. This prevents one failing file from aborting the entire batch.

5.4 Output Layer

formatter.py owns all terminal output. cli.py imports functions from it; it never calls print() or console.print() directly.

Shared Console instance:

console = Console(soft_wrap=True)

This singleton is imported by both formatter.py and cli.py so that Rich Progress bars (which need access to the console to avoid interleaving with other output) share the same output stream.

Hash display — two-line layout:

Every hash value is printed on its own dedicated line below the algorithm label, using _print_hash():

def _print_hash(digest: str, style: str = CLR_HASH) -> None:
    console.print(Text(_HASH_INDENT + digest, style=style, no_wrap=True))

no_wrap=True on the Text object combined with soft_wrap=True on the Console means Rich will not split, fold, or truncate the hash regardless of terminal width. The terminal itself handles visual wrapping if needed, but no character is ever dropped.

Verification layout:

The display_verification() function prints file metadata in a Table(box=None) (invisible borders for alignment), then prints the expected and computed hashes each on their own _print_hash() line. The verdict (MATCH or MISMATCH) appears in a coloured Panel.

JSON output:

display_json_results() creates a fresh Console(highlight=False, markup=False) for each call to avoid any Rich markup processing interfering with JSON content:

def display_json_results(data: Any) -> None:
    _json_console = Console(soft_wrap=True, highlight=False, markup=False)
    _json_console.print(json.dumps(data, indent=2, ensure_ascii=False))

5.5 Utilities Layer

utils.py provides helpers with no knowledge of the CLI or display.

  • format_file_size(size_bytes) — converts bytes to a human-readable string using iterative division by 1024.
  • collect_files_recursive(path) — uses Path.rglob("*") to find all files under a directory, sorted alphabetically. If given a file path, returns that single file.
  • collect_files_flat(path) — uses Path.iterdir() for a non-recursive listing.
  • validate_algorithms(algos) — validates and deduplicates a list of algorithm names. Deduplication preserves insertion order using a seen set alongside the result list.
  • parse_chain_algorithms(chain_str) — splits on comma, strips whitespace from each part, calls validate_algorithms, and enforces the minimum length of 2.
  • create_manifest_file(file_hashes, algorithms, output_path) — serialises a manifest dict to UTF-8 JSON and writes it atomically (as a single write_text call). The manifest always ends with a newline.
  • load_manifest_file(manifest_path) — reads and parses a manifest file, raising FileNotFoundError or json.JSONDecodeError on failure.
  • copy_to_clipboard(text) — wraps pyperclip.copy() in a try/except and returns bool.
  • get_final_hash(results) — returns the last value in a dict (Python 3.7+ dicts are insertion-ordered).

6. Algorithms

Flag Algorithm Digest Length Security Status Notes
-sha256 SHA-256 64 hex chars Recommended Default when no flag is given
-sha512 SHA-512 128 hex chars Recommended Higher security, larger output
-sha384 SHA-384 96 hex chars Recommended SHA-2 family, truncated SHA-512
-sha1 SHA-1 40 hex chars Deprecated Collision attacks exist; legacy only
-md5 MD5 32 hex chars Weak Checksums only; not cryptographically safe
-blake2b BLAKE2b 128 hex chars Recommended Fastest secure algorithm; no length-extension vulnerability

All flags are case-insensitive. -SHA256, -Sha256, and -sha256 are all valid. This is achieved by normalising sys.argv before Click parses it.

The default algorithm is sha256. It is applied whenever no algorithm flag is specified.


7. Feature Reference

7.1 Default File Hashing

Hash a single file with the default SHA-256 algorithm.

Syntax:

yhash <file>

Example:

yhash document.pdf

Output:

────────────────────────────────────────────
  File    document.pdf  (1.2 MB)

  →  SHA-256
       a665a45920422f9d417e4867efdc4fb8...

How it works:

  1. cli_entry() normalises sys.argv and calls main().
  2. Click parses document.pdf into the files tuple.
  3. main() calls _build_algorithms() — no flags were set, so it returns ["sha256"].
  4. _hash_file() checks the file size. If ≤ 5 MB, calls compute_hashes() directly. If > 5 MB, wraps the call in a Rich Progress context.
  5. compute_hashes() opens the file, reads 8 KB chunks, feeds each chunk into the SHA-256 hasher.
  6. display_hash_results() prints the source name, file size, algorithm label, and digest.

7.2 Algorithm Selection Flags

Choose a specific hashing algorithm.

Syntax:

yhash -<algorithm> <file>

Examples:

yhash -sha512 archive.tar.gz
yhash -blake2b firmware.bin
yhash -md5 installer.exe
yhash -sha1 legacy_backup.zip
yhash -sha384 contract.pdf

How it works: Each algorithm flag is a Click boolean option (is_flag=True). _build_algorithms() inspects all six boolean parameters and appends matching algorithm names to the selected list in declaration order. If none are set, it returns [DEFAULT_ALGORITHM] ("sha256").

Case insensitivity: -SHA512, -Sha512, and -sha512 are all accepted. The _normalise_argv() function checks each argv token against NORMALISABLE_FLAGS (a set of lowercase flag strings) and lowercases any match before Click processes them.


7.3 Multiple Algorithms in One Pass

Compute several hashes in a single file read.

Syntax:

yhash -<algo1> -<algo2> [-<algo3> ...] <file>

Example:

yhash -sha256 -sha512 -blake2b document.pdf

Output:

────────────────────────────────────────────
  File    document.pdf  (1.2 MB)

  →  SHA-256
       a665a45920422f9d417e4867efdc4fb8...

  →  SHA-512
       cf83e1357eefb8bdf1542850d66d8007...

  →  BLAKE2b
       786a02f742015903c6c6fd852552d272...

How it works: compute_hashes(file_path, ["sha256", "sha512", "blake2b"]) creates three hashlib objects before opening the file. On every 8 KB chunk, all three objects are updated in a single loop. The file is read exactly once. Adding more algorithms increases CPU work linearly but does not increase I/O.

This is more efficient than running yhash three times, which would read the file three times.


7.4 Chained Hashing

Apply algorithms sequentially: the output of each step becomes the input of the next.

Syntax:

yhash -chain <algo1,algo2,...> <file>

At least two algorithms must be specified. The chain string is case-insensitive and whitespace around commas is ignored.

Example:

yhash -chain md5,sha256,sha512 video.mp4

Output:

────────────────────────────────────────────
  Chain   video.mp4

  →  MD5
       d41d8cd98f00b204e9800998ecf8427e

  →  SHA-256
       e3b0c44298fc1c149afbf4c8996fb924...

  →  SHA-512   <- Final Chained Hash
       cf83e1357eefb8bdf1542850d66d8007...

How chaining works — step by step:

Step 1:  Read raw bytes of video.mp4 through MD5 hasher
         → MD5 hex digest: "d41d8cd98f00b204..." (32 hex chars)

Step 2:  Encode that hex string as UTF-8 bytes: b"d41d8cd98f00b204..."
         Feed those bytes into a fresh SHA-256 hasher
         → SHA-256 hex digest: "e3b0c44298fc1c..." (64 hex chars)

Step 3:  Encode that hex string as UTF-8 bytes: b"e3b0c44298fc1c..."
         Feed those bytes into a fresh SHA-512 hasher
         → SHA-512 hex digest: "cf83e1357eefb8..." (128 hex chars)  <- Final

Each subsequent step hashes the UTF-8 encoding of the previous hexadecimal string, not the raw bytes of the previous digest. This is important: the input to step 2 is the ASCII/UTF-8 representation of the hex string, e.g., "a3f9..." encoded as bytes, not the binary digest.

Chain with text:

yhash -chain sha256,blake2b -text "my passphrase"

Constraint: -chain cannot be combined with individual algorithm flags (-sha256, etc.). All algorithms in the chain must be specified in the chain string.

How it is implemented: cli.py calls parse_chain_algorithms(chain) which splits the comma-separated string, normalises case, deduplicates, and validates all names against the whitelist. Then chain_hash_file() or chain_hash_text() in hasher.py performs the sequential hashing.


7.5 Text / String Hashing

Hash an arbitrary string directly, without creating a file.

Syntax:

yhash -text "<string>"

Examples:

yhash -text "hello world"
yhash -sha512 -text "my secret key"
yhash -sha256 -sha512 -text "password123"
yhash -chain md5,sha256 -text "chain this string"

Output:

────────────────────────────────────────────
  Text    "hello world"

  →  SHA-256
       b94d27b9934d3e08a52e52d7da7dabfa...

How it works: The string is encoded to bytes using text.encode("utf-8") and passed to hash_text() or chain_hash_text() in hasher.py. No temporary file is created. Any Unicode string is accepted — including spaces, special characters, emoji, null bytes, and control characters — because the encoding step handles all of them.

Constraints:

  • -text and file arguments cannot be combined in the same command.
  • -text "" (empty string) is valid and produces the well-known hash of an empty byte sequence.
  • -text is compatible with -chain, -copy, -json, and all algorithm flags.

7.6 Multiple Files

Hash several files in a single command, displayed as a table.

Syntax:

yhash [algo flags] <file1> <file2> [file3 ...]

Example:

yhash -sha256 -sha512 report.pdf data.csv archive.zip

Output:

┌──────────────┬─────────────────────────────┬─────────────────────────────┐
│ File         │ SHA-256                     │ SHA-512                     │
├──────────────┼─────────────────────────────┼─────────────────────────────┤
│ report.pdf   │ a665a459...                 │ cf83e135...                 │
│ data.csv     │ 7f8e9a0b...                 │ 4fddee6d...                 │
│ archive.zip  │ 3b00361b...                 │ 5d9c46ef...                 │
└──────────────┴─────────────────────────────┴─────────────────────────────┘

How it works: When len(valid_fps) > 1, cli.py iterates through each file and calls _hash_file() sequentially (not in parallel at this level — parallel processing is reserved for the -r recursive mode, which may have hundreds of files). Results are collected into results_map and displayed via display_multi_file_results().

The table uses Rich's overflow="fold" on each hash column so that long hashes (SHA-512, BLAKE2b — 128 chars) wrap inside the cell rather than being clipped.

If a file fails (e.g. permission denied), its row shows the error message in red; processing continues for the remaining files.


7.7 Recursive Directory Hashing

Hash all files under a directory and its subdirectories.

Syntax:

yhash -r <directory>

Examples:

yhash -r ./project
yhash -sha512 -r ./backups
yhash -sha256 -sha512 -r ./documents

Output:

Found 47 file(s) — processing...

┌────────────────────┬──────────────────────────────────────┐
│ File               │ SHA-256                              │
├────────────────────┼──────────────────────────────────────┤
│ main.py            │ a665a45920422f9d...                  │
│ utils.py           │ cf83e1357eefb8bd...                  │
│ README.md          │ 7f8e9a0b1c2d3e4f...                  │
│ ...                │ ...                                  │
└────────────────────┴──────────────────────────────────────┘

How it works:

  1. collect_files_recursive(path) walks the directory tree with Path.rglob("*"), returning a sorted list of all files.
  2. A ThreadPoolExecutor (capped at min(8, file_count) workers) submits each file as a separate _hash_one_rec() task.
  3. A Rich Progress bar tracks completion as as_completed() yields finished futures.
  4. Results are displayed in a table.

Multiple directories and files can be combined:

yhash -r ./src ./tests README.md

Combined with -create: When -r and -create are combined, instead of a table, a .yhash manifest file is written. See Section 7.9.


7.8 Hash Verification

Compare a file's computed hash against a known expected value.

Syntax:

yhash -check <expected_hash> <file>

Examples:

yhash -check a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3 document.pdf

# Verify with SHA-512 instead of the default SHA-256
yhash -sha512 -check cf83e1357eefb8bdf1542850d66d800... document.pdf

Output on match:

────────────────────────────────────────────
  File          document.pdf
  Algorithm     SHA-256

  Expected
       a665a45920422f9d417e4867efdc4fb8...

  Computed
       a665a45920422f9d417e4867efdc4fb8...

╭──────────────────────────────────────────╮
│  MATCH  —  The file is intact...         │
╰──────────────────────────────────────────╯

Output on mismatch:

╭──────────────────────────────────────────╮
│  MISMATCH  —  Hash does not match.      │
│  File may be corrupted or tampered with. │
╰──────────────────────────────────────────╯

How it works:

  1. The file is hashed using _hash_file() with only the first algorithm in algorithms list (defaults to SHA-256 unless an algorithm flag is set).
  2. The computed digest is compared to the expected value using case-insensitive string comparison: expected_hash.lower() == computed_hash.lower().
  3. Both hashes are displayed in full regardless of terminal width (using _print_hash()).
  4. The verdict is displayed in a green (MATCH) or red (MISMATCH) Panel.

The comparison is always case-insensitive. Uppercase and lowercase hex strings are treated identically.


7.9 Manifest Creation

Generate a .yhash manifest file containing hash mappings for one or more files.

Syntax:

yhash -create [algo flags] <file1> [file2 ...]
yhash -create [algo flags] -r <directory>

Examples:

# Manifest for specific files
yhash -create report.pdf data.csv

# Recursive manifest with SHA-512
yhash -sha512 -create -r ./project

# Manifest with multiple algorithms
yhash -sha256 -sha512 -create -r ./dist

Manifest file format (manifest.yhash):

{
  "yhash_version": "1.0.0",
  "created_at": "2025-01-15T10:30:00.123456+00:00",
  "algorithms": ["sha256"],
  "entries": {
    "/absolute/path/to/report.pdf": {
      "sha256": "a665a45920422f9d417e4867efdc4fb8..."
    },
    "/absolute/path/to/data.csv": {
      "sha256": "cf83e1357eefb8bdf1542850d66d8007..."
    }
  }
}

How it works:

  1. Files are collected (either the provided list or recursively via -r).
  2. Each file is hashed with the selected algorithms using compute_hashes().
  3. create_manifest_file() builds a dict with version, timestamp (UTC ISO 8601), algorithm list, and per-file hash entries.
  4. The dict is serialised to UTF-8 JSON with json.dumps(..., indent=2, ensure_ascii=False) and written with a single write_text() call.
  5. The manifest is saved as manifest.yhash in the current working directory.

The timestamp uses datetime.now(tz=timezone.utc) for a timezone-aware UTC value. The manifest is always valid JSON: path keys and hash values containing special characters are safely escaped by Python's json module.


7.10 Clipboard Copy

Copy the final hash to the system clipboard.

Syntax:

yhash -copy <file>
yhash -copy -sha512 <file>
yhash -copy -text "string"
yhash -copy -chain md5,sha256 <file>

Examples:

# Copy SHA-256 of a file
yhash -copy firmware.bin

# Copy SHA-512
yhash -sha512 -copy firmware.bin

# Copy the final chained hash
yhash -copy -chain md5,sha256,sha512 archive.tar.gz

Output:

  →  SHA-256
       a665a45920422f9d...

  OK    Copied to clipboard  a665a45920422f9d...

How it works: After the hash result is computed and displayed, _clipboard() in cli.py calls get_final_hash(results) to get the last entry in the results dict (which is always the final algorithm's output — in chained mode, this is the final chained hash). It then calls copy_to_clipboard(final) from utils.py.

copy_to_clipboard() calls pyperclip.copy(text) inside a try/except. On failure (e.g., no display server, no clipboard tool), it returns False and cli.py shows a warning message with installation instructions rather than crashing.

With multiple algorithms: The hash of the last specified algorithm is copied. Example: -sha256 -sha512 -copy file copies the SHA-512 hash (because SHA-512 was the last flag declared and _build_algorithms() preserves flag order).

With chaining: The final chained hash (last algorithm in the chain) is always what gets copied.


7.11 JSON Output

Print results as machine-readable JSON instead of the default formatted display.

Syntax:

yhash -json [other flags] <file>

Examples:

yhash -json document.pdf
yhash -json -sha512 document.pdf
yhash -json -text "hello world"
yhash -json -chain md5,sha256 archive.zip
yhash -json -sha256 -r ./folder

Single file output:

{
  "mode": "file",
  "file": "/path/to/document.pdf",
  "size_bytes": 1258496,
  "algorithms": ["sha256"],
  "hashes": {
    "sha256": "a665a45920422f9d417e4867efdc4fb8..."
  }
}

Text output:

{
  "mode": "text",
  "input": "hello world",
  "algorithms": ["sha256"],
  "hashes": {
    "sha256": "b94d27b9934d3e08a52e52d7da7dabfa..."
  }
}

Chain output:

{
  "mode": "chain",
  "file": "/path/to/archive.zip",
  "size_bytes": 45056,
  "chain": {
    "md5": "d41d8cd98f00b204e9800998ecf8427e",
    "sha256": "e3b0c44298fc1c149afbf4c8996fb924..."
  }
}

Recursive output:

{
  "mode": "recursive",
  "algorithms": ["sha256"],
  "file_count": 3,
  "results": {
    "/path/to/a.txt": {"hashes": {"sha256": "..."}},
    "/path/to/b.txt": {"hashes": {"sha256": "..."}},
    "/path/to/c.txt": {"error": "Permission denied"}
  }
}

How it works: -json is a boolean flag that, when set, routes all result data through display_json_results() instead of the Rich formatter functions. display_json_results() creates a dedicated Console(highlight=False, markup=False) to print the JSON string without any Rich markup processing, ensuring the output is clean and parseable by tools like jq.

Using with jq:

yhash -json document.pdf | jq '.hashes.sha256'
yhash -json -r ./folder | jq '.results | to_entries[] | {file: .key, hash: .value.hashes.sha256}'

7.12 Algorithm Reference Table

Display a table of all supported algorithms with security ratings.

Syntax:

yhash -algo
# or
yhash --algorithms

Output:

┌──────────┬──────────┬─────────────┬─────────────┬────────────────────────────────────┐
│   Flag   │ Algorithm│ Output Size │ Security    │ Recommended Use                    │
├──────────┼──────────┼─────────────┼─────────────┼────────────────────────────────────┤
│ -sha256  │ SHA-256  │   256-bit   │ Recommended │ General-purpose cryptographic...   │
│ -sha512  │ SHA-512  │   512-bit   │ Recommended │ High-security / large data         │
│ ...      │ ...      │   ...       │ ...         │ ...                                │
└──────────┴──────────┴─────────────┴─────────────┴────────────────────────────────────┘

Security ratings are colour-coded: green for Recommended, yellow for Deprecated or Weak.

How it works: display_algo_table() in formatter.py reads the ALGO_INFO dict from constants.py and builds a Rich Table. The Security column value is wrapped in Rich markup at render time to apply the colour.


7.13 About Panel

Display tool metadata in a formatted panel.

Syntax:

yhash -about

Output:

╭─────────────────── About Yhash ────────────────────╮
│ Yhash  (Yung Hash)                                 │
│                                                    │
│ Version       1.0.0                                │
│ Description   Modern, fast CLI hashing utility.   │
│ License       MIT                                  │
│ Python        >= 3.9                               │
│ Algorithms    SHA-256, SHA-512, SHA-384, ...       │
│ Install       pip install yhash / pipx install ... │
│ Repository    https://github.com/yhash/yhash       │
╰────────────────────────────────────────────────────╯

7.14 Help

Display the full usage reference.

Syntax:

yhash -help
# or
yhash --help

Prints all flags organised into sections: Basic Usage, Algorithm Flags, Chained Hashing, Text Hashing, Files & Directories, Verification, Manifest, Output Options, Install, and Info.

How it works: Click's context_settings = {"help_option_names": ["-help", "--help"]} registers both -help and --help as equivalent help triggers. When no arguments are provided at all, main() also calls display_help() alongside print_banner().


7.15 Version

Print the version number.

Syntax:

yhash --version

Output:

Yhash  v1.0.0

Implemented via Click's @click.version_option(VERSION, "--version", prog_name="Yhash", message="%(prog)s v%(version)s").


8. Flag Compatibility Matrix

Flag -sha* / -md5 / -blake2b -chain -text -r -copy -check -create -json
-sha* / -md5 / -blake2b ✅ Combine freely ❌ Mutual exclusion
-chain ❌ Mutual exclusion
-text ❌ No files
-r
-copy
-check ✅ (first algo used)
-create
-json

Mutual exclusions enforced in main() before any hashing begins:

  • -chain + any algorithm flag: error — all algorithms must be in the chain string.
  • -text + file arguments: error — text and file inputs cannot be mixed.

9. Progress Bars

A progress bar is shown automatically for any single file larger than 5 MB (LARGE_FILE_THRESHOLD = 5 * 1024 * 1024 bytes). Progress bars are also shown during recursive directory hashing regardless of individual file size.

What the progress bar shows:

⠋ Hashing firmware.bin  (120.0 MB) ████████████░░░░░░ 68% 42.1 MB/s 0:00:01
  • Spinner: animated rotating character (Rich SpinnerColumn)
  • Description: filename and size
  • Bar: filled proportionally to bytes read (Rich BarColumn, width 38)
  • Percentage: TaskProgressColumn
  • Speed: TransferSpeedColumn — bytes read per second
  • ETA: TimeRemainingColumn

How it works: _hash_file() in cli.py uses _progress_ctx() to create a Progress context manager. The task total is set to the file size in bytes. Inside compute_hashes(), after each 8 KB chunk is processed, progress.update(task, advance=len(chunk)) advances the bar. The progress bar is transient=True — it disappears after completion, leaving only the hash result.

The progress bar and hash output share the same console instance (the singleton from formatter.py), so Rich can correctly erase the progress bar before printing the result without output corruption.


10. Hash Display Design

Problem: Standard terminal output in Rich enforces its internal line-width limit. A 128-character SHA-512 or BLAKE2b digest in an 80-column terminal would be truncated with .

Solution — two mechanisms combined:

  1. Console(soft_wrap=True) — tells Rich not to enforce its own line-length limit when printing plain text. Rich emits the full string; the terminal handles visual wrapping.

  2. Two-line layout — each hash is printed on its own dedicated line below the algorithm label. This gives the hash the full terminal width, not a portion of it shared with a label.

def _print_hash(digest: str, style: str = CLR_HASH) -> None:
    console.print(Text(_HASH_INDENT + digest, style=style, no_wrap=True))

no_wrap=True on the Text object reinforces the instruction: this object should not be wrapped by Rich at any level.

Effect in practice:

  • On a 200-column terminal: entire 128-char hash on one visual line.
  • On a 80-column terminal: hash wraps visually at column 80, continues on the next line. All characters are present.
  • The hash value is never truncated, ellipsised, or partially hidden regardless of terminal width.

For multi-file tables: hash columns use overflow="fold" (not overflow="ellipsis") so the content wraps within the cell visually but the full hash is always present in the output.


11. Security Model

Algorithm Whitelist

The first line of defence is _make_hasher() in hasher.py:

def _make_hasher(algorithm: str) -> Any:
    if algorithm not in SUPPORTED_ALGORITHMS:
        raise ValueError(f"Unsupported algorithm: {algorithm!r}. Supported: ...")
    return hashlib.blake2b() if algorithm == "blake2b" else hashlib.new(algorithm)

SUPPORTED_ALGORITHMS is a fixed dict with exactly six keys. Any string not in that set raises ValueError before any hashlib call is made. This prevents all forms of algorithm-name injection:

Injection attempt Result
sha256; rm -rf / ValueError: Unsupported algorithm
sha256 && cat /etc/passwd ValueError
../../../etc/shadow ValueError
$(whoami) ValueError
sha256\x00sha512 ValueError

Validation happens at two independent layers: validate_algorithm() (single name) and validate_algorithms() (list). Both are called before any file is opened.

No Shell Calls

Yhash never calls subprocess, os.system(), eval(), or exec(). hashlib.new(name) is a Python API call, not a shell command. pyperclip.copy() may invoke a system clipboard tool internally, but the hash value passed to it is a hex string (characters [0-9a-f] only), which cannot be misinterpreted as shell syntax.

File Reading Is Read-Only

compute_hashes() and chain_hash_file() open files with open(path, "rb") — binary read mode. No write, no execute, no interpretation. Hashing a shell script does not run it.

Memory Safety

Files are never fully loaded into memory. The maximum memory overhead of the hashing engine is one CHUNK_SIZE (8192 byte) buffer plus the hasher state objects. A 100 GB file and a 1 KB file use the same peak memory.

Manifest Integrity

Manifest files are written with json.dumps() which safely escapes all special characters in keys and values. The output is always valid JSON — never executable code. Paths containing quotes, backslashes, or newlines are correctly serialised.

Clipboard Safety

The value passed to pyperclip.copy() is always the output of hexdigest() — a string of lowercase hexadecimal characters ([0-9a-f]+). This cannot be misinterpreted as shell commands even if pasted into a terminal.

Error Isolation in Parallel Processing

In hash_files_parallel() and the recursive directory flow, each file's hashing is wrapped in an individual try/except. A permission error, broken symlink, or read failure on one file produces {"error": str(exc)} in the result dict for that file and does not abort processing of other files.


12. Memory Safety

The core design principle is that Yhash's memory usage is O(1) with respect to file size — it does not scale with how large the file is.

Implementation:

with open(file_path, "rb") as fh:
    while True:
        chunk = fh.read(CHUNK_SIZE)   # CHUNK_SIZE = 8192 bytes
        if not chunk:
            break
        for hasher in hashers.values():
            hasher.update(chunk)      # hash state updated, chunk discarded

After each update() call, Python's garbage collector can reclaim the chunk buffer. The hashers maintain only their internal state (a fixed-size structure for each algorithm, typically 32–200 bytes), not a copy of the data.

Verified: In the test suite, test_30mb_file_memory_safe (in test_security.py) measures RSS memory before and after hashing a 30 MB file with three algorithms and asserts the increase is under 8 MB — well within normal Python interpreter overhead.


13. Error Handling

All errors are displayed via display_error() in formatter.py which prints a styled ERROR prefix followed by the message. The process then exits with code 1 via sys.exit(1).

Error conditions and their handling:

Condition Detection point Behaviour
File not found cli.py before hashing Error message, sys.exit(1) for single file; skipped with error for batches
Directory given without -r cli.py Error message, sys.exit(1)
Permission denied compute_hashes() open() PermissionError caught in cli.py, error message
Unsupported algorithm _make_hasher() ValueError with list of supported algorithms
-chain with < 2 algorithms parse_chain_algorithms() ValueError with example
-chain + algorithm flags main() Error message, sys.exit(1)
-text + file arguments main() Error message, sys.exit(1)
-r without path main() Error message, sys.exit(1)
-check without file main() Error message, sys.exit(1)
Clipboard unavailable copy_to_clipboard() Returns False; warning shown; execution continues
Empty directory with -r collect_files_recursive() Warning message; no hash attempted
Malformed manifest load_manifest_file() json.JSONDecodeError propagates to caller

Warnings (non-fatal) use display_warning() which prints a WARN prefix in yellow. Execution continues after a warning.


14. Test Suite

The test suite is written with Python's built-in unittest module and requires no additional testing framework. It consists of 199 tests across five modules.

Running the tests:

# From the project root, after pip install -e .
python3 -m unittest discover -s tests -v

Test modules:

tests/test_hasher.py — 48 tests (Unit)

Tests every public function in hasher.py in isolation.

Class What it tests
TestValidateAlgorithm All valid names, case normalisation, whitespace stripping, unsupported names, injection attempts
TestComputeHashes Each of the 6 algorithms independently, all 6 together, empty file, binary file, 12 MB streaming correctness, missing file, result key ordering, multi-algo vs individual consistency
TestChainHashFile 2-algo chain, 3-algo chain, 6-algo chain, order sensitivity, chain vs independent hash, key ordering, error on < 2 algorithms
TestChainHashText Same as above for text input; unicode, null bytes, empty string, single-algo rejection
TestHashText Each algorithm, multiple independent algorithms, empty string, unicode, null byte, 1M character string, control characters
TestHashFilesParallel All files hashed, hash correctness, multiple algorithms, nonexistent file returns error dict, deterministic across 5 concurrent runs, empty file list

tests/test_utils.py — 55 tests (Unit)

Tests every function in utils.py in isolation.

Class What it tests
TestFormatFileSize Zero, bytes boundary, KB, MB, GB, TB, return type
TestCollectFilesRecursive Nested structure (4 files), files only, single file input, sorted order, empty directory
TestCollectFilesFlat Non-recursive, single file, sorted order
TestValidateAlgorithms Valid list, case, deduplication, order preservation, all 6, unsupported, error message
TestParseChainAlgorithms 2 and 3 algos, case, whitespace, single-algo rejection, empty string, unsupported, injection, 6 algos, error message
TestCreateManifestFile File creation, valid JSON, all required fields, entries preserved, algorithms preserved, version type, default cwd path, unicode keys, special chars in keys, ISO timestamp, trailing newline
TestLoadManifestFile Valid load, missing file, malformed JSON, round-trip
TestGetFinalHash Single, multiple (returns last), empty dict
TestCopyToClipboard Returns bool, no crash on unavailable clipboard, no crash on empty string

tests/test_cli.py — 47 tests (Integration)

Tests the CLI end-to-end using Click's CliRunner, which invokes main() in-process and captures stdout.

Class What it tests
TestInfoCommands No args shows usage, --version, -algo, --algorithms, -about, -help
TestFileHashing All 6 algorithm flags, all 6 together, multiple algorithm flags, missing file, directory without -r, JSON output
TestTextHashing Default SHA-256, with -sha512, multiple algorithms, empty string, string with spaces, -text + file conflict, JSON output
TestChainHashing 2-algo file chain, 3-algo file chain, chain with -text, intermediate hashes in output, -chain + algo flag conflict, single-algo chain rejection, missing file graceful handling, JSON output
TestVerification Correct hash → MATCH, wrong hash → MISMATCH, case-insensitive comparison, SHA-512 check, BLAKE2b check, missing file
TestMultipleFiles All hashes in output, JSON output, with SHA-512
TestRecursive Finds all files in nested structure, missing path arg, nonexistent path
TestManifest Single file manifest creation, recursive manifest creation

tests/test_security.py — 25 tests (Security)

Verifies that adversarial inputs cannot exploit the tool.

Class What it tests
TestAlgorithmInjection 12 injection attempts rejected by validate_algorithm, validate_algorithms, parse_chain_algorithms, and the CLI chain flag
TestFileOperationSafety Nonexistent path raises, script file not executed during hashing, manifest written only to specified path, manifest contains no executable code, manifest always valid JSON with special chars, malformed manifest raises not crashes, missing manifest raises FileNotFoundError
TestInputSanitisation Null bytes, null bytes in chain, ANSI escape sequences, 7 unicode edge cases (zero-width space, BOM, emoji, combining chars, repeated nulls), 5 MB string, CLI verify with 10,000-char hash value
TestMemoryBounds 30 MB file with 3 algorithms; RSS increase must be < 8 MB
TestCLIAdversarialInputs Conflicting chain + algo flags, text + file conflict, -r without path, -check without file, nonexistent file, directory as file argument
TestConcurrencySafety 10 files hashed in parallel, 5 repeated runs — all results correct and deterministic

tests/test_system.py — 24 tests (System / End-to-End)

Tests complete user workflows without mocking any internal components.

Class What it tests
TestHashAndVerifyRoundTrip All 6 algorithms: hash then verify → MATCH; tampered file → MISMATCH; default algo is SHA-256
TestChainHashWorkflow 3-step chain with all intermediates correct; chain with BLAKE2b as final; chain order changes output; chain text 3 steps
TestMultiAlgorithmWorkflow All 6 simultaneously; JSON output contains correct digests; output contains algorithm labels
TestManifestWorkflow Manifest created for all files in a 3-file corpus; all hashes are correct (verified by reading files and comparing); SHA-512 manifest entries correct; algorithms field matches
TestRecursiveDirectoryWorkflow 4-file nested structure; all hashes found in output; SHA-512 digests found (handles table folding)
TestTextHashingWorkflow SHA-256 of "abc" correct; MD5 of "" correct; chain deterministic across two runs; unicode text uses UTF-8 encoding

15. Module Reference

constants.py

Pure data — no imports from other yhash modules, no side effects.

Name Type Value Purpose
VERSION str "1.0.0" Package version string
CHUNK_SIZE int 8192 Bytes read per chunk during file streaming
LARGE_FILE_THRESHOLD int 5242880 File size above which progress bars are shown
DEFAULT_ALGORITHM str "sha256" Algorithm used when no flag is specified
MANIFEST_EXTENSION str ".yhash" File extension for manifest files
SUPPORTED_ALGORITHMS dict[str, str] {"sha256": "SHA-256", ...} Internal name → display name mapping
ALGO_FLAGS set[str] {"-sha256", ...} CLI flags for algorithms (lowercase)
NORMALISABLE_FLAGS set[str] ALGO_FLAGS ∪ {...} All flags normalised by argv pre-processor
ALGO_INFO dict[str, tuple] {"sha256": ("SHA-256", "256-bit", ...)} Algorithm metadata for the -algo table

hasher.py

Function Signature Description
validate_algorithm (str) -> str Validate and normalise a single algorithm name
compute_hashes (Path, List[str], *, progress, task) -> Dict[str, str] Stream a file through all hashers simultaneously
chain_hash_file (Path, List[str], *, progress, task) -> Dict[str, str] Chain-hash a file; output of each step feeds the next
chain_hash_text (str, List[str]) -> Dict[str, str] Chain-hash a UTF-8 string
hash_text (str, List[str]) -> Dict[str, str] Hash a string independently with each algorithm
hash_files_parallel (List[Path], List[str], *, max_workers) -> Dict[str, Any] Hash multiple files concurrently via ThreadPoolExecutor

utils.py

Function Signature Description
format_file_size (int) -> str Convert bytes to human-readable string
collect_files_recursive (Path) -> List[Path] All files under a directory (sorted)
collect_files_flat (Path) -> List[Path] Files directly inside a directory (non-recursive)
validate_algorithms (List[str]) -> List[str] Validate, normalise, and deduplicate algorithm names
parse_chain_algorithms (str) -> List[str] Parse a comma-separated chain spec
create_manifest_file (Dict, List[str], Optional[Path]) -> Path Write a .yhash JSON manifest file
load_manifest_file (Path) -> Dict[str, Any] Read and parse a manifest file
copy_to_clipboard (str) -> bool Copy text to system clipboard via pyperclip
get_final_hash (Dict[str, str]) -> str Return the last value in a results dict

formatter.py

Function Description
print_banner() ASCII art banner with version subtitle
display_error(message) Red ERROR prefix + message
display_warning(message) Yellow WARN prefix + message
display_success(message) Green OK prefix + message
display_hash_results(name, results, *, is_text, file_size) Two-line layout for file or text hash results
display_chain_results(name, chain_results, *, is_text) Chain layout with <- Final Chained Hash marker
display_multi_file_results(all_results, algorithms) Rich table for multiple files
display_verification(file_path, expected, computed, algorithm) MATCH/MISMATCH panel with full hash display
display_algo_table() Algorithm reference table
display_about() About panel
display_manifest_created(path, count) Success panel after manifest creation
display_clipboard_success(hash_value) Clipboard copy confirmation
display_json_results(data) Plain JSON output (no Rich markup)
display_help() Full usage reference

cli.py

Symbol Description
cli_entry() Shell entry point; normalises argv then calls main()
main() Click command; dispatches to the appropriate flow
_normalise_argv(argv) Lowercases recognisable flags for case-insensitive input
_build_algorithms(...) Converts boolean flag values to algorithm name list
_hash_file(path, algorithms) Hash one file; conditionally shows progress bar
_chain_file(path, algorithms) Chain-hash one file; conditionally shows progress bar
_clipboard(results) Copy final hash to clipboard; show warning on failure
_progress_ctx() Factory for a configured Rich Progress instance

Yhash v1.0.0 — MIT License

About

Yhash (Yung Hash) is a command-line interface tool for computing cryptographic hashes of files, directories, and text strings. It is written entirely in Python and exposes a single `yhash` command with composable flags.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages