Version: 1.0.0
Developer: Adekunle Abdulmujeeb
License: MIT
Language: Python 3.8+
- Overview
- Installation
- Project Structure
- Technology Stack
- Architecture & Internal Workings
- 5.1 Entry Point & argv Normalisation
- 5.2 CLI Dispatch Logic
- 5.3 Hashing Engine
- 5.4 Output Layer
- 5.5 Utilities Layer
- Algorithms
- Feature Reference
- 7.1 Default File Hashing
- 7.2 Algorithm Selection Flags
- 7.3 Multiple Algorithms in One Pass
- 7.4 Chained Hashing
- 7.5 Text / String Hashing
- 7.6 Multiple Files
- 7.7 Recursive Directory Hashing
- 7.8 Hash Verification
- 7.9 Manifest Creation
- 7.10 Clipboard Copy
- 7.11 JSON Output
- 7.12 Algorithm Reference Table
- 7.13 About Panel
- 7.14 Help
- 7.15 Version
- Flag Compatibility Matrix
- Progress Bars
- Hash Display Design
- Security Model
- Memory Safety
- Error Handling
- Test Suite
- Module Reference
Yhash (Yung Hash) is a command-line interface tool for computing cryptographic hashes of files, directories, and text strings. It is written entirely in Python and exposes a single yhash command with composable flags.
Design goals:
- Memory safety — files are never fully loaded into memory; they stream through 8 KB chunks.
- Speed — multiple algorithms applied in a single file-read pass; directories hashed in parallel.
- Clarity — every hash value always prints in full, regardless of terminal width.
- Simplicity — single-dash flags, sensible defaults, no subcommands.
- Security — all algorithm names are validated against a strict whitelist before any I/O begins; no
eval,exec, or shell calls are made anywhere.
pip install yhashpipx installs the tool in an isolated virtual environment and adds yhash to your PATH without affecting your system Python packages.
pipx install yhashTo install from a local directory (from source):
pipx install .git clone https://github.com/yhash/yhash.git
cd yhash
pip install .pip install -e .Verify the installation:
yhash --version
# Yhash v1.0.0yhash/
├── pyproject.toml # PEP 621 packaging metadata and entry point
├── README.md # Quick-start reference
├── LICENSE # MIT licence
├── src/
│ └── yhash/
│ ├── __init__.py # Package version metadata
│ ├── constants.py # All shared constants — algorithms, thresholds, flags
│ ├── hasher.py # Core hashing engine (the only file that calls hashlib)
│ ├── utils.py # File collection, manifest I/O, clipboard, validation
│ ├── formatter.py # All Rich terminal output
│ ├── cli.py # Click CLI definition and dispatch logic
│ └── py.typed # PEP 561 type-checking marker
└── tests/
├── test_hasher.py # Unit tests — hashing engine
├── test_utils.py # Unit tests — utilities
├── test_cli.py # Integration tests — CLI flags and flows
├── test_security.py # Security and vulnerability tests
└── test_system.py # System tests — end-to-end workflows
The src/ layout (PEP 517) ensures the package is never accidentally imported from the project root during development; only the installed version is importable.
What it is: A Python package for building command-line interfaces via decorators.
How it is used: Every flag (-sha256, -chain, -text, etc.) is declared as a @click.option decorator on the main() function. Click handles all argument parsing, type conversion, help generation, and version output. The CliRunner from click.testing is also used in the integration test suite to invoke the CLI in-process without spawning a subprocess.
Key settings used:
context_settings = {"help_option_names": ["-help", "--help"]}— enables-helpas a valid help flag alongside the standard--help.no_args_is_help=False— prevents Click from printing help when no arguments are given; Yhash handles that case itself (showing the banner and usage).@click.argument("files", nargs=-1)— captures zero or more positional file paths into a tuple.@click.option("-algo", "--algorithms", "show_algo", is_flag=True)— one option declared with two names, so both-algoand--algorithmswork.
What it is: A Python library for rich text and beautiful terminal formatting.
How it is used:
| Rich component | Where used | Purpose |
|---|---|---|
Console(soft_wrap=True) |
formatter.py |
Shared output sink; soft_wrap=True prevents Rich from truncating long hash strings |
Rule |
formatter.py |
Horizontal separator between output sections |
Text |
formatter.py |
Styled inline text with per-character style control |
Panel |
formatter.py |
Bordered panels for MATCH/MISMATCH verdict, About, Manifest Created |
Table |
formatter.py |
Multi-file hash results table, algorithm reference table, verification meta info |
Progress |
cli.py |
Progress bars for large files (> 5 MB) and recursive directory batches |
SpinnerColumn |
cli.py |
Animated spinner on the left of the progress bar |
BarColumn |
cli.py |
The actual filled progress bar |
TaskProgressColumn |
cli.py |
Percentage complete |
TransferSpeedColumn |
cli.py |
MB/s read speed |
TimeRemainingColumn |
cli.py |
Estimated time to completion |
Why soft_wrap=True: By default, Rich measures the width of each print() call and inserts a newline at the terminal boundary. For a 128-character SHA-512 or BLAKE2b digest in a narrow terminal, this would produce a truncated line ending in …. With soft_wrap=True, Rich emits the full string and leaves line-wrapping to the terminal, which visually wraps the characters but never drops any of them.
overflow="fold" on table columns: Long digests in multi-file tables are folded (wrapped) within the cell rather than being clipped. This is set on every hash-value column of the results table.
What it is: Python's built-in interface to cryptographic hash functions, backed by OpenSSL.
How it is used: All cryptographic operations live exclusively in hasher.py. No other module calls hashlib directly.
# For all algorithms except BLAKE2b:
hasher = hashlib.new("sha256")
hasher.update(chunk)
digest = hasher.hexdigest()
# For BLAKE2b (requires special constructor):
hasher = hashlib.blake2b()The hashlib.new(name) factory method is used for all algorithms except blake2b, which requires its own constructor. This is abstracted behind the internal _make_hasher(algorithm) function.
What it is: A cross-platform Python clipboard module.
How it is used: Called only from utils.copy_to_clipboard(). The function wraps the call in a broad except Exception so that a clipboard failure (common on headless servers, CI environments, or Linux systems without xclip/xsel) is never a crash — it returns False and cli.py shows a warning instead.
What it is: Python's high-level threading and multiprocessing interface.
How it is used: ThreadPoolExecutor is used in hasher.hash_files_parallel() and in the recursive directory flow inside cli.py. File hashing is I/O-bound, making threads (not processes) the correct tool — threads allow concurrent disk reads without the serialisation overhead of multiprocessing. Worker count is capped at min(8, number_of_files) to avoid thread thrashing on large directories.
What it is: The modern Python packaging standard (PEP 517/621).
How it is used: All project metadata (name, version, dependencies, Python requirement, entry point) lives in pyproject.toml. The entry point yhash = "yhash.cli:cli_entry" wires the yhash shell command to the cli_entry() function. The [tool.setuptools] section uses an explicit package-dir = {"": "src"} declaration so that both pip install and pipx install resolve the package correctly without relying on auto-discovery.
Yhash is divided into five layers. Data flows strictly from right to left in this diagram; only cli.py imports from all other layers.
Shell
│
▼
cli_entry() ← cli.py (argv normalisation + Click dispatch)
│
├──► hasher.py (cryptographic operations — hashlib only)
├──► utils.py (file I/O, manifest, clipboard, validation)
└──► formatter.py (all terminal output — Rich only)
│
└──► constants.py (shared constants, no imports from other modules)
The shell command yhash calls cli_entry(), not main() directly.
def cli_entry() -> None:
sys.argv = [sys.argv[0]] + _normalise_argv(sys.argv[1:])
main()Before Click parses anything, _normalise_argv() iterates over every argument and lower-cases any token whose lowercase form appears in NORMALISABLE_FLAGS:
NORMALISABLE_FLAGS = {"-sha256", "-sha512", "-sha384", "-sha1", "-md5", "-blake2b",
"-chain", "-copy", "-create", "-json", "-about", "-algo", "-text"}This means -SHA256, -Sha256, and -sha256 are all normalised to -sha256 before Click sees them. File paths, hash values, and chain strings are left untouched because their lowercase forms do not match any flag in the set. This is a pure string-set lookup — O(1) per token.
main() is the single Click command. After argument parsing, it dispatches to one of six flows in strict priority order:
1. show_algo → display algorithm table, return
2. show_about → display about panel, return
3. No input → display banner + help, return
4. chain → chain hashing (file or text)
5. text_input → independent text hashing
6. check_hash → hash verification
7. recursive → recursive directory hashing
8. files > 1 → multi-file parallel hashing
9. files == 1 → single file hashing
Mutual-exclusion checks happen before dispatch:
-chain+ any algorithm flag → error (the chain string already specifies all algorithms)-text+ file arguments → error (text and file inputs cannot be mixed)
hasher.py is the only module that imports hashlib. All other modules call functions from hasher.py.
Internal factory. Validates the algorithm against SUPPORTED_ALGORITHMS and returns the appropriate hashlib object. Raises ValueError for any unsupported name, which is the security boundary that prevents injection.
The core file-hashing function. Opens the file in binary mode and reads it in CHUNK_SIZE = 8192 byte (8 KB) chunks. All hashers are updated on every chunk, so N algorithms cost N × (CPU work per chunk), but the file is only read once regardless of N.
File on disk:
┌──────────┬──────────┬──────────┬──────────┐
│ chunk 0 │ chunk 1 │ chunk 2 │ chunk 3 │ ...
└──────────┴──────────┴──────────┴──────────┘
│ │
▼ ▼
sha256.update sha256.update ...
sha512.update sha512.update ...
blake2b.update blake2b.update ...
After the loop ends, hexdigest() is called on each hasher. The result dict preserves the order of the input algorithms list.
Optional progress and task parameters allow the caller (cli.py) to inject a Rich Progress context for live progress bars on large files; the engine calls progress.update(task, advance=len(chunk)) on every iteration if they are provided.
Implements the chain hashing algorithm:
- Open and stream the file with the first algorithm only (8 KB chunks). Produce hex string
H0. - For each subsequent algorithm
Ai, create a fresh hasher and callhasher.update(H_{i-1}.encode("utf-8")). ProduceHi. - Return all intermediate and final hashes as an ordered dict.
The key design decision is that each chained step hashes the UTF-8 encoding of the previous hex string, not the raw bytes of the previous digest. This means the chain input is always a printable ASCII string of hexadecimal digits.
File bytes → [SHA256] → "a3f9..." (64 hex chars)
│
.encode("utf-8")
│
▼
[SHA512] → "8b2e..." (128 hex chars) ← Final Chained Hash
Identical to chain_hash_file but the initial input is text.encode("utf-8") rather than file bytes. No streaming or progress bar is needed since text is always small.
Hashes the same UTF-8 bytes independently with each algorithm. All algorithms see the same input — this is not chained. Returns a dict mapping algorithm → hexdigest.
Uses ThreadPoolExecutor to hash multiple files concurrently. Each file is submitted as an independent task. as_completed() yields results in completion order (not submission order), which is handled correctly — results are stored by path string key, not by order.
Worker count: min(max_workers, len(file_paths)) — capped to avoid creating more threads than files.
Error handling: if compute_hashes() raises for a specific file, the exception is caught inside the worker and stored as {"error": str(exc)} in the result dict. This prevents one failing file from aborting the entire batch.
formatter.py owns all terminal output. cli.py imports functions from it; it never calls print() or console.print() directly.
Shared Console instance:
console = Console(soft_wrap=True)This singleton is imported by both formatter.py and cli.py so that Rich Progress bars (which need access to the console to avoid interleaving with other output) share the same output stream.
Hash display — two-line layout:
Every hash value is printed on its own dedicated line below the algorithm label, using _print_hash():
def _print_hash(digest: str, style: str = CLR_HASH) -> None:
console.print(Text(_HASH_INDENT + digest, style=style, no_wrap=True))no_wrap=True on the Text object combined with soft_wrap=True on the Console means Rich will not split, fold, or truncate the hash regardless of terminal width. The terminal itself handles visual wrapping if needed, but no character is ever dropped.
Verification layout:
The display_verification() function prints file metadata in a Table(box=None) (invisible borders for alignment), then prints the expected and computed hashes each on their own _print_hash() line. The verdict (MATCH or MISMATCH) appears in a coloured Panel.
JSON output:
display_json_results() creates a fresh Console(highlight=False, markup=False) for each call to avoid any Rich markup processing interfering with JSON content:
def display_json_results(data: Any) -> None:
_json_console = Console(soft_wrap=True, highlight=False, markup=False)
_json_console.print(json.dumps(data, indent=2, ensure_ascii=False))utils.py provides helpers with no knowledge of the CLI or display.
format_file_size(size_bytes)— converts bytes to a human-readable string using iterative division by 1024.collect_files_recursive(path)— usesPath.rglob("*")to find all files under a directory, sorted alphabetically. If given a file path, returns that single file.collect_files_flat(path)— usesPath.iterdir()for a non-recursive listing.validate_algorithms(algos)— validates and deduplicates a list of algorithm names. Deduplication preserves insertion order using aseenset alongside the result list.parse_chain_algorithms(chain_str)— splits on comma, strips whitespace from each part, callsvalidate_algorithms, and enforces the minimum length of 2.create_manifest_file(file_hashes, algorithms, output_path)— serialises a manifest dict to UTF-8 JSON and writes it atomically (as a singlewrite_textcall). The manifest always ends with a newline.load_manifest_file(manifest_path)— reads and parses a manifest file, raisingFileNotFoundErrororjson.JSONDecodeErroron failure.copy_to_clipboard(text)— wrapspyperclip.copy()in atry/exceptand returnsbool.get_final_hash(results)— returns the last value in a dict (Python 3.7+ dicts are insertion-ordered).
| Flag | Algorithm | Digest Length | Security Status | Notes |
|---|---|---|---|---|
-sha256 |
SHA-256 | 64 hex chars | Recommended | Default when no flag is given |
-sha512 |
SHA-512 | 128 hex chars | Recommended | Higher security, larger output |
-sha384 |
SHA-384 | 96 hex chars | Recommended | SHA-2 family, truncated SHA-512 |
-sha1 |
SHA-1 | 40 hex chars | Deprecated | Collision attacks exist; legacy only |
-md5 |
MD5 | 32 hex chars | Weak | Checksums only; not cryptographically safe |
-blake2b |
BLAKE2b | 128 hex chars | Recommended | Fastest secure algorithm; no length-extension vulnerability |
All flags are case-insensitive. -SHA256, -Sha256, and -sha256 are all valid. This is achieved by normalising sys.argv before Click parses it.
The default algorithm is sha256. It is applied whenever no algorithm flag is specified.
Hash a single file with the default SHA-256 algorithm.
Syntax:
yhash <file>Example:
yhash document.pdfOutput:
────────────────────────────────────────────
File document.pdf (1.2 MB)
→ SHA-256
a665a45920422f9d417e4867efdc4fb8...
How it works:
cli_entry()normalisessys.argvand callsmain().- Click parses
document.pdfinto thefilestuple. main()calls_build_algorithms()— no flags were set, so it returns["sha256"]._hash_file()checks the file size. If ≤ 5 MB, callscompute_hashes()directly. If > 5 MB, wraps the call in a Rich Progress context.compute_hashes()opens the file, reads 8 KB chunks, feeds each chunk into the SHA-256 hasher.display_hash_results()prints the source name, file size, algorithm label, and digest.
Choose a specific hashing algorithm.
Syntax:
yhash -<algorithm> <file>Examples:
yhash -sha512 archive.tar.gz
yhash -blake2b firmware.bin
yhash -md5 installer.exe
yhash -sha1 legacy_backup.zip
yhash -sha384 contract.pdfHow it works:
Each algorithm flag is a Click boolean option (is_flag=True). _build_algorithms() inspects all six boolean parameters and appends matching algorithm names to the selected list in declaration order. If none are set, it returns [DEFAULT_ALGORITHM] ("sha256").
Case insensitivity:
-SHA512, -Sha512, and -sha512 are all accepted. The _normalise_argv() function checks each argv token against NORMALISABLE_FLAGS (a set of lowercase flag strings) and lowercases any match before Click processes them.
Compute several hashes in a single file read.
Syntax:
yhash -<algo1> -<algo2> [-<algo3> ...] <file>Example:
yhash -sha256 -sha512 -blake2b document.pdfOutput:
────────────────────────────────────────────
File document.pdf (1.2 MB)
→ SHA-256
a665a45920422f9d417e4867efdc4fb8...
→ SHA-512
cf83e1357eefb8bdf1542850d66d8007...
→ BLAKE2b
786a02f742015903c6c6fd852552d272...
How it works:
compute_hashes(file_path, ["sha256", "sha512", "blake2b"]) creates three hashlib objects before opening the file. On every 8 KB chunk, all three objects are updated in a single loop. The file is read exactly once. Adding more algorithms increases CPU work linearly but does not increase I/O.
This is more efficient than running yhash three times, which would read the file three times.
Apply algorithms sequentially: the output of each step becomes the input of the next.
Syntax:
yhash -chain <algo1,algo2,...> <file>At least two algorithms must be specified. The chain string is case-insensitive and whitespace around commas is ignored.
Example:
yhash -chain md5,sha256,sha512 video.mp4Output:
────────────────────────────────────────────
Chain video.mp4
→ MD5
d41d8cd98f00b204e9800998ecf8427e
→ SHA-256
e3b0c44298fc1c149afbf4c8996fb924...
→ SHA-512 <- Final Chained Hash
cf83e1357eefb8bdf1542850d66d8007...
How chaining works — step by step:
Step 1: Read raw bytes of video.mp4 through MD5 hasher
→ MD5 hex digest: "d41d8cd98f00b204..." (32 hex chars)
Step 2: Encode that hex string as UTF-8 bytes: b"d41d8cd98f00b204..."
Feed those bytes into a fresh SHA-256 hasher
→ SHA-256 hex digest: "e3b0c44298fc1c..." (64 hex chars)
Step 3: Encode that hex string as UTF-8 bytes: b"e3b0c44298fc1c..."
Feed those bytes into a fresh SHA-512 hasher
→ SHA-512 hex digest: "cf83e1357eefb8..." (128 hex chars) <- Final
Each subsequent step hashes the UTF-8 encoding of the previous hexadecimal string, not the raw bytes of the previous digest. This is important: the input to step 2 is the ASCII/UTF-8 representation of the hex string, e.g., "a3f9..." encoded as bytes, not the binary digest.
Chain with text:
yhash -chain sha256,blake2b -text "my passphrase"Constraint: -chain cannot be combined with individual algorithm flags (-sha256, etc.). All algorithms in the chain must be specified in the chain string.
How it is implemented:
cli.py calls parse_chain_algorithms(chain) which splits the comma-separated string, normalises case, deduplicates, and validates all names against the whitelist. Then chain_hash_file() or chain_hash_text() in hasher.py performs the sequential hashing.
Hash an arbitrary string directly, without creating a file.
Syntax:
yhash -text "<string>"Examples:
yhash -text "hello world"
yhash -sha512 -text "my secret key"
yhash -sha256 -sha512 -text "password123"
yhash -chain md5,sha256 -text "chain this string"Output:
────────────────────────────────────────────
Text "hello world"
→ SHA-256
b94d27b9934d3e08a52e52d7da7dabfa...
How it works:
The string is encoded to bytes using text.encode("utf-8") and passed to hash_text() or chain_hash_text() in hasher.py. No temporary file is created. Any Unicode string is accepted — including spaces, special characters, emoji, null bytes, and control characters — because the encoding step handles all of them.
Constraints:
-textand file arguments cannot be combined in the same command.-text ""(empty string) is valid and produces the well-known hash of an empty byte sequence.-textis compatible with-chain,-copy,-json, and all algorithm flags.
Hash several files in a single command, displayed as a table.
Syntax:
yhash [algo flags] <file1> <file2> [file3 ...]Example:
yhash -sha256 -sha512 report.pdf data.csv archive.zipOutput:
┌──────────────┬─────────────────────────────┬─────────────────────────────┐
│ File │ SHA-256 │ SHA-512 │
├──────────────┼─────────────────────────────┼─────────────────────────────┤
│ report.pdf │ a665a459... │ cf83e135... │
│ data.csv │ 7f8e9a0b... │ 4fddee6d... │
│ archive.zip │ 3b00361b... │ 5d9c46ef... │
└──────────────┴─────────────────────────────┴─────────────────────────────┘
How it works:
When len(valid_fps) > 1, cli.py iterates through each file and calls _hash_file() sequentially (not in parallel at this level — parallel processing is reserved for the -r recursive mode, which may have hundreds of files). Results are collected into results_map and displayed via display_multi_file_results().
The table uses Rich's overflow="fold" on each hash column so that long hashes (SHA-512, BLAKE2b — 128 chars) wrap inside the cell rather than being clipped.
If a file fails (e.g. permission denied), its row shows the error message in red; processing continues for the remaining files.
Hash all files under a directory and its subdirectories.
Syntax:
yhash -r <directory>Examples:
yhash -r ./project
yhash -sha512 -r ./backups
yhash -sha256 -sha512 -r ./documentsOutput:
Found 47 file(s) — processing...
┌────────────────────┬──────────────────────────────────────┐
│ File │ SHA-256 │
├────────────────────┼──────────────────────────────────────┤
│ main.py │ a665a45920422f9d... │
│ utils.py │ cf83e1357eefb8bd... │
│ README.md │ 7f8e9a0b1c2d3e4f... │
│ ... │ ... │
└────────────────────┴──────────────────────────────────────┘
How it works:
collect_files_recursive(path)walks the directory tree withPath.rglob("*"), returning a sorted list of all files.- A
ThreadPoolExecutor(capped atmin(8, file_count)workers) submits each file as a separate_hash_one_rec()task. - A Rich
Progressbar tracks completion asas_completed()yields finished futures. - Results are displayed in a table.
Multiple directories and files can be combined:
yhash -r ./src ./tests README.mdCombined with -create:
When -r and -create are combined, instead of a table, a .yhash manifest file is written. See Section 7.9.
Compare a file's computed hash against a known expected value.
Syntax:
yhash -check <expected_hash> <file>Examples:
yhash -check a665a45920422f9d417e4867efdc4fb8a04a1f3fff1fa07e998e86f7f7a27ae3 document.pdf
# Verify with SHA-512 instead of the default SHA-256
yhash -sha512 -check cf83e1357eefb8bdf1542850d66d800... document.pdfOutput on match:
────────────────────────────────────────────
File document.pdf
Algorithm SHA-256
Expected
a665a45920422f9d417e4867efdc4fb8...
Computed
a665a45920422f9d417e4867efdc4fb8...
╭──────────────────────────────────────────╮
│ MATCH — The file is intact... │
╰──────────────────────────────────────────╯
Output on mismatch:
╭──────────────────────────────────────────╮
│ MISMATCH — Hash does not match. │
│ File may be corrupted or tampered with. │
╰──────────────────────────────────────────╯
How it works:
- The file is hashed using
_hash_file()with only the first algorithm inalgorithmslist (defaults to SHA-256 unless an algorithm flag is set). - The computed digest is compared to the expected value using case-insensitive string comparison:
expected_hash.lower() == computed_hash.lower(). - Both hashes are displayed in full regardless of terminal width (using
_print_hash()). - The verdict is displayed in a green (MATCH) or red (MISMATCH)
Panel.
The comparison is always case-insensitive. Uppercase and lowercase hex strings are treated identically.
Generate a .yhash manifest file containing hash mappings for one or more files.
Syntax:
yhash -create [algo flags] <file1> [file2 ...]
yhash -create [algo flags] -r <directory>Examples:
# Manifest for specific files
yhash -create report.pdf data.csv
# Recursive manifest with SHA-512
yhash -sha512 -create -r ./project
# Manifest with multiple algorithms
yhash -sha256 -sha512 -create -r ./distManifest file format (manifest.yhash):
{
"yhash_version": "1.0.0",
"created_at": "2025-01-15T10:30:00.123456+00:00",
"algorithms": ["sha256"],
"entries": {
"/absolute/path/to/report.pdf": {
"sha256": "a665a45920422f9d417e4867efdc4fb8..."
},
"/absolute/path/to/data.csv": {
"sha256": "cf83e1357eefb8bdf1542850d66d8007..."
}
}
}How it works:
- Files are collected (either the provided list or recursively via
-r). - Each file is hashed with the selected algorithms using
compute_hashes(). create_manifest_file()builds a dict with version, timestamp (UTC ISO 8601), algorithm list, and per-file hash entries.- The dict is serialised to UTF-8 JSON with
json.dumps(..., indent=2, ensure_ascii=False)and written with a singlewrite_text()call. - The manifest is saved as
manifest.yhashin the current working directory.
The timestamp uses datetime.now(tz=timezone.utc) for a timezone-aware UTC value. The manifest is always valid JSON: path keys and hash values containing special characters are safely escaped by Python's json module.
Copy the final hash to the system clipboard.
Syntax:
yhash -copy <file>
yhash -copy -sha512 <file>
yhash -copy -text "string"
yhash -copy -chain md5,sha256 <file>Examples:
# Copy SHA-256 of a file
yhash -copy firmware.bin
# Copy SHA-512
yhash -sha512 -copy firmware.bin
# Copy the final chained hash
yhash -copy -chain md5,sha256,sha512 archive.tar.gzOutput:
→ SHA-256
a665a45920422f9d...
OK Copied to clipboard a665a45920422f9d...
How it works:
After the hash result is computed and displayed, _clipboard() in cli.py calls get_final_hash(results) to get the last entry in the results dict (which is always the final algorithm's output — in chained mode, this is the final chained hash). It then calls copy_to_clipboard(final) from utils.py.
copy_to_clipboard() calls pyperclip.copy(text) inside a try/except. On failure (e.g., no display server, no clipboard tool), it returns False and cli.py shows a warning message with installation instructions rather than crashing.
With multiple algorithms: The hash of the last specified algorithm is copied. Example: -sha256 -sha512 -copy file copies the SHA-512 hash (because SHA-512 was the last flag declared and _build_algorithms() preserves flag order).
With chaining: The final chained hash (last algorithm in the chain) is always what gets copied.
Print results as machine-readable JSON instead of the default formatted display.
Syntax:
yhash -json [other flags] <file>Examples:
yhash -json document.pdf
yhash -json -sha512 document.pdf
yhash -json -text "hello world"
yhash -json -chain md5,sha256 archive.zip
yhash -json -sha256 -r ./folderSingle file output:
{
"mode": "file",
"file": "/path/to/document.pdf",
"size_bytes": 1258496,
"algorithms": ["sha256"],
"hashes": {
"sha256": "a665a45920422f9d417e4867efdc4fb8..."
}
}Text output:
{
"mode": "text",
"input": "hello world",
"algorithms": ["sha256"],
"hashes": {
"sha256": "b94d27b9934d3e08a52e52d7da7dabfa..."
}
}Chain output:
{
"mode": "chain",
"file": "/path/to/archive.zip",
"size_bytes": 45056,
"chain": {
"md5": "d41d8cd98f00b204e9800998ecf8427e",
"sha256": "e3b0c44298fc1c149afbf4c8996fb924..."
}
}Recursive output:
{
"mode": "recursive",
"algorithms": ["sha256"],
"file_count": 3,
"results": {
"/path/to/a.txt": {"hashes": {"sha256": "..."}},
"/path/to/b.txt": {"hashes": {"sha256": "..."}},
"/path/to/c.txt": {"error": "Permission denied"}
}
}How it works:
-json is a boolean flag that, when set, routes all result data through display_json_results() instead of the Rich formatter functions. display_json_results() creates a dedicated Console(highlight=False, markup=False) to print the JSON string without any Rich markup processing, ensuring the output is clean and parseable by tools like jq.
Using with jq:
yhash -json document.pdf | jq '.hashes.sha256'
yhash -json -r ./folder | jq '.results | to_entries[] | {file: .key, hash: .value.hashes.sha256}'Display a table of all supported algorithms with security ratings.
Syntax:
yhash -algo
# or
yhash --algorithmsOutput:
┌──────────┬──────────┬─────────────┬─────────────┬────────────────────────────────────┐
│ Flag │ Algorithm│ Output Size │ Security │ Recommended Use │
├──────────┼──────────┼─────────────┼─────────────┼────────────────────────────────────┤
│ -sha256 │ SHA-256 │ 256-bit │ Recommended │ General-purpose cryptographic... │
│ -sha512 │ SHA-512 │ 512-bit │ Recommended │ High-security / large data │
│ ... │ ... │ ... │ ... │ ... │
└──────────┴──────────┴─────────────┴─────────────┴────────────────────────────────────┘
Security ratings are colour-coded: green for Recommended, yellow for Deprecated or Weak.
How it works:
display_algo_table() in formatter.py reads the ALGO_INFO dict from constants.py and builds a Rich Table. The Security column value is wrapped in Rich markup at render time to apply the colour.
Display tool metadata in a formatted panel.
Syntax:
yhash -aboutOutput:
╭─────────────────── About Yhash ────────────────────╮
│ Yhash (Yung Hash) │
│ │
│ Version 1.0.0 │
│ Description Modern, fast CLI hashing utility. │
│ License MIT │
│ Python >= 3.9 │
│ Algorithms SHA-256, SHA-512, SHA-384, ... │
│ Install pip install yhash / pipx install ... │
│ Repository https://github.com/yhash/yhash │
╰────────────────────────────────────────────────────╯
Display the full usage reference.
Syntax:
yhash -help
# or
yhash --helpPrints all flags organised into sections: Basic Usage, Algorithm Flags, Chained Hashing, Text Hashing, Files & Directories, Verification, Manifest, Output Options, Install, and Info.
How it works:
Click's context_settings = {"help_option_names": ["-help", "--help"]} registers both -help and --help as equivalent help triggers. When no arguments are provided at all, main() also calls display_help() alongside print_banner().
Print the version number.
Syntax:
yhash --versionOutput:
Yhash v1.0.0
Implemented via Click's @click.version_option(VERSION, "--version", prog_name="Yhash", message="%(prog)s v%(version)s").
| Flag | -sha* / -md5 / -blake2b |
-chain |
-text |
-r |
-copy |
-check |
-create |
-json |
|---|---|---|---|---|---|---|---|---|
-sha* / -md5 / -blake2b |
✅ Combine freely | ❌ Mutual exclusion | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
-chain |
❌ Mutual exclusion | — | ✅ | — | ✅ | — | — | ✅ |
-text |
✅ | ✅ | — | ❌ No files | ✅ | — | — | ✅ |
-r |
✅ | — | ❌ | — | — | — | ✅ | ✅ |
-copy |
✅ | ✅ | ✅ | — | — | — | — | — |
-check |
✅ (first algo used) | — | — | — | — | — | — | — |
-create |
✅ | — | — | ✅ | — | — | — | — |
-json |
✅ | ✅ | ✅ | ✅ | — | — | — | — |
Mutual exclusions enforced in main() before any hashing begins:
-chain+ any algorithm flag: error — all algorithms must be in the chain string.-text+ file arguments: error — text and file inputs cannot be mixed.
A progress bar is shown automatically for any single file larger than 5 MB (LARGE_FILE_THRESHOLD = 5 * 1024 * 1024 bytes). Progress bars are also shown during recursive directory hashing regardless of individual file size.
What the progress bar shows:
⠋ Hashing firmware.bin (120.0 MB) ████████████░░░░░░ 68% 42.1 MB/s 0:00:01
- Spinner: animated rotating character (Rich
SpinnerColumn) - Description: filename and size
- Bar: filled proportionally to bytes read (Rich
BarColumn, width 38) - Percentage:
TaskProgressColumn - Speed:
TransferSpeedColumn— bytes read per second - ETA:
TimeRemainingColumn
How it works:
_hash_file() in cli.py uses _progress_ctx() to create a Progress context manager. The task total is set to the file size in bytes. Inside compute_hashes(), after each 8 KB chunk is processed, progress.update(task, advance=len(chunk)) advances the bar. The progress bar is transient=True — it disappears after completion, leaving only the hash result.
The progress bar and hash output share the same console instance (the singleton from formatter.py), so Rich can correctly erase the progress bar before printing the result without output corruption.
Problem: Standard terminal output in Rich enforces its internal line-width limit. A 128-character SHA-512 or BLAKE2b digest in an 80-column terminal would be truncated with ….
Solution — two mechanisms combined:
-
Console(soft_wrap=True)— tells Rich not to enforce its own line-length limit when printing plain text. Rich emits the full string; the terminal handles visual wrapping. -
Two-line layout — each hash is printed on its own dedicated line below the algorithm label. This gives the hash the full terminal width, not a portion of it shared with a label.
def _print_hash(digest: str, style: str = CLR_HASH) -> None:
console.print(Text(_HASH_INDENT + digest, style=style, no_wrap=True))no_wrap=True on the Text object reinforces the instruction: this object should not be wrapped by Rich at any level.
Effect in practice:
- On a 200-column terminal: entire 128-char hash on one visual line.
- On a 80-column terminal: hash wraps visually at column 80, continues on the next line. All characters are present.
- The hash value is never truncated, ellipsised, or partially hidden regardless of terminal width.
For multi-file tables: hash columns use overflow="fold" (not overflow="ellipsis") so the content wraps within the cell visually but the full hash is always present in the output.
The first line of defence is _make_hasher() in hasher.py:
def _make_hasher(algorithm: str) -> Any:
if algorithm not in SUPPORTED_ALGORITHMS:
raise ValueError(f"Unsupported algorithm: {algorithm!r}. Supported: ...")
return hashlib.blake2b() if algorithm == "blake2b" else hashlib.new(algorithm)SUPPORTED_ALGORITHMS is a fixed dict with exactly six keys. Any string not in that set raises ValueError before any hashlib call is made. This prevents all forms of algorithm-name injection:
| Injection attempt | Result |
|---|---|
sha256; rm -rf / |
ValueError: Unsupported algorithm |
sha256 && cat /etc/passwd |
ValueError |
../../../etc/shadow |
ValueError |
$(whoami) |
ValueError |
sha256\x00sha512 |
ValueError |
Validation happens at two independent layers: validate_algorithm() (single name) and validate_algorithms() (list). Both are called before any file is opened.
Yhash never calls subprocess, os.system(), eval(), or exec(). hashlib.new(name) is a Python API call, not a shell command. pyperclip.copy() may invoke a system clipboard tool internally, but the hash value passed to it is a hex string (characters [0-9a-f] only), which cannot be misinterpreted as shell syntax.
compute_hashes() and chain_hash_file() open files with open(path, "rb") — binary read mode. No write, no execute, no interpretation. Hashing a shell script does not run it.
Files are never fully loaded into memory. The maximum memory overhead of the hashing engine is one CHUNK_SIZE (8192 byte) buffer plus the hasher state objects. A 100 GB file and a 1 KB file use the same peak memory.
Manifest files are written with json.dumps() which safely escapes all special characters in keys and values. The output is always valid JSON — never executable code. Paths containing quotes, backslashes, or newlines are correctly serialised.
The value passed to pyperclip.copy() is always the output of hexdigest() — a string of lowercase hexadecimal characters ([0-9a-f]+). This cannot be misinterpreted as shell commands even if pasted into a terminal.
In hash_files_parallel() and the recursive directory flow, each file's hashing is wrapped in an individual try/except. A permission error, broken symlink, or read failure on one file produces {"error": str(exc)} in the result dict for that file and does not abort processing of other files.
The core design principle is that Yhash's memory usage is O(1) with respect to file size — it does not scale with how large the file is.
Implementation:
with open(file_path, "rb") as fh:
while True:
chunk = fh.read(CHUNK_SIZE) # CHUNK_SIZE = 8192 bytes
if not chunk:
break
for hasher in hashers.values():
hasher.update(chunk) # hash state updated, chunk discardedAfter each update() call, Python's garbage collector can reclaim the chunk buffer. The hashers maintain only their internal state (a fixed-size structure for each algorithm, typically 32–200 bytes), not a copy of the data.
Verified: In the test suite, test_30mb_file_memory_safe (in test_security.py) measures RSS memory before and after hashing a 30 MB file with three algorithms and asserts the increase is under 8 MB — well within normal Python interpreter overhead.
All errors are displayed via display_error() in formatter.py which prints a styled ERROR prefix followed by the message. The process then exits with code 1 via sys.exit(1).
Error conditions and their handling:
| Condition | Detection point | Behaviour |
|---|---|---|
| File not found | cli.py before hashing |
Error message, sys.exit(1) for single file; skipped with error for batches |
Directory given without -r |
cli.py |
Error message, sys.exit(1) |
| Permission denied | compute_hashes() open() |
PermissionError caught in cli.py, error message |
| Unsupported algorithm | _make_hasher() |
ValueError with list of supported algorithms |
-chain with < 2 algorithms |
parse_chain_algorithms() |
ValueError with example |
-chain + algorithm flags |
main() |
Error message, sys.exit(1) |
-text + file arguments |
main() |
Error message, sys.exit(1) |
-r without path |
main() |
Error message, sys.exit(1) |
-check without file |
main() |
Error message, sys.exit(1) |
| Clipboard unavailable | copy_to_clipboard() |
Returns False; warning shown; execution continues |
Empty directory with -r |
collect_files_recursive() |
Warning message; no hash attempted |
| Malformed manifest | load_manifest_file() |
json.JSONDecodeError propagates to caller |
Warnings (non-fatal) use display_warning() which prints a WARN prefix in yellow. Execution continues after a warning.
The test suite is written with Python's built-in unittest module and requires no additional testing framework. It consists of 199 tests across five modules.
Running the tests:
# From the project root, after pip install -e .
python3 -m unittest discover -s tests -vTest modules:
Tests every public function in hasher.py in isolation.
| Class | What it tests |
|---|---|
TestValidateAlgorithm |
All valid names, case normalisation, whitespace stripping, unsupported names, injection attempts |
TestComputeHashes |
Each of the 6 algorithms independently, all 6 together, empty file, binary file, 12 MB streaming correctness, missing file, result key ordering, multi-algo vs individual consistency |
TestChainHashFile |
2-algo chain, 3-algo chain, 6-algo chain, order sensitivity, chain vs independent hash, key ordering, error on < 2 algorithms |
TestChainHashText |
Same as above for text input; unicode, null bytes, empty string, single-algo rejection |
TestHashText |
Each algorithm, multiple independent algorithms, empty string, unicode, null byte, 1M character string, control characters |
TestHashFilesParallel |
All files hashed, hash correctness, multiple algorithms, nonexistent file returns error dict, deterministic across 5 concurrent runs, empty file list |
Tests every function in utils.py in isolation.
| Class | What it tests |
|---|---|
TestFormatFileSize |
Zero, bytes boundary, KB, MB, GB, TB, return type |
TestCollectFilesRecursive |
Nested structure (4 files), files only, single file input, sorted order, empty directory |
TestCollectFilesFlat |
Non-recursive, single file, sorted order |
TestValidateAlgorithms |
Valid list, case, deduplication, order preservation, all 6, unsupported, error message |
TestParseChainAlgorithms |
2 and 3 algos, case, whitespace, single-algo rejection, empty string, unsupported, injection, 6 algos, error message |
TestCreateManifestFile |
File creation, valid JSON, all required fields, entries preserved, algorithms preserved, version type, default cwd path, unicode keys, special chars in keys, ISO timestamp, trailing newline |
TestLoadManifestFile |
Valid load, missing file, malformed JSON, round-trip |
TestGetFinalHash |
Single, multiple (returns last), empty dict |
TestCopyToClipboard |
Returns bool, no crash on unavailable clipboard, no crash on empty string |
Tests the CLI end-to-end using Click's CliRunner, which invokes main() in-process and captures stdout.
| Class | What it tests |
|---|---|
TestInfoCommands |
No args shows usage, --version, -algo, --algorithms, -about, -help |
TestFileHashing |
All 6 algorithm flags, all 6 together, multiple algorithm flags, missing file, directory without -r, JSON output |
TestTextHashing |
Default SHA-256, with -sha512, multiple algorithms, empty string, string with spaces, -text + file conflict, JSON output |
TestChainHashing |
2-algo file chain, 3-algo file chain, chain with -text, intermediate hashes in output, -chain + algo flag conflict, single-algo chain rejection, missing file graceful handling, JSON output |
TestVerification |
Correct hash → MATCH, wrong hash → MISMATCH, case-insensitive comparison, SHA-512 check, BLAKE2b check, missing file |
TestMultipleFiles |
All hashes in output, JSON output, with SHA-512 |
TestRecursive |
Finds all files in nested structure, missing path arg, nonexistent path |
TestManifest |
Single file manifest creation, recursive manifest creation |
Verifies that adversarial inputs cannot exploit the tool.
| Class | What it tests |
|---|---|
TestAlgorithmInjection |
12 injection attempts rejected by validate_algorithm, validate_algorithms, parse_chain_algorithms, and the CLI chain flag |
TestFileOperationSafety |
Nonexistent path raises, script file not executed during hashing, manifest written only to specified path, manifest contains no executable code, manifest always valid JSON with special chars, malformed manifest raises not crashes, missing manifest raises FileNotFoundError |
TestInputSanitisation |
Null bytes, null bytes in chain, ANSI escape sequences, 7 unicode edge cases (zero-width space, BOM, emoji, combining chars, repeated nulls), 5 MB string, CLI verify with 10,000-char hash value |
TestMemoryBounds |
30 MB file with 3 algorithms; RSS increase must be < 8 MB |
TestCLIAdversarialInputs |
Conflicting chain + algo flags, text + file conflict, -r without path, -check without file, nonexistent file, directory as file argument |
TestConcurrencySafety |
10 files hashed in parallel, 5 repeated runs — all results correct and deterministic |
Tests complete user workflows without mocking any internal components.
| Class | What it tests |
|---|---|
TestHashAndVerifyRoundTrip |
All 6 algorithms: hash then verify → MATCH; tampered file → MISMATCH; default algo is SHA-256 |
TestChainHashWorkflow |
3-step chain with all intermediates correct; chain with BLAKE2b as final; chain order changes output; chain text 3 steps |
TestMultiAlgorithmWorkflow |
All 6 simultaneously; JSON output contains correct digests; output contains algorithm labels |
TestManifestWorkflow |
Manifest created for all files in a 3-file corpus; all hashes are correct (verified by reading files and comparing); SHA-512 manifest entries correct; algorithms field matches |
TestRecursiveDirectoryWorkflow |
4-file nested structure; all hashes found in output; SHA-512 digests found (handles table folding) |
TestTextHashingWorkflow |
SHA-256 of "abc" correct; MD5 of "" correct; chain deterministic across two runs; unicode text uses UTF-8 encoding |
Pure data — no imports from other yhash modules, no side effects.
| Name | Type | Value | Purpose |
|---|---|---|---|
VERSION |
str |
"1.0.0" |
Package version string |
CHUNK_SIZE |
int |
8192 |
Bytes read per chunk during file streaming |
LARGE_FILE_THRESHOLD |
int |
5242880 |
File size above which progress bars are shown |
DEFAULT_ALGORITHM |
str |
"sha256" |
Algorithm used when no flag is specified |
MANIFEST_EXTENSION |
str |
".yhash" |
File extension for manifest files |
SUPPORTED_ALGORITHMS |
dict[str, str] |
{"sha256": "SHA-256", ...} |
Internal name → display name mapping |
ALGO_FLAGS |
set[str] |
{"-sha256", ...} |
CLI flags for algorithms (lowercase) |
NORMALISABLE_FLAGS |
set[str] |
ALGO_FLAGS ∪ {...} |
All flags normalised by argv pre-processor |
ALGO_INFO |
dict[str, tuple] |
{"sha256": ("SHA-256", "256-bit", ...)} |
Algorithm metadata for the -algo table |
| Function | Signature | Description |
|---|---|---|
validate_algorithm |
(str) -> str |
Validate and normalise a single algorithm name |
compute_hashes |
(Path, List[str], *, progress, task) -> Dict[str, str] |
Stream a file through all hashers simultaneously |
chain_hash_file |
(Path, List[str], *, progress, task) -> Dict[str, str] |
Chain-hash a file; output of each step feeds the next |
chain_hash_text |
(str, List[str]) -> Dict[str, str] |
Chain-hash a UTF-8 string |
hash_text |
(str, List[str]) -> Dict[str, str] |
Hash a string independently with each algorithm |
hash_files_parallel |
(List[Path], List[str], *, max_workers) -> Dict[str, Any] |
Hash multiple files concurrently via ThreadPoolExecutor |
| Function | Signature | Description |
|---|---|---|
format_file_size |
(int) -> str |
Convert bytes to human-readable string |
collect_files_recursive |
(Path) -> List[Path] |
All files under a directory (sorted) |
collect_files_flat |
(Path) -> List[Path] |
Files directly inside a directory (non-recursive) |
validate_algorithms |
(List[str]) -> List[str] |
Validate, normalise, and deduplicate algorithm names |
parse_chain_algorithms |
(str) -> List[str] |
Parse a comma-separated chain spec |
create_manifest_file |
(Dict, List[str], Optional[Path]) -> Path |
Write a .yhash JSON manifest file |
load_manifest_file |
(Path) -> Dict[str, Any] |
Read and parse a manifest file |
copy_to_clipboard |
(str) -> bool |
Copy text to system clipboard via pyperclip |
get_final_hash |
(Dict[str, str]) -> str |
Return the last value in a results dict |
| Function | Description |
|---|---|
print_banner() |
ASCII art banner with version subtitle |
display_error(message) |
Red ERROR prefix + message |
display_warning(message) |
Yellow WARN prefix + message |
display_success(message) |
Green OK prefix + message |
display_hash_results(name, results, *, is_text, file_size) |
Two-line layout for file or text hash results |
display_chain_results(name, chain_results, *, is_text) |
Chain layout with <- Final Chained Hash marker |
display_multi_file_results(all_results, algorithms) |
Rich table for multiple files |
display_verification(file_path, expected, computed, algorithm) |
MATCH/MISMATCH panel with full hash display |
display_algo_table() |
Algorithm reference table |
display_about() |
About panel |
display_manifest_created(path, count) |
Success panel after manifest creation |
display_clipboard_success(hash_value) |
Clipboard copy confirmation |
display_json_results(data) |
Plain JSON output (no Rich markup) |
display_help() |
Full usage reference |
| Symbol | Description |
|---|---|
cli_entry() |
Shell entry point; normalises argv then calls main() |
main() |
Click command; dispatches to the appropriate flow |
_normalise_argv(argv) |
Lowercases recognisable flags for case-insensitive input |
_build_algorithms(...) |
Converts boolean flag values to algorithm name list |
_hash_file(path, algorithms) |
Hash one file; conditionally shows progress bar |
_chain_file(path, algorithms) |
Chain-hash one file; conditionally shows progress bar |
_clipboard(results) |
Copy final hash to clipboard; show warning on failure |
_progress_ctx() |
Factory for a configured Rich Progress instance |
Yhash v1.0.0 — MIT License