diff --git a/.codex/AGENTS_EXTRA.md b/.codex/AGENTS_EXTRA.md index 5dc66f4f..dd0bd394 100644 --- a/.codex/AGENTS_EXTRA.md +++ b/.codex/AGENTS_EXTRA.md @@ -4,7 +4,7 @@ ## 1.1 Library Identity -FTLLexEngine is the Python runtime for the **Fluent Template Language specification**, with **CLDR-backed locale-aware formatting** and **fail-fast boot validation with structured audit evidence**. Every public symbol must arise from one of these three purposes. The library is not a general utilities collection, not a financial domain toolkit, not a concurrency framework — it is the i18n layer that production systems build directly on top of, and nothing else. +FTLLexEngine is the Python runtime for the **Fluent Template Language specification**, with **CLDR-backed locale-aware formatting** and **fail-fast boot validation with structured integrity evidence**. Every public symbol must arise from one of these three purposes. The library is not a general utilities collection, not a financial domain toolkit, not a concurrency framework — it is the i18n layer that production systems build directly on top of, and nothing else. The primary use case is production systems where every locale resource must load cleanly, every message schema must match exactly, and every failure must produce named, traceable evidence — regulated deployments, audited backends, compliance-constrained services. This purpose drives every API design decision. @@ -14,10 +14,10 @@ The primary use case is production systems where every locale resource must load Before adding any symbol to a public facade, ask: *what downstream composition does this replace?* Every public surface must eliminate a pattern that serious callers would otherwise implement themselves. `require_locale_code()` replaced per-caller trim/blank/length/normalize chains. `LocalizationBootConfig` replaced per-caller boot sequence assembly. `make_fluent_number()` replaced per-caller visible-precision inference. Primitives that serve only internal composition belong in submodules, not on `ftllexengine`, `ftllexengine.runtime`, or `ftllexengine.localization`. **Axiom 2 — Fail-Fast at Boot, Structured Evidence at Runtime.** -Validate everything before accepting traffic. The canonical boot chain — `LocalizationBootConfig.boot()`, or `FluentLocalization` + `require_clean()` + `validate_message_schemas()` — raises `IntegrityCheckFailedError` if any resource fails to load cleanly or any schema mismatches. At runtime, errors are returned as immutable structured evidence (`FrozenFluentError`, `WriteLogEntry`, `LoadSummary`) so callers can build auditable, loggable, compliant systems on top. Silent degradation is prohibited; all failures are explicit. +Validate everything before accepting traffic. The canonical boot chain — `LocalizationBootConfig.boot()`, or `FluentLocalization` + `require_clean()` + `validate_message_schemas()` — raises `IntegrityCheckFailedError` if any resource fails to load cleanly or any schema mismatches. At runtime, formatting and parsing errors are returned as immutable structured evidence (`FrozenFluentError`, `LoadSummary`), while cache evidence flows through `CacheDebugLogEntry` and `CacheIntegrityEvent`. Silent degradation is prohibited; all failures are explicit. **Axiom 3 — Explicit Failures, Immutable Evidence.** -Every failure produces a named, typed, immutable error object with structured context. `strict=True` is the default on `FluentBundle` and `FluentLocalization` — exceptions, not silent empty strings, are the correct response to integrity failures. `strict=False` is an explicit opt-in for soft-error return semantics where `format_pattern` returns a `(result, errors)` tuple. Audit structures (`WriteLogEntry`, `IntegrityContext`) carry dual timestamps (`timestamp` for monotonic ordering, `wall_time_unix` for cross-system correlation) because compliance traces must be reproducible across restarts. +Every failure produces a named, typed, immutable error object with structured context. `strict=True` is the default on `FluentBundle` and `FluentLocalization` — exceptions, not silent empty strings, are the correct response to integrity failures. `strict=False` is an explicit opt-in for soft-error return semantics where `format_pattern` returns a `(result, errors)` tuple. Cache evidence structures (`CacheDebugLogEntry`, `CacheIntegrityEvent`, `IntegrityContext`) carry dual timestamps (`timestamp_monotonic` for ordering, `wall_time_unix` for cross-system correlation) because compliance traces must be reproducible across restarts. **API design review — apply before any new public surface:** @@ -441,7 +441,7 @@ FluentBundle's high docstring-to-code ratio is expected — it is the primary pu | Bundle management | `FluentBundle` | Creates on demand, holds in `_bundles` dict | | Fallback resolution | Locale chain | Iterates locale list until format succeeds | | Boot validation | `require_clean()`, `validate_message_schemas()` | Provides pre-traffic validation API | -| Audit log | `FluentBundle.get_cache_audit_log()` | Aggregates per-locale logs into dict | +| Cache debug log | `FluentBundle.get_cache_debug_log()` | Aggregates per-locale debug logs into dict | ### 4.5.3 LocalizationBootConfig — strict-mode boot orchestrator diff --git a/.codex/AGENTS_SQLITE3MC233_SQLITE353.md b/.codex/AGENTS_SQLITE3MC234_SQLITE3531.md similarity index 96% rename from .codex/AGENTS_SQLITE3MC233_SQLITE353.md rename to .codex/AGENTS_SQLITE3MC234_SQLITE3531.md index a738f233..5ced1be6 100644 --- a/.codex/AGENTS_SQLITE3MC233_SQLITE353.md +++ b/.codex/AGENTS_SQLITE3MC234_SQLITE3531.md @@ -1,9 +1,9 @@ -# SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0 Agent Protocol +# SQLite3 Multiple Ciphers 2.3.4 / SQLite 3.53.1 Agent Protocol -**Version:** 2.0.0 -**Updated:** 2026-04-27 +**Version:** 2.0.1 +**Updated:** 2026-05-08 **Inherits:** [.codex/UNIVERSAL_ENGINEERING_CONTRACT.md](./UNIVERSAL_ENGINEERING_CONTRACT.md) v2.0.0+ -**Scope:** projects that build, vendor, link, wrap, configure, distribute, test, or operate **SQLite3 Multiple Ciphers 2.3.3**, based on **SQLite 3.53.0**. Includes C and C++ integrations, amalgamation builds, static or shared library packaging, embedded applications, CLIs, services, language bindings, JNI/JNA, Python/Rust/Node/.NET/Java/Kotlin wrappers, SQL migrations, encrypted database files, PRAGMA/URI configuration, key and rekey flows, backups, WAL/journal behavior, build flags, and cross-platform distribution. +**Scope:** projects that build, vendor, link, wrap, configure, distribute, test, or operate **SQLite3 Multiple Ciphers 2.3.4**, based on **SQLite 3.53.1**. Includes C and C++ integrations, amalgamation builds, static or shared library packaging, embedded applications, CLIs, services, language bindings, JNI/JNA, Python/Rust/Node/.NET/Java/Kotlin wrappers, SQL migrations, encrypted database files, PRAGMA/URI configuration, key and rekey flows, backups, WAL/journal behavior, build flags, and cross-platform distribution. ## 0. Scope and inheritance @@ -32,7 +32,7 @@ Per the Naurian frame, some theory the agent typically does not bring in cold an - Whether `TEMP` tables, in-memory databases, or bytes 16–23 of the database file are inside or outside the threat model. The encryption boundary is non-obvious and easy to assume away. - Whether old SQLCipher, sqleet, or SQLite Encryption Extension conventions still inform the codebase. SQLite3MC is API-compatible in many places but is not identical, and copy-pasted SQLCipher recipes can silently drift. - Whether SQLite 3.52.0 (withdrawn upstream) is still pinned anywhere as a fallback baseline. -- Whether the secure cipher-state nullification path that distinguishes SQLite3MC 2.3.3 from older releases is still intact. It looks redundant; removing it is a security regression. +- Whether the secure cipher-state nullification path that distinguishes SQLite3MC 2.3.4 from older releases is still intact. It looks redundant; removing it is a security regression. Where the answer is not derivable from code, history, or conversation, surface the gap explicitly; do not assume the convenient answer. @@ -135,43 +135,43 @@ Do not: --- -## 3. Baseline posture: SQLite3MC 2.3.3 and SQLite 3.53.0 +## 3. Baseline posture: SQLite3MC 2.3.4 and SQLite 3.53.1 ### 3.1 Version baseline For repositories governed by this protocol, assume: ```text -SQLite3 Multiple Ciphers: 2.3.3 -Underlying SQLite: 3.53.0 +SQLite3 Multiple Ciphers: 2.3.4 +Underlying SQLite: 3.53.1 ``` Use the repository's pinned version when it is more specific. Do not upgrade or downgrade SQLite3MC without a compatibility judgment, migration-risk assessment, and verification plan. -SQLite3MC 2.3.3 includes the upstream SQLite 3.53.0 baseline and fixes secure nullification of cipher data structures on freeing. Treat any edit around cipher state cleanup as security-sensitive. Do not remove zeroization, nullification, or cleanup paths because they look redundant — this is exactly the kind of code where Naur's "amorphous additions" warning bites in reverse. +SQLite3MC 2.3.4 includes the upstream SQLite 3.53.1 baseline and fixes secure nullification of cipher data structures on freeing. Treat any edit around cipher state cleanup as security-sensitive. Do not remove zeroization, nullification, or cleanup paths because they look redundant — this is exactly the kind of code where Naur's "amorphous additions" warning bites in reverse. -SQLite 3.53.0 includes a fix for the WAL-reset database corruption bug. Do not downgrade to a pre-fix SQLite baseline without explicitly accepting the risk and recording the justification (per universal contract §1.5). +SQLite 3.53.1 includes a fix for the WAL-reset database corruption bug. Do not downgrade to a pre-fix SQLite baseline without explicitly accepting the risk and recording the justification (per universal contract §1.5). -### 3.2 SQLite 3.53.0 feature posture +### 3.2 SQLite 3.53.1 feature posture -Use SQLite 3.53.0 capabilities only when the deployed runtime is guaranteed to be SQLite3MC 2.3.3 / SQLite 3.53.0 or newer. +Use SQLite 3.53.1 capabilities only when the deployed runtime is guaranteed to be SQLite3MC 2.3.4 / SQLite 3.53.1 or newer. -Notable 3.53.0 behavior for agents: +Notable 3.53.1 behavior for agents: - `ALTER TABLE` can add and remove `NOT NULL` and `CHECK` constraints. Use this only when migration compatibility is acceptable. - `REINDEX EXPRESSIONS` can rebuild expression indexes. Prefer it when repairing stale expression-index state rather than inventing application-level workarounds. -- `json_array_insert()` and `jsonb_array_insert()` are available in the 3.53.0 baseline. +- `json_array_insert()` and `jsonb_array_insert()` are available in the 3.53.1 baseline. - The CLI output defaults changed for interactive sessions through QRF. Tests and scripts must set explicit output modes instead of relying on human-oriented defaults. - Bare semicolons at the end of dot-commands are silently ignored. Treat CLI script compatibility deliberately. -- New C interfaces such as `sqlite3_str_truncate()`, `sqlite3_str_free()`, `sqlite3_carray_bind_v2()`, `SQLITE_PREPARE_FROM_DDL`, `SQLITE_UTF8_ZT`, `SQLITE_LIMIT_PARSER_DEPTH`, and `SQLITE_DBCONFIG_FP_DIGITS` are available only when the runtime really is 3.53.0+. +- New C interfaces such as `sqlite3_str_truncate()`, `sqlite3_str_free()`, `sqlite3_carray_bind_v2()`, `SQLITE_PREPARE_FROM_DDL`, `SQLITE_UTF8_ZT`, `SQLITE_LIMIT_PARSER_DEPTH`, and `SQLITE_DBCONFIG_FP_DIGITS` are available only when the runtime really is 3.53.1+. - Floating-point text conversion behavior changed to round by default to 17 significant digits instead of the previous 15. Review golden outputs, text dumps, hash inputs, and deterministic serialization tests. - The self-healing index feature may address stale expression index issues, but it does not replace tests for migration and query correctness. -Do not write code or migrations that silently require 3.53.0 if production, tests, system packages, or bundled artifacts may still load an older SQLite. +Do not write code or migrations that silently require 3.53.1 if production, tests, system packages, or bundled artifacts may still load an older SQLite. ### 3.3 SQLite 3.52 warning -SQLite 3.52.0 was withdrawn upstream. Do not select SQLite3MC 2.3.0 / SQLite 3.52.0 as a fallback baseline. If a repository already contains that version (see §0.1), surface the issue and prefer moving to SQLite3MC 2.3.3 or a project-approved fixed baseline. +SQLite 3.52.0 was withdrawn upstream. Do not select SQLite3MC 2.3.0 / SQLite 3.52.0 as a fallback baseline. If a repository already contains that version (see §0.1), surface the issue and prefer moving to SQLite3MC 2.3.4 or a project-approved fixed baseline. --- @@ -395,7 +395,7 @@ When a database uses WAL or rollback journaling: - avoid deleting sidecar files as a substitute for proper checkpoint/recovery logic; - test multiple connections if the application uses them. -SQLite 3.53.0 includes an upstream fix for a WAL-reset corruption bug, but this does not remove the need for connection, checkpoint, and backup discipline. +SQLite 3.53.1 includes an upstream fix for a WAL-reset corruption bug, but this does not remove the need for connection, checkpoint, and backup discipline. ### 7.3 Backup, restore, VACUUM, and export @@ -443,7 +443,7 @@ Do not convert all SQLite failures into generic booleans or generic exceptions. SQLite SQL compatibility is a runtime contract. -Before using a 3.53.0 SQL feature in migrations or generated SQL, verify that all deployment targets load SQLite3MC 2.3.3 / SQLite 3.53.0 or newer. +Before using a 3.53.1 SQL feature in migrations or generated SQL, verify that all deployment targets load SQLite3MC 2.3.4 / SQLite 3.53.1 or newer. Be especially cautious with: @@ -458,7 +458,7 @@ If a repository supports multiple SQLite baselines, write migrations and SQL to ### 8.3 CLI scripts and golden outputs -SQLite 3.53.0 changed human-oriented CLI formatting through QRF. +SQLite 3.53.1 changed human-oriented CLI formatting through QRF. For tests and automation: @@ -603,7 +603,7 @@ Performance-sensitive changes should consider: - synchronous mode; - hardware acceleration and target CPU features; - binding overhead; -- query planner changes in SQLite 3.53.0. +- query planner changes in SQLite 3.53.1. Do not weaken encryption, durability, or compatibility for unmeasured performance claims. diff --git a/.devcontainer/devcontainer-lock.json b/.devcontainer/devcontainer-lock.json new file mode 100644 index 00000000..57df6bf9 --- /dev/null +++ b/.devcontainer/devcontainer-lock.json @@ -0,0 +1,9 @@ +{ + "features": { + "ghcr.io/devcontainers/features/docker-outside-of-docker:1": { + "version": "1.9.1", + "resolved": "ghcr.io/devcontainers/features/docker-outside-of-docker@sha256:dc89605f01ff2f24252c61f7c8ba2a58ccdbc14f2ebf87a7952d9e2b89834850", + "integrity": "sha256:dc89605f01ff2f24252c61f7c8ba2a58ccdbc14f2ebf87a7952d9e2b89834850" + } + } +} diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml index dbe4537b..fdde0d85 100644 --- a/.github/workflows/publish.yml +++ b/.github/workflows/publish.yml @@ -3,7 +3,7 @@ name: Build and Publish on: push: tags: - - 'v*' + - "v*" workflow_dispatch: inputs: release_tag: @@ -17,49 +17,146 @@ on: type: boolean permissions: - contents: write - id-token: write + contents: read concurrency: group: publication-${{ github.workflow }}-${{ inputs.release_tag || github.ref_name }} cancel-in-progress: true jobs: + release-contract: + name: Resolve Release Contract + runs-on: ubuntu-latest + permissions: + contents: read + outputs: + minimum-version: ${{ steps.python-support.outputs.minimum-version }} + latest-version: ${{ steps.python-support.outputs.latest-version }} + supported-json: ${{ steps.python-support.outputs.supported-json }} + freethreaded-version: ${{ steps.python-support.outputs.freethreaded-version }} + unsupported-version: ${{ steps.python-support.outputs.unsupported-version }} + release-tag: ${{ steps.release-ref.outputs.release-tag }} + release-version: ${{ steps.release-ref.outputs.release-version }} + release-commit: ${{ steps.release-ref.outputs.release-commit }} + tag-object-sha: ${{ steps.release-ref.outputs.tag-object-sha }} + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + + - name: Export Python support contract + id: python-support + run: python3 scripts/python_support.py github-outputs >> "$GITHUB_OUTPUT" + + - name: Resolve immutable annotated release tag + id: release-ref + env: + GH_TOKEN: ${{ github.token }} + RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + run: | + set -euo pipefail + python3 - <<'PY' >> "$GITHUB_OUTPUT" + import json + import os + import re + import urllib.error + import urllib.request + + release_tag = os.environ["RELEASE_TAG"] + repo = os.environ["GITHUB_REPOSITORY"] + token = os.environ["GH_TOKEN"] + + if re.fullmatch(r"v\d+\.\d+\.\d+(?:[-+][A-Za-z0-9.]+)?", release_tag) is None: + raise SystemExit(f"::error::Invalid release tag format: {release_tag}") + + headers = { + "Accept": "application/vnd.github+json", + "Authorization": f"Bearer {token}", + "X-GitHub-Api-Version": "2022-11-28", + } + + def fetch(url: str) -> dict[str, object]: + request = urllib.request.Request(url, headers=headers) + try: + with urllib.request.urlopen(request) as response: + return json.load(response) + except urllib.error.HTTPError as error: # pragma: no cover - exercised in workflow + detail = error.read().decode("utf-8", errors="replace") + raise SystemExit( + f"::error::Failed to fetch release metadata from {url}: " + f"{error.code} {detail}" + ) from error + + ref = fetch(f"https://api.github.com/repos/{repo}/git/ref/tags/{release_tag}") + ref_object = ref.get("object", {}) + if ref_object.get("type") != "tag": + raise SystemExit( + "::error::Release tags must be annotated tag objects; " + f"{release_tag} resolved to {ref_object.get('type')!r}" + ) + + tag_sha = str(ref_object.get("sha", "")) + tag_object = fetch(f"https://api.github.com/repos/{repo}/git/tags/{tag_sha}") + if tag_object.get("tag") != release_tag: + raise SystemExit( + "::error::Resolved tag object does not match requested release tag" + ) + + target_object = tag_object.get("object", {}) + if target_object.get("type") != "commit": + raise SystemExit( + "::error::Release tag must point to a commit object" + ) + + verification = tag_object.get("verification") or {} + if not verification.get("verified"): + reason = verification.get("reason", "unknown") + raise SystemExit( + "::error::Release tag signature is not verified by GitHub " + f"(reason: {reason})" + ) + + commit_sha = str(target_object.get("sha", "")) + if len(commit_sha) != 40: + raise SystemExit("::error::Resolved release commit SHA is malformed") + + print(f"release-tag={release_tag}") + print(f"release-version={release_tag.removeprefix('v')}") + print(f"release-commit={commit_sha}") + print(f"tag-object-sha={tag_sha}") + PY + build: name: Build & Verify Quality (Python ${{ matrix.python-version }}) + needs: release-contract runs-on: ubuntu-latest timeout-minutes: 60 - # Force all uv commands (sync, run, etc.) to use the same versioned venv as the - # scripts. Without this, bare "uv sync" creates the default ".venv" while the - # scripts pivot to ".venv-", wasting ~150 MiB on a duplicate environment. + permissions: + contents: read env: UV_PROJECT_ENVIRONMENT: .venv-${{ matrix.python-version }} - RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + RELEASE_TAG: ${{ needs.release-contract.outputs.release-tag }} + RELEASE_VERSION: ${{ needs.release-contract.outputs.release-version }} + RELEASE_COMMIT: ${{ needs.release-contract.outputs.release-commit }} strategy: fail-fast: false matrix: - # 3.13 is the minimum supported version (requires-python = ">=3.13"). - # 3.14 validates forward compatibility. - # When Python 3.15 releases (~late 2026), add "3.15" here. - python-version: ["3.13", "3.14"] + python-version: ${{ fromJSON(needs.release-contract.outputs.supported-json) }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Verify clean working tree run: | - # Use porcelain format for maximum reliability across all git states (detached HEAD, etc.) if [ -n "$(git status --porcelain)" ]; then echo "::error::Working directory is not clean after checkout" - echo "Modified/untracked files detected:" - git status --porcelain - echo "" - git status + git status --short + exit 1 + fi + if [ "$(git rev-parse HEAD)" != "$RELEASE_COMMIT" ]; then + echo "::error::Checked out commit does not match resolved release commit" exit 1 fi - echo "Working tree is clean" - name: Set up uv uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 @@ -71,43 +168,31 @@ jobs: run: chmod +x ./scripts/*.sh - name: Verify shell syntax - run: bash -n scripts/*.sh + run: bash -n check.sh scripts/*.sh - name: Detect package name id: detect run: | set -euo pipefail - # Robust detection excluding egg-info and pycache PACKAGE_NAME=$(find src -mindepth 1 -maxdepth 1 -type d -not -name "*.egg-info" -not -name "__pycache__" | head -n 1 | xargs basename) if [ -z "$PACKAGE_NAME" ]; then echo "::error::Could not detect package in src/" exit 1 fi - echo "name=$PACKAGE_NAME" >> $GITHUB_OUTPUT - echo "PACKAGE_NAME=$PACKAGE_NAME" >> $GITHUB_ENV + echo "name=$PACKAGE_NAME" >> "$GITHUB_OUTPUT" + echo "PACKAGE_NAME=$PACKAGE_NAME" >> "$GITHUB_ENV" echo "Detected package: $PACKAGE_NAME" - name: Validate version tag run: | set -euo pipefail - TAG_VERSION="${RELEASE_TAG#v}" - - # Validate tag format - if ! [[ "$TAG_VERSION" =~ ^[0-9]+\.[0-9]+\.[0-9]+(-[a-zA-Z0-9.]+)?$ ]]; then - echo "::error::Invalid tag format: $TAG_VERSION (expected semver)" - exit 1 - fi - PACKAGE_VERSION=$(python -c "import tomllib; print(tomllib.load(open('pyproject.toml', 'rb'))['project']['version'])") - - echo "Tag version: $TAG_VERSION" + echo "Resolved tag version: $RELEASE_VERSION" echo "Package version: $PACKAGE_VERSION" - - if [ "$TAG_VERSION" != "$PACKAGE_VERSION" ]; then - echo "::error::Tag $TAG_VERSION doesn't match package version $PACKAGE_VERSION" + if [ "$RELEASE_VERSION" != "$PACKAGE_VERSION" ]; then + echo "::error::Tag version $RELEASE_VERSION does not match package version $PACKAGE_VERSION" exit 1 fi - echo "Version validation passed" - name: Install dependencies run: | @@ -117,22 +202,13 @@ jobs: - name: Validate runtime version matches tag run: | set -euo pipefail - PACKAGE="${{ steps.detect.outputs.name }}" - RUNTIME_VERSION=$(uv run python -c "import ${PACKAGE}; print(${PACKAGE}.__version__)") - TAG_VERSION="${RELEASE_TAG#v}" - - echo "Package: $PACKAGE" - echo "Runtime version: $RUNTIME_VERSION" - echo "Tag version: $TAG_VERSION" - - if [ "$RUNTIME_VERSION" != "$TAG_VERSION" ]; then - echo "::error::Runtime version $RUNTIME_VERSION doesn't match tag $TAG_VERSION" + if [ "$RUNTIME_VERSION" != "$RELEASE_VERSION" ]; then + echo "::error::Runtime version $RUNTIME_VERSION doesn't match tag $RELEASE_VERSION" echo "::error::Update __version__ in src/${PACKAGE}/__init__.py" exit 1 fi - echo "Runtime version validation passed" - name: Run Quality Checks (Lint & Test) run: | @@ -141,11 +217,21 @@ jobs: echo "::endgroup::" echo "::group::Testing" - uv run scripts/test.sh --ci + PY_VERSION=${{ matrix.python-version }} uv run scripts/test.sh --ci echo "::endgroup::" + - name: Verify quality gates left checkout unchanged + run: | + set -euo pipefail + if [ -n "$(git status --porcelain)" ]; then + echo "::error::Verification mutated the checkout" + git status --short + git diff --stat + exit 1 + fi + - name: Upload coverage reports to Codecov - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0 with: token: ${{ secrets.CODECOV_TOKEN }} @@ -153,23 +239,22 @@ jobs: fail_ci_if_error: false - name: Build package - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: uv build - name: Check package metadata - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: uv run twine check dist/* - name: Verify package integrity - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: | set -euo pipefail PACKAGE="${{ steps.detect.outputs.name }}" echo "Verifying package: $PACKAGE" - # Check wheel contains required files - set +o pipefail # grep -q can cause SIGPIPE + set +o pipefail if ! unzip -l dist/*.whl | grep -q "${PACKAGE}/__init__.py"; then set -o pipefail echo "::error::Wheel missing ${PACKAGE}/__init__.py" @@ -177,7 +262,6 @@ jobs: fi set -o pipefail - # Ensure no compiled Python files in wheel set +o pipefail if unzip -l dist/*.whl | grep -qE "\.pyc$|__pycache__"; then set -o pipefail @@ -186,8 +270,7 @@ jobs: fi set -o pipefail - # Check source distribution contains source files - set +o pipefail # grep -q closes pipe early, causing tar SIGPIPE with pipefail + set +o pipefail if ! tar -tzf dist/*.tar.gz | grep -q "src/${PACKAGE}/__init__.py"; then set -o pipefail echo "::error::Source distribution missing src/${PACKAGE}/__init__.py" @@ -195,7 +278,6 @@ jobs: fi set -o pipefail - # Repository-only agent guidance must not ship in released artifacts set +o pipefail if unzip -l dist/*.whl | grep -qE '(^|[[:space:]])AGENTS\.md$|(^|[[:space:]])\.codex/'; then set -o pipefail @@ -215,11 +297,11 @@ jobs: echo "Package integrity verified" - name: Create release checksum receipt - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: | set -euo pipefail PACKAGE="${{ steps.detect.outputs.name }}" - VERSION="${RELEASE_TAG#v}" + VERSION="$RELEASE_VERSION" cd dist shasum -a 256 \ @@ -228,11 +310,11 @@ jobs: > "${PACKAGE}-${VERSION}.sha256" - name: Debug Artifacts - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: ls -laR dist/ - name: Stage publication artifacts - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version run: | set -euo pipefail mkdir -p publish-dist release-assets-dist @@ -258,7 +340,7 @@ jobs: } - name: Store PyPI distributions - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: pypi-dist @@ -267,7 +349,7 @@ jobs: if-no-files-found: error - name: Store GitHub release assets - if: matrix.python-version == '3.14' + if: matrix.python-version == needs.release-contract.outputs.latest-version uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1 with: name: release-assets @@ -275,8 +357,48 @@ jobs: retention-days: 7 if-no-files-found: error + verify-freethreaded: + name: Verify Free-Threaded Python (${{ needs.release-contract.outputs.freethreaded-version }}) + needs: release-contract + runs-on: ubuntu-latest + timeout-minutes: 60 + permissions: + contents: read + env: + PY_VERSION: ${{ needs.release-contract.outputs.freethreaded-version }} + UV_PROJECT_ENVIRONMENT: .venv-${{ needs.release-contract.outputs.freethreaded-version }} + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + with: + fetch-depth: 0 + ref: ${{ needs.release-contract.outputs.release-commit }} + + - name: Set up Python ${{ needs.release-contract.outputs.freethreaded-version }} + uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 + with: + python-version: ${{ needs.release-contract.outputs.freethreaded-version }} + + - name: Set up uv + uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 + with: + enable-cache: true + + - name: Make scripts executable + run: chmod +x ./scripts/*.sh + + - name: Install dependencies + run: | + uv sync --group dev --locked --python "$PY_VERSION" + echo "Locked free-threaded environment synchronized" + + - name: Run free-threaded test suite + run: | + echo "::group::Free-threaded Testing" + uv run scripts/test.sh --ci --quick + echo "::endgroup::" + test-publish: - needs: build + needs: [build, verify-freethreaded] runs-on: ubuntu-latest timeout-minutes: 30 if: github.event_name == 'workflow_dispatch' && inputs.publish_to_testpypi @@ -295,21 +417,21 @@ jobs: verify-test-publish: name: Verify TestPyPI Publication - needs: test-publish + needs: [release-contract, test-publish] runs-on: ubuntu-latest timeout-minutes: 30 if: github.event_name == 'workflow_dispatch' && inputs.publish_to_testpypi strategy: matrix: - # Test installation on both minimum and latest supported Python versions. - python-version: ["3.13", "3.14"] + python-version: ${{ fromJSON(needs.release-contract.outputs.supported-json) }} env: - RELEASE_TAG: ${{ inputs.release_tag }} + RELEASE_TAG: ${{ needs.release-contract.outputs.release-tag }} + RELEASE_VERSION: ${{ needs.release-contract.outputs.release-version }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ inputs.release_tag }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 @@ -325,7 +447,7 @@ jobs: echo "::error::Could not detect package in src/" exit 1 fi - echo "name=$PACKAGE_NAME" >> $GITHUB_OUTPUT + echo "name=$PACKAGE_NAME" >> "$GITHUB_OUTPUT" echo "Detected package: $PACKAGE_NAME" - name: Wait for TestPyPI CDN propagation @@ -337,21 +459,19 @@ jobs: run: | set -euo pipefail PACKAGE="${{ steps.detect.outputs.name }}" - VERSION="${RELEASE_TAG#v}" - + MAX_ATTEMPTS=5 ATTEMPT=1 DELAY=20 while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do - echo "Attempt $ATTEMPT of $MAX_ATTEMPTS: Installing ${PACKAGE}==${VERSION}..." + echo "Attempt $ATTEMPT of $MAX_ATTEMPTS: Installing ${PACKAGE}==${RELEASE_VERSION}..." pip cache purge || true - # Extra index url ensures dependencies are found on real PyPI - if pip install --no-cache-dir --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "${PACKAGE}==${VERSION}"; then - echo "Successfully installed ${PACKAGE}==${VERSION}" + if pip install --no-cache-dir --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ "${PACKAGE}==${RELEASE_VERSION}"; then + echo "Successfully installed ${PACKAGE}==${RELEASE_VERSION}" break fi - + if [ $ATTEMPT -eq $MAX_ATTEMPTS ]; then echo "::error::Failed to install from TestPyPI after $MAX_ATTEMPTS attempts" exit 1 @@ -366,16 +486,12 @@ jobs: set -euo pipefail PACKAGE="${{ steps.detect.outputs.name }}" INSTALLED_VERSION=$(python -c "import ${PACKAGE}; print(${PACKAGE}.__version__)") - EXPECTED_VERSION="${RELEASE_TAG#v}" - echo "Installed version: $INSTALLED_VERSION" - echo "Expected version: $EXPECTED_VERSION" - - if [ "$INSTALLED_VERSION" != "$EXPECTED_VERSION" ]; then - echo "::error::Version mismatch: $INSTALLED_VERSION != $EXPECTED_VERSION" + echo "Expected version: $RELEASE_VERSION" + if [ "$INSTALLED_VERSION" != "$RELEASE_VERSION" ]; then + echo "::error::Version mismatch: $INSTALLED_VERSION != $RELEASE_VERSION" exit 1 fi - echo "Version verification passed" - name: Smoke test run: | @@ -385,17 +501,19 @@ jobs: publish-release-assets: name: Publish GitHub Release Assets - needs: build + needs: [release-contract, build, verify-freethreaded] runs-on: ubuntu-latest timeout-minutes: 20 + permissions: + contents: write env: GH_TOKEN: ${{ github.token }} - RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + RELEASE_TAG: ${{ needs.release-contract.outputs.release-tag }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Make scripts executable run: chmod +x ./scripts/*.sh @@ -410,17 +528,19 @@ jobs: verify-github-release: name: Verify GitHub Release - needs: publish-release-assets + needs: [release-contract, publish-release-assets] runs-on: ubuntu-latest timeout-minutes: 15 + permissions: + contents: read env: GH_TOKEN: ${{ github.token }} - RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + RELEASE_TAG: ${{ needs.release-contract.outputs.release-tag }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Make scripts executable run: chmod +x ./scripts/*.sh @@ -429,9 +549,11 @@ jobs: run: ./scripts/verify-github-release.sh "$RELEASE_TAG" publish: - needs: build + needs: [build, verify-freethreaded] runs-on: ubuntu-latest timeout-minutes: 20 + permissions: + id-token: write steps: - uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1 with: @@ -446,20 +568,20 @@ jobs: verify-publish: name: Verify PyPI Publication - needs: publish + needs: [release-contract, publish] runs-on: ubuntu-latest timeout-minutes: 30 strategy: matrix: - # Test installation on both minimum and latest supported Python versions. - python-version: ["3.13", "3.14"] + python-version: ${{ fromJSON(needs.release-contract.outputs.supported-json) }} env: - RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + RELEASE_TAG: ${{ needs.release-contract.outputs.release-tag }} + RELEASE_VERSION: ${{ needs.release-contract.outputs.release-version }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 @@ -475,7 +597,7 @@ jobs: echo "::error::Could not detect package in src/" exit 1 fi - echo "name=$PACKAGE_NAME" >> $GITHUB_OUTPUT + echo "name=$PACKAGE_NAME" >> "$GITHUB_OUTPUT" echo "Detected package: $PACKAGE_NAME" - name: Wait for PyPI CDN propagation @@ -486,7 +608,6 @@ jobs: - name: Install from PyPI with retry run: | set -euo pipefail - TAG_VERSION="${RELEASE_TAG#v}" PACKAGE="${{ steps.detect.outputs.name }}" MAX_ATTEMPTS=5 @@ -494,10 +615,10 @@ jobs: DELAY=30 while [ $ATTEMPT -le $MAX_ATTEMPTS ]; do - echo "Attempt $ATTEMPT of $MAX_ATTEMPTS: Installing ${PACKAGE}==${TAG_VERSION}..." + echo "Attempt $ATTEMPT of $MAX_ATTEMPTS: Installing ${PACKAGE}==${RELEASE_VERSION}..." pip cache purge || true - if pip install --no-cache-dir "${PACKAGE}==${TAG_VERSION}"; then - echo "Successfully installed ${PACKAGE}==${TAG_VERSION}" + if pip install --no-cache-dir "${PACKAGE}==${RELEASE_VERSION}"; then + echo "Successfully installed ${PACKAGE}==${RELEASE_VERSION}" break fi @@ -515,16 +636,12 @@ jobs: set -euo pipefail PACKAGE="${{ steps.detect.outputs.name }}" INSTALLED_VERSION=$(python -c "import ${PACKAGE}; print(${PACKAGE}.__version__)") - TAG_VERSION="${RELEASE_TAG#v}" - echo "Installed version: $INSTALLED_VERSION" - echo "Expected version: $TAG_VERSION" - - if [ "$INSTALLED_VERSION" != "$TAG_VERSION" ]; then - echo "::error::Version mismatch: $INSTALLED_VERSION != $TAG_VERSION" + echo "Expected version: $RELEASE_VERSION" + if [ "$INSTALLED_VERSION" != "$RELEASE_VERSION" ]; then + echo "::error::Version mismatch: $INSTALLED_VERSION != $RELEASE_VERSION" exit 1 fi - echo "Version verification passed" - name: Smoke test run: | @@ -533,17 +650,18 @@ jobs: python -c "import ${PACKAGE} as pkg; r = pkg.parse_ftl('greeting = Hello, World!'); s = pkg.serialize_ftl(r); assert 'greeting' in s; print('Smoke test passed: ${PACKAGE} v' + pkg.__version__)" verify-python-requirement: - name: Verify Python 3.13+ Requirement - needs: publish + name: Verify Python Requirement Floor + needs: [release-contract, publish] runs-on: ubuntu-latest timeout-minutes: 15 env: - RELEASE_TAG: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + RELEASE_VERSION: ${{ needs.release-contract.outputs.release-version }} + UNSUPPORTED_VERSION: ${{ needs.release-contract.outputs.unsupported-version }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 with: fetch-depth: 0 - ref: ${{ github.event_name == 'workflow_dispatch' && inputs.release_tag || github.ref_name }} + ref: ${{ needs.release-contract.outputs.release-commit }} - name: Detect package name id: detect @@ -554,32 +672,22 @@ jobs: echo "::error::Could not detect package in src/" exit 1 fi - echo "name=$PACKAGE_NAME" >> $GITHUB_OUTPUT + echo "name=$PACKAGE_NAME" >> "$GITHUB_OUTPUT" echo "Detected package: $PACKAGE_NAME" - # Python 3.12 is intentionally used here — it is BELOW the package floor - # (requires-python = ">=3.13" in pyproject.toml). This step is a negative - # test: it verifies that pip correctly rejects installation on unsupported - # Python versions. 3.12 is chosen because it is the newest Python version - # available on GitHub Actions runners by default. - # GitHub Actions runners ship with Python 3.12 as their system Python as of 2025. - - name: Set up Python 3.12 + - name: Set up Python ${{ env.UNSUPPORTED_VERSION }} uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 with: - python-version: "3.12" + python-version: ${{ env.UNSUPPORTED_VERSION }} - - name: Verify install is blocked on Python 3.12 (below requires-python floor) + - name: Verify install is blocked below the support floor run: | set -euo pipefail - TAG_VERSION="${RELEASE_TAG#v}" PACKAGE="${{ steps.detect.outputs.name }}" - - echo "Attempting to install $PACKAGE==$TAG_VERSION on Python 3.12 (should fail)..." - - if pip install --no-cache-dir "${PACKAGE}==${TAG_VERSION}" 2>/dev/null; then - echo "::error::Installation succeeded on Python 3.12 (should have failed)" + echo "Attempting to install $PACKAGE==$RELEASE_VERSION on Python $UNSUPPORTED_VERSION (should fail)..." + if pip install --no-cache-dir "${PACKAGE}==${RELEASE_VERSION}" 2>/dev/null; then + echo "::error::Installation succeeded below the supported Python floor" echo "::error::Check requires-python in pyproject.toml" exit 1 fi - - echo "Installation correctly blocked on Python 3.12" + echo "Installation correctly blocked on Python $UNSUPPORTED_VERSION" diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index af33804b..54a71979 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -14,22 +14,37 @@ concurrency: cancel-in-progress: true jobs: + python-support: + name: Resolve Python Support Contract + runs-on: ubuntu-latest + permissions: + contents: read + outputs: + minimum-version: ${{ steps.contract.outputs.minimum-version }} + latest-version: ${{ steps.contract.outputs.latest-version }} + supported-json: ${{ steps.contract.outputs.supported-json }} + freethreaded-version: ${{ steps.contract.outputs.freethreaded-version }} + unsupported-version: ${{ steps.contract.outputs.unsupported-version }} + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + + - name: Export Python support contract + id: contract + run: python3 scripts/python_support.py github-outputs >> "$GITHUB_OUTPUT" + test: name: Test (Python ${{ matrix.python-version }}) + needs: python-support runs-on: ubuntu-latest timeout-minutes: 45 - # Force all uv commands (sync, run, etc.) to use the same versioned venv as the - # scripts. Without this, bare "uv sync" creates the default ".venv" while the - # scripts pivot to ".venv-", wasting ~150 MiB on a duplicate environment. + permissions: + contents: read env: UV_PROJECT_ENVIRONMENT: .venv-${{ matrix.python-version }} strategy: fail-fast: false matrix: - # 3.13 is the minimum supported version (requires-python = ">=3.13"). - # 3.14 validates forward compatibility. - # When Python 3.15 releases (~late 2026), add "3.15" here. - python-version: ["3.13", "3.14"] + python-version: ${{ fromJSON(needs.python-support.outputs.supported-json) }} steps: - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 @@ -43,7 +58,7 @@ jobs: run: chmod +x ./scripts/*.sh - name: Verify shell syntax - run: bash -n scripts/*.sh + run: bash -n check.sh scripts/*.sh - name: Detect package name id: detect @@ -54,7 +69,7 @@ jobs: echo "::error::Could not detect package in src/" exit 1 fi - echo "name=$PACKAGE_NAME" >> $GITHUB_OUTPUT + echo "name=$PACKAGE_NAME" >> "$GITHUB_OUTPUT" echo "Detected package: $PACKAGE_NAME" - name: Install dependencies @@ -68,13 +83,63 @@ jobs: PY_VERSION=${{ matrix.python-version }} uv run scripts/lint.sh echo "::endgroup::" + - name: Verify lint left checkout unchanged + run: | + set -euo pipefail + if [ -n "$(git status --porcelain)" ]; then + echo "::error::Verification mutated the checkout" + git status --short + git diff --stat + exit 1 + fi + - name: Run Tests run: | echo "::group::Testing" - uv run scripts/test.sh --ci + PY_VERSION=${{ matrix.python-version }} uv run scripts/test.sh --ci echo "::endgroup::" - name: Verify package import run: | PACKAGE="${{ steps.detect.outputs.name }}" uv run python -c "import ${PACKAGE}; print(f'Successfully imported ${PACKAGE} v{${PACKAGE}.__version__}')" + + test-freethreaded: + name: Test (Python ${{ needs.python-support.outputs.freethreaded-version }}) + needs: python-support + runs-on: ubuntu-latest + timeout-minutes: 60 + permissions: + contents: read + env: + PY_VERSION: ${{ needs.python-support.outputs.freethreaded-version }} + UV_PROJECT_ENVIRONMENT: .venv-${{ needs.python-support.outputs.freethreaded-version }} + steps: + - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 + + - name: Set up Python ${{ needs.python-support.outputs.freethreaded-version }} + uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 + with: + python-version: ${{ needs.python-support.outputs.freethreaded-version }} + + - name: Set up uv + uses: astral-sh/setup-uv@08807647e7069bb48b6ef5acd8ec9567f424441b # v8.1.0 + with: + enable-cache: true + + - name: Make scripts executable + run: chmod +x ./scripts/*.sh + + - name: Verify shell syntax + run: bash -n check.sh scripts/*.sh + + - name: Install dependencies + run: | + uv sync --group dev --locked --python "$PY_VERSION" + echo "Locked free-threaded environment synchronized" + + - name: Run Tests + run: | + echo "::group::Free-threaded Testing" + uv run scripts/test.sh --ci --quick + echo "::endgroup::" diff --git a/AGENTS.md b/AGENTS.md index 55760050..4e656da2 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -1,7 +1,7 @@ # AGENTS.md — Agent Entry Protocol -**Version:** 2.3.0 -**Updated:** 2026-04-30 +**Version:** 2.4.1 +**Updated:** 2026-05-08 This file is the repository entry point for agent work. It defines load order, precedence, repository-wide exceptions, and the universal minimum that applies before any specialized language, framework, database/native, domain-modeling, or documentation rule. @@ -31,7 +31,7 @@ When opening a repository, load context in this order: 5. Load the application-framework protocol for each touched surface: - Tauri 2.10.x: `.codex/AGENTS_TAURI210.md` (v2.0.0+) 6. Load the database/native dependency protocol for each touched surface: - - SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0: `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` (v2.0.0+) + - SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0: `.codex/AGENTS_SQLITE3MC234_SQLITE3531.md` (v2.0.1+) 7. Load the domain-modeling lens **only when the change touches business meaning**: `.codex/DOMAIN_DRIVEN_DESIGN_LENS.md` (v1.0.0+). Triggers include domain state, business rules, workflow names, commands, domain events, permissions, policies, calculations, lifecycle transitions, user-facing business terms, or integration contracts between models. Do not load for purely mechanical work — build wiring, generic plumbing, infrastructure with no domain meaning. The Universal Engineering Contract §1.7 *Domain meaning gate* is the formal trigger. 8. For documentation authoring, documentation refactoring, or code changes that alter documented public contracts, load `.codex/PROTOCOL_AFAD.md` unless the only touched document is the repository root `README.md`. @@ -88,7 +88,7 @@ Application-framework surfaces: Database/native dependency surfaces: -- SQLite3 Multiple Ciphers 2.3.3 / SQLite 3.53.0 surfaces use `.codex/AGENTS_SQLITE3MC233_SQLITE353.md` in addition to any applicable language or framework protocol. +- SQLite3 Multiple Ciphers 2.3.4 / SQLite 3.53.1 surfaces use `.codex/AGENTS_SQLITE3MC234_SQLITE3531.md` in addition to any applicable language or framework protocol. Domain-modeling surfaces: @@ -202,10 +202,30 @@ For non-trivial work, the final report combines two shapes: Keep the report proportional to risk. For tiny edits, a concise sentence with verification is enough. -## 7.10 No emoji +### 7.10 No emoji Do not add, retain, or introduce emoji anywhere. This rule applies across all programming languages, markup languages, documentation formats, and plain text. This includes, without limitation, source code, inline comments, documentation comments, docstrings, commit messages, changelogs, release notes, configuration files, documentation, and this AGENTS.md file. There are no exceptions. Remove any emoji encountered while creating, editing, reviewing, or refactoring content. + +### 7.11 In-progress work awareness + +Before beginning any non-trivial task, inspect the repository's in-progress state, not just its committed state: + +```bash +gh pr list --state open \ + --json number,title,url,headRefName,isDraft,author \ + --jq '.[] | [.number, .headRefName, .title] | @tsv' +``` + +For each open PR whose branch or title overlaps the task area, read the PR body and the actual diff before proceeding. An open PR is existing theory-in-progress — work a prior session or contributor already built toward the same goal. Starting fresh without reading it destroys that theory rather than building on it. + +If an open PR substantially covers the task: + +- treat it as the starting point, not a parallel path; +- understand what it does and does not yet do; +- continue from it or explicitly explain why a fresh approach is better. + +This extends the §3 system map. The **Truth** axis is not only the committed HEAD; it includes everything currently in-flight. Discovering an open PR mid-task is late — discover it first. diff --git a/CHANGELOG.md b/CHANGELOG.md index 0117c6a8..25d2a210 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,13 +1,3 @@ ---- -afad: "4.0" -version: "0.166.0" -domain: CHANGELOG -updated: "2026-05-01" -route: - keywords: [changelog, release notes, version history, breaking changes, migration, fixed, what's new] - questions: ["what changed in version X?", "what are the breaking changes?", "what was fixed in the latest release?", "what is the release history?"] ---- - # Changelog Notable changes to this project are documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), @@ -15,6 +5,38 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.167.0] - 2026-05-15 + +### Changed + +- **The maintainer release runbook now distinguishes partial bootstrap payloads from final bootstrap commits.** + `docs/RELEASE_PROTOCOL.md` now gives two explicit Step 3 branch-cut flows: one for release + payloads that still need staged finalization work, and one for clean-clone pre-flight runs that + already proved a detached bootstrap commit containing the full final payload. That removes the + ambiguity around how to create `release/X.Y.Z` after bootstrap-path verification while keeping + the PR diff against `origin/main` as the authoritative scope checkpoint. +- **Runtime boundaries, release contracts, and validation semantics now fail closed earlier and more explicitly.** + Streamed resource loading now enforces bounded input before allocation, filesystem loaders now + apply bounded no-follow reads, custom-function and parsing diagnostics redact payloads by + default, duplicate IDs and cross-resource shadows require explicit overwrite admission, the + async bundle owns its executor and bounded queue instead of delegating through ambient + `asyncio.to_thread()`, and the release/test workflows now derive Python support from one + canonical contract while verifying a dedicated Python 3.13 free-threaded lane and an immutable + annotated release tag. +- **Runtime caching now uses one explicit integrity contract instead of a lenient audit-log grab bag.** + Cache corruption, key confusion, and write-once conflicts now fail fast regardless of bundle + formatting strictness; `CacheConfig.integrity_strict`, `enable_audit`, `max_audit_entries`, and + `max_entry_weight` are gone in favor of `enable_debug_log`, `max_debug_entries`, + `max_entry_payload_bytes`, keyed debug fingerprints, and structured integrity-event sinks. + Custom functions now default to `cacheable=False`, cache keys include the function-registry + generation, oversized retained payloads are measured by deterministic UTF-8 payload bytes, cache + snapshots sanitize retained fallback text, the runtime/localization facades now expose + `CacheDebugLogEntry` plus `CacheIntegrityEvent` as the public evidence surfaces, and the + fuzzing/docs/test inventories now use the renamed debug-log terminology consistently. The + runtime Atheris harness now treats injected cache corruption as the expected fail-closed + integrity outcome even when formatting mode is non-strict, so the canonical fuzz sweep asserts + the same cache contract as the shipped runtime. + ## [0.166.0] - 2026-05-01 ### Changed @@ -7088,7 +7110,8 @@ Both validators are re-exported from `ftllexengine.introspection` and the root [0.29.0]: https://github.com/resoltico/ftllexengine/releases/tag/v0.29.0 [0.28.1]: https://github.com/resoltico/ftllexengine/releases/tag/v0.28.1 [0.28.0]: https://github.com/resoltico/ftllexengine/releases/tag/v0.28.0 -[Unreleased]: https://github.com/resoltico/FTLLexEngine/compare/v0.166.0...HEAD +[Unreleased]: https://github.com/resoltico/FTLLexEngine/compare/v0.167.0...HEAD +[0.167.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.166.0...v0.167.0 [0.166.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.165.0...v0.166.0 [0.165.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.164.0...v0.165.0 [0.164.0]: https://github.com/resoltico/FTLLexEngine/compare/v0.163.0...v0.164.0 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index c5cf4cf1..3112628f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: CONTRIBUTING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [contributing, development, uv, lint, test, fuzz, benchmark, release, virtualenv] questions: ["how do I set up development?", "how do I run lint and tests?", "how do I work on fuzzing?", "how do I prepare a release?"] @@ -65,8 +65,8 @@ Useful variants: Markdown changes should stay synchronized with the code and examples they describe. ```bash -uv run python scripts/validate_docs.py -uv run python scripts/validate_version.py +uv run --group dev --python 3.14 python scripts/validate_docs.py +uv run --group dev --python 3.14 python scripts/validate_version.py uv run python scripts/run_examples.py ``` @@ -136,4 +136,6 @@ When the change touches runtime behavior or supported Python versions, also run ```bash PY_VERSION=3.14 ./scripts/lint.sh PY_VERSION=3.14 ./scripts/test.sh +uv run --group dev --python 3.14 python scripts/validate_docs.py +uv run --group dev --python 3.14 python scripts/validate_version.py ``` diff --git a/PATENTS.md b/PATENTS.md index 1f91d03c..1cdc7db3 100644 --- a/PATENTS.md +++ b/PATENTS.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: LEGAL -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [patents, legal, license, fluent, apache, mit, babel] questions: ["what is the patent position?", "does the project include a patent grant?", "what about the Fluent specification license?"] diff --git a/README.md b/README.md index f9040f6d..41a8e9eb 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,32 @@ -[![FTLLexEngine Art](https://raw.githubusercontent.com/resoltico/FTLLexEngine/main/images/FTLLexEngine.jpg)](https://github.com/resoltico/FTLLexEngine) +# FTLLexEngine — Fluent localization for Python -[![PyPI](https://img.shields.io/pypi/v/ftllexengine.svg)](https://pypi.org/project/ftllexengine/) -[![Python Versions](https://img.shields.io/pypi/pyversions/ftllexengine.svg)](https://pypi.org/project/ftllexengine/) -[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) +FTLLexEngine is a Python library for Fluent `.ftl` resources. It loads and validates Fluent +messages, formats locale-aware output, and parses localized user input back into exact Python +types. -# FTLLexEngine — Fluent localization runtime and parser for Python +It is designed for applications that want one locale contract for both directions: -FTLLexEngine is a Python library for the Fluent `.ftl` specification: format locale-aware prices, -dates, and messages for 200+ locales, then parse localized user input back to exact Python types -in the same stack. +- Format messages, currency, dates, and plural forms for 200+ locales through CLDR +- Parse localized input back into `Decimal`, `date`, `datetime`, and related typed values +- Validate `.ftl` resources and message-variable contracts before serving traffic +- Keep locale state inside `FluentBundle` and `FluentLocalization` instances instead of process-global state -Most setups handle the two directions separately — one library brews the outbound message, -something hand-rolled to parse the reply back. Locale rules drift between them. FTLLexEngine runs -both from the same locale, validates `.ftl` resources at boot before the first request, and keeps -threads isolated without touching global state. +## What It Does -- Format currency, dates, and plural messages correctly for 200+ locales via CLDR -- Parse localized user input back to `Decimal`, `date`, or typed values — no float drift -- Validate `.ftl` resources and message schemas at boot, before the first request -- Thread-safe bundles, no global locale state +Use FTLLexEngine when your application already uses Fluent for output and also needs to accept +localized input such as prices, dates, or quantities. -[Copy-paste patterns](docs/QUICK_REFERENCE.md) · [Workflow tour](docs/WORKFLOW_TOUR.md) · [PyPI](https://pypi.org/project/ftllexengine/) +Typical fit: -## Both Ends of the Counter +- Web services that render localized messages and need to parse localized form input correctly +- Back-office or finance systems that must avoid float drift when reading user-entered amounts +- Applications that want boot-time validation of translation resources instead of discovering + broken `.ftl` files during a live request +- Concurrent programs that need locale handling without shared mutable global state -A specialty coffee exporter invoices buyers in German. Buyers reply in their local number format. -One runtime handles both ends: +## Example: Format Output And Parse Input With The Same Locale Rules + +This example formats a German quote and parses a German currency value back to exact data: ```python from decimal import Decimal @@ -42,46 +43,45 @@ parsed, _ = parse_currency("12.450,00 EUR", "de_DE", default_currency="EUR") # → (Decimal("12450.00"), "EUR") ``` -Same locale rules write the invoice and read the buyer's reply. No separate parser. No float -approximation. - -## Where It Fits - -Python apps using Fluent `.ftl` for messages, plural rules, and locale-aware formatting — -especially when users send localized prices, dates, or quantities that need to come back as exact -typed values. Systems that validate `.ftl` resources before accepting traffic, and concurrent apps -that need locale isolation without shared mutable state. +The same locale rules produce the outbound text and parse the inbound value. You do not need one +library for formatting and a different parser for localized input. ## Install -Full runtime — formatting, bidirectional parsing, CLDR locale data: +Python 3.13 or newer. + +Install `ftllexengine[babel]` for the full runtime. This includes Fluent formatting, locale-aware parsing, +and CLDR-backed locale data. Use this option if your application formats localized output or parses +localized user input. ```bash uv add ftllexengine[babel] # or: pip install "ftllexengine[babel]" ``` -Parser only — FTL syntax, AST, validation, zero Babel dependency: +Install `ftllexengine` without extras for the parser-only build. This includes FTL syntax support, +AST types, serialization, and validation, with no Babel dependency. Use this option if you only +need to parse, inspect, validate, or serialize Fluent resources. ```bash uv add ftllexengine # or: pip install ftllexengine ``` -Python 3.13+. Fully typed. Built on the [Fluent specification](https://projectfluent.org/) with -CLDR data via Babel. - -- [Copy-paste patterns](docs/QUICK_REFERENCE.md) -- [Workflow tour](docs/WORKFLOW_TOUR.md) -- [API reference](docs/DOC_00_Index.md) -- [Runnable examples](examples/) +## Documentation -Maintainers: [Release protocol](docs/RELEASE_PROTOCOL.md). +- [Documentation index](docs/DOC_00_Index.md) — complete map of every Markdown document under `docs/` +- [Quick reference](docs/QUICK_REFERENCE.md) — short copy-paste recipes +- [Workflow tour](docs/WORKFLOW_TOUR.md) — end-to-end examples +- [Runtime and API reference](docs/DOC_00_Index.md) — symbol routing plus guide links +- [Release protocol](docs/RELEASE_PROTOCOL.md) — release and publication workflow +- [Runnable examples](examples/) — executable sample programs ## Legal -MIT-licensed. The optional `[babel]` extra adds Babel under BSD-3-Clause. FTLLexEngine is an -independent implementation of the [Fluent syntax specification](https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf) +FTLLexEngine is MIT-licensed. The optional `[babel]` extra adds Babel under BSD-3-Clause. +FTLLexEngine is an independent implementation of the +[Fluent syntax specification](https://github.com/projectfluent/fluent/blob/master/spec/fluent.ebnf) and is not affiliated with or endorsed by Mozilla. [LICENSE](LICENSE) · [NOTICE](NOTICE) · [PATENTS.md](PATENTS.md) diff --git a/check.sh b/check.sh index 3e2e50ac..61d1e5bd 100755 --- a/check.sh +++ b/check.sh @@ -2,7 +2,14 @@ set -euo pipefail -PY_VERSION="${PY_VERSION:-3.13}" +ROOT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +# shellcheck source=scripts/lib/python_support_contract.sh +source "$ROOT_DIR/scripts/lib/python_support_contract.sh" + +# Premise: contributor shell gates default to the minimum supported Python. +# Reason: the latest supported interpreter remains available via PY_VERSION=... +# overrides, but the default path should always validate the support floor. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then UV_ENV=".venv-devcontainer-${PY_VERSION}" else @@ -13,7 +20,6 @@ if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then fi ATHERIS_TARGET_SMOKE_TIME="${ATHERIS_TARGET_SMOKE_TIME:-3}" -ROOT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" cd "$ROOT_DIR" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" != "1" ]]; then diff --git a/docs/CUSTOM_FUNCTIONS_GUIDE.md b/docs/CUSTOM_FUNCTIONS_GUIDE.md index 40d9ac4b..30125564 100644 --- a/docs/CUSTOM_FUNCTIONS_GUIDE.md +++ b/docs/CUSTOM_FUNCTIONS_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: CUSTOM_FUNCTIONS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [custom functions, fluent_function, FunctionRegistry, locale injection, add_function] questions: ["how do I add a custom function?", "how does locale injection work?", "should I use a registry or add_function?"] diff --git a/docs/DATA_INTEGRITY_ARCHITECTURE.md b/docs/DATA_INTEGRITY_ARCHITECTURE.md index ce6db068..6402f6d3 100644 --- a/docs/DATA_INTEGRITY_ARCHITECTURE.md +++ b/docs/DATA_INTEGRITY_ARCHITECTURE.md @@ -1,11 +1,11 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: ARCHITECTURE -updated: "2026-05-01" +updated: "2026-05-15" route: - keywords: [data integrity, strict mode, FrozenFluentError, IntegrityCheckFailedError, cache audit, boot validation] - questions: ["how does strict mode relate to integrity?", "what audit evidence does the runtime expose?", "what is boot validation for?"] + keywords: [data integrity, strict mode, FrozenFluentError, IntegrityCheckFailedError, cache debug log, cache integrity event, boot validation] + questions: ["how does strict mode relate to integrity?", "what cache evidence does the runtime expose?", "what is boot validation for?"] --- # Data Integrity Architecture @@ -20,7 +20,8 @@ The library pushes validation as early as possible and represents runtime failur - `FrozenFluentError` captures formatting and parsing failures without mutable side channels. - `FormattingIntegrityError`, `SyntaxIntegrityError`, and `IntegrityCheckFailedError` surface strict-mode failures explicitly. - `LoadSummary`, `ResourceLoadResult`, and boot schema results provide startup evidence for localization initialization. -- `CacheConfig(enable_audit=True)` exposes immutable audit-log entries for cache operations. +- `CacheConfig(enable_debug_log=True)` exposes bounded debug-log entries for routine cache operations. +- `CacheConfig(integrity_event_sink=...)` emits structured critical integrity events that applications can retain durably. ## Strict Mode @@ -37,8 +38,8 @@ The library pushes validation as early as possible and represents runtime failur The public contract stays centered on the facade types, but the implementation is intentionally partitioned so integrity behavior can evolve without collapsing back into single large modules: - `runtime.bundle` remains the public home of `FluentBundle`, while lifecycle and mutation responsibilities are delegated into focused internal runtime modules. -- `runtime.cache` remains the public cache surface, while audit-log behavior, stats helpers, and cache-key shaping live in dedicated internal cache modules. +- `runtime.cache` remains the public cache surface, while debug-log behavior, integrity-event emission, stats helpers, and cache-key shaping live in dedicated internal cache modules. - `runtime.function_bridge` remains the public registry surface, while decorator metadata attachment and registry introspection helpers are separated internally. - `diagnostics.templates` remains the public diagnostic-template namespace, while reference, runtime, and parsing template families are maintained in smaller focused modules. -This split does not change user imports. It preserves clearer ownership boundaries for audit evidence, strict-mode failures, and runtime mutation paths. +This split does not change user imports. It preserves clearer ownership boundaries for debug evidence, incident-grade integrity events, strict-mode failures, and runtime mutation paths. diff --git a/docs/DEVELOPER_DEVCONTAINER.md b/docs/DEVELOPER_DEVCONTAINER.md index 685c9999..d9610961 100644 --- a/docs/DEVELOPER_DEVCONTAINER.md +++ b/docs/DEVELOPER_DEVCONTAINER.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: CONTRIBUTING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [devcontainer, contributor workflow, docker, check.sh, atheris] questions: ["how do I open the contributor container?", "how do I run the full repo gate?", "how do I run Atheris in the supported environment?"] @@ -58,6 +58,8 @@ From the host without opening an interactive shell: ```bash npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_hypofuzz.sh --preflight npx --yes @devcontainers/cli exec --workspace-folder . ./scripts/fuzz_atheris.sh --smoke-all --time 3 +npx --yes @devcontainers/cli exec --workspace-folder . env PY_VERSION=3.14 ./scripts/lint.sh +npx --yes @devcontainers/cli exec --workspace-folder . env PY_VERSION=3.14 ./scripts/test.sh ``` ## Validation diff --git a/docs/DOC_00_Index.md b/docs/DOC_00_Index.md index 21353627..71614f33 100644 --- a/docs/DOC_00_Index.md +++ b/docs/DOC_00_Index.md @@ -1,14 +1,77 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: INDEX -updated: "2026-05-01" +updated: "2026-05-15" route: - keywords: [api index, routing, FluentBundle, FluentLocalization, parse_ftl, FunctionRegistry, FrozenFluentError, introspection, detect_cycles, entry_dependency_set] - questions: ["where is a symbol documented?", "which file documents the runtime APIs?", "which file documents locale parsing, introspection, and analysis APIs?", "where are syntax, parsing, diagnostics, and dependency-graph references?"] + keywords: [api index, docs index, documentation map, routing, FluentBundle, FluentLocalization, parse_ftl, FunctionRegistry, FrozenFluentError, introspection, detect_cycles, entry_dependency_set] + questions: ["where is a symbol documented?", "which file documents the runtime APIs?", "which file documents locale parsing, introspection, and analysis APIs?", "where are syntax, parsing, diagnostics, and dependency-graph references?", "where is the complete index of Markdown docs under docs/?"] --- -# FTLLexEngine API Reference Index +# FTLLexEngine Documentation And API Index + + + +## Documentation Map + +### Start Here + +| File | Purpose | +|:-----|:--------| +| [DOC_00_Index.md](DOC_00_Index.md) | This file: complete docs map plus API routing table | +| [QUICK_REFERENCE.md](QUICK_REFERENCE.md) | Short recipes for common tasks | +| [WORKFLOW_TOUR.md](WORKFLOW_TOUR.md) | End-to-end usage examples | +| [TERMINOLOGY.md](TERMINOLOGY.md) | Definitions for project vocabulary | +| [MIGRATION.md](MIGRATION.md) | Upgrade notes and breaking-change guidance | + +### Core Reference + +| File | Purpose | +|:-----|:--------| +| [DOC_01_Core.md](DOC_01_Core.md) | Core entry points such as `FluentBundle`, `FluentLocalization`, and resource loading | +| [DOC_02_Types.md](DOC_02_Types.md) | Public semantic and support types | +| [DOC_02_SyntaxTypes.md](DOC_02_SyntaxTypes.md) | Fluent AST node types | +| [DOC_02_SyntaxExpressions.md](DOC_02_SyntaxExpressions.md) | Fluent AST expression nodes | +| [DOC_03_Parsing.md](DOC_03_Parsing.md) | FTL parsing, serialization, and validation APIs | +| [DOC_03_LocaleParsing.md](DOC_03_LocaleParsing.md) | Locale-aware parsing APIs for numbers, dates, and currency | + +### Runtime, Introspection, And Diagnostics + +| File | Purpose | +|:-----|:--------| +| [DOC_04_Runtime.md](DOC_04_Runtime.md) | Runtime formatting, cache, function, and bundle support APIs | +| [DOC_04_RuntimeUtilities.md](DOC_04_RuntimeUtilities.md) | Locale utilities, constants, and runtime helpers | +| [DOC_04_Introspection.md](DOC_04_Introspection.md) | Message, locale, currency, and territory introspection APIs | +| [DOC_04_Analysis.md](DOC_04_Analysis.md) | Dependency-graph and cycle-analysis APIs | +| [DOC_05_Diagnostics.md](DOC_05_Diagnostics.md) | Diagnostics, validation results, and formatter APIs | +| [DOC_05_Errors.md](DOC_05_Errors.md) | Error and integrity exception types | + +### Guides + +| File | Purpose | +|:-----|:--------| +| [CUSTOM_FUNCTIONS_GUIDE.md](CUSTOM_FUNCTIONS_GUIDE.md) | Registering and using custom Fluent functions | +| [LOCALE_GUIDE.md](LOCALE_GUIDE.md) | Locale normalization, fallback, and orchestration behavior | +| [PARSING_GUIDE.md](PARSING_GUIDE.md) | Practical locale-aware parsing examples | +| [TYPE_HINTS_GUIDE.md](TYPE_HINTS_GUIDE.md) | Type-checking expectations and examples | +| [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) | Resource and message-schema validation workflow | +| [THREAD_SAFETY.md](THREAD_SAFETY.md) | Thread-safety guarantees and concurrency model | +| [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md) | Integrity model, cache evidence, and fail-closed boundaries | + +### Tooling And Operations + +| File | Purpose | +|:-----|:--------| +| [DEVELOPER_DEVCONTAINER.md](DEVELOPER_DEVCONTAINER.md) | Canonical contributor container workflow | +| [DOC_06_Testing.md](DOC_06_Testing.md) | Verification commands, docs validation, and example execution | +| [FUZZING_GUIDE.md](FUZZING_GUIDE.md) | Fuzzing overview and workflows | +| [FUZZING_GUIDE_ATHERIS.md](FUZZING_GUIDE_ATHERIS.md) | Atheris-specific fuzzing workflow | +| [FUZZING_GUIDE_HYPOFUZZ.md](FUZZING_GUIDE_HYPOFUZZ.md) | HypoFuzz-specific workflow | +| [RELEASE_PROTOCOL.md](RELEASE_PROTOCOL.md) | Release, packaging, and publication procedure | ## Routing Table @@ -25,6 +88,8 @@ route: | `ResourceLoadResult` | [DOC_01_Core.md](DOC_01_Core.md) | `ResourceLoadResult` | | `FallbackInfo` | [DOC_01_Core.md](DOC_01_Core.md) | `FallbackInfo` | | `LocalizationCacheStats` | [DOC_01_Core.md](DOC_01_Core.md) | `LocalizationCacheStats` | +| `UNLIMITED` | [DOC_02_Types.md](DOC_02_Types.md) | `UNLIMITED` | +| `UnlimitedLimit` | [DOC_02_Types.md](DOC_02_Types.md) | `UnlimitedLimit` | | `FluentNumber` | [DOC_02_Types.md](DOC_02_Types.md) | `FluentNumber` | | `FluentValue` | [DOC_02_Types.md](DOC_02_Types.md) | `FluentValue` | | `ParseResult` | [DOC_02_Types.md](DOC_02_Types.md) | `ParseResult` | @@ -108,8 +173,9 @@ route: | `select_plural_category` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `select_plural_category` | | `make_fluent_number` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `make_fluent_number` | | `clear_module_caches` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `clear_module_caches` | -| `CacheAuditLogEntry` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `CacheAuditLogEntry` | -| `WriteLogEntry` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `WriteLogEntry` | +| `CacheDebugLogEntry` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `CacheDebugLogEntry` | +| `CacheIntegrityEvent` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `CacheIntegrityEvent` | +| `CacheIntegrityEventKind` | [DOC_04_Runtime.md](DOC_04_Runtime.md) | `CacheIntegrityEventKind` | | `detect_cycles` | [DOC_04_Analysis.md](DOC_04_Analysis.md) | `detect_cycles` | | `entry_dependency_set` | [DOC_04_Analysis.md](DOC_04_Analysis.md) | `entry_dependency_set` | | `make_cycle_key` | [DOC_04_Analysis.md](DOC_04_Analysis.md) | `make_cycle_key` | @@ -175,22 +241,3 @@ route: | `scripts/fuzz_hypofuzz.sh` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/fuzz_hypofuzz.sh` | | `scripts/fuzz_atheris.sh` | [DOC_06_Testing.md](DOC_06_Testing.md) | `scripts/fuzz_atheris.sh` | | `pytest.mark.fuzz` | [DOC_06_Testing.md](DOC_06_Testing.md) | `pytest.mark.fuzz` | - -## Guide Links - -- [QUICK_REFERENCE.md](QUICK_REFERENCE.md) -- [CUSTOM_FUNCTIONS_GUIDE.md](CUSTOM_FUNCTIONS_GUIDE.md) -- [DATA_INTEGRITY_ARCHITECTURE.md](DATA_INTEGRITY_ARCHITECTURE.md) -- [DEVELOPER_DEVCONTAINER.md](DEVELOPER_DEVCONTAINER.md) -- [FUZZING_GUIDE.md](FUZZING_GUIDE.md) -- [FUZZING_GUIDE_ATHERIS.md](FUZZING_GUIDE_ATHERIS.md) -- [FUZZING_GUIDE_HYPOFUZZ.md](FUZZING_GUIDE_HYPOFUZZ.md) -- [LOCALE_GUIDE.md](LOCALE_GUIDE.md) -- [MIGRATION.md](MIGRATION.md) -- [PARSING_GUIDE.md](PARSING_GUIDE.md) -- [RELEASE_PROTOCOL.md](RELEASE_PROTOCOL.md) -- [TERMINOLOGY.md](TERMINOLOGY.md) -- [THREAD_SAFETY.md](THREAD_SAFETY.md) -- [TYPE_HINTS_GUIDE.md](TYPE_HINTS_GUIDE.md) -- [VALIDATION_GUIDE.md](VALIDATION_GUIDE.md) -- [WORKFLOW_TOUR.md](WORKFLOW_TOUR.md) diff --git a/docs/DOC_01_Core.md b/docs/DOC_01_Core.md index 0e2f129a..a5aa8056 100644 --- a/docs/DOC_01_Core.md +++ b/docs/DOC_01_Core.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: CORE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [FluentBundle, AsyncFluentBundle, FluentLocalization, LocalizationBootConfig, PathResourceLoader, LoadSummary, ResourceLoadResult, LocalizationCacheStats, require_clean, get_load_summary] questions: ["how do I format messages?", "how do I load multiple locales?", "how do I inspect localization load results?", "how do I boot localization safely?"] @@ -31,9 +31,11 @@ class FluentBundle: use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, strict: bool = True, ) -> None: ``` @@ -47,9 +49,14 @@ class FluentBundle: | `functions` | N | Custom function registry | | `max_source_size` | N | FTL input bound | | `max_nesting_depth` | N | Nesting safety bound | +| `max_parse_errors` | N | Junk-entry abort bound | +| `max_stream_line_length` | N | Stream line-length bound | | `max_expansion_size` | N | Expansion safety bound | | `strict` | N | Raise on integrity failures | +Security-limit note: +`LimitArg` fields fail closed on zero and negative integers. Use public `ftllexengine.UNLIMITED` only for an intentional opt-out. + ### Constraints - Return: Bundle with normalized locale and empty resource store - Raises: `ValueError` on invalid or unknown locale; `TypeError` on invalid registry @@ -75,9 +82,13 @@ class AsyncFluentBundle: use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, + max_workers: int = 4, + max_pending_operations: int = 16, strict: bool = True, ) -> None: ``` @@ -91,14 +102,21 @@ class AsyncFluentBundle: | `functions` | N | Custom function registry | | `max_source_size` | N | FTL input bound | | `max_nesting_depth` | N | Nesting safety bound | +| `max_parse_errors` | N | Junk-entry abort bound | +| `max_stream_line_length` | N | Stream line-length bound | | `max_expansion_size` | N | Expansion safety bound | +| `max_workers` | N | Owned worker-thread count | +| `max_pending_operations` | N | Async admission bound | | `strict` | N | Raise on integrity failures | +Security-limit note: +`LimitArg` fields fail closed on zero and negative integers. Use public `ftllexengine.UNLIMITED` only for an intentional opt-out. + ### Constraints - Return: Async wrapper around the same runtime semantics as `FluentBundle` - State: Delegates to an internal bundle instance - Thread: Safe -- Async: Formatting and mutation paths run through `asyncio.to_thread()` +- Async: Formatting, mutation, and lock-taking read paths run through one owned executor plus a bounded async admission gate - Availability: full-runtime only --- @@ -119,6 +137,10 @@ class FluentLocalization: use_isolating: bool = True, cache: CacheConfig | None = None, on_fallback: Callable[[FallbackInfo], None] | None = None, + max_source_size: LimitArg = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, strict: bool = True, ) -> None: ``` @@ -132,8 +154,15 @@ class FluentLocalization: | `use_isolating` | N | Enable bidi isolation | | `cache` | N | Per-bundle cache config | | `on_fallback` | N | Fallback callback hook | +| `max_source_size` | N | Per-resource decoded source bound | +| `max_parse_errors` | N | Per-resource Junk-entry abort bound | +| `max_stream_line_length` | N | Stream line-length bound | +| `max_expansion_size` | N | Per-format output bound | | `strict` | N | Raise on integrity failures | +Security-limit note: +`LimitArg` fields fail closed on zero and negative integers. Use public `ftllexengine.UNLIMITED` only for an intentional opt-out. + ### Constraints - Return: Multi-locale runtime with canonicalized locale chain - Raises: `ValueError` on empty locales, invalid or unknown locales, or inconsistent loader inputs @@ -199,6 +228,8 @@ Dataclass that loads FTL source from a locale-substituted path template. class PathResourceLoader: base_path: str root_dir: str | None = None + max_source_bytes: LimitArg = None + max_source_chars: LimitArg = None ``` ### Parameters @@ -206,6 +237,8 @@ class PathResourceLoader: |:-----|:----|:----------| | `base_path` | Y | Path template with `{locale}` | | `root_dir` | N | Root for path safety checks | +| `max_source_bytes` | N | Raw-byte read budget enforced before full decode | +| `max_source_chars` | N | Decoded-character budget enforced during streaming decode | ### Constraints - Raises: `ValueError` if `base_path` lacks `{locale}` @@ -317,16 +350,17 @@ class FallbackInfo: ## `LocalizationCacheStats` -Typed dict representing aggregate cache metrics across localization bundles. +Frozen dataclass representing aggregate cache metrics across localization bundles. ### Signature ```python -class LocalizationCacheStats(CacheStats, total=True): +@dataclass(frozen=True, slots=True) +class LocalizationCacheStats(CacheStats): bundle_count: int ``` ### Constraints - Purpose: Summarize per-locale cache state from `FluentLocalization.get_cache_stats()` - Fields: Includes all `CacheStats` fields aggregated across initialized bundles, plus `bundle_count` -- State: Read-only result object +- State: Immutable snapshot with mapping-style access via `CacheStats` - Availability: full-runtime only diff --git a/docs/DOC_02_SyntaxExpressions.md b/docs/DOC_02_SyntaxExpressions.md index 2c76bd6d..0d0fdb6c 100644 --- a/docs/DOC_02_SyntaxExpressions.md +++ b/docs/DOC_02_SyntaxExpressions.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: SYNTAX_EXPRESSIONS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [TextElement, Placeable, SelectExpression, VariableReference, FunctionReference, Entry, Expression] questions: ["which AST node types model Fluent expressions and references?", "what public syntax union aliases exist?", "where are placeables and selectors documented?"] diff --git a/docs/DOC_02_SyntaxTypes.md b/docs/DOC_02_SyntaxTypes.md index 69a13de1..226d56d9 100644 --- a/docs/DOC_02_SyntaxTypes.md +++ b/docs/DOC_02_SyntaxTypes.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: SYNTAX_TYPES -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [AST, Resource, Message, Term, Pattern, Span, Annotation, syntax nodes] questions: ["how is FTL represented in the AST?", "which public AST container and declaration node types exist?", "where are spans and parser annotations documented?"] diff --git a/docs/DOC_02_Types.md b/docs/DOC_02_Types.md index c63354d0..22248b59 100644 --- a/docs/DOC_02_Types.md +++ b/docs/DOC_02_Types.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: TYPES -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [FluentNumber, FluentValue, ParseResult, LocaleCode, CurrencyCode, TerritoryInfo, MessageIntrospection] questions: ["what public types does FTLLexEngine expose?", "what value types can formatting accept?", "which semantic aliases and lookup-result types exist?", "what introspection result types exist?"] @@ -12,6 +12,41 @@ route: --- +## `UNLIMITED` + +Explicit sentinel used to disable a supported security/resource limit intentionally. + +### Signature +```python +UNLIMITED: UnlimitedLimit +``` + +### Constraints +- Import: `from ftllexengine import UNLIMITED` +- Purpose: opt out of a `LimitArg` guard explicitly instead of using ambiguous magic values like `0` or `-1` +- Used by: `FluentParserV1`, `FluentBundle`, `AsyncFluentBundle`, and `FluentLocalization` limit parameters +- Invariant: compare by identity (`value is UNLIMITED`) rather than by value equality + +--- + +## `UnlimitedLimit` + +Marker type for the public `UNLIMITED` sentinel. + +### Signature +```python +@final +class UnlimitedLimit: + ... +``` + +### Constraints +- Import: `from ftllexengine import UnlimitedLimit` +- Purpose: gives the explicit opt-out sentinel a distinct static type in `LimitArg` +- Construction: callers should reuse the exported singleton `UNLIMITED` instead of creating their own instance + +--- + ## `FluentNumber` Immutable wrapper that keeps a numeric value, its rendered string, and visible precision together. diff --git a/docs/DOC_03_LocaleParsing.md b/docs/DOC_03_LocaleParsing.md index 9b29482e..ca2ea26e 100644 --- a/docs/DOC_03_LocaleParsing.md +++ b/docs/DOC_03_LocaleParsing.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: LOCALE_PARSING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [parse_decimal, parse_fluent_number, parse_date, parse_datetime, parse_currency, is_valid_decimal, clear_date_caches] questions: ["how do I parse localized numbers and dates?", "what do the locale-aware parse helpers return?", "which parsing type guards and cache-clear helpers are public?"] diff --git a/docs/DOC_03_Parsing.md b/docs/DOC_03_Parsing.md index 0bd3172c..cc8df662 100644 --- a/docs/DOC_03_Parsing.md +++ b/docs/DOC_03_Parsing.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: PARSING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [parse_ftl, serialize_ftl, validate_resource, FluentParserV1, Cursor, ASTVisitor, ASTTransformer, ParseError] questions: ["how do I parse FTL?", "what does validate_resource return?", "what syntax traversal helpers are public?", "where is the syntax parser API documented?"] @@ -124,9 +124,10 @@ class FluentParserV1: def __init__( self, *, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_parse_errors: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, ) -> None: ``` @@ -136,6 +137,10 @@ class FluentParserV1: | `max_source_size` | N | Input length bound | | `max_nesting_depth` | N | Nesting safety bound | | `max_parse_errors` | N | Recovery error bound | +| `max_stream_line_length` | N | Per-line stream bound enforced before buffering | + +Security-limit note: +`LimitArg` fields fail closed on zero and negative integers. Use public `ftllexengine.UNLIMITED` only for an intentional opt-out. ### Constraints - Return: Parser instance diff --git a/docs/DOC_04_Analysis.md b/docs/DOC_04_Analysis.md index 3a9fdba3..26ecd0d1 100644 --- a/docs/DOC_04_Analysis.md +++ b/docs/DOC_04_Analysis.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: ANALYSIS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [analysis, detect_cycles, entry_dependency_set, make_cycle_key, dependency graph, cycle key] questions: ["where are the dependency-graph helpers documented?", "how do I detect cycles in an FTL dependency graph?", "how do I build namespace-prefixed dependency sets?"] diff --git a/docs/DOC_04_Introspection.md b/docs/DOC_04_Introspection.md index 5c8effa9..d381369f 100644 --- a/docs/DOC_04_Introspection.md +++ b/docs/DOC_04_Introspection.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: INTROSPECTION -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [introspection, validate_message_variables, extract_variables, extract_references, ISO 4217, ISO 3166, get_currency, get_territory] questions: ["how do I inspect a message's variables and references?", "which ISO lookup helpers exist?", "how do I validate message-variable schemas?", "which Babel-backed introspection helpers are public?"] diff --git a/docs/DOC_04_Runtime.md b/docs/DOC_04_Runtime.md index f0de2e0b..6a0b8843 100644 --- a/docs/DOC_04_Runtime.md +++ b/docs/DOC_04_Runtime.md @@ -1,20 +1,20 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: RUNTIME -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [CacheConfig, FunctionRegistry, fluent_function, number_format, currency_format, select_plural_category, clear_module_caches] - questions: ["how do I configure runtime formatting?", "how do custom functions and registries work?", "where are cache config and write-log entry types documented?"] + questions: ["how do I configure runtime formatting?", "how do custom functions and registries work?", "where are cache debug and integrity event types documented?"] --- # Runtime Reference -This reference covers cache configuration, function registries, built-in formatters, plural selection, cache/audit entry types, and the root-level `clear_module_caches()` helper. +This reference covers cache configuration, function registries, built-in formatters, plural selection, cache debug/integrity evidence types, and the root-level `clear_module_caches()` helper. Runtime-adjacent utilities, validators, and package metadata constants are documented in [DOC_04_RuntimeUtilities.md](DOC_04_RuntimeUtilities.md). Parser-only facade note: -- `CacheConfig`, `FunctionRegistry`, `fluent_function`, `make_fluent_number`, `CacheAuditLogEntry`, `WriteLogEntry`, and `ValidationResult` remain importable in parser-only installs. +- `CacheConfig`, `FunctionRegistry`, `fluent_function`, `make_fluent_number`, `CacheDebugLogEntry`, `CacheIntegrityEvent`, `CacheIntegrityEventKind`, and `ValidationResult` remain importable in parser-only installs. - `create_default_registry`, `get_shared_registry`, `number_format`, `datetime_format`, `currency_format`, `select_plural_category`, `FluentBundle`, and `AsyncFluentBundle` require the full runtime install. In parser-only installs they resolve to lazy placeholders that raise `BabelImportError` on first use. - `clear_module_caches()` is a root-level helper that works in both parser-only and full-runtime installs. @@ -32,17 +32,21 @@ Dataclass that configures optional format-result caching. class CacheConfig: size: int = 1000 write_once: bool = False - integrity_strict: bool = True - enable_audit: bool = False - max_audit_entries: int = 10000 - max_entry_weight: int = 10000 + enable_debug_log: bool = False + max_debug_entries: int = 10000 + max_entry_payload_bytes: int = 10000 max_errors_per_entry: int = 50 + integrity_event_sink: IntegrityEventSink | None = None + debug_fingerprint_key: bytes | None = None ``` ### Constraints - Purpose: Single cache configuration object for bundle/localization runtime - State: Immutable - Thread: Safe +- Integrity boundary: Cache corruption, key confusion, and write-once conflicts are system + integrity failures regardless of `FluentBundle.strict`; formatting softness does not downgrade + cache-integrity exceptions into fallback results --- @@ -61,6 +65,7 @@ class FunctionRegistry: - State: Mutable until `freeze()` - Thread: Safe for normal runtime use after registration - Main methods: `register()`, `call()`, `get_callable()`, `list_functions()`, `copy()` +- Cache contract: `register(..., cacheable=False)` is the safe default for custom functions; opt in with `cacheable=True` only for pure functions whose output depends solely on the cache key inputs --- @@ -265,44 +270,87 @@ def clear_module_caches(components: frozenset[str] | None = None) -> None: --- -## `CacheAuditLogEntry` +## `CacheDebugLogEntry` -Public alias for the cache audit-log record type. +Immutable dataclass that represents one bounded cache debug-log record. ### Signature ```python -CacheAuditLogEntry = WriteLogEntry +@dataclass(frozen=True, slots=True) +class CacheDebugLogEntry: + operation: str + key_fingerprint: str + timestamp_monotonic: float + wall_time_unix: float + debug_sequence: int + cache_sequence: int + cache_generation: int + checksum_hex: str ``` ### Constraints -- Purpose: Stable public alias returned by bundle/localization cache-audit APIs -- Underlying type: `WriteLogEntry` -- Import: `from ftllexengine.runtime import CacheAuditLogEntry` or `from ftllexengine.localization import CacheAuditLogEntry` +- Purpose: Recent-operation debug evidence for hits, misses, puts, evictions, and write-once outcomes +- State: Immutable +- Thread: Safe +- Import: `from ftllexengine.runtime import CacheDebugLogEntry` or `from ftllexengine.localization import CacheDebugLogEntry` +- `debug_sequence`: Monotonic debug-ring order across cache operations +- `cache_sequence`: Cache-entry sequence observed at the time of the event +- `cache_generation`: Cache-clear generation active when the event was recorded +- `key_fingerprint`: Keyed privacy-preserving fingerprint, not the raw cache key --- -## `WriteLogEntry` +## `CacheIntegrityEvent` -Immutable dataclass that represents one cache audit-log record. +Immutable dataclass that represents one critical cache-integrity event. ### Signature ```python @dataclass(frozen=True, slots=True) -class WriteLogEntry: - operation: str - key_hash: str - timestamp: float - sequence: int +class CacheIntegrityEvent: + kind: CacheIntegrityEventKind + message_id: str + locale_code: str + attribute: str | None + use_isolating: bool + key_fingerprint: str | None + event_sequence: int cache_sequence: int - checksum_hex: str + cache_generation: int + correlation_id: str | None + thread_id: int + task_name: str | None + detail: str + timestamp_monotonic: float wall_time_unix: float ``` ### Constraints -- Purpose: Underlying runtime cache dataclass behind the `CacheAuditLogEntry` public alias +- Purpose: Critical evidence for corruption, write conflicts, key-contract failures, and immediate verification failures - State: Immutable - Thread: Safe -- `sequence`: Monotonic audit-event order across all cache operations -- `cache_sequence`: Cache-entry sequence observed at the time of the event +- Import: `from ftllexengine.runtime import CacheIntegrityEvent` or `from ftllexengine.localization import CacheIntegrityEvent` + +--- + +## `CacheIntegrityEventKind` + +String enum that classifies critical cache-integrity event types. + +### Signature +```python +class CacheIntegrityEventKind(StrEnum): + ENTRY_CORRUPTION = "entry_corruption" + KEY_CONFUSION = "key_confusion" + WRITE_CONFLICT = "write_conflict" + KEY_SERIALIZATION_FAILED = "key_serialization_failed" + ENTRY_VERIFICATION_FAILED = "entry_verification_failed" +``` + +### Constraints +- Purpose: Stable event-category vocabulary for `CacheIntegrityEvent.kind` +- State: Immutable enum values +- Thread: Safe +- Import: `from ftllexengine.runtime import CacheIntegrityEventKind` --- diff --git a/docs/DOC_04_RuntimeUtilities.md b/docs/DOC_04_RuntimeUtilities.md index 67ad24ba..d973274f 100644 --- a/docs/DOC_04_RuntimeUtilities.md +++ b/docs/DOC_04_RuntimeUtilities.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: RUNTIME_UTILITIES -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [normalize_locale, get_system_locale, require_locale_code, __version__, require_date, require_datetime, require_fluent_number] questions: ["where are root-level runtime utility exports documented?", "what package metadata constants are public?", "which boundary validators and locale helpers are exported from the root package?"] @@ -11,7 +11,7 @@ route: # Runtime Utilities Reference This reference covers root-level runtime-adjacent utilities, package metadata constants, locale helpers, and boundary validators. -Formatting functions, registries, cache configuration, and audit entry types live in [DOC_04_Runtime.md](DOC_04_Runtime.md). Dependency-graph helpers live in [DOC_04_Analysis.md](DOC_04_Analysis.md). +Formatting functions, registries, cache configuration, cache debug-log and integrity-event types live in [DOC_04_Runtime.md](DOC_04_Runtime.md). Dependency-graph helpers live in [DOC_04_Analysis.md](DOC_04_Analysis.md). ## `normalize_locale` diff --git a/docs/DOC_05_Diagnostics.md b/docs/DOC_05_Diagnostics.md index 6731725a..f06e6e39 100644 --- a/docs/DOC_05_Diagnostics.md +++ b/docs/DOC_05_Diagnostics.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: DIAGNOSTICS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [ParserAnnotation, ValidationResult, ValidationError, ValidationWarning, DiagnosticCode, DiagnosticFormatter, OutputFormat, SourceSpan] questions: ["what validation result types exist?", "how do I format diagnostics output?", "where are diagnostic codes and source spans documented?"] @@ -75,6 +75,7 @@ class ValidationWarning: - Import: `from ftllexengine.diagnostics import ValidationWarning` - Produced by: `validate_resource()` - Formatting helper: `.format()` delegates to `DiagnosticFormatter` +- Severity: `WarningSeverity.CRITICAL` marks a fail-closed semantic violation and contributes to `ValidationResult.critical_warning_count` --- @@ -114,9 +115,10 @@ class ValidationResult: ### Constraints - Import: `from ftllexengine.diagnostics import ValidationResult` - Produced by: `validate_resource()` -- Properties: `is_valid`, `error_count`, `warning_count`, `annotation_count` +- Properties: `is_valid`, `error_count`, `warning_count`, `annotation_count`, `critical_warning_count` - Factories: `valid()`, `invalid()`, `from_annotations()` - Annotation contract: stores any object satisfying `ParserAnnotation`; parser AST `Annotation` nodes are the common implementation +- Validity rule: `is_valid` is `False` when errors, parser annotations, or critical semantic warnings are present - Formatting helper: `.format()` delegates to `DiagnosticFormatter` --- diff --git a/docs/DOC_05_Errors.md b/docs/DOC_05_Errors.md index 672a8b3a..0cad65a0 100644 --- a/docs/DOC_05_Errors.md +++ b/docs/DOC_05_Errors.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: ERRORS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [FrozenFluentError, ErrorCategory, FrozenErrorContext, DataIntegrityError, BabelImportError, ErrorTemplate] questions: ["what errors does FTLLexEngine expose?", "how do parse and format failures surface?", "what integrity exceptions exist?", "how does missing Babel surface?"] diff --git a/docs/DOC_06_Testing.md b/docs/DOC_06_Testing.md index d0dbb2bc..5a8eef47 100644 --- a/docs/DOC_06_Testing.md +++ b/docs/DOC_06_Testing.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: TESTING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [testing, lint, pytest, fuzz, HypoFuzz, Atheris, test.sh, lint.sh, check.sh, devcontainer] questions: ["how do I run lint and tests?", "what is the fuzz marker for?", "which scripts drive testing?", "how do I validate the contributor container?"] @@ -22,7 +22,7 @@ Repository script that validates runnable Markdown examples against the live pac ### Signature ```bash -uv run python scripts/validate_docs.py +uv run --group dev --python 3.14 python scripts/validate_docs.py ``` ### Constraints @@ -40,7 +40,7 @@ Repository script that enforces package-version sync across code, metadata, and ### Signature ```bash -uv run python scripts/validate_version.py +uv run --group dev --python 3.14 python scripts/validate_version.py ``` ### Constraints diff --git a/docs/FUZZING_GUIDE.md b/docs/FUZZING_GUIDE.md index 75f06de1..1cded070 100644 --- a/docs/FUZZING_GUIDE.md +++ b/docs/FUZZING_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: FUZZING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [fuzzing, HypoFuzz, Atheris, Hypothesis, fuzz_hypofuzz.sh, fuzz_atheris.sh] questions: ["which fuzzer should I use?", "how do I start fuzzing?", "how do I reproduce a fuzz failure?"] diff --git a/docs/FUZZING_GUIDE_ATHERIS.md b/docs/FUZZING_GUIDE_ATHERIS.md index 16ec454b..89dbdfb3 100644 --- a/docs/FUZZING_GUIDE_ATHERIS.md +++ b/docs/FUZZING_GUIDE_ATHERIS.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: FUZZING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [atheris, libfuzzer, fuzz_atheris.sh, replay, minimize, corpus] questions: ["how do I run an Atheris target?", "how do I replay a finding?", "how does the Atheris environment get created?"] diff --git a/docs/FUZZING_GUIDE_HYPOFUZZ.md b/docs/FUZZING_GUIDE_HYPOFUZZ.md index 255ca3d5..778e3145 100644 --- a/docs/FUZZING_GUIDE_HYPOFUZZ.md +++ b/docs/FUZZING_GUIDE_HYPOFUZZ.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: FUZZING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [hypofuzz, hypothesis, fuzz_hypofuzz.sh, deep mode, preflight, repro] questions: ["how do I run HypoFuzz?", "what does --deep do?", "how do I reproduce a Hypothesis failure?"] diff --git a/docs/LOCALE_GUIDE.md b/docs/LOCALE_GUIDE.md index 892bc279..c149c767 100644 --- a/docs/LOCALE_GUIDE.md +++ b/docs/LOCALE_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: LOCALE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [locale, NUMBER, DATETIME, CURRENCY, normalize_locale, get_system_locale, use_isolating] questions: ["why did my number not format?", "what locale string should I use?", "what does use_isolating do?"] diff --git a/docs/MIGRATION.md b/docs/MIGRATION.md index ebd338be..d62c533e 100644 --- a/docs/MIGRATION.md +++ b/docs/MIGRATION.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: MIGRATION -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [migration, fluent.runtime, FluentBundle, FluentLocalization, strict mode] questions: ["how do I migrate from fluent.runtime?", "what changes when I switch to FTLLexEngine?"] diff --git a/docs/PARSING_GUIDE.md b/docs/PARSING_GUIDE.md index a61fcf72..fd91bda7 100644 --- a/docs/PARSING_GUIDE.md +++ b/docs/PARSING_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: PARSING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [parsing, parse_decimal, parse_currency, parse_date, parse_datetime, parse_fluent_number] questions: ["how do I parse localized user input?", "how do I do roundtrip formatting and parsing?", "what do parse errors look like?"] diff --git a/docs/QUICK_REFERENCE.md b/docs/QUICK_REFERENCE.md index 7f76ed58..0c610677 100644 --- a/docs/QUICK_REFERENCE.md +++ b/docs/QUICK_REFERENCE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: REFERENCE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [quick reference, cheat sheet, fluentbundle, fluentlocalization, parsing, validation, boot, strict mode] questions: ["show me the common patterns", "smallest working example", "how do I boot localization safely?", "strict vs soft mode"] @@ -133,6 +133,10 @@ assert result.is_valid assert result.error_count == 0 ``` +If validation returns duplicate IDs or other critical semantic warnings, `result.is_valid` is +`False` even when `error_count == 0`. Check `critical_warning_count` when you need to distinguish +syntax failures from fail-closed semantic violations. + --- ## Boot validation diff --git a/docs/RELEASE_PROTOCOL.md b/docs/RELEASE_PROTOCOL.md index c355177a..57a6d341 100644 --- a/docs/RELEASE_PROTOCOL.md +++ b/docs/RELEASE_PROTOCOL.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: RELEASE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [release, gh, github release, pypi, tag, assets, publish, verify, clone, main] questions: ["how do I cut a release?", "how do I publish GitHub assets?", "how do I verify a release handoff?", "how do I rerun publish for an existing tag?"] @@ -166,10 +166,13 @@ Also confirm: Do not cut the release branch or tag anything while any gate is red. -## Step 3: Release Branch And Staging Checkpoint +## Step 3: Release Branch And Scope Checkpoint -Create the release branch and treat staging as a scope-verification checkpoint for the delta from -the branch point: +Choose the branch-cut flow that matches the state proven in Step 2: + +### Flow A: Pre-flight ran on `origin/main` or on a partially finalized bootstrap payload + +Use this flow when the release payload is not yet captured in one final bootstrap commit. ```bash git switch -c release/X.Y.Z @@ -192,6 +195,37 @@ Requirements before continuing: If the staged diff is incomplete or polluted, fix the branch before committing. +### Flow B: Pre-flight ran on a detached bootstrap payload that already contains the full final release tree + +Use this flow when Step 2 already proved the exact tree you intend to release and there is no +further release-finalization delta to stage. The payload may be one bootstrap commit or a short +bootstrap-only commit range created while refining release docs or metadata inside the clean clone. +In this case the cleanest history is to replay that proven payload onto a fresh `release/X.Y.Z` +branch rooted at current `origin/main` and collapse it into one canonical release commit: + +```bash +BOOTSTRAP_HEAD="$(git rev-parse HEAD)" +git switch --detach origin/main +git switch -c release/X.Y.Z +git cherry-pick --no-commit "origin/main..$BOOTSTRAP_HEAD" +git commit -m "release: bump version to X.Y.Z" +git status --short +git show --stat --summary --format=fuller HEAD +git diff --name-status origin/main...HEAD +git push origin release/X.Y.Z +``` + +Requirements before continuing: + +- `git status --short` is empty after the cherry-pick and commit. +- `git show --stat --summary --format=fuller HEAD` confirms the release branch contains exactly + one intentional release commit. +- `git diff --name-status origin/main...HEAD` matches the full intended release file set. +- Step 4 PR diff review remains the authoritative scope checkpoint against `origin/main`. + +If the cherry-pick does not replay cleanly or the diff is polluted, stop and repair the release +branch before opening the PR. + ## Step 4: Pull Request And CI Checkpoint Open the pull request: @@ -352,6 +386,17 @@ Use `uv` for this installability check even if the host shell does not expose a `python3.13` binary. The release verifier must exercise a real Python 3.13 environment, but it does not need to come from the system PATH. +Also verify the negative packaging floor explicitly. Python 3.12 is intentionally unsupported, so +the release smoke check must confirm that the published metadata rejects it: + +```bash +uv venv --python 3.12 --seed "$TMP_DIR/py312" +if "$TMP_DIR/py312/bin/pip" install --no-cache-dir "ftllexengine==X.Y.Z"; then + echo "Unexpected Python 3.12 install success" + exit 1 +fi +``` + The release is not complete until the release object, assets, and real install test all succeed. ## Step 8: Branch And Checkout Hygiene diff --git a/docs/TERMINOLOGY.md b/docs/TERMINOLOGY.md index 18c9ff1c..3342be34 100644 --- a/docs/TERMINOLOGY.md +++ b/docs/TERMINOLOGY.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: TERMINOLOGY -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [terminology, glossary, message, term, resource, locale code, strict mode] questions: ["what does resource mean here?", "what is the difference between a message and a term?", "what does strict mode mean in FTLLexEngine?"] diff --git a/docs/THREAD_SAFETY.md b/docs/THREAD_SAFETY.md index 10680192..52836aa4 100644 --- a/docs/THREAD_SAFETY.md +++ b/docs/THREAD_SAFETY.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: ARCHITECTURE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [thread safety, concurrency, FluentBundle, FluentLocalization, AsyncFluentBundle, shared bundle] questions: ["is FluentBundle thread-safe?", "can I share a localization object across threads?", "what does AsyncFluentBundle do?"] @@ -23,10 +23,12 @@ These guarantees come from the runtime's own synchronization boundaries. They ar - Share a `FluentBundle` across threads when all requests use the same locale. - Share a `FluentLocalization` across threads when the locale fallback chain is fixed. -- Use `AsyncFluentBundle` in asyncio handlers when you want bundle work offloaded through `asyncio.to_thread()`. +- Use `AsyncFluentBundle` in asyncio handlers when you want bundle work offloaded through its owned worker pool and bounded async admission gate. - Treat custom functions as external code: if they share mutable process state outside the bundle, that state still needs its own synchronization. -- Do not try to mutate a bundle from inside a custom function triggered by that same bundle’s formatting call. +- Do not try to mutate or re-enter a bundle from a new thread inside a custom function triggered by that same bundle’s formatting call. ## Async -`AsyncFluentBundle` is not a separate resolver implementation. It wraps the same runtime behavior in an async-facing API and delegates the heavy work to worker threads so the event loop stays responsive. +`AsyncFluentBundle` is not a separate resolver implementation. It wraps the same runtime behavior in an async-facing API, owns its executor lifecycle, and bounds queued work so event-loop callers have an explicit concurrency contract instead of ambient `asyncio.to_thread()` behavior. + +The repository verifies these guarantees against both the normal supported interpreter set and a dedicated Python 3.13 free-threaded lane in CI. diff --git a/docs/TYPE_HINTS_GUIDE.md b/docs/TYPE_HINTS_GUIDE.md index 569dd52b..2684f59f 100644 --- a/docs/TYPE_HINTS_GUIDE.md +++ b/docs/TYPE_HINTS_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: TYPE_HINTS -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [type hints, mypy, FluentValue, ParseResult, TypeIs, LocaleCode] questions: ["what types does the library expose?", "how do I type parse results?", "which helpers are type guards?"] diff --git a/docs/VALIDATION_GUIDE.md b/docs/VALIDATION_GUIDE.md index d1acff5b..a73fb355 100644 --- a/docs/VALIDATION_GUIDE.md +++ b/docs/VALIDATION_GUIDE.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: VALIDATION -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [validation, validate_resource, ValidationResult, require_clean, boot validation, message schemas] questions: ["how do I validate FTL before loading it?", "how do I fail fast at startup?", "how do I validate message variables?"] @@ -29,8 +29,15 @@ assert result.warning_count == 0 `ValidationResult` separates: - `errors`: structural or syntax validation failures. -- `warnings`: semantic problems such as unresolved references. +- `warnings`: semantic problems such as unresolved references or duplicate IDs. - `annotations`: parser-level annotations recovered from junk input. +- `critical_warning_count`: semantic warnings that fail validation closed because runtime loading or strict boot checks reject them. + +`result.is_valid` is `False` when any of these blocking conditions exist: + +- `errors` +- `annotations` +- critical warnings such as duplicate IDs, invalid shadows, or undefined required references ## Loaded-Resource Validation diff --git a/docs/WORKFLOW_TOUR.md b/docs/WORKFLOW_TOUR.md index d1f90bc7..08fb9ef0 100644 --- a/docs/WORKFLOW_TOUR.md +++ b/docs/WORKFLOW_TOUR.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: GUIDE -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [workflow tour, multi-locale, bidirectional parsing, boot validation, thread safety, async, introspection, streaming] questions: ["how do I use FTLLexEngine end-to-end?", "multi-locale formatting example", "how do I parse localized user input?", "boot validation example", "thread-safe formatting", "async bundle example"] @@ -269,7 +269,7 @@ Multiple threads can format messages simultaneously. Adding resources or functio ## Use async bundles in event-loop applications -`AsyncFluentBundle` keeps the same strict-mode guarantees but offloads mutations and formatting through `asyncio.to_thread()`, keeping the event loop free. +`AsyncFluentBundle` keeps the same strict-mode guarantees but offloads mutations and formatting through an owned worker pool plus a bounded async admission gate, keeping the event loop free and the queue size explicit. ```python import asyncio diff --git a/examples/README.md b/examples/README.md index e8622431..c3a526f2 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: EXAMPLES -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [examples, quickstart, parser-only, localization, custom functions, thread safety, streaming, benchmarks] questions: ["what examples are available?", "how do I run the examples?", "which example should I start with?", "which example covers streaming or parsing?"] diff --git a/examples/README_TYPE_CHECKING.md b/examples/README_TYPE_CHECKING.md index f9b9274c..768a4bb2 100644 --- a/examples/README_TYPE_CHECKING.md +++ b/examples/README_TYPE_CHECKING.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: EXAMPLES -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [examples, mypy, type checking, strict, explicit ownership, thread safety] questions: ["how do I type-check the examples?", "what mypy config do the examples use?", "how do the examples stay strict without local stubs?"] diff --git a/examples/async_bundle.py b/examples/async_bundle.py index dd68dd1b..03146c77 100644 --- a/examples/async_bundle.py +++ b/examples/async_bundle.py @@ -61,7 +61,7 @@ async def example_stream_loading() -> None: bundle = AsyncFluentBundle("en_US", use_isolating=False) junk = await bundle.add_resource_stream(["hello = Hello!\n", "status = Ready\n"]) assert junk == () - assert bundle.has_message("hello") + assert await bundle.has_message("hello") status, errors = await bundle.format_pattern("status") assert errors == () diff --git a/examples/parser_only.py b/examples/parser_only.py index 4709610c..c51c7f19 100644 --- a/examples/parser_only.py +++ b/examples/parser_only.py @@ -139,7 +139,7 @@ def example_3_validation() -> None: assert result.warning_count == 0 print(f"Valid FTL: is_valid={result.is_valid}, errors={result.error_count}") - # Warning-only FTL: semantic issues do not change is_valid + # Critical semantic warnings now fail validation just like syntax errors. warning_ftl = """ greeting = Hello, { $name }! greeting = Duplicate ID! @@ -147,12 +147,13 @@ def example_3_validation() -> None: """ result = validate_resource(warning_ftl) - assert result.is_valid + assert not result.is_valid + assert result.critical_warning_count == 2 assert result.warning_count == 2 print(f"Warning-only FTL: is_valid={result.is_valid}, warnings={result.warning_count}") for warning in result.warnings: print(f" - {warning.code.name}: {warning.message}") - print("[PASS] Warning-only validation semantics verified") + print("[PASS] Critical warning validation semantics verified") # Invalid syntax FTL: parser annotations make the result invalid invalid_syntax_ftl = """ diff --git a/examples/quickstart.py b/examples/quickstart.py index 99dfe2c7..29219be4 100644 --- a/examples/quickstart.py +++ b/examples/quickstart.py @@ -386,8 +386,8 @@ def describe_path(self, locale: str, resource_id: str) -> str: # Financial applications can use cache security features: # - write_once: Prevents cache overwrites (data race prevention) -# - integrity_strict: Raise on cache corruption/write conflicts -# - enable_audit: Maintains audit trail of cache operations +# - integrity events: Corruption and key-contract failures always raise +# - enable_debug_log: Keeps a bounded recent-operation ring for local inspection # - strict (bundle): Raises exceptions on formatting errors financial_bundle = FluentBundle( @@ -395,9 +395,8 @@ def describe_path(self, locale: str, resource_id: str) -> str: use_isolating=False, cache=CacheConfig( write_once=True, # Prevent data races - integrity_strict=True, # Raise on corruption (default) - enable_audit=True, # Compliance audit trail - max_entry_weight=5000, # Memory protection + enable_debug_log=True, # Local recent-operation debug ring + max_entry_payload_bytes=5000, # Retained payload budget max_errors_per_entry=10, # Error bloat protection ), strict=True, # Fail-fast on ANY formatting error @@ -424,25 +423,24 @@ def describe_path(self, locale: str, resource_id: str) -> str: cfg = financial_bundle.cache_config if cfg is not None: print(f" write_once: {cfg.write_once}") - print(f" integrity_strict: {cfg.integrity_strict}") - print(f" audit_enabled: {cfg.enable_audit}") - print(f" max_entry_weight: {cfg.max_entry_weight}") + print(f" debug_log_enabled: {cfg.enable_debug_log}") + print(f" max_entry_payload_bytes: {cfg.max_entry_payload_bytes}") print(f" max_errors_per_entry: {cfg.max_errors_per_entry}") -# Get cache stats and audit trail +# Get cache stats and recent debug ring stats = financial_bundle.get_cache_stats() if stats: - print(f" audit_entries: {stats.get('audit_entries', 0)}") + print(f" debug_log_entries: {stats.get('debug_log_entries', 0)}") print(f" cache_hits: {stats.get('hits', 0)}") print(f" cache_misses: {stats.get('misses', 0)}") -audit_log = financial_bundle.get_cache_audit_log() -if audit_log is not None: - print(f" audit_log_entries: {len(audit_log)}") - if audit_log: - latest_entry = audit_log[-1] - print(f" latest_audit_operation: {latest_entry.operation}") - print(f" latest_audit_sequence: {latest_entry.sequence}") +debug_log = financial_bundle.get_cache_debug_log() +if debug_log is not None: + print(f" debug_log_entries: {len(debug_log)}") + if debug_log: + latest_entry = debug_log[-1] + print(f" latest_debug_operation: {latest_entry.operation}") + print(f" latest_debug_sequence: {latest_entry.debug_sequence}") print(f" latest_cache_sequence: {latest_entry.cache_sequence}") print("\n" + "=" * 50) diff --git a/fuzz_atheris/README.md b/fuzz_atheris/README.md index 6713eb9b..547d0f50 100644 --- a/fuzz_atheris/README.md +++ b/fuzz_atheris/README.md @@ -1,8 +1,8 @@ --- afad: "4.0" -version: "0.166.0" +version: "0.167.0" domain: FUZZING -updated: "2026-05-01" +updated: "2026-05-15" route: keywords: [atheris, fuzz inventory, fuzz targets, libfuzzer, corpus] questions: ["what do the Atheris fuzzers cover?", "which targets exist?", "how do I map a target name to a file?"] @@ -18,7 +18,7 @@ The executable target registry lives in `targets.tsv`. This table is the human-r |:-------|:-----|:--------| | `bridge` | `fuzz_bridge.py` | FunctionRegistry bridge machinery | | `builtins` | `fuzz_builtins.py` | Built-in function Babel boundary | -| `cache` | `fuzz_cache.py` | Cache concurrency and audit behavior | +| `cache` | `fuzz_cache.py` | Cache concurrency plus debug-log and integrity-event behavior | | `currency` | `fuzz_currency.py` | Currency formatting oracle | | `cursor` | `fuzz_cursor.py` | Cursor and parse-position helpers | | `dates` | `fuzz_dates.py` | Locale-aware date and datetime parsing | diff --git a/fuzz_atheris/fuzz_cache.py b/fuzz_atheris/fuzz_cache.py index f72f34a7..29c6f8f1 100644 --- a/fuzz_atheris/fuzz_cache.py +++ b/fuzz_atheris/fuzz_cache.py @@ -4,8 +4,9 @@ Targets: ftllexengine.runtime.cache (via FluentBundle public API) Concern boundary: This fuzzer stress-tests the cache subsystem by systematically -varying all cache constructor parameters (size, entry weight, error limits, -write-once, audit mode) under concurrent multi-threaded access. This is distinct +varying all cache constructor parameters (size, payload-byte budget, error +limits, write-once, debug-log mode) under concurrent multi-threaded access. +This is distinct from the runtime fuzzer which tests the full resolver stack with fixed cache configs and only 2 threads. @@ -17,7 +18,7 @@ - Frozen bundle cache behavior - Cache key complexity (deeply nested args via _make_hashable) - Hotspot access patterns (same entry repeated) -- Memory weight enforcement +- Payload-byte budget enforcement Patterns: - variable_messages: Cache key variation @@ -25,7 +26,7 @@ - select_expressions: Complex pattern caching - message_references: Cross-message resolution cache - term_references: Namespace variation -- long_values: Memory weight stress +- long_values: Payload budget stress - many_variables: Key complexity - circular_refs: Error caching behavior - minimal_resource: Edge cases @@ -105,7 +106,7 @@ class CacheMetrics: concurrent_modify_tests: int = 0 frozen_cache_tests: int = 0 eviction_stress_tests: int = 0 - audit_log_checks: int = 0 + debug_log_checks: int = 0 # Hit rate tracking (rolling) cache_hits: int = 0 @@ -184,7 +185,7 @@ def _build_stats_dict() -> dict[str, Any]: stats["concurrent_modify_tests"] = _domain.concurrent_modify_tests stats["frozen_cache_tests"] = _domain.frozen_cache_tests stats["eviction_stress_tests"] = _domain.eviction_stress_tests - stats["audit_log_checks"] = _domain.audit_log_checks + stats["debug_log_checks"] = _domain.debug_log_checks stats["thread_timeouts"] = _domain.thread_timeouts stats["max_threads_used"] = _domain.max_threads_used @@ -229,7 +230,7 @@ def _emit_report() -> None: WriteConflictError, ) from ftllexengine.runtime.bundle import FluentBundle - from ftllexengine.runtime.cache import WriteLogEntry + from ftllexengine.runtime.cache import CacheDebugLogEntry from ftllexengine.runtime.cache_config import CacheConfig @@ -241,78 +242,82 @@ def _emit_report() -> None: _MSG_IDS: Sequence[str] = tuple(f"msg{i}" for i in range(20)) _ATTR_NAMES: Sequence[str] = ("tooltip", "aria-label", "placeholder", "title") -_VALID_AUDIT_OPERATIONS: frozenset[str] = frozenset({ +_VALID_DEBUG_OPERATIONS: frozenset[str] = frozenset({ "MISS", "PUT", "HIT", "EVICT", "CORRUPTION", + "KEY_CONFUSION", "WRITE_ONCE_IDEMPOTENT", "WRITE_ONCE_CONFLICT", + "ENTRY_VERIFICATION_FAILED", + "BYPASS_NONCACHEABLE_FUNCTION", }) -def _validate_cache_audit_entry( - entry: WriteLogEntry, +def _validate_cache_debug_entry( + entry: CacheDebugLogEntry, *, - last_timestamp: float, - last_sequence: int, + last_monotonic_timestamp: float, + last_debug_sequence: int, ) -> tuple[float, int]: - """Validate one audit-log entry and return timestamp plus audit sequence.""" - if entry.operation not in _VALID_AUDIT_OPERATIONS: - msg = f"Unexpected audit operation {entry.operation!r}" + """Validate one debug-log entry and return monotonic time plus sequence.""" + if entry.operation not in _VALID_DEBUG_OPERATIONS: + msg = f"Unexpected debug-log operation {entry.operation!r}" raise CacheFuzzError(msg) - if not entry.key_hash: - msg = "Audit log entry contained an empty key hash" + if not entry.key_fingerprint: + msg = "Debug log entry contained an empty key fingerprint" raise CacheFuzzError(msg) - if entry.timestamp < last_timestamp: + if entry.timestamp_monotonic < last_monotonic_timestamp: msg = ( - "Audit log timestamps must be non-decreasing: " - f"{last_timestamp} -> {entry.timestamp}" + "Debug-log timestamps must be non-decreasing: " + f"{last_monotonic_timestamp} -> {entry.timestamp_monotonic}" ) raise CacheFuzzError(msg) - if entry.sequence <= last_sequence: + if entry.debug_sequence <= last_debug_sequence: msg = ( - "Audit log sequences must be strictly increasing: " - f"{last_sequence} -> {entry.sequence}" + "Debug-log sequences must be strictly increasing: " + f"{last_debug_sequence} -> {entry.debug_sequence}" ) raise CacheFuzzError(msg) - # wall_time_unix is a Unix timestamp (time.time()); must be a positive float. - # It is the wall-clock companion to the monotonic timestamp field. if not isinstance(entry.wall_time_unix, float): msg = ( - f"WriteLogEntry.wall_time_unix must be float, " + f"CacheDebugLogEntry.wall_time_unix must be float, " f"got {type(entry.wall_time_unix).__name__!r}" ) raise CacheFuzzError(msg) if entry.wall_time_unix <= 0: - msg = f"WriteLogEntry.wall_time_unix must be positive, got {entry.wall_time_unix}" + msg = ( + "CacheDebugLogEntry.wall_time_unix must be positive, " + f"got {entry.wall_time_unix}" + ) raise CacheFuzzError(msg) if entry.operation == "MISS": if entry.checksum_hex != "" or entry.cache_sequence < 0: msg = ( - "MISS audit entries must have empty checksum and " + "MISS debug entries must have empty checksum and " "non-negative cache_sequence" ) raise CacheFuzzError(msg) - return entry.timestamp, entry.sequence + return entry.timestamp_monotonic, entry.debug_sequence if entry.checksum_hex == "" or entry.cache_sequence <= 0: msg = ( - f"{entry.operation} audit entries must carry a positive cache_sequence " + f"{entry.operation} debug entries must carry a positive cache_sequence " "and non-empty checksum" ) raise CacheFuzzError(msg) - return entry.timestamp, entry.sequence + return entry.timestamp_monotonic, entry.debug_sequence def _collect_cache_observability( bundle: FluentBundle, *, - enable_audit: bool, + enable_debug_log: bool, ) -> None: - """Accumulate cache stats and validate public audit-log accessors. + """Accumulate cache stats and validate public debug-log accessors. Each iteration creates a new FluentBundle/cache, so stats are per-iteration deltas that get accumulated into _domain totals. @@ -327,47 +332,47 @@ def _collect_cache_observability( _domain.oversize_skip_counts += int(stats.get("oversize_skips", 0)) _domain.error_bloat_counts += int(stats.get("error_bloat_skips", 0)) _domain.corruption_events += int(stats.get("corruption_detected", 0)) - _domain.audit_log_checks += 1 + _domain.debug_log_checks += 1 - audit_log = bundle.get_cache_audit_log() - if audit_log is None: - msg = "Cache-enabled bundle returned None from get_cache_audit_log()" + debug_log = bundle.get_cache_debug_log() + if debug_log is None: + msg = "Cache-enabled bundle returned None from get_cache_debug_log()" raise CacheFuzzError(msg) - if not isinstance(audit_log, tuple): - msg = f"get_cache_audit_log() returned {type(audit_log).__name__}, expected tuple" + if not isinstance(debug_log, tuple): + msg = f"get_cache_debug_log() returned {type(debug_log).__name__}, expected tuple" raise CacheFuzzError(msg) - if bool(stats.get("audit_enabled", False)) != enable_audit: + if bool(stats.get("debug_log_enabled", False)) != enable_debug_log: msg = ( - "get_cache_stats()['audit_enabled'] disagrees with CacheConfig: " - f"{stats.get('audit_enabled')} vs {enable_audit}" + "get_cache_stats()['debug_log_enabled'] disagrees with CacheConfig: " + f"{stats.get('debug_log_enabled')} vs {enable_debug_log}" ) raise CacheFuzzError(msg) - if len(audit_log) != int(stats.get("audit_entries", 0)): + if len(debug_log) != int(stats.get("debug_log_entries", 0)): msg = ( - "get_cache_audit_log() length disagrees with cache stats: " - f"{len(audit_log)} vs {stats.get('audit_entries')}" + "get_cache_debug_log() length disagrees with cache stats: " + f"{len(debug_log)} vs {stats.get('debug_log_entries')}" ) raise CacheFuzzError(msg) - if not enable_audit: - if audit_log != (): - msg = "Audit-disabled cache returned non-empty audit log" + if not enable_debug_log: + if debug_log != (): + msg = "Debug-log-disabled cache returned non-empty debug log" raise CacheFuzzError(msg) return last_timestamp = float("-inf") last_sequence = 0 - for entry in audit_log: - if not isinstance(entry, WriteLogEntry): - msg = "get_cache_audit_log() returned non-WriteLogEntry entries" + for entry in debug_log: + if not isinstance(entry, CacheDebugLogEntry): + msg = "get_cache_debug_log() returned non-CacheDebugLogEntry entries" raise CacheFuzzError(msg) - last_timestamp, last_sequence = _validate_cache_audit_entry( + last_timestamp, last_sequence = _validate_cache_debug_entry( entry, - last_timestamp=last_timestamp, - last_sequence=last_sequence, + last_monotonic_timestamp=last_timestamp, + last_debug_sequence=last_sequence, ) @@ -691,13 +696,13 @@ def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch if fdp.remaining_bytes() < 4: return - # Generate cache configuration (vary ALL parameters) + # Generate cache configuration (vary all constructor parameters). cache_size = fdp.ConsumeIntInRange(1, 50) - max_entry_weight = fdp.ConsumeIntInRange(100, 10000) + max_entry_payload_bytes = fdp.ConsumeIntInRange(100, 10000) max_errors_per_entry = fdp.ConsumeIntInRange(1, 50) write_once = fdp.ConsumeBool() strict_mode = fdp.ConsumeBool() - enable_audit = fdp.ConsumeBool() + enable_debug_log = fdp.ConsumeBool() locale = _generate_locale(fdp) ftl = _generate_ftl_for_pattern(fdp, pattern) @@ -712,10 +717,10 @@ def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch locale, cache=CacheConfig( size=cache_size, - max_entry_weight=max_entry_weight, + max_entry_payload_bytes=max_entry_payload_bytes, max_errors_per_entry=max_errors_per_entry, write_once=write_once, - enable_audit=enable_audit, + enable_debug_log=enable_debug_log, ), strict=strict_mode, ) @@ -751,7 +756,7 @@ def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch finally: try: - _collect_cache_observability(bundle, enable_audit=enable_audit) + _collect_cache_observability(bundle, enable_debug_log=enable_debug_log) except CacheFuzzError: _state.findings += 1 raise diff --git a/fuzz_atheris/fuzz_localization_entry.py b/fuzz_atheris/fuzz_localization_entry.py index c0b0b693..9cb5e871 100644 --- a/fuzz_atheris/fuzz_localization_entry.py +++ b/fuzz_atheris/fuzz_localization_entry.py @@ -14,7 +14,7 @@ from fuzz_localization_patterns_boot import _pattern_boot_config_api from fuzz_localization_patterns_introspection import ( _pattern_add_function_custom, - _pattern_cache_audit_api, + _pattern_cache_debug_log_api, _pattern_introspect_api, _pattern_locale_boundary_api, _pattern_on_fallback_callback, @@ -71,7 +71,7 @@ "validate_message_schemas_api": _pattern_validate_message_schemas_api, "add_function_custom": _pattern_add_function_custom, "introspect_api": _pattern_introspect_api, - "cache_audit_api": _pattern_cache_audit_api, + "cache_debug_log_api": _pattern_cache_debug_log_api, "locale_boundary_api": _pattern_locale_boundary_api, "on_fallback_callback": _pattern_on_fallback_callback, "loader_init_success": _pattern_loader_init_success, @@ -127,7 +127,7 @@ def test_one_input(data: bytes) -> None: "add_resource_mutation", "introspect_api", "ast_lookup_api", - "cache_audit_api", + "cache_debug_log_api", "locale_boundary_api", "validate_message_variables_api", "validate_message_schemas_api", diff --git a/fuzz_atheris/fuzz_localization_patterns_introspection.py b/fuzz_atheris/fuzz_localization_patterns_introspection.py index 7e81a232..e56d2205 100644 --- a/fuzz_atheris/fuzz_localization_patterns_introspection.py +++ b/fuzz_atheris/fuzz_localization_patterns_introspection.py @@ -4,11 +4,11 @@ _NON_STRING_LOCALES, _SINGLE_LOCALES, _STRUCTURALLY_INVALID_LOCALES, - _VALID_AUDIT_OPERATIONS, + _VALID_DEBUG_OPERATIONS, MAX_LOCALE_LENGTH_HARD_LIMIT, Any, - CacheAuditLogEntry, CacheConfig, + CacheDebugLogEntry, FallbackInfo, FluentLocalization, LocalizationCacheStats, @@ -90,68 +90,68 @@ def _pattern_introspect_api( raise LocalizationFuzzError(msg) -def _validate_localization_audit_log( +def _validate_localization_debug_log( locale: str, - audit_log: tuple[CacheAuditLogEntry, ...], + debug_log: tuple[CacheDebugLogEntry, ...], *, - enable_audit: bool, + enable_debug_log: bool, ) -> int: - """Validate one locale's audit log and return its entry count.""" - if not enable_audit and audit_log != (): - msg = f"Audit-disabled localization returned non-empty log for '{locale}'" + """Validate one locale's debug log and return its entry count.""" + if not enable_debug_log and debug_log != (): + msg = f"Debug-log-disabled localization returned non-empty log for '{locale}'" raise LocalizationFuzzError(msg) last_timestamp = float("-inf") last_sequence = 0 - for entry in audit_log: - if entry.operation not in _VALID_AUDIT_OPERATIONS: - msg = f"Unexpected audit operation {entry.operation!r} for locale '{locale}'" + for entry in debug_log: + if entry.operation not in _VALID_DEBUG_OPERATIONS: + msg = f"Unexpected debug-log operation {entry.operation!r} for locale '{locale}'" raise LocalizationFuzzError(msg) - if not entry.key_hash: - msg = f"Empty audit key hash for locale '{locale}'" + if not entry.key_fingerprint: + msg = f"Empty debug-log key fingerprint for locale '{locale}'" raise LocalizationFuzzError(msg) - if entry.timestamp < last_timestamp: + if entry.timestamp_monotonic < last_timestamp: msg = ( - f"Audit timestamps regressed for locale '{locale}': " - f"{last_timestamp} -> {entry.timestamp}" + f"Debug-log timestamps regressed for locale '{locale}': " + f"{last_timestamp} -> {entry.timestamp_monotonic}" ) raise LocalizationFuzzError(msg) - if entry.sequence <= last_sequence: + if entry.debug_sequence <= last_sequence: msg = ( - f"Audit sequence regressed for locale '{locale}': " - f"{last_sequence} -> {entry.sequence}" + f"Debug-log sequence regressed for locale '{locale}': " + f"{last_sequence} -> {entry.debug_sequence}" ) raise LocalizationFuzzError(msg) if entry.operation == "MISS": if entry.checksum_hex != "" or entry.cache_sequence < 0: msg = ( - f"MISS audit entry for locale '{locale}' must have " + f"MISS debug entry for locale '{locale}' must have " "empty checksum and non-negative cache_sequence" ) raise LocalizationFuzzError(msg) elif entry.checksum_hex == "" or entry.cache_sequence <= 0: msg = ( - f"{entry.operation} audit entry for locale '{locale}' must carry " + f"{entry.operation} debug entry for locale '{locale}' must carry " "a positive cache_sequence and non-empty checksum" ) raise LocalizationFuzzError(msg) - last_timestamp = entry.timestamp - last_sequence = entry.sequence + last_timestamp = entry.timestamp_monotonic + last_sequence = entry.debug_sequence - return len(audit_log) + return len(debug_log) def _validate_localization_cache_stats( stats: LocalizationCacheStats, *, - enable_audit: bool, + enable_debug_log: bool, expected_locales: list[str], ) -> None: """Validate aggregate localization cache stats against configuration.""" - if stats["audit_enabled"] != enable_audit: + if stats["debug_log_enabled"] != enable_debug_log: msg = ( - "get_cache_stats()['audit_enabled'] disagrees with CacheConfig: " - f"{stats['audit_enabled']} vs {enable_audit}" + "get_cache_stats()['debug_log_enabled'] disagrees with CacheConfig: " + f"{stats['debug_log_enabled']} vs {enable_debug_log}" ) raise LocalizationFuzzError(msg) if stats["bundle_count"] != len(expected_locales): @@ -162,39 +162,39 @@ def _validate_localization_cache_stats( raise LocalizationFuzzError(msg) -def _collect_localization_audit_entries( - audit_logs: dict[str, tuple[CacheAuditLogEntry, ...]], +def _collect_localization_debug_entries( + debug_logs: dict[str, tuple[CacheDebugLogEntry, ...]], *, - enable_audit: bool, + enable_debug_log: bool, ) -> int: - """Validate all per-locale audit logs and return their combined length.""" - total_audit_entries = 0 - for locale, audit_log in audit_logs.items(): - if any(not isinstance(entry, CacheAuditLogEntry) for entry in audit_log): - msg = f"get_cache_audit_log()['{locale}'] returned non-CacheAuditLogEntry data" + """Validate all per-locale debug logs and return their combined length.""" + total_debug_entries = 0 + for locale, debug_log in debug_logs.items(): + if any(not isinstance(entry, CacheDebugLogEntry) for entry in debug_log): + msg = f"get_cache_debug_log()['{locale}'] returned non-CacheDebugLogEntry data" raise LocalizationFuzzError(msg) - total_audit_entries += _validate_localization_audit_log( + total_debug_entries += _validate_localization_debug_log( locale, - audit_log, - enable_audit=enable_audit, + debug_log, + enable_debug_log=enable_debug_log, ) - return total_audit_entries + return total_debug_entries -def _pattern_cache_audit_api( +def _pattern_cache_debug_log_api( fdp: atheris.FuzzedDataProvider, ) -> None: - """get_cache_audit_log exposes per-locale immutable audit trails.""" - _domain.cache_audit_checks += 1 + """get_cache_debug_log exposes per-locale immutable debug histories.""" + _domain.cache_debug_log_checks += 1 primary, fallback = fdp.PickValueInList(list(_LOCALE_PAIRS)) - enable_audit = fdp.ConsumeBool() + enable_debug_log = fdp.ConsumeBool() initialize_fallback = fdp.ConsumeBool() - primary_msg_id = f"audit-{gen_ftl_identifier(fdp)}" + primary_msg_id = f"debug-{gen_ftl_identifier(fdp)}" fallback_msg_id = f"fallback-{gen_ftl_identifier(fdp)}" l10n = FluentLocalization( [primary, fallback], - cache=CacheConfig(enable_audit=enable_audit), + cache=CacheConfig(enable_debug_log=enable_debug_log), strict=False, ) l10n.add_resource(primary, f"{primary_msg_id} = primary\n") @@ -209,14 +209,14 @@ def _pattern_cache_audit_api( if initialize_fallback: l10n.format_value(fallback_msg_id) - audit_logs = l10n.get_cache_audit_log() - if audit_logs is None: - msg = "Cached FluentLocalization returned None from get_cache_audit_log()" + debug_logs = l10n.get_cache_debug_log() + if debug_logs is None: + msg = "Cached FluentLocalization returned None from get_cache_debug_log()" raise LocalizationFuzzError(msg) - if list(audit_logs) != expected_locales: + if list(debug_logs) != expected_locales: msg = ( - "get_cache_audit_log() returned wrong locale keys: " - f"{list(audit_logs)!r} vs {expected_locales!r}" + "get_cache_debug_log() returned wrong locale keys: " + f"{list(debug_logs)!r} vs {expected_locales!r}" ) raise LocalizationFuzzError(msg) @@ -226,28 +226,28 @@ def _pattern_cache_audit_api( raise LocalizationFuzzError(msg) _validate_localization_cache_stats( stats, - enable_audit=enable_audit, + enable_debug_log=enable_debug_log, expected_locales=expected_locales, ) - total_audit_entries = _collect_localization_audit_entries( - audit_logs, - enable_audit=enable_audit, + total_debug_entries = _collect_localization_debug_entries( + debug_logs, + enable_debug_log=enable_debug_log, ) - if total_audit_entries != int(stats.get("audit_entries", 0)): + if total_debug_entries != int(stats.get("debug_log_entries", 0)): msg = ( - "Localization audit log length disagrees with cache stats: " - f"{total_audit_entries} vs {stats.get('audit_entries')}" + "Localization debug-log length disagrees with cache stats: " + f"{total_debug_entries} vs {stats.get('debug_log_entries')}" ) raise LocalizationFuzzError(msg) primary_locale = normalize_locale(primary) fallback_locale = normalize_locale(fallback) - if enable_audit and len(audit_logs[primary_locale]) < 2: - msg = f"Primary locale '{primary_locale}' did not record expected audit entries" + if enable_debug_log and len(debug_logs[primary_locale]) < 2: + msg = f"Primary locale '{primary_locale}' did not record expected debug entries" raise LocalizationFuzzError(msg) - if initialize_fallback and enable_audit and len(audit_logs[fallback_locale]) < 2: - msg = f"Fallback locale '{fallback_locale}' did not record expected audit entries" + if initialize_fallback and enable_debug_log and len(debug_logs[fallback_locale]) < 2: + msg = f"Fallback locale '{fallback_locale}' did not record expected debug entries" raise LocalizationFuzzError(msg) diff --git a/fuzz_atheris/fuzz_localization_support.py b/fuzz_atheris/fuzz_localization_support.py index 441154fd..28d5a449 100644 --- a/fuzz_atheris/fuzz_localization_support.py +++ b/fuzz_atheris/fuzz_localization_support.py @@ -71,7 +71,7 @@ class LocalizationMetrics: validate_calls: int = 0 message_variable_validation_checks: int = 0 schema_validation_checks: int = 0 - cache_audit_checks: int = 0 + cache_debug_log_checks: int = 0 locale_boundary_checks: int = 0 loader_init_checks: int = 0 loader_junk_checks: int = 0 @@ -103,7 +103,7 @@ class LocalizationFuzzError(Exception): ("validate_message_schemas_api", 6), ("add_function_custom", 6), ("introspect_api", 7), - ("cache_audit_api", 6), + ("cache_debug_log_api", 6), ("locale_boundary_api", 5), ("on_fallback_callback", 6), ("loader_init_success", 5), @@ -168,15 +168,18 @@ class LocalizationFuzzError(Exception): ["en-US"], {"locale": "en-US"}, ) -_VALID_AUDIT_OPERATIONS: frozenset[str] = frozenset( +_VALID_DEBUG_OPERATIONS: frozenset[str] = frozenset( { "MISS", "PUT", "HIT", "EVICT", "CORRUPTION", + "KEY_CONFUSION", "WRITE_ONCE_IDEMPOTENT", "WRITE_ONCE_CONFLICT", + "ENTRY_VERIFICATION_FAILED", + "BYPASS_NONCACHEABLE_FUNCTION", } ) _state = BaseFuzzerState( @@ -209,7 +212,7 @@ def _build_stats_dict() -> dict[str, Any]: stats["validate_calls"] = _domain.validate_calls stats["message_variable_validation_checks"] = _domain.message_variable_validation_checks stats["schema_validation_checks"] = _domain.schema_validation_checks - stats["cache_audit_checks"] = _domain.cache_audit_checks + stats["cache_debug_log_checks"] = _domain.cache_debug_log_checks stats["locale_boundary_checks"] = _domain.locale_boundary_checks stats["loader_init_checks"] = _domain.loader_init_checks stats["loader_junk_checks"] = _domain.loader_junk_checks @@ -251,7 +254,7 @@ def _emit_report() -> None: SyntaxIntegrityError, ) from ftllexengine.localization import ( - CacheAuditLogEntry, + CacheDebugLogEntry, FluentLocalization, LocalizationBootConfig, LocalizationCacheStats, @@ -260,7 +263,7 @@ def _emit_report() -> None: from ftllexengine.runtime.cache_config import CacheConfig from ftllexengine.syntax import Message, Term - +# ruff: noqa: RUF022 - grouped re-exports mirror the shared fuzzer helper surface __all__ = [ "GC_INTERVAL", "MAX_LOCALE_LENGTH_HARD_LIMIT", @@ -272,9 +275,9 @@ def _emit_report() -> None: "_PATTERN_WEIGHTS", "_SINGLE_LOCALES", "_STRUCTURALLY_INVALID_LOCALES", - "_VALID_AUDIT_OPERATIONS", + "_VALID_DEBUG_OPERATIONS", "Any", - "CacheAuditLogEntry", + "CacheDebugLogEntry", "CacheConfig", "DataIntegrityError", "FallbackInfo", diff --git a/fuzz_atheris/fuzz_runtime_entry.py b/fuzz_atheris/fuzz_runtime_entry.py index a058f758..f2ddaf2b 100644 --- a/fuzz_atheris/fuzz_runtime_entry.py +++ b/fuzz_atheris/fuzz_runtime_entry.py @@ -17,6 +17,7 @@ TEST_LOCALES, CacheConfig, CacheCorruptionError, + CacheKeySerializationError, ComplexArgs, FluentBundle, FrozenFluentError, @@ -52,7 +53,7 @@ def _perform_differential_testing( alt_strict = not bundle.strict if fdp.ConsumeBool() else bundle.strict alt_cache = not bundle.cache_enabled if fdp.ConsumeBool() else bundle.cache_enabled - try: + with contextlib.suppress(Exception): alt_bundle = FluentBundle( alt_locale, strict=alt_strict, @@ -75,9 +76,6 @@ def _perform_differential_testing( with contextlib.suppress(Exception): alt_bundle.format_pattern(msg_id, args) - except Exception: # pylint: disable=broad-exception-caught - pass - def _run_concurrent_test( fdp: atheris.FuzzedDataProvider, @@ -91,18 +89,24 @@ def _run_concurrent_test( _domain.concurrent_tests += 1 barrier = threading.Barrier(2) + worker_failures: list[BaseException] = [] + failure_lock = threading.Lock() def worker() -> None: with contextlib.suppress(threading.BrokenBarrierError): barrier.wait(timeout=1.0) try: _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) - except CacheCorruptionError: - # Expected from corruption simulation in strict mode + except (CacheCorruptionError, CacheKeySerializationError): + # These are fail-closed cache-integrity outcomes. They are valid regardless of + # formatting softness, and the concurrent harness must not lose them to stderr noise. pass except (RecursionError, MemoryError, FrozenFluentError): # FrozenFluentError: depth guard (MAX_DEPTH_EXCEEDED) pass + except BaseException as exc: + with failure_lock: + worker_failures.append(exc) threads = [threading.Thread(target=worker) for _ in range(2)] for t in threads: @@ -112,6 +116,12 @@ def worker() -> None: if t.is_alive(): msg = "RWLock deadlock detected." raise RuntimeIntegrityError(msg) + if worker_failures: + first_failure = worker_failures[0] + if isinstance(first_failure, RuntimeIntegrityError): + raise first_failure + msg = f"Concurrent worker raised unexpected {type(first_failure).__name__}." + raise RuntimeIntegrityError(msg) from first_failure def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch @@ -186,11 +196,10 @@ def test_one_input(data: bytes) -> None: # noqa: PLR0912, PLR0915 - dispatch else: _execute_runtime_invariants(fdp, bundle, args, strict, enable_cache, cache_write_once) - except CacheCorruptionError: - if strict: - return # Expected - _state.findings += 1 - raise + except (CacheCorruptionError, CacheKeySerializationError): + # Cache corruption and unencodable cache keys are expected fail-closed outcomes whenever + # the harness drives the bundle into those integrity boundaries. + return except RuntimeIntegrityError: _state.findings += 1 diff --git a/fuzz_atheris/fuzz_runtime_scenarios.py b/fuzz_atheris/fuzz_runtime_scenarios.py index d0f4115f..c14886c0 100644 --- a/fuzz_atheris/fuzz_runtime_scenarios.py +++ b/fuzz_atheris/fuzz_runtime_scenarios.py @@ -8,6 +8,7 @@ Any, CacheConfig, CacheCorruptionError, + CacheKeySerializationError, ComplexArgs, FluentBundle, FormattingIntegrityError, @@ -85,10 +86,10 @@ def _execute_runtime_invariants( # noqa: PLR0912, PLR0915 - dispatch _simulate_corruption(bundle) try: bundle.format_pattern(msg_id, args, attribute=attribute) - except CacheCorruptionError as exc: - if not strict: - msg = "Non-strict cache raised CacheCorruptionError." - raise RuntimeIntegrityError(msg) from exc + except CacheCorruptionError: + # Cache corruption is a system-integrity failure, not a formatting-softness + # concern. Both strict and non-strict formatting modes must surface it. + pass except Exception as e: # pylint: disable=broad-exception-caught is_corruption = "corruption" in str(e).lower() if is_corruption and not isinstance(e, CacheCorruptionError): @@ -109,6 +110,12 @@ def _execute_runtime_invariants( # noqa: PLR0912, PLR0915 - dispatch msg = "WriteConflictError raised when write_once=False." raise RuntimeIntegrityError(msg) from e + except CacheKeySerializationError as e: + if not enable_cache: + msg = "CacheKeySerializationError raised while cache was disabled." + raise RuntimeIntegrityError(msg) from e + _domain.integrity_checks += 1 + except (RecursionError, MemoryError, FrozenFluentError): # FrozenFluentError: depth guard fires MAX_DEPTH_EXCEEDED as a safety # mechanism regardless of strict mode to prevent stack overflow diff --git a/fuzz_atheris/fuzz_runtime_support.py b/fuzz_atheris/fuzz_runtime_support.py index f234f492..7fd055a6 100644 --- a/fuzz_atheris/fuzz_runtime_support.py +++ b/fuzz_atheris/fuzz_runtime_support.py @@ -338,6 +338,7 @@ def _emit_report() -> None: from ftllexengine.diagnostics.errors import FrozenFluentError from ftllexengine.integrity import ( CacheCorruptionError, + CacheKeySerializationError, FormattingIntegrityError, WriteConflictError, ) @@ -369,6 +370,7 @@ def _emit_report() -> None: "Any", "CacheConfig", "CacheCorruptionError", + "CacheKeySerializationError", "ComplexArgs", "FluentBundle", "FormattingIntegrityError", diff --git a/fuzz_atheris/fuzz_scope.py b/fuzz_atheris/fuzz_scope.py index 9651bf43..25b79def 100644 --- a/fuzz_atheris/fuzz_scope.py +++ b/fuzz_atheris/fuzz_scope.py @@ -42,15 +42,19 @@ _psutil_mod: Any = None _atheris_mod: Any = None -try: # noqa: SIM105 - need module ref for check_dependencies - import psutil as _psutil_mod # type: ignore[no-redef] +try: + import psutil except ImportError: pass +else: + _psutil_mod = psutil -try: # noqa: SIM105 - need module ref for check_dependencies - import atheris as _atheris_mod # type: ignore[no-redef] +try: + import atheris except ImportError: pass +else: + _atheris_mod = atheris from fuzz_common import ( # noqa: E402 - after dependency capture # pylint: disable=C0413 GC_INTERVAL, @@ -104,6 +108,7 @@ class ScopeMetrics: with atheris.instrument_imports(include=["ftllexengine"]): from ftllexengine.diagnostics.codes import DiagnosticCode from ftllexengine.diagnostics.errors import FrozenFluentError + from ftllexengine.integrity import ResourceConflictIntegrityError from ftllexengine.runtime.bundle import FluentBundle @@ -148,6 +153,10 @@ class ScopeFuzzError(Exception): _ALLOWED_EXCEPTIONS = ( ValueError, TypeError, OverflowError, FrozenFluentError, RecursionError, RuntimeError, + # Adversarial FTL text may now fail closed during registration if injected + # bytes create duplicate/shadowing IDs. That is valid bundle-integrity + # behavior, not a scope-resolution defect. + ResourceConflictIntegrityError, ) # Node ID pool for FTL message/term identifiers diff --git a/fuzz_atheris/targets.tsv b/fuzz_atheris/targets.tsv index 89024794..8bca943b 100644 --- a/fuzz_atheris/targets.tsv +++ b/fuzz_atheris/targets.tsv @@ -1,7 +1,7 @@ # target module description bridge fuzz_bridge.py FunctionRegistry bridge machinery builtins fuzz_builtins.py Built-in function Babel boundary -cache fuzz_cache.py Cache concurrency and audit behavior +cache fuzz_cache.py Cache concurrency plus debug-log and integrity-event behavior currency fuzz_currency.py Currency formatting oracle cursor fuzz_cursor.py Cursor and parse-position helpers dates fuzz_dates.py Locale-aware date and datetime parsing diff --git a/images/FTLLexEngine.jpg b/images/FTLLexEngine.jpg deleted file mode 100644 index d2113fae..00000000 Binary files a/images/FTLLexEngine.jpg and /dev/null differ diff --git a/pyproject.toml b/pyproject.toml index 8857bc2d..9d5c9835 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -7,9 +7,10 @@ managed = true package = true [tool.validate-version] -# Check for `version: X.Y.Z` in YAML frontmatter for non-README documentation. +# Check for `version: X.Y.Z` in YAML frontmatter for AFAD-managed documentation and +# examples. Special root files such as README.md and CHANGELOG.md follow their own +# conventions and are validated by dedicated checks instead. frontmatter_globs = [ - "CHANGELOG.md", "CONTRIBUTING.md", "PATENTS.md", "docs/**/*.md", @@ -61,7 +62,7 @@ shell_exec_timeout_seconds = 180 [project] name = "ftllexengine" -version = "0.166.0" +version = "0.167.0" description = "Python runtime for the Fluent (FTL) specification: bidirectional parsing, CLDR-backed locale-aware formatting, and fail-fast boot validation with structured audit evidence." readme = "README.md" requires-python = ">=3.13" @@ -108,7 +109,6 @@ classifiers = [ "Programming Language :: Python :: 3 :: Only", "Programming Language :: Python :: 3.13", "Programming Language :: Python :: 3.14", - "Programming Language :: Python :: 3.15", "Topic :: Software Development :: Internationalization", "Topic :: Software Development :: Localization", "Topic :: Text Processing :: Linguistic", @@ -130,16 +130,16 @@ dev = [ "Babel>=2.18.0,<3.0.0", # Required for tests (locale formatting, parsing) "pytest>=9.0.3", "pytest-cov>=7.1.0", - "hypothesis>=6.152.4", - "mypy>=1.20.2", - "ruff>=0.15.12", + "hypothesis>=6.152.7", + "mypy>=2.1.0", + "ruff>=0.15.13", "pytest-benchmark>=5.2.3", "psutil>=7.2.2", "types-psutil>=7.2.2.20260408", ] fuzz = [ - "hypothesis[cli]>=6.152.4", + "hypothesis[cli]>=6.152.7", "hypofuzz>=25.11.1", ] @@ -398,33 +398,26 @@ ignore = [ # T201/S101: fuzzers print findings and assert invariants # EXE001: shebang present; file executability managed by git, not ruff # FBT001/FBT002: harness functions use bool flags for control flow -# BLE001: fuzz replay scripts catch all exceptions to report crashes # INP001: fuzz_atheris/ is not a package # SLF001: fuzzers access private members for invariant checking (white-box fuzzing) # TRY301: raise-in-try pattern used extensively in invariant guards; abstracting to helpers adds no value # TC001/TC003: fuzz harnesses do not need TYPE_CHECKING optimization -# S110: try-except-pass is used intentionally to skip non-fatal errors in crash-tolerant fuzzers -# S311: random used for test data generation, not cryptography # PERF401: fuzzer loops are not performance-critical relative to the fuzzing overhead # ARG001: sub-handler functions in dispatch-to-sub-handlers pattern must have uniform (fdp) signatures; # some handlers do not consume bytes from fdp but must accept it for the dispatch interface -"fuzz_atheris/**/*.py" = ["ERA001", "T201", "S101", "S110", "S311", "EXE001", "FBT001", "FBT002", "FBT003", "BLE001", "INP001", "C901", "PLC0415", "SLF001", "TRY301", "TC001", "TC003", "PERF401", "ARG001"] +"fuzz_atheris/**/*.py" = ["ERA001", "T201", "S101", "EXE001", "FBT001", "FBT002", "FBT003", "INP001", "C901", "PLC0415", "SLF001", "TRY301", "TC001", "TC003", "PERF401", "ARG001"] # SCRIPTS - Developer tooling; same rationale as fuzz_atheris # T201: scripts output to terminal by design # EXE001: shebang files; executability managed externally # FBT001/FBT002: script helpers use bool flags -# S603: subprocess calls in repro/replay scripts are developer-controlled (not user input) # TC003: scripts do not need TYPE_CHECKING optimization # INP001: scripts/ is not a package -"scripts/**/*.py" = ["T201", "EXE001", "FBT001", "FBT002", "FBT003", "INP001", "C901", "BLE001", "S101", "S603", "TC001", "TC003", "PERF401"] +"scripts/**/*.py" = ["T201", "EXE001", "FBT001", "FBT002", "FBT003", "INP001", "C901", "S101", "TC001", "TC003", "PERF401"] -# TESTS - Blanket waivers for all test files +# TESTS - Non-production architectural waivers only; security waivers are file-specific below # S101: assert is the standard pytest assertion mechanism; bandit S101 targets production code # S108: /tmp paths in tests are intentional for temp fixture directories (not production) -# S110: try-except-pass in tests catches expected failure paths without logging -# S310/S311/S603: security rules targeting production code; not applicable in controlled tests -# BLE001: broad-except used intentionally to test error isolation and exception handling # TRY301: raise-in-try is a natural pattern for test invariant guards # FBT001/FBT002/FBT003: test helpers and Hypothesis strategies may use bool flags # SLF001: integration and white-box tests verify internal state via private members @@ -434,7 +427,7 @@ ignore = [ # C901: shadow_bundle.py mirrors production dispatch; structural complexity # E501: test docstrings, FTL strings, and assertion messages may exceed 100 chars for clarity # PERF401/PERF402: test loops are not performance-critical paths -"tests/**/*.py" = ["S101", "S108", "S110", "S310", "S311", "S603", "BLE001", "TRY301", "FBT001", "FBT002", "FBT003", "SLF001", "TC001", "TC002", "TC003", "PGH003", "T201", "C901", "E501", "PERF401", "PERF402"] +"tests/**/*.py" = ["S101", "S108", "TRY301", "FBT001", "FBT002", "FBT003", "SLF001", "TC001", "TC002", "TC003", "PGH003", "T201", "C901", "E501", "PERF401", "PERF402"] # Per-file waivers for specific test architectural patterns "tests/test_syntax_visitor.py" = ["N802"] @@ -458,6 +451,37 @@ ignore = [ "tests/test_runtime_function_bridge_validation.py" = ["PLC0415"] "tests/test_init_module.py" = ["PLC0415", "EM101"] "tests/test_runtime_metamorphic_property.py" = ["PLC0415"] + +# Focused security-rule waivers. Premise: release, fuzz, and adversarial surfaces +# are worth linting. Reason: only the exact files that need a waiver keep one. +"fuzz_atheris/fuzz_*_entry.py" = ["BLE001"] +"fuzz_atheris/fuzz_cache.py" = ["BLE001"] +"fuzz_atheris/fuzz_cursor.py" = ["BLE001"] +"fuzz_atheris/fuzz_integrity.py" = ["BLE001"] +"fuzz_atheris/fuzz_iso.py" = ["BLE001"] +"fuzz_atheris/fuzz_lock.py" = ["BLE001"] +"fuzz_atheris/fuzz_parse_*.py" = ["BLE001"] +"fuzz_atheris/fuzz_plural.py" = ["BLE001"] +"fuzz_atheris/fuzz_roundtrip.py" = ["S110", "S311", "BLE001"] +"fuzz_atheris/fuzz_runtime_scenarios.py" = ["S110", "BLE001"] +"fuzz_atheris/fuzz_serializer_mutators.py" = ["S110", "S311", "BLE001"] +"fuzz_atheris/fuzz_structured.py" = ["S110", "S311", "BLE001"] +"scripts/fuzz_hypofuzz_repro.py" = ["S603"] +"scripts/run_examples.py" = ["S603"] +"scripts/validate_docs.py" = ["S603", "BLE001"] +"tests/diagnostics_frozen_error_cases/core_behavior.py" = ["BLE001"] +"tests/introspection_iso_cases/defensive_branches.py" = ["BLE001"] +"tests/introspection_message_cases/properties_and_branches.py" = ["BLE001"] +"tests/runtime_cache_integrity_cases/idempotence_and_hashes.py" = ["BLE001"] +"tests/test_architecture_contract.py" = ["S603"] +"tests/test_diagnostics_errors.py" = ["BLE001"] +"tests/test_documentation_tooling.py" = ["S603"] +"tests/test_introspection_iso_property.py" = ["S110", "BLE001"] +"tests/test_introspection_property.py" = ["S110", "BLE001"] +"tests/test_localization_cache_stats.py" = ["BLE001"] +"tests/test_runtime_bundle_property_core.py" = ["S110", "BLE001"] +"tests/test_runtime_cache_format.py" = ["BLE001"] +"tests/test_rwlock_property.py" = ["S311"] "tests/test_regression_parser_l10n_rounding.py" = ["PLC0415"] "tests/test_rwlock_core.py" = ["PLC0415"] "tests/conftest.py" = ["PLC0415"] @@ -478,7 +502,7 @@ ignore = [ "tests/test_regression_rwlock_currency_spans.py" = ["PLC0415"] "tests/test_runtime_locale_utils.py" = ["SIM117"] "tests/fuzz/test_parsing_currency_property.py" = ["PLC0415"] -"tests/fuzz/test_runtime_bundle_concurrent.py" = ["PLC0415"] +"tests/fuzz/test_runtime_bundle_concurrent.py" = ["PLC0415", "BLE001"] "tests/fuzz/test_syntax_validator_property.py" = ["PLC0415"] "tests/test_parsing_babel_compat.py" = ["PLC0415"] "tests/test_integration_coverage.py" = ["N802", "PLC0415"] @@ -491,7 +515,7 @@ ignore = [ "tests/syntax_visitor_cases/__init__.py" = ["N802"] "tests/syntax_visitor_transformer_cases/__init__.py" = ["N802"] # Fuzzing: pytestmark precedes later imports for file-level marker application -"tests/fuzz/test_syntax_parser_grammar.py" = ["E402"] +"tests/fuzz/test_syntax_parser_grammar.py" = ["E402", "BLE001"] "tests/fuzz/test_syntax_parser_whitespace.py" = ["E402"] "tests/fuzz/test_parsing_numbers_property.py" = ["PLC0415"] # DTZ001: datetime strategy bounds and @example fixtures are test data, not production datetimes diff --git a/scripts/benchmark.sh b/scripts/benchmark.sh index 374ff3db..ab899339 100755 --- a/scripts/benchmark.sh +++ b/scripts/benchmark.sh @@ -71,8 +71,17 @@ set -o nounset set -o pipefail shopt -s inherit_errexit +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" +# shellcheck source=scripts/lib/python_support_contract.sh +source "$PROJECT_ROOT/scripts/lib/python_support_contract.sh" + # [SECTION: ENVIRONMENT_ISOLATION] -PY_VERSION="${PY_VERSION:-3.13}" +# Premise: benchmark defaults should match the minimum support floor so +# performance evidence is anchored to a promised interpreter. +# Reason: higher-version benchmarking remains available via PY_VERSION=... +# overrides without duplicating the support contract. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then TARGET_VENV=".venv-devcontainer-${PY_VERSION}" else @@ -150,7 +159,7 @@ while [[ $# -gt 0 ]]; do echo " ./scripts/benchmark.sh --histogram --json benchmark_results.json" echo "" echo "Environment:" - echo " PY_VERSION Python version for the isolated venv (default: 3.13)" + echo " PY_VERSION Python version for the isolated venv (default: $FTLLEXENGINE_PYTHON_MIN)" echo " NO_COLOR=1 Disable colored output" exit 0 ;; diff --git a/scripts/fuzz_atheris.sh b/scripts/fuzz_atheris.sh index 9869f5e2..1f320d91 100755 --- a/scripts/fuzz_atheris.sh +++ b/scripts/fuzz_atheris.sh @@ -13,7 +13,16 @@ set -o nounset set -o pipefail shopt -s inherit_errexit -PY_VERSION="${PY_VERSION:-3.13}" +SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" +# shellcheck source=scripts/lib/python_support_contract.sh +source "$PROJECT_ROOT/scripts/lib/python_support_contract.sh" + +# Premise: the native Atheris lane must follow the same minimum-version owner +# as the rest of the repository gates. +# Reason: keeping the interpreter default in one contract file prevents the +# native-fuzz lane from silently diverging from the supported floor. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" WORKERS=1 TIME_LIMIT="" TARGET="" @@ -27,8 +36,6 @@ VERBOSE=0 DRY_RUN=0 ORIGINAL_ARGS=("$@") -SCRIPT_DIR="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)" -PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" readonly FUZZ_LIB_DIR="$SCRIPT_DIR/lib/fuzz_atheris" require_fuzz_lib() { diff --git a/scripts/fuzz_hypofuzz.sh b/scripts/fuzz_hypofuzz.sh index 3211f6b9..aa9c3781 100755 --- a/scripts/fuzz_hypofuzz.sh +++ b/scripts/fuzz_hypofuzz.sh @@ -25,7 +25,15 @@ if [[ "${BASH_VERSINFO[0]}" -ge 5 ]]; then shopt -s inherit_errexit 2>/dev/null || true fi -PY_VERSION="${PY_VERSION:-3.13}" +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" +# shellcheck source=scripts/lib/python_support_contract.sh +source "$PROJECT_ROOT/scripts/lib/python_support_contract.sh" + +# Premise: property/fuzz entrypoints should share the same default interpreter +# contract as the core lint/test gates. +# Reason: one source of truth keeps corpus, repro, and CI lanes from drifting. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then TARGET_VENV=".venv-devcontainer-${PY_VERSION}" else @@ -52,9 +60,6 @@ else fi export TMPDIR="/tmp" - -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" IS_GHA="${GITHUB_ACTIONS:-false}" readonly FUZZ_LIB_DIR="$SCRIPT_DIR/lib/fuzz_hypofuzz" diff --git a/scripts/lib/python_support_contract.sh b/scripts/lib/python_support_contract.sh new file mode 100644 index 00000000..5a6380f2 --- /dev/null +++ b/scripts/lib/python_support_contract.sh @@ -0,0 +1,18 @@ +# Canonical Python support contract for repository tooling, CI, and release flows. +# +# Premise: +# A Python support claim is not one number. The repository simultaneously owns a +# minimum supported interpreter, a tested supported set, a latest supported +# release-verification interpreter, a free-threaded verification lane, and an +# intentionally unsupported floor used for negative packaging checks. +# +# Reason: +# Keeping those values in one shell-readable contract file gives every shell +# gate, workflow, validator, and document a single owner. Drift becomes a build +# failure instead of a silent metadata or CI mismatch. + +FTLLEXENGINE_PYTHON_MIN="3.13" +FTLLEXENGINE_PYTHON_SUPPORTED="3.13 3.14" +FTLLEXENGINE_PYTHON_LATEST="3.14" +FTLLEXENGINE_PYTHON_FREETHREADED="3.13t" +FTLLEXENGINE_PYTHON_UNSUPPORTED_FLOOR="3.12" diff --git a/scripts/lint.sh b/scripts/lint.sh index 38187af5..dd5f725e 100755 --- a/scripts/lint.sh +++ b/scripts/lint.sh @@ -39,8 +39,19 @@ set -o nounset set -o pipefail shopt -s inherit_errexit +_script_src="${BASH_SOURCE[0]}"; [[ "$_script_src" != */* ]] && _script_src="./$_script_src" +SCRIPT_DIR="$(cd -- "${_script_src%/*}" && pwd)" +PROJECT_ROOT="$(cd -- "$SCRIPT_DIR/.." && pwd)" +unset _script_src +# shellcheck source=scripts/lib/python_support_contract.sh +source "$PROJECT_ROOT/scripts/lib/python_support_contract.sh" + # [SECTION: ENVIRONMENT_ISOLATION] -PY_VERSION="${PY_VERSION:-3.13}" +# Premise: verification defaults to the support floor so regressions surface at +# the narrowest promised interpreter first. +# Reason: contributors can opt into a wider lane with PY_VERSION=... without +# the repository carrying duplicate hard-coded version tables. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then TARGET_VENV=".venv-devcontainer-${PY_VERSION}" else @@ -71,15 +82,13 @@ fi # [SECTION: SETUP] CLEAN_CACHE=true VERBOSE=false +FIX_MODE=false declare -A STATUS declare -A TIMING declare -A METRICS FAILED=false IS_GHA="${GITHUB_ACTIONS:-false}" # Resolve script directory using Bash built-ins only — no dependency on /usr/bin/dirname. -_script_src="${BASH_SOURCE[0]}"; [[ "$_script_src" != */* ]] && _script_src="./$_script_src" -SCRIPT_DIR="$(cd -- "${_script_src%/*}" && pwd)" -unset _script_src PY_VERSION_NODOT="${PY_VERSION//./}" FAILED_ITEMS_FILE=$(mktemp) @@ -89,6 +98,7 @@ while [[ $# -gt 0 ]]; do case "$1" in --no-clean) CLEAN_CACHE=false; shift ;; --verbose) VERBOSE=true; shift ;; + --fix) FIX_MODE=true; shift ;; *) echo "Unknown argument: $1"; exit 1 ;; esac done @@ -240,7 +250,14 @@ run_ruff() { # Run on all targets at once (Ruff is safe for this) # Removed explicit --config to allow for nested ruff/pyproject config discovery - local cmd=(ruff check --fix $format_flag) + local cmd=(ruff check $format_flag) + if [[ "$FIX_MODE" == "true" ]]; then + # Premise: repair mode is useful locally, but CI evidence must never + # mutate the source tree during verification. + # Reason: --fix is opt-in so the lint gate proves the checked-in tree + # is already valid unless a contributor explicitly asks for repair. + cmd+=(--fix) + fi log_info "Discovery: Native/Hierarchical (ruff.toml or pyproject.toml)" # Append target version if we can determine it, otherwise let ruff read pyproject.toml if [[ -n "${PY_VERSION_NODOT}" ]]; then @@ -287,6 +304,7 @@ run_mypy() { readonly -a SCRIPT_VALIDATORS=( "PISync:$SCRIPT_DIR/validate_pyi_sync.py" "ISO4217:$SCRIPT_DIR/verify_iso4217.py" + "PythonSupport:$SCRIPT_DIR/python_support.py validate" ) run_script_validators() { @@ -297,7 +315,9 @@ run_script_validators() { for validator_entry in "${SCRIPT_VALIDATORS[@]}"; do name="${validator_entry%%:*}" script_path="${validator_entry#*:}" - execute_tool "validator:$name" "all" "${TARGET_VENV}/bin/python" "$script_path" + # shellcheck disable=SC2206 + local -a validator_args=($script_path) + execute_tool "validator:$name" "all" "${TARGET_VENV}/bin/python" "${validator_args[@]}" done log_group_end } diff --git a/scripts/python_support.py b/scripts/python_support.py new file mode 100644 index 00000000..80987e17 --- /dev/null +++ b/scripts/python_support.py @@ -0,0 +1,36 @@ +#!/usr/bin/env python3 +"""CLI wrapper for the repository Python support contract tooling. + +Premise: + The executable entrypoint should stay small enough to make the command + surface obvious and keep the implementation owner easy to audit. + +Reason: + The heavy validation logic lives in a library module so this file remains a + thin command adapter rather than growing into another multi-purpose owner. +""" + +from __future__ import annotations + +import sys + +from python_support_lib import emit_github_outputs, load_contract, validate_contract + + +def main() -> int: + """Dispatch the requested contract command.""" + contract = load_contract() + + if len(sys.argv) != 2 or sys.argv[1] not in {"github-outputs", "validate"}: + print("Usage: python_support.py {github-outputs|validate}", file=sys.stderr) + return 2 + + if sys.argv[1] == "github-outputs": + emit_github_outputs(contract) + return 0 + + return validate_contract(contract) + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/scripts/python_support_lib.py b/scripts/python_support_lib.py new file mode 100644 index 00000000..2e72d640 --- /dev/null +++ b/scripts/python_support_lib.py @@ -0,0 +1,320 @@ +"""Own and validate the repository's Python support contract. + +Premise: + Python compatibility truth spans metadata, workflows, shell gates, and + public maintainer docs, so a one-file script quickly becomes a mixed owner. + +Reason: + This library keeps the structured contract logic in one importable module + while the CLI entrypoint remains small and the validation rules stay easy + to expand without violating the repository size budgets. +""" + +from __future__ import annotations + +import json +import re +import sys +import tomllib +from dataclasses import dataclass +from pathlib import Path +from typing import Final + +CONTRACT_PATH: Final = Path("scripts/lib/python_support_contract.sh") +SHELL_TARGETS: Final[tuple[Path, ...]] = ( + Path("check.sh"), + Path("scripts/lint.sh"), + Path("scripts/test.sh"), + Path("scripts/fuzz_hypofuzz.sh"), + Path("scripts/fuzz_atheris.sh"), + Path("scripts/benchmark.sh"), +) + + +@dataclass(frozen=True, slots=True) +class PythonSupportContract: + """Canonical repository Python support values.""" + + minimum: str + supported: tuple[str, ...] + latest: str + freethreaded: str + unsupported_floor: str + + @property + def ruff_target(self) -> str: + """Return Ruff's target-version representation for the minimum version.""" + return f"py{self.minimum.replace('.', '')}" + + +def repo_root() -> Path: + """Return the repository root from this helper module.""" + return Path(__file__).resolve().parent.parent + + +def load_contract(root: Path | None = None) -> PythonSupportContract: + """Load and validate the canonical shell-readable support contract.""" + active_root = root or repo_root() + contract_text = (active_root / CONTRACT_PATH).read_text(encoding="utf-8") + values: dict[str, str] = {} + + for raw_line in contract_text.splitlines(): + line = raw_line.strip() + if not line or line.startswith("#"): + continue + match = re.fullmatch(r'([A-Z0-9_]+)="([^"]*)"', line) + if match is None: + msg = f"Unsupported contract line format: {raw_line!r}" + raise SystemExit(msg) + values[match.group(1)] = match.group(2) + + required = { + "FTLLEXENGINE_PYTHON_MIN", + "FTLLEXENGINE_PYTHON_SUPPORTED", + "FTLLEXENGINE_PYTHON_LATEST", + "FTLLEXENGINE_PYTHON_FREETHREADED", + "FTLLEXENGINE_PYTHON_UNSUPPORTED_FLOOR", + } + missing = sorted(required - values.keys()) + if missing: + msg = f"Missing contract keys: {missing}" + raise SystemExit(msg) + + contract = PythonSupportContract( + minimum=values["FTLLEXENGINE_PYTHON_MIN"], + supported=tuple(values["FTLLEXENGINE_PYTHON_SUPPORTED"].split()), + latest=values["FTLLEXENGINE_PYTHON_LATEST"], + freethreaded=values["FTLLEXENGINE_PYTHON_FREETHREADED"], + unsupported_floor=values["FTLLEXENGINE_PYTHON_UNSUPPORTED_FLOOR"], + ) + validate_contract_shape(contract) + return contract + + +def validate_contract_shape(contract: PythonSupportContract) -> None: + """Reject internally inconsistent contract declarations.""" + if contract.minimum not in contract.supported: + msg = "Minimum Python version must appear in supported set" + raise SystemExit(msg) + if contract.latest not in contract.supported: + msg = "Latest supported Python version must appear in supported set" + raise SystemExit(msg) + if contract.supported[-1] != contract.latest: + msg = "Latest supported Python version must be the final supported entry" + raise SystemExit(msg) + if not contract.freethreaded.startswith(contract.minimum): + msg = "Free-threaded lane must be anchored to the minimum CPython release" + raise SystemExit(msg) + + +def emit_github_outputs(contract: PythonSupportContract) -> None: + """Emit GitHub Actions outputs derived from the canonical contract.""" + print(f"minimum-version={contract.minimum}") + print(f"latest-version={contract.latest}") + print(f"supported-json={json.dumps(list(contract.supported))}") + print(f"freethreaded-version={contract.freethreaded}") + print(f"unsupported-version={contract.unsupported_floor}") + + +def validate_contract(contract: PythonSupportContract) -> int: + """Validate repository surfaces against the canonical Python contract.""" + root = repo_root() + errors: list[str] = [] + _validate_pyproject(root, contract, errors) + _validate_shell_scripts(root, errors) + _validate_workflows(root, errors) + _validate_docs(root, contract, errors) + + if errors: + for error in errors: + print(f"[FAIL] {error}", file=sys.stderr) + return 1 + + print("[PASS] Python support contract is consistent.") + return 0 + + +def _expect(condition: bool, message: str, *, errors: list[str]) -> None: + if not condition: + errors.append(message) + + +def _validate_pyproject(root: Path, contract: PythonSupportContract, errors: list[str]) -> None: + data = tomllib.loads((root / "pyproject.toml").read_text(encoding="utf-8")) + project = data["project"] + mypy = data["tool"]["mypy"] + ruff = data["tool"]["ruff"] + + _expect( + project["requires-python"] == f">={contract.minimum}", + "pyproject.toml [project].requires-python must match the contract minimum", + errors=errors, + ) + + minor_classifier_re = re.compile(r"Programming Language :: Python :: (\d+\.\d+)$") + actual_minors = { + match.group(1) + for classifier in project["classifiers"] + if (match := minor_classifier_re.fullmatch(classifier)) + } + _expect( + actual_minors == set(contract.supported), + ( + "pyproject.toml Python minor-version classifiers must equal the contract " + f"supported set: expected {sorted(contract.supported)!r}, got {sorted(actual_minors)!r}" + ), + errors=errors, + ) + + _expect( + str(mypy["python_version"]) == contract.minimum, + "pyproject.toml [tool.mypy].python_version must match the contract minimum", + errors=errors, + ) + _expect( + str(ruff["target-version"]) == contract.ruff_target, + "pyproject.toml [tool.ruff].target-version must match the contract minimum", + errors=errors, + ) + + tests_mypy = (root / "tests" / "mypy.ini").read_text(encoding="utf-8") + _expect( + f"python_version = {contract.minimum}" in tests_mypy, + "tests/mypy.ini must match the contract minimum", + errors=errors, + ) + + +def _validate_shell_scripts(root: Path, errors: list[str]) -> None: + for relative_path in SHELL_TARGETS: + text = (root / relative_path).read_text(encoding="utf-8") + _expect( + "python_support_contract.sh" in text, + f"{relative_path} must source the canonical Python support contract", + errors=errors, + ) + _expect( + 'PY_VERSION="${PY_VERSION:-3.13}"' not in text, + f"{relative_path} must not hard-code the default Python version", + errors=errors, + ) + + +def _validate_workflows(root: Path, errors: list[str]) -> None: + test_workflow = (root / ".github" / "workflows" / "test.yml").read_text(encoding="utf-8") + publish_workflow = (root / ".github" / "workflows" / "publish.yml").read_text( + encoding="utf-8" + ) + + for marker in ( + "python-support:", + "fromJSON(needs.python-support.outputs.supported-json)", + "needs.python-support.outputs.freethreaded-version", + "permissions:\n contents: read", + ): + _expect( + marker in test_workflow, + f"test workflow missing contract-driven marker: {marker}", + errors=errors, + ) + + publish_markers = ( + "release-contract:", + "fromJSON(needs.release-contract.outputs.supported-json)", + "needs.release-contract.outputs.release-commit", + "needs.release-contract.outputs.freethreaded-version", + "Resolve immutable annotated release tag", + "/git/ref/tags/", + "/git/tags/", + "Release tags must be annotated tag objects", + "Release tag signature is not verified by GitHub", + ) + for marker in publish_markers: + _expect( + marker in publish_workflow, + f"publish workflow missing contract-driven marker: {marker}", + errors=errors, + ) + + forbidden_workflow_snippets = ( + 'python-version: ["3.13", "3.14"]', + "When Python 3.15 releases", + ( + "ref: ${{ github.event_name == 'workflow_dispatch' && " + "inputs.release_tag || github.ref_name }}" + ), + "permissions:\n contents: write\n id-token: write", + ) + for snippet in forbidden_workflow_snippets: + _expect( + snippet not in test_workflow and snippet not in publish_workflow, + f"workflow drift marker must be absent: {snippet}", + errors=errors, + ) + + +def _validate_docs(root: Path, contract: PythonSupportContract, errors: list[str]) -> None: + contributing = (root / "CONTRIBUTING.md").read_text(encoding="utf-8") + release_protocol = (root / "docs" / "RELEASE_PROTOCOL.md").read_text(encoding="utf-8") + developer_devcontainer = (root / "docs" / "DEVELOPER_DEVCONTAINER.md").read_text( + encoding="utf-8" + ) + testing_doc = (root / "docs" / "DOC_06_Testing.md").read_text(encoding="utf-8") + + expected_release_commands = ( + f"PY_VERSION={contract.latest} ./scripts/lint.sh", + f"PY_VERSION={contract.latest} ./scripts/test.sh", + f"uv run --group dev --python {contract.latest} python scripts/validate_docs.py", + f"uv run --group dev --python {contract.latest} python scripts/validate_version.py", + ) + for command in expected_release_commands: + _expect( + command in contributing, + f"CONTRIBUTING.md missing command: {command}", + errors=errors, + ) + _expect( + command in release_protocol, + f"docs/RELEASE_PROTOCOL.md missing command: {command}", + errors=errors, + ) + + _expect( + f"Python {contract.minimum} as the canonical contributor interpreter" + in developer_devcontainer, + ( + "docs/DEVELOPER_DEVCONTAINER.md must name the contract minimum " + "as the contributor interpreter" + ), + errors=errors, + ) + _expect( + f"PY_VERSION={contract.latest}" in developer_devcontainer, + ( + "docs/DEVELOPER_DEVCONTAINER.md must use the contract latest " + "version in forward-compat examples" + ), + errors=errors, + ) + _expect( + f".venv-{contract.minimum}" in testing_doc + and f".venv-devcontainer-{contract.minimum}" in testing_doc, + "docs/DOC_06_Testing.md must describe the contract minimum venv naming", + errors=errors, + ) + _expect( + f"uv venv --python {contract.minimum} --seed" in release_protocol, + ( + "docs/RELEASE_PROTOCOL.md must verify the minimum supported " + "Python installer path" + ), + errors=errors, + ) + _expect( + f"Python {contract.unsupported_floor}" in release_protocol, + ( + "docs/RELEASE_PROTOCOL.md must name the unsupported floor used " + "for negative packaging verification" + ), + errors=errors, + ) diff --git a/scripts/run_examples.py b/scripts/run_examples.py index a7acf581..cdab9ad1 100644 --- a/scripts/run_examples.py +++ b/scripts/run_examples.py @@ -68,7 +68,7 @@ def _validator(stdout: str) -> str | None: ), "locale_fallback.py": _require_output_markers("[SUCCESS] All examples complete!"), "parser_only.py": _require_output_markers( - "[PASS] Warning-only validation semantics verified", + "[PASS] Critical warning validation semantics verified", "[PASS] Invalid syntax semantics verified", "All examples completed successfully!", ), diff --git a/scripts/test.sh b/scripts/test.sh index 982ae242..5cefed61 100755 --- a/scripts/test.sh +++ b/scripts/test.sh @@ -38,8 +38,15 @@ set -o nounset set -o pipefail shopt -s inherit_errexit +PROJECT_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)" +# shellcheck source=scripts/lib/python_support_contract.sh +source "$PROJECT_ROOT/scripts/lib/python_support_contract.sh" + # [SECTION: ENVIRONMENT_ISOLATION] -PY_VERSION="${PY_VERSION:-3.13}" +# Premise: default verification must exercise the minimum supported runtime. +# Reason: contributors can widen coverage with PY_VERSION=... while one +# contract file remains the authoritative version owner. +PY_VERSION="${PY_VERSION:-$FTLLEXENGINE_PYTHON_MIN}" if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" ]]; then TARGET_VENV=".venv-devcontainer-${PY_VERSION}" else @@ -49,7 +56,6 @@ if [[ "${FTLLEXENGINE_DEVCONTAINER:-}" == "1" && -z "${UV_LINK_MODE:-}" ]]; then export UV_LINK_MODE="copy" fi -PROJECT_ROOT="$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")/.." && pwd)" cd "$PROJECT_ROOT" if [[ "${UV_PROJECT_ENVIRONMENT:-}" != "$TARGET_VENV" ]]; then diff --git a/scripts/validate-devcontainer.sh b/scripts/validate-devcontainer.sh index 7b385142..64e8a3a3 100755 --- a/scripts/validate-devcontainer.sh +++ b/scripts/validate-devcontainer.sh @@ -26,6 +26,8 @@ readonly repo_root readonly dockerfile_path="${repo_root}/.devcontainer/Dockerfile" readonly config_path="${repo_root}/.devcontainer/devcontainer.json" readonly user_home_repair_script="${repo_root}/scripts/devcontainer-prepare-user-home.sh" +# shellcheck source=scripts/lib/python_support_contract.sh +source "${repo_root}/scripts/lib/python_support_contract.sh" command -v docker >/dev/null 2>&1 || die "docker is required to validate the contributor devcontainer" command -v python3 >/dev/null 2>&1 || die "python3 is required to validate devcontainer.json" @@ -105,9 +107,14 @@ docker build \ --tag "${image_tag}" \ "${repo_root}/.devcontainer" >/dev/null -docker run --rm "${image_tag}" bash -lc ' +docker run --rm \ + --env "FTLLEXENGINE_PYTHON_MIN=${FTLLEXENGINE_PYTHON_MIN}" \ + "${image_tag}" bash -lc ' set -euo pipefail - python3.13 --version | grep -E "^Python 3\.13" >/dev/null + # Premise: the devcontainer owns the canonical contributor interpreter. + # Reason: validate against the shared support contract so container drift is + # caught by the same owner as shell gates and CI. + "python${FTLLEXENGINE_PYTHON_MIN}" --version | grep -E "^Python ${FTLLEXENGINE_PYTHON_MIN//./\\.}" >/dev/null uv --version >/dev/null git --version >/dev/null bash --version | head -1 | grep -E "version 5" >/dev/null diff --git a/src/ftllexengine/__init__.py b/src/ftllexengine/__init__.py index f3299d93..4d37d541 100644 --- a/src/ftllexengine/__init__.py +++ b/src/ftllexengine/__init__.py @@ -12,6 +12,7 @@ parse_stream_ftl - Parse FTL source from a line iterator, yields entries incrementally serialize_ftl - Serialize AST to FTL source (no external dependencies) validate_resource - Validate FTL resource for semantic errors (no external dependencies) + UNLIMITED - Explicit opt-out sentinel for security/resource limits FluentNumber - Immutable formatted-number wrapper preserving numeric identity FluentValue - Type alias for values accepted by formatting functions make_fluent_number - Construct FluentNumber from int/Decimal with inferred precision @@ -119,6 +120,7 @@ ) from .analysis import detect_cycles from .cache_management import clear_module_caches +from .core._limits import UNLIMITED, UnlimitedLimit from .core.babel_compat import get_cldr_version, is_babel_available from .core.locale_utils import get_system_locale, normalize_locale, require_locale_code from .core.semantic_types import FTLSource, LocaleCode, MessageId, ResourceId @@ -256,6 +258,8 @@ def __getattr__(name: str) -> object: "FrozenErrorContext", "FrozenFluentError", "ParseTypeLiteral", + "UNLIMITED", + "UnlimitedLimit", # Data integrity exceptions "CacheCorruptionError", "DataIntegrityError", diff --git a/src/ftllexengine/__init__.pyi b/src/ftllexengine/__init__.pyi index 75f6b7d7..2b9721ab 100644 --- a/src/ftllexengine/__init__.pyi +++ b/src/ftllexengine/__init__.pyi @@ -1,5 +1,7 @@ # ISO data utilities (call-time Babel requirement) from .analysis import detect_cycles as detect_cycles +from .core._limits import UNLIMITED as UNLIMITED +from .core._limits import UnlimitedLimit as UnlimitedLimit from .core.babel_compat import get_cldr_version as get_cldr_version # Locale utilities (no Babel dependency) @@ -85,8 +87,8 @@ from .localization import ResourceLoadResult as ResourceLoadResult # Babel-backed facades from .localization.boot import LocalizationBootConfig as LocalizationBootConfig +from .localization.cache_stats import LocalizationCacheStats as LocalizationCacheStats from .localization.orchestrator import FluentLocalization as FluentLocalization -from .localization.orchestrator import LocalizationCacheStats as LocalizationCacheStats from .runtime import AsyncFluentBundle as AsyncFluentBundle from .runtime import FluentBundle as FluentBundle from .runtime import FluentNumber as FluentNumber @@ -133,6 +135,8 @@ __all__: list[str] = [ "FrozenErrorContext", "FrozenFluentError", "ParseTypeLiteral", + "UNLIMITED", + "UnlimitedLimit", # Data integrity exceptions "CacheCorruptionError", "DataIntegrityError", diff --git a/src/ftllexengine/_optional_exports.py b/src/ftllexengine/_optional_exports.py index 2174f44b..f54d1e69 100644 --- a/src/ftllexengine/_optional_exports.py +++ b/src/ftllexengine/_optional_exports.py @@ -56,7 +56,7 @@ class OptionalFacadeExport: ), OptionalFacadeExport( public_name="LocalizationCacheStats", - source_module="ftllexengine.localization.orchestrator", + source_module="ftllexengine.localization.cache_stats", source_name="LocalizationCacheStats", ), ), @@ -73,7 +73,7 @@ class OptionalFacadeExport: ), OptionalFacadeExport( public_name="LocalizationCacheStats", - source_module="ftllexengine.localization.orchestrator", + source_module="ftllexengine.localization.cache_stats", source_name="LocalizationCacheStats", ), ), diff --git a/src/ftllexengine/constants.py b/src/ftllexengine/constants.py index 66225c8b..41500ccf 100644 --- a/src/ftllexengine/constants.py +++ b/src/ftllexengine/constants.py @@ -26,7 +26,7 @@ "MAX_TERRITORY_CACHE_SIZE", "MAX_CURRENCY_CACHE_SIZE", "DEFAULT_CACHE_SIZE", - "DEFAULT_MAX_ENTRY_WEIGHT", + "DEFAULT_MAX_ENTRY_PAYLOAD_BYTES", # Input limits "MAX_SOURCE_SIZE", "MAX_LOCALE_CODE_LENGTH", @@ -129,12 +129,12 @@ # 1000 entries is sufficient for most applications (typical UI has <500 messages). DEFAULT_CACHE_SIZE: int = 1000 -# Default maximum entry weight in characters (~10KB for typical strings). -# Prevents unbounded memory usage when caching very large formatted results. -# Results exceeding this limit are computed but not cached, protecting against -# scenarios where large variable values produce very large formatted strings -# (e.g., 10MB results cached 1000 times would consume 10GB of memory). -DEFAULT_MAX_ENTRY_WEIGHT: int = 10_000 +# Default maximum retained payload bytes for one cached entry. +# This bounds the UTF-8 payload of the formatted string plus the serialized +# diagnostic content retained alongside it. It is intentionally described as +# retained payload, not process memory, because Python object overhead varies by +# interpreter build while payload bytes are deterministic and portable. +DEFAULT_MAX_ENTRY_PAYLOAD_BYTES: int = 10_000 # ============================================================================ # INPUT LIMITS diff --git a/src/ftllexengine/core/_limits.py b/src/ftllexengine/core/_limits.py new file mode 100644 index 00000000..190b5512 --- /dev/null +++ b/src/ftllexengine/core/_limits.py @@ -0,0 +1,78 @@ +"""Internal helpers for validating explicit security and resource limits. + +These helpers centralize the limit contract so parser, runtime, and loader +boundaries do not drift apart. The premise is simple: security limits must be +validated once at the owning boundary and represented consistently everywhere +else. +""" + +from __future__ import annotations + +from typing import Final, final + +__all__ = ["UNLIMITED", "LimitArg", "UnlimitedLimit", "resolve_limit_arg"] + + +@final +class UnlimitedLimit: + """Explicit opt-out sentinel for callers that truly want unbounded behavior.""" + + __slots__ = () + + def __repr__(self) -> str: + """Render the sentinel with its semantic name for debugging and docs.""" + return "UNLIMITED" + + +UNLIMITED: Final[UnlimitedLimit] = UnlimitedLimit() +"""Canonical explicit opt-out sentinel for security limit configuration.""" + +type LimitArg = int | UnlimitedLimit | None + + +def _require_plain_int(value: object, field_name: str) -> int: + """Reject non-int inputs, including bool, at the configuration boundary.""" + if isinstance(value, bool): + msg = f"{field_name} must be int, got bool" + raise TypeError(msg) + if not isinstance(value, int): + msg = f"{field_name} must be int, got {type(value).__name__}" + raise TypeError(msg) + return value + + +def resolve_limit_arg( + value: LimitArg, + *, + field_name: str, + default: int, + allow_unlimited: bool = True, +) -> int | None: + """Resolve one limit argument to a validated integer or ``None`` for unlimited. + + Premise: + Security limits must fail closed. Invalid negatives and magic zero values + are rejected instead of silently disabling protection. + + Reason: + The rest of the system should not have to remember whether ``0`` or + ``-1`` means "off". Only the explicit ``UNLIMITED`` sentinel may disable + a limit intentionally. + """ + if value is None: + candidate = default + elif value is UNLIMITED: + if not allow_unlimited: + msg = f"{field_name} does not support unlimited mode" + raise ValueError(msg) + return None + else: + candidate = _require_plain_int(value, field_name) + + if candidate <= 0: + msg = ( + f"{field_name} must be positive. " + f"Use UNLIMITED for intentional unbounded operation." + ) + raise ValueError(msg) + return candidate diff --git a/src/ftllexengine/core/locale_utils.py b/src/ftllexengine/core/locale_utils.py index 1b5a4fea..e6f7e3f2 100644 --- a/src/ftllexengine/core/locale_utils.py +++ b/src/ftllexengine/core/locale_utils.py @@ -261,7 +261,7 @@ def get_system_locale(*, raise_on_failure: bool = False) -> str: if system_locale: locale_code = system_locale.split(".", 1)[0] if not _is_pseudo_locale(locale_code): - return normalize_locale(locale_code) + return require_locale_code(locale_code, "system locale") except (ValueError, AttributeError): pass @@ -272,8 +272,13 @@ def get_system_locale(*, raise_on_failure: bool = False) -> str: locale_code = value.split(".", 1)[0] if _is_pseudo_locale(locale_code): continue - # Normalize to ensure consistent format - return normalize_locale(locale_code) + try: + return require_locale_code(locale_code, f"{var} locale") + except ValueError: + # Invalid environment values are not trustworthy inputs. + # Skip them and keep walking the fallback chain instead of + # returning text that downstream APIs would reject later. + continue # No locale detected if raise_on_failure: diff --git a/src/ftllexengine/core/validators.py b/src/ftllexengine/core/validators.py index 28c5882a..be183bab 100644 --- a/src/ftllexengine/core/validators.py +++ b/src/ftllexengine/core/validators.py @@ -27,6 +27,7 @@ from .value_types import FluentNumber __all__ = [ + "require_bool", "require_date", "require_datetime", "require_fluent_number", @@ -88,6 +89,34 @@ def require_positive_int(value: object, field_name: str) -> int: return value +def require_bool(value: object, field_name: str) -> bool: + """Validate that a boundary value is a real boolean. + + Premise: + Security and integrity toggles own operational posture, not just UI + preferences. + + Reason: + Accepting truthy strings, integers, or custom objects at configuration + boundaries silently changes fail-closed behavior. Booleans therefore + need the same strict gate as numeric limits. + + Args: + value: Raw boundary value to validate. + field_name: Human-readable field label used in error messages. + + Returns: + The validated boolean. + + Raises: + TypeError: If value is not exactly ``bool``. + """ + if not isinstance(value, bool): + msg = f"{field_name} must be bool, got {type(value).__name__}" + raise TypeError(msg) + return value + + def require_date(value: object, field_name: str) -> _date: """Validate that a boundary value is a stdlib date (not datetime). @@ -215,4 +244,3 @@ def require_fluent_number(value: object, field_name: str) -> FluentNumber: msg = f"{field_name} must be FluentNumber, got {type(value).__name__}" raise TypeError(msg) return value - diff --git a/src/ftllexengine/core/value_types.py b/src/ftllexengine/core/value_types.py index 39c56752..2c58fc49 100644 --- a/src/ftllexengine/core/value_types.py +++ b/src/ftllexengine/core/value_types.py @@ -145,10 +145,11 @@ def __post_init__(self) -> None: "Use int(your_bool) explicitly if you need 0 or 1." ) raise TypeError(msg) - if not isinstance(self.value, (int, Decimal)): - msg = ( # type: ignore[unreachable] + value_obj: object = self.value + if not isinstance(value_obj, (int, Decimal)): + msg = ( f"FluentNumber.value must be int or Decimal, " - f"got {type(self.value).__name__}" + f"got {type(value_obj).__name__}" ) raise TypeError(msg) if self.precision is not None and self.precision < 0: diff --git a/src/ftllexengine/diagnostics/_redaction.py b/src/ftllexengine/diagnostics/_redaction.py new file mode 100644 index 00000000..3d0afeb3 --- /dev/null +++ b/src/ftllexengine/diagnostics/_redaction.py @@ -0,0 +1,59 @@ +"""Internal helpers for redacting sensitive diagnostic payloads by default. + +The library handles untrusted localization content and user input. The owning +rule here is that diagnostics should preserve enough evidence to debug safely +without copying raw payloads into logs, exceptions, or cached error objects. +""" + +from __future__ import annotations + +import hashlib + +__all__ = [ + "fingerprint_text", + "redacted_custom_function_failure", + "redacted_loader_snippet", + "redacted_parse_failure", +] + +_FINGERPRINT_HEX_LEN = 16 + + +def _fingerprint_bytes(value: str) -> tuple[int, str]: + """Return UTF-8 byte length plus a short stable fingerprint.""" + encoded = value.encode("utf-8", errors="surrogatepass") + digest = hashlib.blake2b(encoded, digest_size=12).hexdigest() + return (len(encoded), digest[:_FINGERPRINT_HEX_LEN]) + + +def fingerprint_text(value: object, *, label: str) -> str: + """Summarize arbitrary text-like data without exposing the raw payload.""" + rendered = str(value) + byte_length, digest = _fingerprint_bytes(rendered) + return f"{label}[bytes={byte_length}, blake2b={digest}]" + + +def redacted_parse_failure(value: object, *, parse_type: str) -> str: + """Produce a stable redacted identifier for parse failure context.""" + return fingerprint_text(value, label=f"{parse_type}_input") + + +def redacted_loader_snippet(value: str) -> str: + """Summarize a malformed resource chunk without logging its content. + + Premise: + Resource text often contains customer-visible or regulated content. + + Reason: + Structured fingerprints let operators correlate repeated failures + without the library becoming the leak point for the original text. + """ + return fingerprint_text(value, label="resource_snippet") + + +def redacted_custom_function_failure(error: BaseException) -> str: + """Describe a custom-function crash without disclosing exception text.""" + if error.args: + detail = fingerprint_text(" ".join(str(arg) for arg in error.args), label="detail") + return f"uncaught {type(error).__name__} ({detail})" + return f"uncaught {type(error).__name__}" diff --git a/src/ftllexengine/diagnostics/codes.py b/src/ftllexengine/diagnostics/codes.py index f06fb932..0d532f47 100644 --- a/src/ftllexengine/diagnostics/codes.py +++ b/src/ftllexengine/diagnostics/codes.py @@ -56,11 +56,17 @@ class FrozenErrorContext: making the error object mutable. Attributes: - input_value: String that failed to parse (empty if not applicable) + input_value: Redacted fingerprint summary of the input that failed to + parse (empty if not applicable). The library stores a stable + summary rather than raw user or resource text so diagnostics remain + safe to surface by default. locale_code: Locale used for parsing/formatting (empty if not applicable) parse_type: Type of parsing attempted; one of the known parse domains, or ``""`` (empty string sentinel) when not applicable. - fallback_value: Value to use in output when formatting fails + fallback_value: Fallback string selected for output when formatting + fails. Cache-retained snapshots may redact this value before + storage, but the live runtime context keeps the real fallback so + formatting behavior remains correct. """ input_value: str = "" @@ -105,6 +111,7 @@ class DiagnosticCode(Enum): PLURAL_SUPPORT_UNAVAILABLE = 2013 FORMATTING_FAILED = 2014 EXPANSION_BUDGET_EXCEEDED = 2015 + REENTRANT_FORMATTING_BLOCKED = 2016 # Syntax errors (3000-3999) # 3001: UNEXPECTED_EOF - parser cursor signals early end of input diff --git a/src/ftllexengine/diagnostics/errors.py b/src/ftllexengine/diagnostics/errors.py index ad9a50e7..60739d2c 100644 --- a/src/ftllexengine/diagnostics/errors.py +++ b/src/ftllexengine/diagnostics/errors.py @@ -16,10 +16,12 @@ import hashlib import hmac +from dataclasses import replace from typing import final from ftllexengine.integrity import PYTHON_EXCEPTION_ATTRS, ImmutabilityViolationError +from ._redaction import fingerprint_text from .codes import Diagnostic, ErrorCategory, FrozenErrorContext # ruff: noqa: RUF022 - __all__ organized by category for readability, not alphabetically @@ -379,6 +381,36 @@ def verify_integrity(self) -> bool: ) return hmac.compare_digest(self._content_hash, expected) + def sanitized_for_cache(self) -> FrozenFluentError: + """Return the retained cache snapshot for this error. + + Premise: + Runtime formatting errors may carry a user-visible fallback string + so the resolver can keep output behavior correct for the current + call. + + Reason: + Cache retention has a different contract from live resolution. The + cache keeps a sanitized evidence copy so cached error inspection, + strict-mode cache hits, and crash artifacts do not retain the raw + fallback payload longer than necessary. + """ + if self._context is None or not self._context.fallback_value: + return self + if self._context.fallback_value.startswith("fallback[bytes="): + return self + + sanitized_context = replace( + self._context, + fallback_value=fingerprint_text(self._context.fallback_value, label="fallback"), + ) + return FrozenFluentError( + self._message, + self._category, + diagnostic=self._diagnostic, + context=sanitized_context, + ) + @property def message(self) -> str: """Human-readable error description.""" diff --git a/src/ftllexengine/diagnostics/template_parsing.py b/src/ftllexengine/diagnostics/template_parsing.py index 00686e09..479e3b30 100644 --- a/src/ftllexengine/diagnostics/template_parsing.py +++ b/src/ftllexengine/diagnostics/template_parsing.py @@ -1,140 +1,13 @@ -"""Parsing and locale-input error template mixins.""" +"""Parsing error template composition.""" from __future__ import annotations -from .codes import Diagnostic, DiagnosticCode +from .template_parsing_currency import _ParsingCurrencyErrorTemplateMixin +from .template_parsing_input import _ParsingInputErrorTemplateMixin -class _ParsingErrorTemplateMixin: - """ErrorTemplate methods for user-input parsing failures.""" - - @staticmethod - def parse_decimal_failed( - value: str, - locale_code: str, - reason: str, - ) -> Diagnostic: - """Decimal parsing failed.""" - msg = f"Failed to parse decimal '{value}' for locale '{locale_code}': {reason}" - return Diagnostic( - code=DiagnosticCode.PARSE_DECIMAL_FAILED, - message=msg, - span=None, - hint="Check that the decimal format matches the locale's conventions", - ) - - @staticmethod - def parse_date_failed( - value: str, - locale_code: str, - reason: str, - ) -> Diagnostic: - """Date parsing failed.""" - msg = f"Failed to parse date '{value}' for locale '{locale_code}': {reason}" - return Diagnostic( - code=DiagnosticCode.PARSE_DATE_FAILED, - message=msg, - span=None, - hint="Use ISO 8601 (YYYY-MM-DD) for unambiguous, locale-independent dates", - ) - - @staticmethod - def parse_datetime_failed( - value: str, - locale_code: str, - reason: str, - ) -> Diagnostic: - """Datetime parsing failed.""" - msg = f"Failed to parse datetime '{value}' for locale '{locale_code}': {reason}" - return Diagnostic( - code=DiagnosticCode.PARSE_DATETIME_FAILED, - message=msg, - span=None, - hint="Use ISO 8601 (YYYY-MM-DD HH:MM:SS) for unambiguous, locale-independent datetimes", - ) - - @staticmethod - def parse_currency_failed( - value: str, - locale_code: str, - reason: str, - ) -> Diagnostic: - """Currency parsing failed.""" - msg = f"Failed to parse currency '{value}' for locale '{locale_code}': {reason}" - return Diagnostic( - code=DiagnosticCode.PARSE_CURRENCY_FAILED, - message=msg, - span=None, - hint="Use ISO currency codes (USD, EUR, GBP) for unambiguous parsing", - ) - - @staticmethod - def parse_locale_unknown(locale_code: str) -> Diagnostic: - """Unknown locale for parsing.""" - msg = f"Unknown locale '{locale_code}'" - return Diagnostic( - code=DiagnosticCode.PARSE_LOCALE_UNKNOWN, - message=msg, - span=None, - hint="Use BCP 47 locale codes (e.g., 'en_US', 'de_DE', 'lv_LV')", - ) - - @staticmethod - def parse_currency_ambiguous( - symbol: str, - value: str, - ) -> Diagnostic: - """Ambiguous currency symbol.""" - msg = ( - f"Ambiguous currency symbol '{symbol}' in '{value}'. " - f"Symbol '{symbol}' is used by multiple currencies." - ) - return Diagnostic( - code=DiagnosticCode.PARSE_CURRENCY_AMBIGUOUS, - message=msg, - span=None, - hint="Use default_currency parameter, infer_from_locale=True, or ISO code (USD, EUR)", - ) - - @staticmethod - def parse_currency_symbol_unknown( - symbol: str, - value: str, - ) -> Diagnostic: - """Unknown currency symbol.""" - msg = f"Unknown currency symbol '{symbol}' in '{value}'" - return Diagnostic( - code=DiagnosticCode.PARSE_CURRENCY_SYMBOL_UNKNOWN, - message=msg, - span=None, - hint="Use ISO currency codes (USD, EUR, GBP) or supported symbols", - ) - - @staticmethod - def parse_currency_code_invalid( - code: str, - value: str, - ) -> Diagnostic: - """Invalid ISO 4217 currency code.""" - msg = f"Invalid ISO 4217 currency code '{code}' in '{value}'" - return Diagnostic( - code=DiagnosticCode.PARSE_CURRENCY_CODE_INVALID, - message=msg, - span=None, - hint="Use valid ISO 4217 codes (USD, EUR, GBP, JPY, etc.)", - ) - - @staticmethod - def parse_amount_invalid( - amount_str: str, - value: str, - reason: str, - ) -> Diagnostic: - """Invalid amount in currency string.""" - msg = f"Failed to parse amount '{amount_str}' from '{value}': {reason}" - return Diagnostic( - code=DiagnosticCode.PARSE_AMOUNT_INVALID, - message=msg, - span=None, - hint="Check that the amount format matches the locale's conventions", - ) +class _ParsingErrorTemplateMixin( + _ParsingCurrencyErrorTemplateMixin, + _ParsingInputErrorTemplateMixin, +): + """Compose parsing diagnostic template families into one mixin.""" diff --git a/src/ftllexengine/diagnostics/template_parsing_currency.py b/src/ftllexengine/diagnostics/template_parsing_currency.py new file mode 100644 index 00000000..3cbfbf3a --- /dev/null +++ b/src/ftllexengine/diagnostics/template_parsing_currency.py @@ -0,0 +1,95 @@ +"""Parsing diagnostics for currency-specific failures.""" + +from __future__ import annotations + +from ftllexengine.diagnostics._redaction import fingerprint_text, redacted_parse_failure + +from .codes import Diagnostic, DiagnosticCode + + +class _ParsingCurrencyErrorTemplateMixin: + """ErrorTemplate methods for currency parsing failures.""" + + @staticmethod + def parse_currency_failed( + value: object, + locale_code: str, + reason: str, + ) -> Diagnostic: + """Currency parsing failed.""" + value_summary = redacted_parse_failure(value, parse_type="currency") + msg = f"Failed to parse currency for locale '{locale_code}': {reason} ({value_summary})" + return Diagnostic( + code=DiagnosticCode.PARSE_CURRENCY_FAILED, + message=msg, + span=None, + hint="Use ISO currency codes (USD, EUR, GBP) for unambiguous parsing", + ) + + @staticmethod + def parse_currency_ambiguous( + symbol: object, + value: object, + ) -> Diagnostic: + """Ambiguous currency symbol.""" + value_summary = redacted_parse_failure(value, parse_type="currency") + symbol_summary = fingerprint_text(symbol, label="currency_symbol") + msg = ( + f"Ambiguous currency symbol in {value_summary}. " + f"Symbol {symbol_summary} is used by multiple currencies." + ) + return Diagnostic( + code=DiagnosticCode.PARSE_CURRENCY_AMBIGUOUS, + message=msg, + span=None, + hint="Use default_currency parameter, infer_from_locale=True, or ISO code (USD, EUR)", + ) + + @staticmethod + def parse_currency_symbol_unknown( + symbol: object, + value: object, + ) -> Diagnostic: + """Unknown currency symbol.""" + value_summary = redacted_parse_failure(value, parse_type="currency") + symbol_summary = fingerprint_text(symbol, label="currency_symbol") + msg = f"Unknown currency symbol {symbol_summary} in {value_summary}" + return Diagnostic( + code=DiagnosticCode.PARSE_CURRENCY_SYMBOL_UNKNOWN, + message=msg, + span=None, + hint="Use ISO currency codes (USD, EUR, GBP) or supported symbols", + ) + + @staticmethod + def parse_currency_code_invalid( + code: object, + value: object, + ) -> Diagnostic: + """Invalid ISO 4217 currency code.""" + value_summary = redacted_parse_failure(value, parse_type="currency") + code_summary = fingerprint_text(code, label="currency_code") + msg = f"Invalid ISO 4217 currency code {code_summary} in {value_summary}" + return Diagnostic( + code=DiagnosticCode.PARSE_CURRENCY_CODE_INVALID, + message=msg, + span=None, + hint="Use valid ISO 4217 codes (USD, EUR, GBP, JPY, etc.)", + ) + + @staticmethod + def parse_amount_invalid( + amount_str: object, + value: object, + reason: str, + ) -> Diagnostic: + """Invalid amount in currency string.""" + value_summary = redacted_parse_failure(value, parse_type="currency") + amount_summary = fingerprint_text(amount_str, label="amount_fragment") + msg = f"Failed to parse amount {amount_summary} from {value_summary}: {reason}" + return Diagnostic( + code=DiagnosticCode.PARSE_AMOUNT_INVALID, + message=msg, + span=None, + hint="Check that the amount format matches the locale's conventions", + ) diff --git a/src/ftllexengine/diagnostics/template_parsing_input.py b/src/ftllexengine/diagnostics/template_parsing_input.py new file mode 100644 index 00000000..2de67c0c --- /dev/null +++ b/src/ftllexengine/diagnostics/template_parsing_input.py @@ -0,0 +1,70 @@ +"""Parsing diagnostics for locale and non-currency value input.""" + +from __future__ import annotations + +from ftllexengine.diagnostics._redaction import redacted_parse_failure + +from .codes import Diagnostic, DiagnosticCode + + +class _ParsingInputErrorTemplateMixin: + """ErrorTemplate methods for generic parsing failures.""" + + @staticmethod + def parse_decimal_failed( + value: object, + locale_code: str, + reason: str, + ) -> Diagnostic: + """Decimal parsing failed.""" + value_summary = redacted_parse_failure(value, parse_type="decimal") + msg = f"Failed to parse decimal for locale '{locale_code}': {reason} ({value_summary})" + return Diagnostic( + code=DiagnosticCode.PARSE_DECIMAL_FAILED, + message=msg, + span=None, + hint="Check that the decimal format matches the locale's conventions", + ) + + @staticmethod + def parse_date_failed( + value: object, + locale_code: str, + reason: str, + ) -> Diagnostic: + """Date parsing failed.""" + value_summary = redacted_parse_failure(value, parse_type="date") + msg = f"Failed to parse date for locale '{locale_code}': {reason} ({value_summary})" + return Diagnostic( + code=DiagnosticCode.PARSE_DATE_FAILED, + message=msg, + span=None, + hint="Use ISO 8601 (YYYY-MM-DD) for unambiguous, locale-independent dates", + ) + + @staticmethod + def parse_datetime_failed( + value: object, + locale_code: str, + reason: str, + ) -> Diagnostic: + """Datetime parsing failed.""" + value_summary = redacted_parse_failure(value, parse_type="datetime") + msg = f"Failed to parse datetime for locale '{locale_code}': {reason} ({value_summary})" + return Diagnostic( + code=DiagnosticCode.PARSE_DATETIME_FAILED, + message=msg, + span=None, + hint="Use ISO 8601 (YYYY-MM-DD HH:MM:SS) for unambiguous, locale-independent datetimes", + ) + + @staticmethod + def parse_locale_unknown(locale_code: str) -> Diagnostic: + """Unknown locale for parsing.""" + msg = f"Unknown locale '{locale_code}'" + return Diagnostic( + code=DiagnosticCode.PARSE_LOCALE_UNKNOWN, + message=msg, + span=None, + hint="Use BCP 47 locale codes (e.g., 'en_US', 'de_DE', 'lv_LV')", + ) diff --git a/src/ftllexengine/diagnostics/template_runtime.py b/src/ftllexengine/diagnostics/template_runtime.py index 9189a60d..2991c4b8 100644 --- a/src/ftllexengine/diagnostics/template_runtime.py +++ b/src/ftllexengine/diagnostics/template_runtime.py @@ -1,182 +1,23 @@ -"""Runtime and function error template mixins.""" +"""Runtime error template composition. -from __future__ import annotations - -from .codes import Diagnostic, DiagnosticCode -from .template_shared import docs_url - - -class _RuntimeErrorTemplateMixin: - """ErrorTemplate methods for runtime evaluation and function failures.""" - - @staticmethod - def function_not_found(function_name: str) -> Diagnostic: - """Function not found in registry.""" - msg = f"Function '{function_name}' not found" - return Diagnostic( - code=DiagnosticCode.FUNCTION_NOT_FOUND, - message=msg, - span=None, - hint="Built-in functions: NUMBER, DATETIME, CURRENCY. Check spelling.", - help_url=docs_url("functions.html"), - ) +Premise: + Runtime diagnostics cover two distinct concerns: function-boundary + failures and resolver/runtime-state failures. - @staticmethod - def function_failed(function_name: str, error_msg: str) -> Diagnostic: - """Function execution failed.""" - msg = f"Function '{function_name}' failed: {error_msg}" - return Diagnostic( - code=DiagnosticCode.FUNCTION_FAILED, - message=msg, - span=None, - hint="Check the function arguments and their types", - help_url=docs_url("functions.html"), - function_name=function_name, - ) +Reason: + Keeping the small composition owner here lets the focused mixin modules + stay below the architecture line budget while the public import surface + remains unchanged. +""" - @staticmethod - def formatting_failed( - function_name: str, - value: str, - error_reason: str, - ) -> Diagnostic: - """Locale-aware formatting failed.""" - msg = f"{function_name}() formatting failed for value '{value}': {error_reason}" - return Diagnostic( - code=DiagnosticCode.FORMATTING_FAILED, - message=msg, - span=None, - hint="Check that the value is valid for the specified format options", - help_url=docs_url("functions.html"), - function_name=function_name, - ) - - @staticmethod - def function_arity_mismatch( - function_name: str, - expected: int, - received: int, - ) -> Diagnostic: - """Function called with wrong number of positional arguments.""" - msg = ( - f"Function '{function_name}' expects {expected} argument(s), " - f"got {received}" - ) - return Diagnostic( - code=DiagnosticCode.FUNCTION_ARITY_MISMATCH, - message=msg, - span=None, - hint=f"Pass exactly {expected} value(s) to {function_name}()", - help_url=docs_url("functions.html"), - function_name=function_name, - ) - - @staticmethod - def type_mismatch( - function_name: str, - argument_name: str, - expected_type: str, - received_type: str, - *, - ftl_location: str | None = None, - ) -> Diagnostic: - """Type mismatch in function argument.""" - msg = f"Type mismatch in {function_name}(): expected {expected_type}, got {received_type}" - hint = f"Convert '{argument_name}' to {expected_type} before passing to {function_name}()" - return Diagnostic( - code=DiagnosticCode.TYPE_MISMATCH, - message=msg, - span=None, - hint=hint, - help_url=docs_url("functions.html"), - function_name=function_name, - argument_name=argument_name, - expected_type=expected_type, - received_type=received_type, - ftl_location=ftl_location, - ) - - @staticmethod - def invalid_argument( - function_name: str, - argument_name: str, - reason: str, - *, - ftl_location: str | None = None, - ) -> Diagnostic: - """Invalid argument value.""" - msg = f"Invalid argument '{argument_name}' in {function_name}(): {reason}" - return Diagnostic( - code=DiagnosticCode.INVALID_ARGUMENT, - message=msg, - span=None, - hint=f"Check the value of '{argument_name}' argument", - help_url=docs_url("functions.html"), - function_name=function_name, - argument_name=argument_name, - ftl_location=ftl_location, - ) - - @staticmethod - def argument_required( - function_name: str, - argument_name: str, - *, - ftl_location: str | None = None, - ) -> Diagnostic: - """Required argument not provided.""" - msg = f"Required argument '{argument_name}' not provided for {function_name}()" - return Diagnostic( - code=DiagnosticCode.ARGUMENT_REQUIRED, - message=msg, - span=None, - hint=f"Add '{argument_name}' argument to {function_name}() call", - help_url=docs_url("functions.html"), - function_name=function_name, - argument_name=argument_name, - ftl_location=ftl_location, - ) +from __future__ import annotations - @staticmethod - def pattern_invalid( - function_name: str, - pattern: str, - reason: str, - *, - ftl_location: str | None = None, - ) -> Diagnostic: - """Invalid format pattern.""" - msg = f"Invalid pattern in {function_name}(): {reason}" - return Diagnostic( - code=DiagnosticCode.PATTERN_INVALID, - message=msg, - span=None, - hint=f"Check pattern syntax: '{pattern}'", - help_url=docs_url("functions.html"), - function_name=function_name, - argument_name="pattern", - ftl_location=ftl_location, - severity="error", - ) +from .template_runtime_functions import _RuntimeFunctionErrorTemplateMixin +from .template_runtime_state import _RuntimeStateErrorTemplateMixin - @staticmethod - def unknown_expression(expr_type: str) -> Diagnostic: - """Unknown expression type encountered.""" - msg = f"Unknown expression type: {expr_type}" - return Diagnostic( - code=DiagnosticCode.UNKNOWN_EXPRESSION, - message=msg, - span=None, - hint="This is likely a bug in the parser or resolver", - ) - @staticmethod - def unexpected_eof(position: int) -> Diagnostic: - """Unexpected end of file.""" - msg = f"Unexpected EOF at position {position}" - return Diagnostic( - code=DiagnosticCode.UNEXPECTED_EOF, - message=msg, - span=None, - hint="Check for unclosed braces or incomplete syntax", - ) +class _RuntimeErrorTemplateMixin( + _RuntimeFunctionErrorTemplateMixin, + _RuntimeStateErrorTemplateMixin, +): + """Compose the runtime diagnostic template families into one mixin.""" diff --git a/src/ftllexengine/diagnostics/template_runtime_functions.py b/src/ftllexengine/diagnostics/template_runtime_functions.py new file mode 100644 index 00000000..4d545600 --- /dev/null +++ b/src/ftllexengine/diagnostics/template_runtime_functions.py @@ -0,0 +1,188 @@ +"""Runtime diagnostics for function and formatting boundaries.""" + +from __future__ import annotations + +from ._redaction import fingerprint_text +from .codes import Diagnostic, DiagnosticCode +from .template_shared import docs_url + + +class _RuntimeFunctionErrorTemplateMixin: + """ErrorTemplate methods for function calls and format helpers.""" + + @staticmethod + def function_not_found(function_name: str) -> Diagnostic: + """Function not found in registry.""" + msg = f"Function '{function_name}' not found" + return Diagnostic( + code=DiagnosticCode.FUNCTION_NOT_FOUND, + message=msg, + span=None, + hint="Built-in functions: NUMBER, DATETIME, CURRENCY. Check spelling.", + help_url=docs_url("functions.html"), + ) + + @staticmethod + def function_failed(function_name: str, error_detail: str | None = None) -> Diagnostic: + """Function execution failed. + + Premise: + Custom functions may wrap downstream systems and secrets. + + Reason: + The public diagnostic names the failing function and failure class + without echoing exception payloads into logs or API responses. + """ + detail = error_detail if error_detail is not None else "custom function execution failed" + msg = f"Function '{function_name}' failed: {detail}" + return Diagnostic( + code=DiagnosticCode.FUNCTION_FAILED, + message=msg, + span=None, + hint="Check the function arguments and their types", + help_url=docs_url("functions.html"), + function_name=function_name, + ) + + @staticmethod + def formatting_failed( + function_name: str, + value: object, + error_reason: object, + *, + safe_reason: str | None = None, + ) -> Diagnostic: + """Locale-aware formatting failed. + + Premise: + Formatting helpers sit on the same trust boundary as parsing and + custom-function execution, so raw values and downstream exception + messages may contain user data or operational secrets. + + Reason: + The diagnostic surfaces stable fingerprints rather than the raw + payloads, preserving correlation value without turning error + reporting into an exfiltration path. A caller may optionally attach + one vetted high-level reason string when that improves usability + without disclosing user input. + """ + value_summary = fingerprint_text(value, label="format_value") + reason_summary = fingerprint_text(error_reason, label="detail") + reason_prefix = f"{safe_reason} " if safe_reason is not None else "" + msg = ( + f"{function_name}() formatting failed for {value_summary}: " + f"{reason_prefix}{reason_summary}" + ) + return Diagnostic( + code=DiagnosticCode.FORMATTING_FAILED, + message=msg, + span=None, + hint="Check that the value is valid for the specified format options", + help_url=docs_url("functions.html"), + function_name=function_name, + ) + + @staticmethod + def function_arity_mismatch( + function_name: str, + expected: int, + received: int, + ) -> Diagnostic: + """Function called with wrong number of positional arguments.""" + msg = f"Function '{function_name}' expects {expected} argument(s), got {received}" + return Diagnostic( + code=DiagnosticCode.FUNCTION_ARITY_MISMATCH, + message=msg, + span=None, + hint=f"Pass exactly {expected} value(s) to {function_name}()", + help_url=docs_url("functions.html"), + function_name=function_name, + ) + + @staticmethod + def type_mismatch( + function_name: str, + argument_name: str, + expected_type: str, + received_type: str, + *, + ftl_location: str | None = None, + ) -> Diagnostic: + """Type mismatch in function argument.""" + msg = f"Type mismatch in {function_name}(): expected {expected_type}, got {received_type}" + hint = f"Convert '{argument_name}' to {expected_type} before passing to {function_name}()" + return Diagnostic( + code=DiagnosticCode.TYPE_MISMATCH, + message=msg, + span=None, + hint=hint, + help_url=docs_url("functions.html"), + function_name=function_name, + argument_name=argument_name, + expected_type=expected_type, + received_type=received_type, + ftl_location=ftl_location, + ) + + @staticmethod + def invalid_argument( + function_name: str, + argument_name: str, + reason: str, + *, + ftl_location: str | None = None, + ) -> Diagnostic: + """Invalid argument value.""" + msg = f"Invalid argument '{argument_name}' in {function_name}(): {reason}" + return Diagnostic( + code=DiagnosticCode.INVALID_ARGUMENT, + message=msg, + span=None, + hint=f"Check the value of '{argument_name}' argument", + help_url=docs_url("functions.html"), + function_name=function_name, + argument_name=argument_name, + ftl_location=ftl_location, + ) + + @staticmethod + def argument_required( + function_name: str, + argument_name: str, + *, + ftl_location: str | None = None, + ) -> Diagnostic: + """Required argument not provided.""" + msg = f"Required argument '{argument_name}' not provided for {function_name}()" + return Diagnostic( + code=DiagnosticCode.ARGUMENT_REQUIRED, + message=msg, + span=None, + hint=f"Add '{argument_name}' argument to {function_name}() call", + help_url=docs_url("functions.html"), + function_name=function_name, + argument_name=argument_name, + ftl_location=ftl_location, + ) + + @staticmethod + def pattern_invalid( + function_name: str, + pattern: str, + reason: str, + *, + ftl_location: str | None = None, + ) -> Diagnostic: + """Invalid format pattern.""" + msg = f"Invalid pattern in {function_name}(): {reason}" + return Diagnostic( + code=DiagnosticCode.PATTERN_INVALID, + message=msg, + span=None, + hint=f"Check pattern syntax: '{pattern}'", + help_url=docs_url("functions.html"), + function_name=function_name, + argument_name="pattern", + ftl_location=ftl_location, + severity="error", + ) diff --git a/src/ftllexengine/diagnostics/template_runtime_state.py b/src/ftllexengine/diagnostics/template_runtime_state.py new file mode 100644 index 00000000..d7b1fc95 --- /dev/null +++ b/src/ftllexengine/diagnostics/template_runtime_state.py @@ -0,0 +1,47 @@ +"""Runtime diagnostics for resolver and runtime-state failures.""" + +from __future__ import annotations + +from .codes import Diagnostic, DiagnosticCode +from .template_shared import docs_url + + +class _RuntimeStateErrorTemplateMixin: + """ErrorTemplate methods for runtime-state and resolver failures.""" + + @staticmethod + def unknown_expression(expr_type: str) -> Diagnostic: + """Unknown expression type encountered.""" + msg = f"Unknown expression type: {expr_type}" + return Diagnostic( + code=DiagnosticCode.UNKNOWN_EXPRESSION, + message=msg, + span=None, + hint="This is likely a bug in the parser or resolver", + ) + + @staticmethod + def unexpected_eof(position: int) -> Diagnostic: + """Unexpected end of file.""" + msg = f"Unexpected EOF at position {position}" + return Diagnostic( + code=DiagnosticCode.UNEXPECTED_EOF, + message=msg, + span=None, + hint="Check for unclosed braces or incomplete syntax", + ) + + @staticmethod + def reentrant_formatting_blocked() -> Diagnostic: + """Cross-thread bundle re-entry from a custom function was rejected.""" + msg = "Cross-thread format_pattern() re-entry from a custom function is blocked" + return Diagnostic( + code=DiagnosticCode.REENTRANT_FORMATTING_BLOCKED, + message=msg, + span=None, + hint=( + "Resolve nested formatting in the current call stack or return data " + "to the caller instead of invoking the bundle from a new thread" + ), + help_url=docs_url("functions.html"), + ) diff --git a/src/ftllexengine/diagnostics/validation.py b/src/ftllexengine/diagnostics/validation.py index 62cedf3b..31c1906a 100644 --- a/src/ftllexengine/diagnostics/validation.py +++ b/src/ftllexengine/diagnostics/validation.py @@ -37,7 +37,7 @@ class WarningSeverity(StrEnum): """Severity levels for validation warnings. Provides semantic differentiation between warning types: - - CRITICAL: Will cause runtime failure (e.g., undefined reference) + - CRITICAL: Validation must fail closed (e.g., undefined reference) - WARNING: May cause issues (e.g., duplicate ID, missing value) - INFO: Informational only (e.g., style suggestions) @@ -45,7 +45,7 @@ class WarningSeverity(StrEnum): critical_warnings = [w for w in warnings if w.severity == WarningSeverity.CRITICAL] """ - CRITICAL = "critical" # Will cause runtime failure + CRITICAL = "critical" # Validation must fail closed WARNING = "warning" # May cause issues INFO = "info" # Informational only @@ -166,8 +166,8 @@ class ValidationWarning: to display warning squiggles at the correct source location. Severity levels: - CRITICAL: Will cause runtime failure (e.g., undefined reference to message) - WARNING: May cause issues (e.g., duplicate ID overwrites previous) + CRITICAL: Validation must fail closed (e.g., undefined reference to message) + WARNING: May cause issues but does not invalidate the resource INFO: Informational only (e.g., unused term) """ @@ -252,14 +252,20 @@ class ValidationResult: @property def is_valid(self) -> bool: - """Check if validation passed (no errors or annotations). + """Check if validation passed. - Warnings do not affect validity - they're informational. + Critical warnings are validity failures because they represent + structural contradictions the runtime or registration layer will reject + or degrade on purpose. Returns: - True if no errors or annotations found + True if no errors, parser annotations, or critical warnings found """ - return len(self.errors) == 0 and len(self.annotations) == 0 + return ( + len(self.errors) == 0 + and len(self.annotations) == 0 + and self.critical_warning_count == 0 + ) @property def error_count(self) -> int: @@ -288,6 +294,13 @@ def warning_count(self) -> int: """ return len(self.warnings) + @property + def critical_warning_count(self) -> int: + """Get number of critical warnings that invalidate the resource.""" + return sum( + 1 for warning in self.warnings if warning.severity == WarningSeverity.CRITICAL + ) + @staticmethod def valid() -> ValidationResult: """Create a valid result with no errors, warnings, or annotations. @@ -303,7 +316,7 @@ def invalid( warnings: tuple[ValidationWarning, ...] = (), annotations: tuple[ParserAnnotation, ...] = (), ) -> ValidationResult: - """Create an invalid result with errors and/or annotations. + """Create an invalid result with errors, critical warnings, and/or annotations. Args: errors: Tuple of validation errors (default: empty) diff --git a/src/ftllexengine/integrity.py b/src/ftllexengine/integrity.py index 8f4d9e29..df1348ec 100644 --- a/src/ftllexengine/integrity.py +++ b/src/ftllexengine/integrity.py @@ -12,9 +12,11 @@ Hierarchy: DataIntegrityError (base - system failures) ├─ CacheCorruptionError (checksum mismatch) + ├─ CacheKeySerializationError (unsupported cache-key contract) ├─ FormattingIntegrityError (strict mode formatting failure) ├─ ImmutabilityViolationError (mutation attempt on frozen object) ├─ IntegrityCheckFailedError (generic verification failure) + ├─ ResourceConflictIntegrityError (duplicate or shadowed resource IDs) ├─ SyntaxIntegrityError (strict mode syntax error during resource loading) └─ WriteConflictError (write-once violation) @@ -32,11 +34,14 @@ __all__ = [ "CacheCorruptionError", + "CacheKeySerializationError", "DataIntegrityError", "FormattingIntegrityError", "ImmutabilityViolationError", "IntegrityCheckFailedError", "IntegrityContext", + "IntegrityEvidence", + "ResourceConflictIntegrityError", "SyntaxIntegrityError", "WriteConflictError", ] @@ -79,6 +84,24 @@ class IntegrityContext: wall_time_unix: float | None = None +@dataclass(frozen=True, slots=True) +class IntegrityEvidence: + """Immutable integrity payload detached from Python exception transport. + + Premise: + Python exception propagation mutates traceback and cause fields after + construction. + + Reason: + Incident evidence therefore needs its own immutable record so callers + can distinguish the integrity payload from mutable transport metadata. + """ + + error_type: str + message: str + context: IntegrityContext | None = None + + class DataIntegrityError(Exception): """Base exception for all data integrity failures. @@ -86,8 +109,10 @@ class DataIntegrityError(Exception): user-facing Fluent errors. They indicate corruption, bugs, or security incidents that should propagate to the top level. - This exception is immutable after construction to prevent - tampering with error evidence. + The integrity payload is immutable after construction to preserve the + evidence carried by ``.evidence``. Python's own exception transport fields + (traceback, cause, notes) remain mutable because the runtime sets them + during propagation. Subclasses are @final to prevent further inheritance. Both static analysis (mypy) and runtime (__init_subclass__) enforce finality on @@ -97,7 +122,7 @@ class DataIntegrityError(Exception): context: Structured diagnostic context for post-mortem analysis """ - __slots__ = ("_context", "_frozen") + __slots__ = ("_context", "_evidence", "_frozen") def __init_subclass__(cls, **kwargs: object) -> None: """Enforce @final on DataIntegrityError subclasses at class-definition time. @@ -123,6 +148,7 @@ def __init_subclass__(cls, **kwargs: object) -> None: # Type annotations for __slots__ attributes (mypy requirement) _context: IntegrityContext | None + _evidence: IntegrityEvidence _frozen: bool def __init__( @@ -138,6 +164,11 @@ def __init__( """ super().__init__(message) object.__setattr__(self, "_context", context) + object.__setattr__( + self, + "_evidence", + IntegrityEvidence(type(self).__name__, message, context), + ) object.__setattr__(self, "_frozen", True) def __setattr__(self, name: str, value: object) -> None: @@ -173,6 +204,11 @@ def context(self) -> IntegrityContext | None: """Structured diagnostic context.""" return self._context + @property + def evidence(self) -> IntegrityEvidence: + """Immutable evidence payload for post-mortem analysis.""" + return self._evidence + def __repr__(self) -> str: """Return detailed representation for debugging.""" return f"{self.__class__.__name__}({self.args[0]!r}, context={self._context!r})" @@ -183,7 +219,8 @@ class CacheCorruptionError(DataIntegrityError): """Checksum mismatch detected in cache entry. Raised when a cached value's checksum doesn't match the stored checksum. - This indicates memory corruption, hardware fault, or tampering. + This indicates accidental mutation, hardware fault, or a broken internal + cache invariant. This is a CRITICAL error that should trigger immediate investigation. The cache entry should be evicted and the operation retried. @@ -192,6 +229,20 @@ class CacheCorruptionError(DataIntegrityError): __slots__ = () +@final +class CacheKeySerializationError(DataIntegrityError): + """Cache request could not be encoded into the canonical key contract. + + Raised when cache-enabled formatting receives argument values that cannot be + converted into the versioned cache-key encoding. Because the cache's + integrity features depend on that key contract, the failure is surfaced as a + typed integrity error instead of silently bypassing write-once or corruption + checks. + """ + + __slots__ = () + + @final class ImmutabilityViolationError(DataIntegrityError): """Attempt to mutate an immutable object. @@ -220,6 +271,57 @@ class IntegrityCheckFailedError(DataIntegrityError): __slots__ = () +@final +class ResourceConflictIntegrityError(DataIntegrityError): + """Resource registration conflict detected before bundle mutation. + + The owning premise is that message and term registries are canonical bundle + state. Replacing definitions must therefore be an explicit caller choice, not + an incidental side effect of resource load order. + + Attributes: + duplicate_ids: Conflicting IDs defined more than once in the incoming resource + shadowed_ids: Existing bundle IDs the incoming resource attempts to replace + source_path: Optional source path for error context + """ + + __slots__ = ("_duplicate_ids", "_shadowed_ids", "_source_path") + + _duplicate_ids: tuple[str, ...] + _shadowed_ids: tuple[str, ...] + _source_path: str | None + + def __init__( + self, + message: str, + context: IntegrityContext | None = None, + *, + duplicate_ids: tuple[str, ...] = (), + shadowed_ids: tuple[str, ...] = (), + source_path: str | None = None, + ) -> None: + """Initialize ResourceConflictIntegrityError.""" + object.__setattr__(self, "_duplicate_ids", tuple(duplicate_ids)) + object.__setattr__(self, "_shadowed_ids", tuple(shadowed_ids)) + object.__setattr__(self, "_source_path", source_path) + super().__init__(message, context) + + @property + def duplicate_ids(self) -> tuple[str, ...]: + """IDs defined multiple times within the incoming resource.""" + return self._duplicate_ids + + @property + def shadowed_ids(self) -> tuple[str, ...]: + """Existing bundle IDs the incoming resource attempted to replace.""" + return self._shadowed_ids + + @property + def source_path(self) -> str | None: + """Optional source path for the conflicting resource.""" + return self._source_path + + @final class WriteConflictError(DataIntegrityError): """Write-once violation in cache. diff --git a/src/ftllexengine/introspection/message.py b/src/ftllexengine/introspection/message.py index ad3304a3..df53343e 100644 --- a/src/ftllexengine/introspection/message.py +++ b/src/ftllexengine/introspection/message.py @@ -18,7 +18,7 @@ import threading import weakref from dataclasses import dataclass, field -from typing import TYPE_CHECKING, assert_never +from typing import TYPE_CHECKING, assert_never, overload from ftllexengine.constants import MAX_DEPTH from ftllexengine.enums import ReferenceKind, VariableContext @@ -466,10 +466,29 @@ def _visit_variant(self, variant: Variant) -> None: # ============================================================================== +# Premise: these overloads are static typing contracts, not executable runtime +# paths. Excluding them keeps coverage focused on the concrete implementation +# below, which is the code that actually runs and can regress. +@overload # pragma: no cover def introspect_message( message: Message | Term, *, use_cache: bool = True, +) -> MessageIntrospection: ... + + +@overload # pragma: no cover +def introspect_message( + message: object, + *, + use_cache: bool = True, +) -> MessageIntrospection: ... + + +def introspect_message( + message: object, + *, + use_cache: bool = True, ) -> MessageIntrospection: """Introspect a message or term and extract all metadata. @@ -498,7 +517,7 @@ def introspect_message( """ # Validate input type at API boundary (runtime check for callers ignoring type hints) if not isinstance(message, (Message, Term)): - msg = f"Expected Message or Term, got {type(message).__name__}" # type: ignore[unreachable] + msg = f"Expected Message or Term, got {type(message).__name__}" raise TypeError(msg) # Step 1: Check cache (lock briefly — O(1) dict lookup). diff --git a/src/ftllexengine/localization/__init__.py b/src/ftllexengine/localization/__init__.py index 372ad27d..9bc57ad6 100644 --- a/src/ftllexengine/localization/__init__.py +++ b/src/ftllexengine/localization/__init__.py @@ -2,7 +2,8 @@ Provides the full localization stack: type aliases, resource loading infrastructure, the multi-locale orchestrator, and the boot configuration -API for strict, audited localization initialization. +API for strict localization initialization with explicit cache-evidence +boundaries. Submodules: types - PEP 695 type aliases (MessageId, LocaleCode, ResourceId, FTLSource) @@ -12,7 +13,8 @@ boot - LocalizationBootConfig (one-call boot-validated assembly) Babel Optionality: - loading, types, CacheAuditLogEntry: Zero external dependencies; always importable. + loading, types, CacheDebugLogEntry, and CacheIntegrityEvent: + Zero external dependencies; always importable. orchestrator and boot require Babel (via FluentBundle). On parser-only installs the Babel-dependent names are absent from normal feature probing; direct access raises a missing-symbol error with runtime @@ -39,17 +41,17 @@ ResourceLoader, ResourceLoadResult, ) -from ftllexengine.runtime.cache import CacheAuditLogEntry +from ftllexengine.runtime.cache import CacheDebugLogEntry, CacheIntegrityEvent if TYPE_CHECKING: from ftllexengine.localization.boot import ( LocalizationBootConfig as LocalizationBootConfig, ) - from ftllexengine.localization.orchestrator import ( - FluentLocalization as FluentLocalization, + from ftllexengine.localization.cache_stats import ( + LocalizationCacheStats as LocalizationCacheStats, ) from ftllexengine.localization.orchestrator import ( - LocalizationCacheStats as LocalizationCacheStats, + FluentLocalization as FluentLocalization, ) _BABEL_AVAILABLE = is_babel_available() @@ -69,7 +71,8 @@ def __getattr__(name: str) -> object: optional_attrs=_BABEL_OPTIONAL_ATTRS, parser_only_hint=( "Parser-only usage still supports ResourceLoader, PathResourceLoader, " - "FallbackInfo, ResourceLoadResult, LoadSummary, and CacheAuditLogEntry." + "FallbackInfo, ResourceLoadResult, LoadSummary, CacheDebugLogEntry, " + "and CacheIntegrityEvent." ), ) globals()[name] = value @@ -83,7 +86,8 @@ def __getattr__(name: str) -> object: # ruff: noqa: RUF022 - grouped localization exports mirror the reader-facing facade __all__: list[str] = [ - "CacheAuditLogEntry", + "CacheDebugLogEntry", + "CacheIntegrityEvent", "FallbackInfo", "FTLSource", "LoadStatus", diff --git a/src/ftllexengine/localization/cache_stats.py b/src/ftllexengine/localization/cache_stats.py new file mode 100644 index 00000000..3f7e3e94 --- /dev/null +++ b/src/ftllexengine/localization/cache_stats.py @@ -0,0 +1,25 @@ +"""Immutable cache statistics contracts for multi-locale localization.""" + +from __future__ import annotations + +from dataclasses import dataclass + +from ftllexengine.runtime.cache import CacheStats + +__all__ = ["LocalizationCacheStats"] + + +@dataclass(frozen=True, slots=True) +class LocalizationCacheStats(CacheStats): + """Aggregate cache statistics across all bundles in a ``FluentLocalization``. + + Premise: + Multi-locale cache reporting is a separate public contract from the + orchestrator implementation that happens to produce it. + + Reason: + Giving this snapshot its own module keeps cache-reporting imports + acyclic and makes the type the clear owner of its own semantics. + """ + + bundle_count: int diff --git a/src/ftllexengine/localization/loading.py b/src/ftllexengine/localization/loading.py index 19c0194d..04d552a2 100644 --- a/src/ftllexengine/localization/loading.py +++ b/src/ftllexengine/localization/loading.py @@ -16,10 +16,15 @@ from __future__ import annotations +import codecs +import os +import stat as stat_module from dataclasses import dataclass, field from pathlib import Path from typing import TYPE_CHECKING, Protocol +from ftllexengine.constants import MAX_SOURCE_SIZE +from ftllexengine.core._limits import LimitArg, resolve_limit_arg from ftllexengine.core.locale_utils import require_locale_code from ftllexengine.enums import LoadStatus @@ -122,11 +127,17 @@ class PathResourceLoader: base_path: Path template with {locale} placeholder root_dir: Fixed root directory for path traversal validation. Defaults to parent of base_path if not specified. + max_source_bytes: Maximum bytes read from disk before aborting. + max_source_chars: Maximum decoded characters produced before aborting. """ base_path: str root_dir: str | None = None + max_source_bytes: LimitArg = None + max_source_chars: LimitArg = None _resolved_root: Path = field(init=False, repr=False) + _effective_max_source_bytes: int | None = field(init=False, repr=False) + _effective_max_source_chars: int | None = field(init=False, repr=False) def __post_init__(self) -> None: """Cache resolved root directory and validate template at initialization. @@ -154,6 +165,24 @@ def __post_init__(self) -> None: static_prefix = template_parts[0].rstrip("/\\") resolved = Path(static_prefix).resolve() if static_prefix else Path.cwd().resolve() object.__setattr__(self, "_resolved_root", resolved) + object.__setattr__( + self, + "_effective_max_source_bytes", + resolve_limit_arg( + self.max_source_bytes, + field_name="max_source_bytes", + default=MAX_SOURCE_SIZE, + ), + ) + object.__setattr__( + self, + "_effective_max_source_chars", + resolve_limit_arg( + self.max_source_chars, + field_name="max_source_chars", + default=MAX_SOURCE_SIZE, + ), + ) @staticmethod def _validate_locale(locale: LocaleCode) -> LocaleCode: @@ -258,19 +287,161 @@ def load(self, locale: LocaleCode, resource_id: ResourceId) -> FTLSource: self._validate_resource_id(resource_id) # Use replace() instead of format() to avoid KeyError if template - # contains other braces like "{version}" for future extensibility + # contains other braces like "{version}" for future extensibility. locale_path = self.base_path.replace("{locale}", normalized_locale) - base_dir = Path(locale_path).resolve() - full_path = (base_dir / resource_id).resolve() + lexical_base_dir = Path(locale_path).resolve(strict=False) + lexical_full_path = (lexical_base_dir / resource_id).resolve(strict=False) - if not self._is_safe_path(self._resolved_root, full_path): + try: + lexical_base_dir.relative_to(self._resolved_root) + lexical_full_path.relative_to(self._resolved_root) + except ValueError as error: msg = ( - f"Path traversal detected: resolved path escapes root directory. " + "Path traversal detected: lexical path escapes root directory. " f"locale='{locale}', resource_id='{resource_id}'" ) + raise ValueError(msg) from error + + file_fd = self._open_secure_file_fd(lexical_full_path) + try: + return self._read_text_bounded(file_fd) + finally: + os.close(file_fd) + + def _open_secure_file_fd(self, full_path: Path) -> int: + """Open a resource file without trusting symlinks or TOCTOU windows.""" + relative_parts = full_path.relative_to(self._resolved_root).parts + if len(relative_parts) == 0: + msg = f"Resource path {full_path!s} does not identify a file" raise ValueError(msg) - return full_path.read_text(encoding="utf-8") + if os.open in os.supports_dir_fd and hasattr(os, "O_NOFOLLOW"): + return self._open_secure_file_fd_posix(relative_parts) + return self._open_secure_file_fd_fallback(full_path) + + def _open_secure_file_fd_posix(self, relative_parts: tuple[str, ...]) -> int: + """Open one resource via root-relative file descriptors on POSIX. + + Premise: + Validation and open must happen in the same ownership domain. + + Reason: + Walking the path one component at a time with ``dir_fd`` and + ``O_NOFOLLOW`` closes the race between "path looked safe" and + "path was opened". + """ + root_flags = os.O_RDONLY | getattr(os, "O_DIRECTORY", 0) | getattr(os, "O_CLOEXEC", 0) + nofollow = getattr(os, "O_NOFOLLOW", 0) + root_fd = os.open(self._resolved_root, root_flags) + current_fd = root_fd + file_fd: int | None = None + try: + for part in relative_parts[:-1]: + next_fd = os.open( + part, + root_flags | nofollow, + dir_fd=current_fd, + ) + if current_fd != root_fd: + os.close(current_fd) + current_fd = next_fd + + file_fd = os.open( + relative_parts[-1], + os.O_RDONLY | getattr(os, "O_CLOEXEC", 0) | nofollow, + dir_fd=current_fd, + ) + file_stat = os.fstat(file_fd) + self._require_regular_file(file_stat.st_mode, relative_parts[-1]) + return file_fd + except Exception: + if file_fd is not None: + os.close(file_fd) + raise + finally: + if current_fd != root_fd: + os.close(current_fd) + os.close(root_fd) + + @staticmethod + def _open_secure_file_fd_fallback(full_path: Path) -> int: + """Fallback open path when POSIX dir-fd walking is unavailable.""" + pre_stat = os.lstat(full_path) + if stat_module.S_ISLNK(pre_stat.st_mode): + msg = f"Symlink resources are not allowed: {full_path!s}" + raise OSError(msg) + file_fd = os.open(full_path, os.O_RDONLY | getattr(os, "O_CLOEXEC", 0)) + post_stat = os.fstat(file_fd) + if not stat_module.S_ISREG(post_stat.st_mode): + os.close(file_fd) + msg = f"Resource path must point to a regular file: {full_path!s}" + raise OSError(msg) + if ( + hasattr(pre_stat, "st_dev") + and hasattr(pre_stat, "st_ino") + and (pre_stat.st_dev, pre_stat.st_ino) != (post_stat.st_dev, post_stat.st_ino) + ): + os.close(file_fd) + msg = f"Resource path changed while opening: {full_path!s}" + raise OSError(msg) + return file_fd + + @staticmethod + def _require_regular_file(mode: int, display_path: str) -> None: + """Reject non-regular filesystem objects at the ownership seam.""" + if stat_module.S_ISREG(mode): + return + msg = f"Resource path must point to a regular file: {display_path!r}" + raise OSError(msg) + + def _read_text_bounded(self, file_fd: int) -> str: + """Read UTF-8 text with byte and decoded-character budgets.""" + decoder = codecs.getincrementaldecoder("utf-8")("strict") + byte_total = 0 + char_total = 0 + parts: list[str] = [] + + while True: + chunk = os.read(file_fd, 65_536) + if not chunk: + break + byte_total += len(chunk) + if ( + self._effective_max_source_bytes is not None + and byte_total > self._effective_max_source_bytes + ): + msg = ( + f"Resource byte length ({byte_total:,}) exceeds maximum " + f"({self._effective_max_source_bytes:,})." + ) + raise ValueError(msg) + + decoded = decoder.decode(chunk, final=False) + char_total += len(decoded) + if ( + self._effective_max_source_chars is not None + and char_total > self._effective_max_source_chars + ): + msg = ( + f"Resource text length ({char_total:,} characters) exceeds maximum " + f"({self._effective_max_source_chars:,})." + ) + raise ValueError(msg) + parts.append(decoded) + + tail = decoder.decode(b"", final=True) + char_total += len(tail) + if ( + self._effective_max_source_chars is not None + and char_total > self._effective_max_source_chars + ): + msg = ( + f"Resource text length ({char_total:,} characters) exceeds maximum " + f"({self._effective_max_source_chars:,})." + ) + raise ValueError(msg) + parts.append(tail) + return "".join(parts) @dataclass(frozen=True, slots=True) diff --git a/src/ftllexengine/localization/orchestrator.py b/src/ftllexengine/localization/orchestrator.py index d5e2148e..8fd27328 100644 --- a/src/ftllexengine/localization/orchestrator.py +++ b/src/ftllexengine/localization/orchestrator.py @@ -35,11 +35,13 @@ from typing import TYPE_CHECKING +from ftllexengine.constants import DEFAULT_MAX_EXPANSION_SIZE, MAX_SOURCE_SIZE +from ftllexengine.core._limits import LimitArg, resolve_limit_arg from ftllexengine.core.locale_utils import require_locale_code +from ftllexengine.localization.cache_stats import LocalizationCacheStats from ftllexengine.localization.orchestrator_formatting import _LocalizationFormattingMixin from ftllexengine.localization.orchestrator_loading import _LocalizationLoadingMixin from ftllexengine.localization.orchestrator_queries import _LocalizationQueryMixin -from ftllexengine.runtime.cache import CacheStats from ftllexengine.runtime.locale_context import LocaleContext from ftllexengine.runtime.rwlock import RWLock @@ -55,17 +57,6 @@ __all__ = ["FluentLocalization", "LocalizationCacheStats"] -class LocalizationCacheStats(CacheStats, total=True): - """Aggregate cache statistics across all bundles in a FluentLocalization. - - Extends CacheStats with an additional field tracking the number of - bundles contributing to the aggregated metrics. - """ - - bundle_count: int - """Number of initialized bundles contributing to these statistics.""" - - class FluentLocalization( _LocalizationQueryMixin, _LocalizationFormattingMixin, @@ -110,6 +101,10 @@ class FluentLocalization( "_load_results", "_locales", "_lock", + "_max_expansion_size", + "_max_parse_errors", + "_max_source_size", + "_max_stream_line_length", "_on_fallback", "_pending_functions", "_primary_locale", @@ -128,6 +123,10 @@ def __init__( use_isolating: bool = True, cache: CacheConfig | None = None, on_fallback: Callable[[FallbackInfo], None] | None = None, + max_source_size: LimitArg = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, strict: bool = True, ) -> None: """Initialize multi-locale localization. @@ -145,6 +144,10 @@ def __init__( debugging and monitoring which messages are missing translations. The callback receives a FallbackInfo with requested_locale, resolved_locale, and message_id. + max_source_size: Maximum decoded FTL source size accepted per bundle. + max_parse_errors: Maximum Junk entries accepted before parse abort. + max_stream_line_length: Maximum line length accepted by stream parsing. + max_expansion_size: Maximum resolved output characters per format call. strict: Fail-fast on formatting errors (default: True). When True, syntax errors in resources raise SyntaxIntegrityError and formatting errors raise FormattingIntegrityError. @@ -183,6 +186,29 @@ def __init__( self._use_isolating = use_isolating self._cache_config: CacheConfig | None = cache self._on_fallback = on_fallback + self._max_source_size = resolve_limit_arg( + max_source_size, + field_name="max_source_size", + default=MAX_SOURCE_SIZE, + ) + self._max_parse_errors = resolve_limit_arg( + max_parse_errors, + field_name="max_parse_errors", + default=100, + ) + stream_default = ( + self._max_source_size if self._max_source_size is not None else MAX_SOURCE_SIZE + ) + self._max_stream_line_length = resolve_limit_arg( + max_stream_line_length, + field_name="max_stream_line_length", + default=stream_default, + ) + self._max_expansion_size = resolve_limit_arg( + max_expansion_size, + field_name="max_expansion_size", + default=DEFAULT_MAX_EXPANSION_SIZE, + ) self._strict = strict # Bundle storage: only contains initialized bundles (no None markers). diff --git a/src/ftllexengine/localization/orchestrator_formatting.py b/src/ftllexengine/localization/orchestrator_formatting.py index 832e044c..2521a84b 100644 --- a/src/ftllexengine/localization/orchestrator_formatting.py +++ b/src/ftllexengine/localization/orchestrator_formatting.py @@ -25,7 +25,11 @@ class _LocalizationFormattingMixin: """Formatting and mutation behavior for FluentLocalization.""" def add_resource( - self: LocalizationStateProtocol, locale: LocaleCode, ftl_source: FTLSource + self: LocalizationStateProtocol, + locale: LocaleCode, + ftl_source: FTLSource, + *, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: """Add FTL resource to a specific locale bundle.""" normalized_locale = require_locale_code(locale, "locale") @@ -37,7 +41,10 @@ def add_resource( if normalized_locale not in self._bundles: self._create_bundle(normalized_locale) - return self._bundles[normalized_locale].add_resource(ftl_source) + return self._bundles[normalized_locale].add_resource( + ftl_source, + allow_overwrite=allow_overwrite, + ) def add_resource_stream( self: LocalizationStateProtocol, @@ -45,6 +52,7 @@ def add_resource_stream( lines: Iterable[str], *, source_path: str | None = None, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: """Add FTL resource to a locale bundle from a line-oriented stream.""" normalized_locale = require_locale_code(locale, "locale") @@ -57,7 +65,9 @@ def add_resource_stream( if normalized_locale not in self._bundles: self._create_bundle(normalized_locale) return self._bundles[normalized_locale].add_resource_stream( - lines, source_path=source_path + lines, + source_path=source_path, + allow_overwrite=allow_overwrite, ) def _handle_message_not_found( diff --git a/src/ftllexengine/localization/orchestrator_loading.py b/src/ftllexengine/localization/orchestrator_loading.py index e16a230b..b64397ff 100644 --- a/src/ftllexengine/localization/orchestrator_loading.py +++ b/src/ftllexengine/localization/orchestrator_loading.py @@ -34,6 +34,10 @@ def _create_bundle( locale, use_isolating=self._use_isolating, cache=self._cache_config, + max_source_size=self._max_source_size, + max_parse_errors=self._max_parse_errors, + max_stream_line_length=self._max_stream_line_length, + max_expansion_size=self._max_expansion_size, strict=self._strict, ) for name, func in self._pending_functions.items(): diff --git a/src/ftllexengine/localization/orchestrator_protocols.py b/src/ftllexengine/localization/orchestrator_protocols.py index 49e83d5b..93b38a00 100644 --- a/src/ftllexengine/localization/orchestrator_protocols.py +++ b/src/ftllexengine/localization/orchestrator_protocols.py @@ -30,6 +30,10 @@ class LocalizationStateProtocol(Protocol): _load_results: list[ResourceLoadResult] _locales: tuple[LocaleCode, ...] _lock: RWLock + _max_expansion_size: int | None + _max_parse_errors: int | None + _max_source_size: int | None + _max_stream_line_length: int | None _on_fallback: Callable[[FallbackInfo], None] | None _pending_functions: dict[str, Callable[..., FluentValue]] _primary_locale: LocaleCode diff --git a/src/ftllexengine/localization/orchestrator_queries.py b/src/ftllexengine/localization/orchestrator_queries.py index f282bf6b..966f79cd 100644 --- a/src/ftllexengine/localization/orchestrator_queries.py +++ b/src/ftllexengine/localization/orchestrator_queries.py @@ -2,7 +2,10 @@ from __future__ import annotations -from typing import TYPE_CHECKING, cast +from dataclasses import dataclass +from typing import TYPE_CHECKING + +from ftllexengine.localization.cache_stats import LocalizationCacheStats if TYPE_CHECKING: from collections.abc import Iterator @@ -10,13 +13,106 @@ from ftllexengine.core.semantic_types import FTLSource, LocaleCode, MessageId from ftllexengine.diagnostics import ValidationResult from ftllexengine.introspection import MessageIntrospection - from ftllexengine.localization.orchestrator import LocalizationCacheStats from ftllexengine.localization.orchestrator_protocols import LocalizationStateProtocol from ftllexengine.runtime.bundle import FluentBundle - from ftllexengine.runtime.cache import CacheAuditLogEntry + from ftllexengine.runtime.cache import CacheDebugLogEntry, CacheStats from ftllexengine.syntax import Message, Term +@dataclass(slots=True) +class _LocalizationCacheAccumulator: + """Aggregate cache stats across initialized locale bundles. + + Premise: + Multi-locale cache reporting is one public contract even though each + locale bundle owns its own cache. + + Reason: + Keeping the accumulation state in one focused helper avoids a long + monolithic query method and makes the aggregation rules explicit. + """ + + total_size: int = 0 + total_maxsize: int = 0 + total_hits: int = 0 + total_misses: int = 0 + total_unhashable: int = 0 + total_oversize: int = 0 + total_error_bloat: int = 0 + total_combined_payload: int = 0 + total_corruption: int = 0 + total_integrity_events: int = 0 + total_idempotent: int = 0 + total_write_once_conflicts: int = 0 + total_uncacheable_function_skips: int = 0 + total_sequence: int = 0 + total_debug_log_entries: int = 0 + max_cache_generation: int = 0 + first_write_once: bool = False + first_debug_log_enabled: bool = False + first_max_entry_payload_bytes: int = 0 + first_max_errors: int = 0 + saw_stats: bool = False + + def include(self, stats: CacheStats) -> None: + """Merge one bundle cache snapshot into the aggregate.""" + self.total_size += stats.size + self.total_maxsize += stats.maxsize + self.total_hits += stats.hits + self.total_misses += stats.misses + self.total_unhashable += stats.unhashable_skips + self.total_oversize += stats.oversize_skips + self.total_error_bloat += stats.error_bloat_skips + self.total_combined_payload += stats.combined_payload_skips + self.total_corruption += stats.corruption_detected + self.total_integrity_events += stats.integrity_events_emitted + self.total_idempotent += stats.idempotent_writes + self.total_write_once_conflicts += stats.write_once_conflicts + self.total_uncacheable_function_skips += stats.uncacheable_function_skips + self.total_sequence += stats.sequence + self.total_debug_log_entries += stats.debug_log_entries + self.max_cache_generation = max(self.max_cache_generation, stats.cache_generation) + if not self.saw_stats: + self.first_write_once = stats.write_once + self.first_debug_log_enabled = stats.debug_log_enabled + self.first_max_entry_payload_bytes = stats.max_entry_payload_bytes + self.first_max_errors = stats.max_errors_per_entry + self.saw_stats = True + + def build(self, *, bundle_count: int) -> LocalizationCacheStats: + """Materialize the public aggregate cache snapshot.""" + total_requests = self.total_hits + self.total_misses + hit_rate = ( + self.total_hits / total_requests * 100 + if total_requests > 0 + else 0.0 + ) + return LocalizationCacheStats( + size=self.total_size, + maxsize=self.total_maxsize, + max_entry_payload_bytes=self.first_max_entry_payload_bytes, + max_errors_per_entry=self.first_max_errors, + hits=self.total_hits, + misses=self.total_misses, + hit_rate=round(hit_rate, 2), + unhashable_skips=self.total_unhashable, + oversize_skips=self.total_oversize, + error_bloat_skips=self.total_error_bloat, + combined_payload_skips=self.total_combined_payload, + corruption_detected=self.total_corruption, + integrity_events_emitted=self.total_integrity_events, + idempotent_writes=self.total_idempotent, + write_once_conflicts=self.total_write_once_conflicts, + uncacheable_function_skips=self.total_uncacheable_function_skips, + sequence=self.total_sequence, + cache_generation=self.max_cache_generation, + write_once=self.first_write_once, + debug_log_enabled=self.first_debug_log_enabled, + debug_log_entries=self.total_debug_log_entries, + bundle_count=bundle_count, + ) + + class _LocalizationQueryMixin: """Read-only query behavior for FluentLocalization.""" @@ -140,100 +236,35 @@ def get_cache_stats( return None with self._lock.read(): - total_size = 0 - total_maxsize = 0 - total_hits = 0 - total_misses = 0 - total_unhashable = 0 - total_oversize = 0 - total_error_bloat = 0 - total_combined_weight = 0 - total_corruption = 0 - total_idempotent = 0 - total_write_once_conflicts = 0 - total_sequence = 0 - total_audit_entries = 0 - first_write_once = False - first_strict = False - first_audit_enabled = False - first_max_entry_weight = 0 - first_max_errors = 0 - is_first = True + accumulator = _LocalizationCacheAccumulator() for bundle in self._bundles.values(): stats = bundle.get_cache_stats() if stats is None: continue + accumulator.include(stats) + + return accumulator.build(bundle_count=len(self._bundles)) - total_size += stats["size"] - total_maxsize += stats["maxsize"] - total_hits += stats["hits"] - total_misses += stats["misses"] - total_unhashable += stats["unhashable_skips"] - total_oversize += stats["oversize_skips"] - total_error_bloat += stats["error_bloat_skips"] - total_combined_weight += stats["combined_weight_skips"] - total_corruption += stats["corruption_detected"] - total_idempotent += stats["idempotent_writes"] - total_write_once_conflicts += stats["write_once_conflicts"] - total_sequence += stats["sequence"] - total_audit_entries += stats["audit_entries"] - if is_first: - first_write_once = stats["write_once"] - first_strict = stats["strict"] - first_audit_enabled = stats["audit_enabled"] - first_max_entry_weight = stats["max_entry_weight"] - first_max_errors = stats["max_errors_per_entry"] - is_first = False - - total_requests = total_hits + total_misses - hit_rate = (total_hits / total_requests * 100) if total_requests > 0 else 0.0 - - return cast( - "LocalizationCacheStats", - { - "size": total_size, - "maxsize": total_maxsize, - "max_entry_weight": first_max_entry_weight, - "max_errors_per_entry": first_max_errors, - "hits": total_hits, - "misses": total_misses, - "hit_rate": round(hit_rate, 2), - "unhashable_skips": total_unhashable, - "oversize_skips": total_oversize, - "error_bloat_skips": total_error_bloat, - "combined_weight_skips": total_combined_weight, - "corruption_detected": total_corruption, - "idempotent_writes": total_idempotent, - "write_once_conflicts": total_write_once_conflicts, - "sequence": total_sequence, - "write_once": first_write_once, - "strict": first_strict, - "audit_enabled": first_audit_enabled, - "audit_entries": total_audit_entries, - "bundle_count": len(self._bundles), - }, - ) - - def get_cache_audit_log( + def get_cache_debug_log( self: LocalizationStateProtocol, - ) -> dict[LocaleCode, tuple[CacheAuditLogEntry, ...]] | None: - """Return per-locale audit logs for initialized bundles.""" + ) -> dict[LocaleCode, tuple[CacheDebugLogEntry, ...]] | None: + """Return per-locale debug logs for initialized bundles.""" if self._cache_config is None: return None with self._lock.read(): - audit_logs: dict[LocaleCode, tuple[CacheAuditLogEntry, ...]] = {} + debug_logs: dict[LocaleCode, tuple[CacheDebugLogEntry, ...]] = {} for locale in self._locales: bundle = self._bundles.get(locale) if bundle is None: continue - audit_log = bundle.get_cache_audit_log() - if audit_log is not None: - audit_logs[locale] = audit_log + debug_log = bundle.get_cache_debug_log() + if debug_log is not None: + debug_logs[locale] = debug_log - return audit_logs + return debug_logs def get_bundles(self: LocalizationStateProtocol) -> Iterator[FluentBundle]: """Yield bundles in fallback order, creating them lazily as needed.""" diff --git a/src/ftllexengine/parsing/currency.py b/src/ftllexengine/parsing/currency.py index cadf19e9..7a655e3c 100644 --- a/src/ftllexengine/parsing/currency.py +++ b/src/ftllexengine/parsing/currency.py @@ -51,6 +51,7 @@ normalize_locale, ) from ftllexengine.diagnostics import ErrorCategory, FrozenErrorContext, FrozenFluentError +from ftllexengine.diagnostics._redaction import redacted_parse_failure from ftllexengine.diagnostics.templates import ErrorTemplate from ftllexengine.parsing.currency_maps import ( _FAST_TIER_UNAMBIGUOUS_SYMBOLS, @@ -124,6 +125,7 @@ def _resolve_currency_code( Tuple of (currency_code, error) - one will be None """ is_iso_code = _is_valid_iso_4217_format(currency_str) + value_summary = redacted_parse_failure(value, parse_type="currency") symbol_map, ambiguous_symbols, locale_to_currency, valid_iso_codes = _get_currency_maps() @@ -132,7 +134,9 @@ def _resolve_currency_code( if currency_str not in valid_iso_codes: diagnostic = ErrorTemplate.parse_currency_code_invalid(currency_str, value) context = FrozenErrorContext( - input_value=str(value), locale_code=locale_code, parse_type="currency" + input_value=value_summary, + locale_code=locale_code, + parse_type="currency", ) error = FrozenFluentError( str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context @@ -147,7 +151,9 @@ def _resolve_currency_code( if not _is_valid_iso_4217_format(default_currency): diagnostic = ErrorTemplate.parse_currency_code_invalid(default_currency, value) context = FrozenErrorContext( - input_value=str(value), locale_code=locale_code, parse_type="currency" + input_value=value_summary, + locale_code=locale_code, + parse_type="currency", ) error = FrozenFluentError( str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context @@ -166,7 +172,9 @@ def _resolve_currency_code( # No resolution available diagnostic = ErrorTemplate.parse_currency_ambiguous(currency_str, value) context = FrozenErrorContext( - input_value=str(value), locale_code=locale_code, parse_type="currency" + input_value=value_summary, + locale_code=locale_code, + parse_type="currency", ) error = FrozenFluentError( str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context @@ -178,7 +186,9 @@ def _resolve_currency_code( if mapped is None: diagnostic = ErrorTemplate.parse_currency_symbol_unknown(currency_str, value) context = FrozenErrorContext( - input_value=str(value), locale_code=locale_code, parse_type="currency" + input_value=value_summary, + locale_code=locale_code, + parse_type="currency", ) error = FrozenFluentError( str(diagnostic), ErrorCategory.PARSE, diagnostic=diagnostic, context=context @@ -259,7 +269,7 @@ def _detect_currency_symbol( value, locale_code, "No currency symbol or code found", ) context = FrozenErrorContext( - input_value=str(value), + input_value=redacted_parse_failure(value, parse_type="currency"), locale_code=locale_code, parse_type="currency", ) @@ -316,7 +326,7 @@ def _parse_currency_amount( number_str, value, str(failure_reason), ) context = FrozenErrorContext( - input_value=str(value), + input_value=redacted_parse_failure(value, parse_type="currency"), locale_code=locale_code, parse_type="currency", ) @@ -332,7 +342,7 @@ def _parse_currency_amount( def parse_currency( - value: str, + value: object, locale_code: str, *, default_currency: str | None = None, @@ -418,15 +428,16 @@ def parse_currency( unknown_locale_error_class = get_unknown_locale_error_class() number_format_error_class = get_number_format_error_class() parse_decimal = get_parse_decimal_func() + value_summary = redacted_parse_failure(value, parse_type="currency") if not isinstance(value, str): - diagnostic = ErrorTemplate.parse_currency_failed( # type: ignore[unreachable] - str(value), + diagnostic = ErrorTemplate.parse_currency_failed( + value, locale_code, f"Expected string, got {type(value).__name__}", ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) @@ -445,7 +456,7 @@ def parse_currency( if not is_structurally_valid_locale_code(locale_code): diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) @@ -461,7 +472,7 @@ def parse_currency( except (unknown_locale_error_class, ValueError): diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) @@ -483,7 +494,7 @@ def parse_currency( value, locale_code, "No currency symbol or code found", ) context = FrozenErrorContext( # pragma: no cover - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) @@ -513,7 +524,7 @@ def parse_currency( value, locale_code, "Currency resolution failed", ) context = FrozenErrorContext( # pragma: no cover - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) @@ -543,7 +554,7 @@ def parse_currency( value, locale_code, "Amount parsing failed", ) context = FrozenErrorContext( # pragma: no cover - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="currency", ) diff --git a/src/ftllexengine/parsing/date_patterns.py b/src/ftllexengine/parsing/date_patterns.py index 2e8c2bfa..0719158d 100644 --- a/src/ftllexengine/parsing/date_patterns.py +++ b/src/ftllexengine/parsing/date_patterns.py @@ -348,15 +348,25 @@ def _tokenize_babel_pattern(pattern: str) -> list[str]: return tokens -def _babel_to_strptime(babel_pattern: str) -> tuple[str, bool]: - """Convert one CLDR date/time pattern to a Python strptime pattern.""" +def _babel_to_strptime( + babel_pattern: str, + *, + token_map: dict[str, str | None] | None = None, +) -> tuple[str, bool]: + """Convert one CLDR date/time pattern to a Python strptime pattern. + + ``token_map`` is an explicit dependency so callers can customize token + handling without mutating shared module state. That keeps cache creation and + conversion free-thread-safe by construction. + """ tokens = _tokenize_babel_pattern(babel_pattern) result_parts: list[str] = [] has_era = False + active_token_map = _BABEL_TOKEN_MAP if token_map is None else token_map for token in tokens: - if token in _BABEL_TOKEN_MAP: - mapped = _BABEL_TOKEN_MAP[token] + if token in active_token_map: + mapped = active_token_map[token] if mapped is None: if token.startswith("G"): has_era = True diff --git a/src/ftllexengine/parsing/dates.py b/src/ftllexengine/parsing/dates.py index e2c4b327..8e06e23e 100644 --- a/src/ftllexengine/parsing/dates.py +++ b/src/ftllexengine/parsing/dates.py @@ -38,10 +38,9 @@ """ from datetime import date, datetime, timezone -from importlib import import_module -from typing import TYPE_CHECKING, cast from ftllexengine.diagnostics import ErrorCategory, FrozenErrorContext, FrozenFluentError +from ftllexengine.diagnostics._redaction import redacted_parse_failure from ftllexengine.diagnostics.templates import ErrorTemplate from ftllexengine.parsing.text_normalization import strip_bidi_format_chars @@ -59,9 +58,6 @@ clear_date_caches, ) -if TYPE_CHECKING: - from collections.abc import Callable - __all__ = [ "_BABEL_TOKEN_MAP", "_babel_to_strptime", @@ -79,7 +75,6 @@ "parse_datetime", ] -_DATE_PATTERNS_MODULE = import_module("ftllexengine.parsing.date_patterns") _PRIVATE_DATE_EXPORTS = ( _BABEL_TOKEN_MAP, _extract_datetime_separator, @@ -92,19 +87,16 @@ def _babel_to_strptime(babel_pattern: str) -> tuple[str, bool]: - """Convert one CLDR pattern using the patchable module-level token map.""" - module_vars = vars(_DATE_PATTERNS_MODULE) - original_map = cast("dict[str, str | None]", module_vars["_BABEL_TOKEN_MAP"]) - module_vars["_BABEL_TOKEN_MAP"] = _BABEL_TOKEN_MAP - try: - converter = cast("Callable[[str], tuple[str, bool]]", module_vars["_babel_to_strptime"]) - return converter(babel_pattern) - finally: - module_vars["_BABEL_TOKEN_MAP"] = original_map + """Convert one CLDR pattern using the local token map without global mutation.""" + from .date_patterns import ( # noqa: PLC0415 - local import avoids an unnecessary parser bootstrap dependency at module import time + _babel_to_strptime as convert_babel_pattern, + ) + + return convert_babel_pattern(babel_pattern, token_map=_BABEL_TOKEN_MAP) def parse_date( - value: str, + value: object, locale_code: str, ) -> tuple[date | None, tuple[FrozenFluentError, ...]]: """Parse locale-aware date string to date object. @@ -151,14 +143,15 @@ def parse_date( Thread-safe. Uses Babel + stdlib (no global state). """ errors: list[FrozenFluentError] = [] + value_summary = redacted_parse_failure(value, parse_type="date") # Type check: value must be string (runtime defense for untyped callers) if not isinstance(value, str): - diagnostic = ErrorTemplate.parse_date_failed( # type: ignore[unreachable] - str(value), locale_code, f"Expected string, got {type(value).__name__}" + diagnostic = ErrorTemplate.parse_date_failed( + value, locale_code, f"Expected string, got {type(value).__name__}" ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="date", ) @@ -182,7 +175,7 @@ def parse_date( # Unknown locale diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="date", ) @@ -207,7 +200,7 @@ def parse_date( value, locale_code, "No matching date pattern found" ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="date", ) @@ -219,7 +212,7 @@ def parse_date( def parse_datetime( - value: str, + value: object, locale_code: str, *, tzinfo: timezone | None = None, @@ -272,14 +265,15 @@ def parse_datetime( Thread-safe. Uses Babel + stdlib (no global state). """ errors: list[FrozenFluentError] = [] + value_summary = redacted_parse_failure(value, parse_type="datetime") # Type check: value must be string (runtime defense for untyped callers) if not isinstance(value, str): - diagnostic = ErrorTemplate.parse_datetime_failed( # type: ignore[unreachable] - str(value), locale_code, f"Expected string, got {type(value).__name__}" + diagnostic = ErrorTemplate.parse_datetime_failed( + value, locale_code, f"Expected string, got {type(value).__name__}" ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="datetime", ) @@ -306,7 +300,7 @@ def parse_datetime( # Unknown locale diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="datetime", ) @@ -334,7 +328,7 @@ def parse_datetime( value, locale_code, "No matching datetime pattern found" ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="datetime", ) diff --git a/src/ftllexengine/parsing/numbers.py b/src/ftllexengine/parsing/numbers.py index e3277adc..b8aeb4c6 100644 --- a/src/ftllexengine/parsing/numbers.py +++ b/src/ftllexengine/parsing/numbers.py @@ -36,6 +36,7 @@ FrozenFluentError, ParseResult, ) +from ftllexengine.diagnostics._redaction import redacted_parse_failure from ftllexengine.diagnostics.templates import ErrorTemplate from ftllexengine.parsing.text_normalization import strip_bidi_format_chars @@ -173,7 +174,7 @@ def _parse_decimal_localized( def parse_decimal( - value: str, + value: object, locale_code: str, ) -> ParseResult[Decimal]: """Parse locale-aware number string to Decimal (financial precision). @@ -229,13 +230,14 @@ def parse_decimal( unknown_locale_error_class = get_unknown_locale_error_class() number_format_error_class = get_number_format_error_class() babel_parse_decimal = get_parse_decimal_func() + value_summary = redacted_parse_failure(value, parse_type="decimal") if not isinstance(value, str): - diagnostic = ErrorTemplate.parse_decimal_failed( # type: ignore[unreachable] - str(value), locale_code, f"Expected string, got {type(value).__name__}" + diagnostic = ErrorTemplate.parse_decimal_failed( + value, locale_code, f"Expected string, got {type(value).__name__}" ) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="decimal", ) @@ -253,7 +255,7 @@ def parse_decimal( if not is_structurally_valid_locale_code(locale_code): diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="decimal", ) @@ -267,7 +269,7 @@ def parse_decimal( except (unknown_locale_error_class, ValueError): diagnostic = ErrorTemplate.parse_locale_unknown(locale_code) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="decimal", ) @@ -288,7 +290,7 @@ def parse_decimal( diagnostic = ErrorTemplate.parse_decimal_failed(value, locale_code, str(failure_reason)) context = FrozenErrorContext( - input_value=str(value), + input_value=value_summary, locale_code=locale_code, parse_type="decimal", ) diff --git a/src/ftllexengine/runtime/__init__.py b/src/ftllexengine/runtime/__init__.py index fd6b9b98..465f6088 100644 --- a/src/ftllexengine/runtime/__init__.py +++ b/src/ftllexengine/runtime/__init__.py @@ -19,7 +19,7 @@ from ftllexengine.core.babel_compat import is_babel_available from ftllexengine.diagnostics import ValidationResult -from .cache import CacheAuditLogEntry, WriteLogEntry +from .cache import CacheDebugLogEntry, CacheIntegrityEvent, CacheIntegrityEventKind from .cache_config import CacheConfig from .function_bridge import FluentNumber, FunctionRegistry, fluent_function from .value_types import make_fluent_number @@ -73,15 +73,15 @@ def __getattr__(name: str) -> object: optional_attrs=_BABEL_OPTIONAL_ATTRS, ) - - +# ruff: noqa: RUF022 - grouped runtime exports mirror the reader-facing facade __all__: list[str] = [ - "CacheAuditLogEntry", + "CacheDebugLogEntry", "CacheConfig", + "CacheIntegrityEvent", + "CacheIntegrityEventKind", "FluentNumber", "FunctionRegistry", "ValidationResult", - "WriteLogEntry", "fluent_function", "make_fluent_number", ] diff --git a/src/ftllexengine/runtime/__init__.pyi b/src/ftllexengine/runtime/__init__.pyi index 4ebd72c6..264a81f0 100644 --- a/src/ftllexengine/runtime/__init__.pyi +++ b/src/ftllexengine/runtime/__init__.pyi @@ -2,8 +2,9 @@ from ftllexengine.diagnostics import ValidationResult as ValidationResult from .async_bundle import AsyncFluentBundle as AsyncFluentBundle from .bundle import FluentBundle as FluentBundle -from .cache import CacheAuditLogEntry as CacheAuditLogEntry -from .cache import WriteLogEntry as WriteLogEntry +from .cache import CacheDebugLogEntry as CacheDebugLogEntry +from .cache import CacheIntegrityEvent as CacheIntegrityEvent +from .cache import CacheIntegrityEventKind as CacheIntegrityEventKind from .cache_config import CacheConfig as CacheConfig from .function_bridge import FluentNumber as FluentNumber from .function_bridge import FunctionRegistry as FunctionRegistry @@ -16,15 +17,17 @@ from .functions import number_format as number_format from .plural_rules import select_plural_category as select_plural_category from .value_types import make_fluent_number as make_fluent_number +# ruff: noqa: RUF022 - grouped runtime exports mirror the reader-facing facade __all__: list[str] = [ "AsyncFluentBundle", - "CacheAuditLogEntry", + "CacheDebugLogEntry", "CacheConfig", + "CacheIntegrityEvent", + "CacheIntegrityEventKind", "FluentBundle", "FluentNumber", "FunctionRegistry", "ValidationResult", - "WriteLogEntry", "create_default_registry", "currency_format", "datetime_format", diff --git a/src/ftllexengine/runtime/_resolution_gate.py b/src/ftllexengine/runtime/_resolution_gate.py new file mode 100644 index 00000000..7bb9cd54 --- /dev/null +++ b/src/ftllexengine/runtime/_resolution_gate.py @@ -0,0 +1,80 @@ +"""Internal gate for bundle formatting re-entry ownership. + +This module closes the cross-thread seam left by ContextVar-only depth +tracking. The gate allows ordinary concurrent top-level formatting, but when a +bundle is executing opaque custom-function code it rejects fresh external +formatting entry into that same bundle unless the call already belongs to the +current resolution session. +""" + +from __future__ import annotations + +from contextlib import contextmanager +from contextvars import ContextVar, Token +from dataclasses import dataclass, field +from threading import Lock +from typing import TYPE_CHECKING + +from ftllexengine.diagnostics import ErrorCategory, ErrorTemplate, FrozenFluentError + +__all__ = ["ResolutionReentryGate"] + +if TYPE_CHECKING: + from collections.abc import Iterator + +_current_resolution_session: ContextVar[object | None] = ContextVar( + "ftllexengine_resolution_session", + default=None, +) + + +@dataclass(slots=True) +class ResolutionReentryGate: + """Own re-entry admission for one bundle's formatting surface.""" + + _lock: Lock = field(default_factory=Lock) + _custom_function_depth: int = 0 + + @contextmanager + def format_call(self) -> Iterator[None]: + """Enter one bundle.format_pattern() call under the current resolution session. + + Premise: + New threads spawned inside custom functions do not inherit ContextVars. + + Reason: + We admit same-session nested formatting, but block fresh entry while + a custom function is active so a new thread cannot reset the depth + budget simply by calling back into the bundle from outside the + original session. + """ + existing_session = _current_resolution_session.get() + token: Token[object | None] | None = None + + if existing_session is None: + with self._lock: + if self._custom_function_depth > 0: + diag = ErrorTemplate.reentrant_formatting_blocked() + raise FrozenFluentError( + str(diag), + ErrorCategory.RESOLUTION, + diagnostic=diag, + ) + token = _current_resolution_session.set(object()) + + try: + yield + finally: + if token is not None: + _current_resolution_session.reset(token) + + @contextmanager + def custom_function_scope(self) -> Iterator[None]: + """Mark the period where bundle-owned custom user code is executing.""" + with self._lock: + self._custom_function_depth += 1 + try: + yield + finally: + with self._lock: + self._custom_function_depth -= 1 diff --git a/src/ftllexengine/runtime/async_bundle.py b/src/ftllexengine/runtime/async_bundle.py index 39ce6b3d..731a033b 100644 --- a/src/ftllexengine/runtime/async_bundle.py +++ b/src/ftllexengine/runtime/async_bundle.py @@ -1,9 +1,9 @@ """Async-native FluentBundle wrapper for asyncio applications. -AsyncFluentBundle wraps FluentBundle and offloads all CPU-bound operations -to a thread pool via asyncio.to_thread(), keeping the event loop unblocked. -The underlying FluentBundle handles all concurrency via its internal RWLock; -this module is purely an asyncio adapter layer. +AsyncFluentBundle wraps FluentBundle and owns its executor, admission control, +and shutdown semantics explicitly. The underlying FluentBundle still owns the +actual formatting and mutation logic; this module owns how that blocking work +is scheduled from asyncio. Python 3.13+. """ @@ -11,9 +11,13 @@ from __future__ import annotations import asyncio -from typing import TYPE_CHECKING, Self +from concurrent.futures import ThreadPoolExecutor +from functools import partial +from threading import Thread +from typing import TYPE_CHECKING, Self, TypeVar from ftllexengine.core.locale_utils import get_system_locale +from ftllexengine.core.validators import require_positive_int from .bundle import FluentBundle @@ -21,29 +25,26 @@ from collections.abc import Callable, Iterable, Mapping from types import TracebackType + from ftllexengine.core._limits import LimitArg from ftllexengine.core.semantic_types import LocaleCode from ftllexengine.core.value_types import FluentValue from ftllexengine.diagnostics import FrozenFluentError from ftllexengine.introspection import MessageIntrospection - from ftllexengine.runtime.cache import CacheAuditLogEntry, CacheStats + from ftllexengine.runtime.cache import CacheDebugLogEntry, CacheStats from ftllexengine.syntax.ast import Junk, Message, Term from .cache_config import CacheConfig from .function_bridge import FunctionRegistry +T = TypeVar("T") + class AsyncFluentBundle: """Async-native wrapper around FluentBundle for asyncio applications. - All mutation and formatting operations are offloaded to a thread pool via - asyncio.to_thread(), preventing event-loop blocking. The underlying - FluentBundle handles all thread safety via its internal RWLock. This class - is purely an asyncio adapter — no additional locking is introduced. - - Fast read lookups (has_message, get_message, etc.) are exposed as - synchronous methods because the underlying dict operations are O(1) and - hold the read lock for nanoseconds, not long enough to meaningfully block - an event loop iteration. + All methods that may touch bundle locks or perform CPU-bound work route + through one owned executor plus a bounded async admission gate. This keeps + event-loop blocking behavior explicit and gives the bundle a shutdown owner. Supports the async context manager protocol: @@ -59,7 +60,7 @@ class AsyncFluentBundle: >>> asyncio.run(example()) # doctest: +SKIP """ - __slots__ = ("_bundle",) + __slots__ = ("_bundle", "_executor", "_max_pending_operations", "_pending_gate") def __init__( self, @@ -69,9 +70,13 @@ def __init__( use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, + max_workers: int = 4, + max_pending_operations: int = 16, strict: bool = True, ) -> None: """Initialize async bundle for locale. @@ -86,9 +91,15 @@ def __init__( later mutations to the original have no effect. max_source_size: Maximum FTL source length in characters. max_nesting_depth: Maximum placeable nesting depth. + max_parse_errors: Maximum Junk entries accepted before parse abort. + max_stream_line_length: Maximum line length accepted by stream parsing. max_expansion_size: Maximum formatted output length in characters. + max_workers: Worker threads owned by this async wrapper. + max_pending_operations: Maximum in-flight or queued async bundle calls. strict: Raise on formatting or syntax errors (default: True). """ + require_positive_int(max_workers, "max_workers") + require_positive_int(max_pending_operations, "max_pending_operations") self._bundle = FluentBundle( locale, use_isolating=use_isolating, @@ -96,9 +107,17 @@ def __init__( functions=functions, max_source_size=max_source_size, max_nesting_depth=max_nesting_depth, + max_parse_errors=max_parse_errors, + max_stream_line_length=max_stream_line_length, max_expansion_size=max_expansion_size, strict=strict, ) + self._executor = ThreadPoolExecutor( + max_workers=max_workers, + thread_name_prefix=f"ftllexengine-{self._bundle.locale}", + ) + self._max_pending_operations = max_pending_operations + self._pending_gate = asyncio.Semaphore(max_pending_operations) @classmethod def for_system_locale( @@ -107,9 +126,13 @@ def for_system_locale( use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, + max_workers: int = 4, + max_pending_operations: int = 16, strict: bool = True, ) -> AsyncFluentBundle: """Create AsyncFluentBundle for the current system locale. @@ -122,7 +145,11 @@ def for_system_locale( functions: Custom FunctionRegistry (default: standard registry). max_source_size: Maximum FTL source size in characters. max_nesting_depth: Maximum placeable nesting depth. + max_parse_errors: Maximum Junk entries accepted before parse abort. + max_stream_line_length: Maximum line length accepted by stream parsing. max_expansion_size: Maximum formatted output length in characters. + max_workers: Worker threads owned by this async wrapper. + max_pending_operations: Maximum in-flight or queued async bundle calls. strict: Fail-fast mode (default True). Returns: @@ -139,7 +166,11 @@ def for_system_locale( functions=functions, max_source_size=max_source_size, max_nesting_depth=max_nesting_depth, + max_parse_errors=max_parse_errors, + max_stream_line_length=max_stream_line_length, max_expansion_size=max_expansion_size, + max_workers=max_workers, + max_pending_operations=max_pending_operations, strict=strict, ) @@ -153,13 +184,30 @@ async def __aexit__( exc_val: BaseException | None, exc_tb: TracebackType | None, ) -> None: - """Exit async context manager. No cleanup required.""" + """Exit async context manager and shut down the owned executor.""" + loop = asyncio.get_running_loop() + done = loop.create_future() + + def shutdown_executor() -> None: + try: + self._executor.shutdown(wait=True, cancel_futures=False) + except Exception as error: # noqa: BLE001 # pragma: no cover + # Premise: shutdown must resolve the awaiting coroutine exactly once. + # Reason: a narrow list here would risk hanging __aexit__ if the + # executor raises an unexpected failure during interpreter teardown. + loop.call_soon_threadsafe(done.set_exception, error) + else: + loop.call_soon_threadsafe(done.set_result, None) + + Thread(target=shutdown_executor, name="ftllexengine-async-shutdown", daemon=True).start() + await done def __repr__(self) -> str: """Return string representation for debugging.""" return ( f"AsyncFluentBundle(locale={self._bundle.locale!r}, " - f"strict={self._bundle.strict!r})" + f"strict={self._bundle.strict!r}, " + f"max_pending_operations={self._max_pending_operations!r})" ) # ------------------------------------------------------------------ @@ -191,18 +239,62 @@ def cache_config(self) -> CacheConfig | None: """Active cache configuration, or None if caching is disabled.""" return self._bundle.cache_config + async def _run_blocking( + self, + func: Callable[..., T], + /, + *args: object, + **kwargs: object, + ) -> T: + """Run bundle work through the owned executor with bounded admission. + + Cancellation is explicit: once work is submitted to the thread pool it + keeps running to completion, but the semaphore permit is released only + when the underlying thread actually finishes. + """ + await self._pending_gate.acquire() + released = False + + def release_permit(_completed: object | None = None) -> None: + nonlocal released + if not released: + released = True + self._pending_gate.release() + + loop = asyncio.get_running_loop() + future = loop.run_in_executor( + self._executor, + partial(func, *args, **kwargs), + ) + try: + result = await asyncio.wrap_future(future) + except asyncio.CancelledError: + future.add_done_callback(release_permit) + raise + except Exception: + release_permit() + raise + else: + release_permit() + return result + # ------------------------------------------------------------------ # Async mutation and formatting operations (offloaded to thread pool) # ------------------------------------------------------------------ async def add_resource( - self, source: str, /, *, source_path: str | None = None + self, + source: str, + /, + *, + source_path: str | None = None, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: - """Add FTL resource from a string. Offloads parsing to a thread pool. + """Add FTL resource from a string. Semantically identical to FluentBundle.add_resource() in all respects: strict-mode behavior, two-phase commit atomicity, thread safety, and - overwrite warnings. + overwrite admission. Args: source: FTL file content [positional-only] @@ -216,18 +308,22 @@ async def add_resource( TypeError: If source is not a string. SyntaxIntegrityError: In strict mode, if any Junk entries are parsed. """ - return await asyncio.to_thread( - self._bundle.add_resource, source, source_path=source_path + return await self._run_blocking( + self._bundle.add_resource, + source, + source_path=source_path, + allow_overwrite=allow_overwrite, ) async def add_resource_stream( - self, lines: Iterable[str], /, *, source_path: str | None = None + self, + lines: Iterable[str], + /, + *, + source_path: str | None = None, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: - """Add FTL resource from a line iterator. Offloads parsing to a thread pool. - - Memory usage is proportional to the largest single FTL entry, not the - total resource size. Semantically identical to add_resource() in all - other respects. + """Add FTL resource from a line iterator. Args: lines: Iterable of FTL source lines [positional-only]. @@ -244,8 +340,11 @@ async def add_resource_stream( ... with open("locales/en/ui.ftl") as f: ... await bundle.add_resource_stream(f, source_path="locales/en/ui.ftl") """ - return await asyncio.to_thread( - self._bundle.add_resource_stream, lines, source_path=source_path + return await self._run_blocking( + self._bundle.add_resource_stream, + lines, + source_path=source_path, + allow_overwrite=allow_overwrite, ) async def format_pattern( @@ -256,7 +355,7 @@ async def format_pattern( *, attribute: str | None = None, ) -> tuple[str, tuple[FrozenFluentError, ...]]: - """Format message to string. Offloads resolution to a thread pool. + """Format message to string. Semantically identical to FluentBundle.format_pattern() in all respects: strict/soft-error behavior, fallback semantics, and error reporting. @@ -274,12 +373,19 @@ async def format_pattern( FormattingIntegrityError: In strict mode, if any error occurs during formatting. """ - return await asyncio.to_thread( - self._bundle.format_pattern, message_id, args, attribute=attribute + return await self._run_blocking( + self._bundle.format_pattern, + message_id, + args, + attribute=attribute, ) async def add_function( - self, name: str, func: Callable[..., FluentValue] + self, + name: str, + func: Callable[..., FluentValue], + *, + cacheable: bool = False, ) -> None: """Register a custom Fluent function. Offloads registration to a thread pool. @@ -288,100 +394,52 @@ async def add_function( Uppercase by convention. func: Callable implementing the function. See fluent_function decorator for locale-injection support. + cacheable: Whether formatted outputs depending on this function may + enter the cache. Defaults to ``False`` for safety. """ - await asyncio.to_thread(self._bundle.add_function, name, func) + await self._run_blocking( + self._bundle.add_function, + name, + func, + cacheable=cacheable, + ) # ------------------------------------------------------------------ - # Synchronous read operations (fast dict lookups — O(1), non-blocking) + # Async read operations (all lock-taking bundle access stays off the loop) # ------------------------------------------------------------------ - def has_message(self, message_id: str) -> bool: - """Return True if the bundle contains a message with the given ID. - - Synchronous. The underlying lookup is O(1) and holds the read lock - for nanoseconds — not long enough to block an event loop iteration. - - Args: - message_id: Message identifier to check. - """ - return self._bundle.has_message(message_id) - - def has_attribute(self, message_id: str, attribute: str) -> bool: - """Return True if the message exists and has the named attribute. - - Synchronous. The underlying lookup is O(1) and holds the read lock - for nanoseconds — not long enough to block an event loop iteration. - - Args: - message_id: Message identifier. - attribute: Attribute name. - """ - return self._bundle.has_attribute(message_id, attribute) + async def has_message(self, message_id: str) -> bool: + """Return True if the bundle contains a message with the given ID.""" + return await self._run_blocking(self._bundle.has_message, message_id) - def get_message_ids(self) -> list[str]: - """Return a list of all message IDs registered in this bundle. + async def has_attribute(self, message_id: str, attribute: str) -> bool: + """Return True if the message exists and has the named attribute.""" + return await self._run_blocking(self._bundle.has_attribute, message_id, attribute) - Synchronous. Returns a snapshot; concurrent mutations are not visible - in the returned list. - """ - return self._bundle.get_message_ids() + async def get_message_ids(self) -> list[str]: + """Return a snapshot list of all message IDs registered in this bundle.""" + return await self._run_blocking(self._bundle.get_message_ids) - def get_message(self, message_id: str) -> Message | None: - """Return the parsed AST node for a message, or None if not found. + async def get_message(self, message_id: str) -> Message | None: + """Return the parsed AST node for a message, or None if not found.""" + return await self._run_blocking(self._bundle.get_message, message_id) - Synchronous. The underlying lookup is O(1). + async def get_term(self, term_id: str) -> Term | None: + """Return the parsed AST node for a term, or None if not found.""" + return await self._run_blocking(self._bundle.get_term, term_id) - Args: - message_id: Message identifier. - """ - return self._bundle.get_message(message_id) + async def introspect_message(self, message_id: str) -> MessageIntrospection: + """Return complete introspection data for a message.""" + return await self._run_blocking(self._bundle.introspect_message, message_id) - def get_term(self, term_id: str) -> Term | None: - """Return the parsed AST node for a term, or None if not found. + async def clear_cache(self) -> None: + """Clear the format result cache, if caching is enabled.""" + await self._run_blocking(self._bundle.clear_cache) - The term_id should be supplied without the leading dash (e.g., ``"brand"`` - for ``-brand``). Synchronous. The underlying lookup is O(1). + async def get_cache_stats(self) -> CacheStats | None: + """Return cache statistics, or None if caching is disabled.""" + return await self._run_blocking(self._bundle.get_cache_stats) - Args: - term_id: Term identifier without leading dash. - """ - return self._bundle.get_term(term_id) - - def introspect_message(self, message_id: str) -> MessageIntrospection: - """Return complete introspection data for a message. - - Provides variable names, function names, reference graph, and selector - presence. Synchronous; introspection is CPU-bound but fast. - - Args: - message_id: Message identifier. - - Returns: - MessageIntrospection with complete metadata. - - Raises: - KeyError: If the message does not exist. - """ - return self._bundle.introspect_message(message_id) - - def clear_cache(self) -> None: - """Clear the format result cache, if caching is enabled. - - Synchronous. Safe to call from async code; the cache clear is O(1). - """ - self._bundle.clear_cache() - - def get_cache_stats(self) -> CacheStats | None: - """Return cache statistics, or None if caching is disabled. - - Synchronous. Returns a snapshot of current hit/miss counts. - """ - return self._bundle.get_cache_stats() - - def get_cache_audit_log(self) -> tuple[CacheAuditLogEntry, ...] | None: - """Return the immutable cache audit log, or None if caching is disabled. - - Synchronous. Each entry records a cache write event with dual timestamps - (monotonic + wall-clock) for compliance audit trails. - """ - return self._bundle.get_cache_audit_log() + async def get_cache_debug_log(self) -> tuple[CacheDebugLogEntry, ...] | None: + """Return the immutable cache debug log, or None if caching is disabled.""" + return await self._run_blocking(self._bundle.get_cache_debug_log) diff --git a/src/ftllexengine/runtime/bundle.py b/src/ftllexengine/runtime/bundle.py index 100d0c01..8296def1 100644 --- a/src/ftllexengine/runtime/bundle.py +++ b/src/ftllexengine/runtime/bundle.py @@ -21,6 +21,7 @@ from ftllexengine.syntax import Message, Term from ftllexengine.syntax.parser import FluentParserV1 + from ._resolution_gate import ResolutionReentryGate from .bundle_protocols import BundleStateProtocol from .cache import IntegrityCache from .cache_config import CacheConfig @@ -42,13 +43,14 @@ class FluentBundle( _cache_config: CacheConfig | None _function_registry: FunctionRegistry _locale: LocaleCode - _max_expansion_size: int + _max_expansion_size: int | None _max_nesting_depth: int - _max_source_size: int + _max_source_size: int | None _messages: dict[str, Message] _msg_deps: dict[str, frozenset[str]] _owns_registry: bool _parser: FluentParserV1 + _resolution_gate: ResolutionReentryGate _resolver: FluentResolver _rwlock: RWLock _strict: bool @@ -68,6 +70,7 @@ class FluentBundle( "_msg_deps", "_owns_registry", "_parser", + "_resolution_gate", "_resolver", "_rwlock", "_strict", @@ -85,5 +88,5 @@ def format_pattern( attribute: str | None = None, ) -> tuple[str, tuple[FrozenFluentError, ...]]: """Format one message or attribute to a string.""" - with self._rwlock.read(): + with self._resolution_gate.format_call(), self._rwlock.read(): return self._format_pattern_impl(message_id, args, attribute) diff --git a/src/ftllexengine/runtime/bundle_formatting.py b/src/ftllexengine/runtime/bundle_formatting.py index 7be10248..6b13d1af 100644 --- a/src/ftllexengine/runtime/bundle_formatting.py +++ b/src/ftllexengine/runtime/bundle_formatting.py @@ -4,10 +4,11 @@ import logging import time -from collections.abc import Mapping +from collections.abc import Mapping, Sequence from typing import TYPE_CHECKING, NoReturn from ftllexengine.constants import FALLBACK_INVALID, FALLBACK_MISSING_MESSAGE +from ftllexengine.core.identifier_validation import is_valid_identifier from ftllexengine.diagnostics import ( Diagnostic, DiagnosticCode, @@ -16,6 +17,7 @@ FrozenFluentError, ) from ftllexengine.integrity import FormattingIntegrityError, IntegrityContext +from ftllexengine.runtime.resolution_context import ResolutionContext from ftllexengine.runtime.resolver import FluentResolver if TYPE_CHECKING: @@ -28,6 +30,50 @@ class _BundleFormattingMixin: """Formatting behavior for FluentBundle.""" + @staticmethod + def _validate_nested_mapping_keys( + value: object, + *, + path: str, + seen: set[int], + ) -> str | None: + """Reject non-string nested mapping keys before cache shaping sees them.""" + object_id = id(value) + if object_id in seen: + return None + seen.add(object_id) + try: + if isinstance(value, Mapping): + for nested_key, nested_value in value.items(): + if not isinstance(nested_key, str): + return ( + f"Invalid nested mapping key at {path}: expected str, got " + f"{type(nested_key).__name__}" + ) + nested_error = _BundleFormattingMixin._validate_nested_mapping_keys( + nested_value, + path=f"{path}[{nested_key!r}]", + seen=seen, + ) + if nested_error is not None: + return nested_error + return None + + if isinstance(value, Sequence) and not isinstance( + value, (str, bytes, bytearray) + ): + for index, item in enumerate(value): + nested_error = _BundleFormattingMixin._validate_nested_mapping_keys( + item, + path=f"{path}[{index}]", + seen=seen, + ) + if nested_error is not None: + return nested_error + return None + finally: + seen.discard(object_id) + def _invalid_request_result( self: BundleStateProtocol, message_id: str, @@ -51,16 +97,34 @@ def _validate_format_request( attribute: str | None, ) -> tuple[str, tuple[FrozenFluentError, ...]] | None: """Validate top-level format_pattern inputs.""" - if not message_id or not isinstance(message_id, str): - logger.warning("Invalid message ID: empty or non-string") - return self._invalid_request_result( - "", - FALLBACK_INVALID, - category=ErrorCategory.REFERENCE, - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Invalid message ID: empty or non-string", - ) + if invalid_result := _BundleFormattingMixin._validate_message_id(self, message_id): + return invalid_result + if invalid_result := _BundleFormattingMixin._validate_args(self, message_id, args): + return invalid_result + return _BundleFormattingMixin._validate_attribute(self, message_id, attribute) + def _validate_message_id( + self: BundleStateProtocol, + message_id: str, + ) -> tuple[str, tuple[FrozenFluentError, ...]] | None: + """Validate the message identifier before cache or resolver work.""" + if message_id and isinstance(message_id, str): + return None + logger.warning("Invalid message ID: empty or non-string") + return self._invalid_request_result( + "", + FALLBACK_INVALID, + category=ErrorCategory.REFERENCE, + code=DiagnosticCode.MESSAGE_NOT_FOUND, + message="Invalid message ID: empty or non-string", + ) + + def _validate_args( + self: BundleStateProtocol, + message_id: str, + args: Mapping[str, FluentValue] | None, + ) -> tuple[str, tuple[FrozenFluentError, ...]] | None: + """Validate formatting arguments before key shaping or resolution.""" raw_args: object = args if raw_args is not None and not isinstance(raw_args, Mapping): arg_type = type(raw_args).__name__ @@ -73,24 +137,80 @@ def _validate_format_request( message=f"Invalid args type: expected Mapping or None, got {arg_type}", ) - raw_attribute: object = attribute - if raw_attribute is not None and not isinstance(raw_attribute, str): - attribute_type = type(raw_attribute).__name__ - logger.warning( - "Invalid attribute type: expected str or None, got %s", - attribute_type, + if raw_args is None: + return None + + for arg_key, arg_value in raw_args.items(): + invalid_result = _BundleFormattingMixin._validate_arg_key(self, message_id, arg_key) + if invalid_result is not None: + return invalid_result + nested_mapping_error = _BundleFormattingMixin._validate_nested_mapping_keys( + arg_value, + path=f"args[{arg_key!r}]", + seen=set(), ) + if nested_mapping_error is not None: + logger.warning(nested_mapping_error) + return self._invalid_request_result( + message_id, + FALLBACK_INVALID, + category=ErrorCategory.RESOLUTION, + code=DiagnosticCode.INVALID_ARGUMENT, + message=nested_mapping_error, + ) + return None + + def _validate_arg_key( + self: BundleStateProtocol, + message_id: str, + arg_key: object, + ) -> tuple[str, tuple[FrozenFluentError, ...]] | None: + """Validate one top-level formatting argument key.""" + if not isinstance(arg_key, str): + key_type = type(arg_key).__name__ + logger.warning("Invalid args key type: expected str, got %s", key_type) return self._invalid_request_result( message_id, FALLBACK_INVALID, category=ErrorCategory.RESOLUTION, code=DiagnosticCode.INVALID_ARGUMENT, - message=( - f"Invalid attribute type: expected str or None, got {attribute_type}" - ), + message=f"Invalid args key type: expected str, got {key_type}", ) + if is_valid_identifier(arg_key): + return None + logger.warning("Invalid args key name: %s", arg_key) + return self._invalid_request_result( + message_id, + FALLBACK_INVALID, + category=ErrorCategory.RESOLUTION, + code=DiagnosticCode.INVALID_ARGUMENT, + message=( + f"Invalid args key name: {arg_key!r}. " + "Keys must be valid Fluent identifiers." + ), + ) - return None + def _validate_attribute( + self: BundleStateProtocol, + message_id: str, + attribute: str | None, + ) -> tuple[str, tuple[FrozenFluentError, ...]] | None: + """Validate the optional attribute selector.""" + raw_attribute: object = attribute + if raw_attribute is None or isinstance(raw_attribute, str): + return None + attribute_type = type(raw_attribute).__name__ + logger.warning( + "Invalid attribute type: expected str or None, got %s", + attribute_type, + ) + return self._invalid_request_result( + message_id, + FALLBACK_INVALID, + category=ErrorCategory.RESOLUTION, + code=DiagnosticCode.INVALID_ARGUMENT, + message=f"Invalid attribute type: expected str or None, got {attribute_type}", + ) def _lookup_cached_pattern( self: BundleStateProtocol, @@ -108,6 +228,7 @@ def _lookup_cached_pattern( attribute, self._locale, use_isolating=self._use_isolating, + function_generation=self._function_registry.cache_generation, ) if cached_entry is None: return None @@ -156,6 +277,7 @@ def _create_resolver(self: BundleStateProtocol) -> FluentResolver: messages=self._messages, terms=self._terms, function_registry=self._function_registry, + reentry_gate=self._resolution_gate, use_isolating=self._use_isolating, max_nesting_depth=self._max_nesting_depth, max_expansion_size=self._max_expansion_size, @@ -190,7 +312,17 @@ def _format_pattern_impl( message = self._messages[message_id] resolver = self._resolver - result, errors_tuple = resolver.resolve_message(message, args, attribute) + context = ResolutionContext( + max_depth=self._max_nesting_depth, + max_expression_depth=self._max_nesting_depth, + max_expansion_size=self._max_expansion_size, + ) + result, errors_tuple = resolver.resolve_message( + message, + args, + attribute, + context=context, + ) if errors_tuple: log_fn = logger.warning if self._strict else logger.debug @@ -205,15 +337,26 @@ def _format_pattern_impl( logger.debug("Resolved message '%s' successfully", message_id) if self._cache is not None: - self._cache.put( - message_id, - args, - attribute, - self._locale, - use_isolating=self._use_isolating, - formatted=result, - errors=errors_tuple, - ) + if context.cacheable_output: + self._cache.put( + message_id, + args, + attribute, + self._locale, + use_isolating=self._use_isolating, + function_generation=self._function_registry.cache_generation, + formatted=result, + errors=errors_tuple, + ) + else: + self._cache.note_uncacheable_result( + message_id, + args, + attribute, + self._locale, + use_isolating=self._use_isolating, + function_generation=self._function_registry.cache_generation, + ) if errors_tuple and self._strict: self._raise_strict_error(message_id, result, errors_tuple) diff --git a/src/ftllexengine/runtime/bundle_lifecycle.py b/src/ftllexengine/runtime/bundle_lifecycle.py index 28aa75c8..ab7ae754 100644 --- a/src/ftllexengine/runtime/bundle_lifecycle.py +++ b/src/ftllexengine/runtime/bundle_lifecycle.py @@ -3,17 +3,20 @@ from __future__ import annotations import logging -from typing import TYPE_CHECKING, cast +import sys +from typing import TYPE_CHECKING, Self from ftllexengine.constants import ( DEFAULT_MAX_EXPANSION_SIZE, MAX_DEPTH, MAX_SOURCE_SIZE, ) +from ftllexengine.core._limits import resolve_limit_arg from ftllexengine.core.depth_guard import depth_clamp from ftllexengine.core.locale_utils import get_system_locale, require_locale_code from ftllexengine.syntax.parser import FluentParserV1 +from ._resolution_gate import ResolutionReentryGate from .cache import IntegrityCache from .function_bridge import FunctionRegistry from .functions import get_shared_registry @@ -21,10 +24,10 @@ from .rwlock import RWLock if TYPE_CHECKING: + from ftllexengine.core._limits import LimitArg from ftllexengine.core.semantic_types import LocaleCode from ftllexengine.syntax import Message, Term - from .bundle import FluentBundle from .bundle_protocols import BundleStateProtocol from .cache_config import CacheConfig @@ -42,9 +45,11 @@ def __init__( use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, strict: bool = True, ) -> None: """Initialize bundle state for one locale.""" @@ -58,16 +63,25 @@ def __init__( self._msg_deps: dict[str, frozenset[str]] = {} self._term_deps: dict[str, frozenset[str]] = {} - self._max_source_size = max_source_size if max_source_size is not None else MAX_SOURCE_SIZE + self._max_source_size = resolve_limit_arg( + max_source_size, + field_name="max_source_size", + default=MAX_SOURCE_SIZE, + ) requested_depth = max_nesting_depth if max_nesting_depth is not None else MAX_DEPTH self._max_nesting_depth = depth_clamp(requested_depth) - self._max_expansion_size = ( - max_expansion_size if max_expansion_size is not None else DEFAULT_MAX_EXPANSION_SIZE + self._max_expansion_size = resolve_limit_arg( + max_expansion_size, + field_name="max_expansion_size", + default=DEFAULT_MAX_EXPANSION_SIZE, ) self._parser = FluentParserV1( max_source_size=self._max_source_size, max_nesting_depth=self._max_nesting_depth, + max_parse_errors=max_parse_errors, + max_stream_line_length=max_stream_line_length, ) + self._resolution_gate = ResolutionReentryGate() self._rwlock = RWLock() provided_functions: object = functions @@ -89,12 +103,13 @@ def __init__( if cache is not None: self._cache = IntegrityCache( maxsize=cache.size, - max_entry_weight=cache.max_entry_weight, + max_entry_payload_bytes=cache.max_entry_payload_bytes, max_errors_per_entry=cache.max_errors_per_entry, write_once=cache.write_once, - strict=cache.integrity_strict and strict, - enable_audit=cache.enable_audit, - max_audit_entries=cache.max_audit_entries, + enable_debug_log=cache.enable_debug_log, + max_debug_entries=cache.max_debug_entries, + integrity_event_sink=cache.integrity_event_sink, + debug_fingerprint_key=cache.debug_fingerprint_key, ) self._resolver = self._create_resolver() @@ -141,7 +156,7 @@ def cache_usage(self: BundleStateProtocol) -> int: @property def max_source_size(self: BundleStateProtocol) -> int: """Maximum FTL source size in characters.""" - return self._max_source_size + return self._max_source_size if self._max_source_size is not None else sys.maxsize @property def max_nesting_depth(self: BundleStateProtocol) -> int: @@ -151,7 +166,7 @@ def max_nesting_depth(self: BundleStateProtocol) -> int: @property def max_expansion_size(self: BundleStateProtocol) -> int: """Maximum total characters produced during resolution.""" - return self._max_expansion_size + return self._max_expansion_size if self._max_expansion_size is not None else sys.maxsize @property def function_registry(self: BundleStateProtocol) -> FunctionRegistry: @@ -160,30 +175,31 @@ def function_registry(self: BundleStateProtocol) -> FunctionRegistry: @classmethod def for_system_locale( - cls, + cls: type[Self], *, use_isolating: bool = True, cache: CacheConfig | None = None, functions: FunctionRegistry | None = None, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_expansion_size: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, + max_expansion_size: LimitArg = None, strict: bool = True, - ) -> FluentBundle: + ) -> Self: """Factory method to create a FluentBundle using the system locale.""" system_locale = get_system_locale(raise_on_failure=True) - return cast( - "FluentBundle", - cls( - system_locale, - use_isolating=use_isolating, - cache=cache, - functions=functions, - max_source_size=max_source_size, - max_nesting_depth=max_nesting_depth, - max_expansion_size=max_expansion_size, - strict=strict, - ), + return cls( + system_locale, + use_isolating=use_isolating, + cache=cache, + functions=functions, + max_source_size=max_source_size, + max_nesting_depth=max_nesting_depth, + max_parse_errors=max_parse_errors, + max_stream_line_length=max_stream_line_length, + max_expansion_size=max_expansion_size, + strict=strict, ) def __repr__(self: BundleStateProtocol) -> str: diff --git a/src/ftllexengine/runtime/bundle_mutation.py b/src/ftllexengine/runtime/bundle_mutation.py index 1e7f8df5..d5896fc6 100644 --- a/src/ftllexengine/runtime/bundle_mutation.py +++ b/src/ftllexengine/runtime/bundle_mutation.py @@ -5,7 +5,6 @@ import logging from typing import TYPE_CHECKING -from ftllexengine.syntax import Resource from ftllexengine.validation import validate_resource as _validate_resource_impl if TYPE_CHECKING: @@ -13,10 +12,10 @@ from ftllexengine.core.value_types import FluentValue from ftllexengine.diagnostics import ValidationResult - from ftllexengine.syntax import Entry, Junk + from ftllexengine.syntax import Junk from .bundle_protocols import BundleStateProtocol - from .cache import CacheAuditLogEntry, CacheStats + from .cache import CacheDebugLogEntry, CacheStats logger = logging.getLogger("ftllexengine.runtime.bundle") @@ -30,8 +29,13 @@ def add_resource( /, *, source_path: str | None = None, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: - """Add FTL resource to bundle.""" + """Add FTL resource to bundle. + + ``allow_overwrite`` exists because replacing canonical message or term IDs + is a state mutation decision, not an incidental consequence of load order. + """ raw_source: object = source if not isinstance(raw_source, str): msg = ( @@ -42,7 +46,11 @@ def add_resource( resource = self._parser.parse(raw_source) with self._rwlock.write(): - return self._register_resource(resource, source_path) + return self._register_resource( + resource, + source_path, + allow_overwrite=allow_overwrite, + ) def add_resource_stream( self: BundleStateProtocol, @@ -50,13 +58,30 @@ def add_resource_stream( /, *, source_path: str | None = None, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: - """Add FTL resource to bundle from a line-oriented source stream.""" - collected: list[Entry] = list(self._parser.parse_stream(lines)) - resource = Resource(entries=tuple(collected)) + """Add FTL resource to bundle from a line-oriented source stream. + + Premise: + Streaming should avoid a second full-entry materialization pass. + Reason: + The parser already yields incremental entries; collecting them into a + list before registration doubles memory pressure for no integrity gain. + """ with self._rwlock.write(): - return self._register_resource(resource, source_path) + from ftllexengine.runtime.bundle_registration import ( # noqa: PLC0415 - local import breaks an optional cycle on the streaming path + _PendingRegistration, + ) + + pending = _PendingRegistration() + for entry in self._parser.parse_stream(lines): + self._collect_pending_entry(pending, entry) + return self._register_pending_entries( + pending, + source_path, + allow_overwrite=allow_overwrite, + ) def validate_resource(self: BundleStateProtocol, source: str) -> ValidationResult: """Validate FTL resource without adding to bundle.""" @@ -83,15 +108,27 @@ def add_function( self: BundleStateProtocol, name: str, func: Callable[..., FluentValue], + *, + cacheable: bool = False, ) -> None: - """Add custom function to bundle.""" + """Add a custom function to the bundle. + + Premise: + Arbitrary callables may depend on time, I/O, or ambient process + state that the cache key cannot represent. + + Reason: + Custom functions therefore default to ``cacheable=False``. Callers + must opt into caching explicitly when the function is pure with + respect to the formatting inputs. + """ with self._rwlock.write(): if not self._owns_registry: self._function_registry = self._function_registry.copy() self._owns_registry = True logger.debug("Registry copied on first add_function") - self._function_registry.register(func, ftl_name=name) + self._function_registry.register(func, ftl_name=name, cacheable=cacheable) logger.debug("Added custom function: %s", name) self._resolver = self._create_resolver() @@ -112,10 +149,10 @@ def get_cache_stats(self: BundleStateProtocol) -> CacheStats | None: return self._cache.get_stats() return None - def get_cache_audit_log( + def get_cache_debug_log( self: BundleStateProtocol, - ) -> tuple[CacheAuditLogEntry, ...] | None: - """Get immutable cache audit log entries.""" + ) -> tuple[CacheDebugLogEntry, ...] | None: + """Get immutable cache debug-log entries.""" if self._cache is not None: - return self._cache.get_audit_log() + return self._cache.get_debug_log() return None diff --git a/src/ftllexengine/runtime/bundle_protocols.py b/src/ftllexengine/runtime/bundle_protocols.py index dc73b331..c336d5a1 100644 --- a/src/ftllexengine/runtime/bundle_protocols.py +++ b/src/ftllexengine/runtime/bundle_protocols.py @@ -11,13 +11,14 @@ from ftllexengine.core.value_types import FluentValue from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError from ftllexengine.diagnostics.codes import DiagnosticCode + from ftllexengine.runtime._resolution_gate import ResolutionReentryGate from ftllexengine.runtime.bundle_registration import _PendingRegistration from ftllexengine.runtime.cache import IntegrityCache from ftllexengine.runtime.cache_config import CacheConfig from ftllexengine.runtime.function_bridge import FunctionRegistry from ftllexengine.runtime.resolver import FluentResolver from ftllexengine.runtime.rwlock import RWLock - from ftllexengine.syntax import Junk, Message, Resource, Term + from ftllexengine.syntax import Comment, Junk, Message, Resource, Term from ftllexengine.syntax.parser import FluentParserV1 @@ -28,13 +29,14 @@ class BundleStateProtocol(Protocol): _cache_config: CacheConfig | None _function_registry: FunctionRegistry _locale: LocaleCode - _max_expansion_size: int + _max_expansion_size: int | None _max_nesting_depth: int - _max_source_size: int + _max_source_size: int | None _messages: dict[str, Message] _msg_deps: dict[str, frozenset[str]] _owns_registry: bool _parser: FluentParserV1 + _resolution_gate: ResolutionReentryGate _resolver: FluentResolver _rwlock: RWLock _strict: bool @@ -45,8 +47,28 @@ class BundleStateProtocol(Protocol): def _collect_pending_entries(self, resource: Resource) -> _PendingRegistration: ... # pragma: no cover - typing-only protocol declaration + def _collect_pending_entry( + self, + pending: _PendingRegistration, + entry: Message | Term | Junk | Comment, + ) -> None: + ... # pragma: no cover - typing-only protocol declaration + def _register_resource( - self, resource: Resource, source_path: str | None + self, + resource: Resource, + source_path: str | None, + *, + allow_overwrite: bool = False, + ) -> tuple[Junk, ...]: + ... # pragma: no cover - typing-only protocol declaration + + def _register_pending_entries( + self, + pending: _PendingRegistration, + source_path: str | None, + *, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: ... # pragma: no cover - typing-only protocol declaration diff --git a/src/ftllexengine/runtime/bundle_registration.py b/src/ftllexengine/runtime/bundle_registration.py index 315f9782..02826af9 100644 --- a/src/ftllexengine/runtime/bundle_registration.py +++ b/src/ftllexengine/runtime/bundle_registration.py @@ -8,7 +8,12 @@ from typing import TYPE_CHECKING, Literal, assert_never from ftllexengine.core.reference_graph import entry_dependency_set -from ftllexengine.integrity import IntegrityContext, SyntaxIntegrityError +from ftllexengine.diagnostics._redaction import redacted_loader_snippet +from ftllexengine.integrity import ( + IntegrityContext, + ResourceConflictIntegrityError, + SyntaxIntegrityError, +) from ftllexengine.introspection import extract_references from ftllexengine.syntax import Comment, Junk, Message, Resource, Term @@ -17,8 +22,6 @@ logger = logging.getLogger("ftllexengine.runtime.bundle") -_LOG_TRUNCATE_WARNING: int = 100 - @dataclass(slots=True) class _PendingRegistration: @@ -29,12 +32,66 @@ class _PendingRegistration: msg_deps: dict[str, frozenset[str]] = field(default_factory=dict) term_deps: dict[str, frozenset[str]] = field(default_factory=dict) junk: list[Junk] = field(default_factory=list) - overwrite_warnings: list[tuple[Literal["message", "term"], str]] = field(default_factory=list) + duplicate_ids: list[str] = field(default_factory=list) + shadowed_ids: list[str] = field(default_factory=list) class _BundleRegistrationMixin: """Resource registration behavior for FluentBundle.""" + @staticmethod + def _conflict_label(entry_type: Literal["message", "term"], entry_id: str) -> str: + """Render the public-facing identifier used in conflict diagnostics.""" + return entry_id if entry_type == "message" else f"-{entry_id}" + + @staticmethod + def _append_unique(target: list[str], value: str) -> None: + """Keep conflict lists stable without duplicating identical IDs.""" + if value not in target: + target.append(value) + + def _collect_pending_entry( + self: BundleStateProtocol, + pending: _PendingRegistration, + entry: Message | Term | Junk | Comment, + ) -> None: + """Merge one parsed entry into the pending registration accumulator.""" + match entry: + case Message(): + msg_id = entry.id.name + if msg_id in pending.messages: + _BundleRegistrationMixin._append_unique( + pending.duplicate_ids, + _BundleRegistrationMixin._conflict_label("message", msg_id), + ) + elif msg_id in self._messages: + _BundleRegistrationMixin._append_unique( + pending.shadowed_ids, + _BundleRegistrationMixin._conflict_label("message", msg_id), + ) + pending.messages[msg_id] = entry + pending.msg_deps[msg_id] = entry_dependency_set(*extract_references(entry)) + case Term(): + term_id = entry.id.name + if term_id in pending.terms: + _BundleRegistrationMixin._append_unique( + pending.duplicate_ids, + _BundleRegistrationMixin._conflict_label("term", term_id), + ) + elif term_id in self._terms: + _BundleRegistrationMixin._append_unique( + pending.shadowed_ids, + _BundleRegistrationMixin._conflict_label("term", term_id), + ) + pending.terms[term_id] = entry + pending.term_deps[term_id] = entry_dependency_set(*extract_references(entry)) + case Junk(): + pending.junk.append(entry) + case Comment(): + pass + case _ as unreachable: # pragma: no cover + assert_never(unreachable) + def _collect_pending_entries( self: BundleStateProtocol, resource: Resource ) -> _PendingRegistration: @@ -42,72 +99,54 @@ def _collect_pending_entries( pending = _PendingRegistration() for entry in resource.entries: - match entry: - case Message(): - msg_id = entry.id.name - if msg_id in self._messages or msg_id in pending.messages: - pending.overwrite_warnings.append(("message", msg_id)) - pending.messages[msg_id] = entry - pending.msg_deps[msg_id] = entry_dependency_set(*extract_references(entry)) - case Term(): - term_id = entry.id.name - if term_id in self._terms or term_id in pending.terms: - pending.overwrite_warnings.append(("term", term_id)) - pending.terms[term_id] = entry - pending.term_deps[term_id] = entry_dependency_set(*extract_references(entry)) - case Junk(): - pending.junk.append(entry) - case Comment(): - pass - case _ as unreachable: # pragma: no cover - assert_never(unreachable) + self._collect_pending_entry(pending, entry) return pending def _register_resource( - self: BundleStateProtocol, resource: Resource, source_path: str | None + self: BundleStateProtocol, + resource: Resource, + source_path: str | None, + *, + allow_overwrite: bool = False, ) -> tuple[Junk, ...]: """Register parsed resource entries via a two-phase commit.""" pending = self._collect_pending_entries(resource) + return self._register_pending_entries( + pending, + source_path, + allow_overwrite=allow_overwrite, + ) + + def _register_pending_entries( + self: BundleStateProtocol, + pending: _PendingRegistration, + source_path: str | None, + *, + allow_overwrite: bool = False, + ) -> tuple[Junk, ...]: + """Commit a pre-collected pending registration into the bundle state.""" junk_tuple = tuple(pending.junk) + duplicate_ids = tuple(pending.duplicate_ids) + shadowed_ids = tuple(pending.shadowed_ids) + source_desc = source_path or "" if self._strict and junk_tuple: - source_desc = source_path or "" - error_summary = "; ".join(repr(junk.content[:50]) for junk in junk_tuple[:3]) - if len(junk_tuple) > 3: - error_summary += f" (and {len(junk_tuple) - 3} more)" - - context = IntegrityContext( - component="bundle", - operation="add_resource", - key=source_desc, - expected="", - actual=f"<{len(junk_tuple)} syntax error(s)>", - timestamp=time.monotonic(), - wall_time_unix=time.time(), + _BundleRegistrationMixin._raise_strict_junk_error( + junk_tuple, source_desc, source_path ) - msg = ( - f"Strict mode: {len(junk_tuple)} syntax error(s) in " - f"{source_desc}: {error_summary}" + if duplicate_ids: + _BundleRegistrationMixin._raise_duplicate_error( + duplicate_ids, source_desc, source_path ) - raise SyntaxIntegrityError( - msg, - context=context, - junk_entries=junk_tuple, - source_path=source_path, + + if shadowed_ids and not allow_overwrite: + _BundleRegistrationMixin._raise_shadow_error( + shadowed_ids, source_desc, source_path ) - for entry_type, entry_id in pending.overwrite_warnings: - if entry_type == "message": - logger.warning( - "Overwriting existing message '%s' with new definition", - entry_id, - ) - else: - logger.warning( - "Overwriting existing term '-%s' with new definition", - entry_id, - ) + for entry_id in shadowed_ids: + logger.warning("Replacing existing bundle entry %s from %s", entry_id, source_desc) self._messages.update(pending.messages) self._terms.update(pending.terms) @@ -119,12 +158,11 @@ def _register_resource( for term_id in pending.terms: logger.debug("Registered term: %s", term_id) - source_desc = source_path or "" for junk in pending.junk: logger.warning( "Syntax error in %s: %s", source_desc, - repr(junk.content[:_LOG_TRUNCATE_WARNING]), + redacted_loader_snippet(junk.content[:100]), ) if source_path: @@ -148,3 +186,95 @@ def _register_resource( logger.debug("Cache cleared after add_resource") return junk_tuple + + @staticmethod + def _conflict_summary(conflict_ids: tuple[str, ...]) -> str: + """Summarize conflict identifiers without overlong diagnostics.""" + summary = ", ".join(conflict_ids[:5]) + if len(conflict_ids) > 5: + summary += f" (and {len(conflict_ids) - 5} more)" + return summary + + @staticmethod + def _raise_strict_junk_error( + junk_entries: tuple[Junk, ...], + source_desc: str, + source_path: str | None, + ) -> None: + """Fail closed when strict ingestion encounters parser junk.""" + error_summary = "; ".join( + redacted_loader_snippet(junk.content[:50]) for junk in junk_entries[:3] + ) + if len(junk_entries) > 3: + error_summary += f" (and {len(junk_entries) - 3} more)" + + context = IntegrityContext( + component="bundle", + operation="add_resource", + key=source_desc, + expected="", + actual=f"<{len(junk_entries)} syntax error(s)>", + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"Strict mode: {len(junk_entries)} syntax error(s) in {source_desc}: {error_summary}" + raise SyntaxIntegrityError( + msg, + context=context, + junk_entries=junk_entries, + source_path=source_path, + ) + + @staticmethod + def _raise_duplicate_error( + duplicate_ids: tuple[str, ...], + source_desc: str, + source_path: str | None, + ) -> None: + """Reject duplicate IDs inside one resource before mutating bundle state.""" + context = IntegrityContext( + component="bundle", + operation="add_resource", + key=source_desc, + expected="unique resource IDs", + actual=", ".join(duplicate_ids), + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + duplicate_summary = _BundleRegistrationMixin._conflict_summary(duplicate_ids) + msg = f"Resource defines duplicate message/term IDs in {source_desc}: {duplicate_summary}" + raise ResourceConflictIntegrityError( + msg, + context=context, + duplicate_ids=duplicate_ids, + source_path=source_path, + ) + + @staticmethod + def _raise_shadow_error( + shadowed_ids: tuple[str, ...], + source_desc: str, + source_path: str | None, + ) -> None: + """Reject implicit replacement of existing canonical bundle entries.""" + context = IntegrityContext( + component="bundle", + operation="add_resource", + key=source_desc, + expected="no replacement of existing IDs", + actual=", ".join(shadowed_ids), + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + shadow_summary = _BundleRegistrationMixin._conflict_summary(shadowed_ids) + msg = ( + f"Resource attempts to replace existing IDs in {source_desc}: " + f"{shadow_summary}. Pass allow_overwrite=True only when replacement " + "is intentional and audited." + ) + raise ResourceConflictIntegrityError( + msg, + context=context, + shadowed_ids=shadowed_ids, + source_path=source_path, + ) diff --git a/src/ftllexengine/runtime/cache.py b/src/ftllexengine/runtime/cache.py index cf353173..bebcfa8f 100644 --- a/src/ftllexengine/runtime/cache.py +++ b/src/ftllexengine/runtime/cache.py @@ -1,31 +1,15 @@ -"""Thread-safe LRU cache with integrity verification for message formatting. - -Provides financial-grade caching of format_pattern() calls with: -- BLAKE2b-128 checksum verification on every get/put -- Write-once semantics (optional) for data race prevention -- Audit logging (optional) for post-mortem analysis -- Immutable cache entries (frozen dataclasses) -- Automatic invalidation on resource/function changes - -Architecture: - - Thread-safe using threading.Lock - - LRU eviction via OrderedDict - - Immutable cache keys (tuples of hashable types) - - Content-addressed entries with BLAKE2b-128 checksums - - Fail-fast on corruption (strict mode) or silent eviction (non-strict) - -Cache Key Structure: - (message_id, args_tuple, attribute, locale_code, use_isolating) - - message_id: str - - args_tuple: tuple[tuple[str, Any], ...] (sorted, frozen) - - attribute: str | None - - locale_code: str (for multi-bundle scenarios) - - use_isolating: bool - -Thread Safety: - All operations protected by Lock. Safe for concurrent reads and writes. - -Python 3.13+. Zero external dependencies. +"""Thread-safe LRU cache with fail-closed integrity verification. + +Provides format caching for ``format_pattern()`` calls with: +- accidental-corruption detection on every lookup; +- write-once semantics for race detection; +- a bounded debug ring for routine cache traffic; +- a structured critical integrity-event sink for incident evidence; +- immutable cache entries and canonical versioned cache keys. + +The cache contract is intentionally separate from formatting strictness. +Formatting fallback behavior is user-facing; cache corruption and key-contract +failures are system-integrity events. """ from __future__ import annotations @@ -33,26 +17,40 @@ import hmac import time from collections import OrderedDict, deque +from secrets import token_bytes from threading import Lock -from typing import TYPE_CHECKING, final +from typing import TYPE_CHECKING, NoReturn, final -from ftllexengine.constants import DEFAULT_CACHE_SIZE, DEFAULT_MAX_ENTRY_WEIGHT +from ftllexengine.constants import DEFAULT_CACHE_SIZE, DEFAULT_MAX_ENTRY_PAYLOAD_BYTES +from ftllexengine.core.validators import require_bool, require_positive_int from ftllexengine.integrity import ( CacheCorruptionError, + CacheKeySerializationError, + IntegrityCheckFailedError, IntegrityContext, WriteConflictError, ) from ftllexengine.runtime.cache_audit import _CacheAuditMixin +from ftllexengine.runtime.cache_events import ( + CacheDebugLogEntry, + CacheIntegrityEvent, + CacheIntegrityEventKind, + IntegrityEventSink, + MemoryIntegrityEventSink, +) +from ftllexengine.runtime.cache_integrity_eventing import _CacheIntegrityEventMixin from ftllexengine.runtime.cache_introspection import _CacheKeyMixin, _CacheStatsMixin from ftllexengine.runtime.cache_types import ( _DEFAULT_MAX_ERRORS_PER_ENTRY, - CacheAuditLogEntry, CacheStats, HashableValue, IntegrityCacheEntry, - WriteLogEntry, _CacheKey, - _estimate_error_weight, + _estimate_error_payload_bytes, +) +from ftllexengine.runtime.cache_validation import ( + validate_optional_debug_fingerprint_key, + validate_optional_integrity_event_sink, ) if TYPE_CHECKING: @@ -62,78 +60,65 @@ from ftllexengine.diagnostics import FrozenFluentError __all__ = [ - "CacheAuditLogEntry", + "CacheDebugLogEntry", + "CacheIntegrityEvent", + "CacheIntegrityEventKind", "CacheStats", "HashableValue", "IntegrityCache", "IntegrityCacheEntry", - "WriteLogEntry", + "IntegrityEventSink", + "MemoryIntegrityEventSink", ] @final -class IntegrityCache(_CacheStatsMixin, _CacheAuditMixin, _CacheKeyMixin): - """Financial-grade format cache with integrity verification. +class IntegrityCache( + _CacheStatsMixin, + _CacheAuditMixin, + _CacheIntegrityEventMixin, + _CacheKeyMixin, +): + """Fail-closed format cache with explicit integrity ownership. Thread-safe LRU cache that provides: - - BLAKE2b-128 checksum verification on every get() - - Write-once semantics (optional) to prevent data races - - Audit logging (optional) for compliance and debugging - - Fail-fast on corruption (strict mode) or silent eviction - - This is the recommended cache for financial applications where - silent data corruption is unacceptable. + - accidental-corruption detection on every ``get()``; + - write-once semantics (optional) to detect conflicting writes; + - a bounded debug ring for recent routine cache traffic; + - a structured integrity-event sink for critical incidents. Thread Safety: - All operations are protected by Lock. Safe for concurrent access. - - Memory Protection: - The max_entry_weight parameter prevents unbounded memory usage. - Weight is calculated as: len(formatted_str) + sum(_estimate_error_weight(e) - for e in errors), where _estimate_error_weight measures actual error content - (message text, diagnostic fields, resolution path strings, context fields). - - Integrity Guarantees: - - Checksums computed on put(), verified on get() - - Corruption detected via BLAKE2b-128 mismatch - - Write-once mode prevents overwrites (data race protection) - - Audit log provides complete operation history - - Example: - >>> cache = IntegrityCache(maxsize=1000, strict=True) # doctest: +SKIP - >>> cache.put( # doctest: +SKIP - ... "msg", - ... None, - ... None, - ... "en_US", - ... use_isolating=False, - ... formatted="Hello", - ... errors=(), - ... ) - >>> entry = cache.get("msg", None, None, "en_US", use_isolating=False) # doctest: +SKIP - >>> assert entry is not None # doctest: +SKIP - >>> assert entry.verify() # Integrity check # doctest: +SKIP - >>> result, errors = entry.as_result() # doctest: +SKIP + All operations are protected by ``threading.Lock``. + + Payload Budget: + ``max_entry_payload_bytes`` bounds the retained UTF-8 payload of the + formatted string plus the serialized diagnostic content cached with it. + This is a deterministic retained-payload contract, not a claim about + Python allocator overhead. """ __slots__ = ( - "_audit_log", - "_audit_sequence", "_cache", - "_combined_weight_skips", + "_cache_generation", + "_combined_payload_skips", "_corruption_detected", + "_debug_fingerprint_key", + "_debug_log", + "_debug_sequence", "_error_bloat_skips", "_hits", "_idempotent_writes", + "_integrity_event_sink", + "_integrity_events_emitted", "_lock", - "_max_audit_entries", - "_max_entry_weight", + "_max_debug_entries", + "_max_entry_payload_bytes", "_max_errors_per_entry", "_maxsize", "_misses", "_oversize_skips", "_sequence", - "_strict", + "_uncacheable_function_skips", "_unhashable_skips", "_write_once", "_write_once_conflicts", @@ -142,69 +127,101 @@ class IntegrityCache(_CacheStatsMixin, _CacheAuditMixin, _CacheKeyMixin): def __init__( self, maxsize: int = DEFAULT_CACHE_SIZE, - max_entry_weight: int = DEFAULT_MAX_ENTRY_WEIGHT, + max_entry_payload_bytes: int = DEFAULT_MAX_ENTRY_PAYLOAD_BYTES, max_errors_per_entry: int = _DEFAULT_MAX_ERRORS_PER_ENTRY, *, write_once: bool = False, - strict: bool = True, - enable_audit: bool = False, - max_audit_entries: int = 10000, + enable_debug_log: bool = False, + max_debug_entries: int = 10000, + integrity_event_sink: IntegrityEventSink | None = None, + debug_fingerprint_key: bytes | None = None, ) -> None: - """Initialize integrity cache. - - Args: - maxsize: Maximum number of entries (default: DEFAULT_CACHE_SIZE from constants) - max_entry_weight: Maximum memory weight for cached results (default: 10_000). - Weight is calculated as: len(formatted_str) + sum(error_weight(e) for e in errors), - where error_weight computes actual content-based weight per error. - max_errors_per_entry: Maximum number of errors per cache entry (default: 50). - write_once: If True, reject updates to existing keys (default: False). - Enables data race prevention for financial applications. - strict: If True, raise CacheCorruptionError on checksum mismatch (default: True). - If False, silently evict corrupted entries and return cache miss. - enable_audit: If True, maintain audit log of all operations (default: False). - max_audit_entries: Maximum audit log entries before oldest are evicted (default: 10000). - - Raises: - ValueError: If maxsize, max_entry_weight, or max_errors_per_entry is not positive + """Initialize the integrity cache. + + Premise: + The cache owns both its integrity posture and its content-based + retained-payload budget. + + Reason: + Constructor validation keeps those boundaries explicit: callers get + one fail-closed cache whose debug ring, integrity-event sink, and + payload-byte limits are all checked before any entries can exist. """ - if maxsize <= 0: - msg = "maxsize must be positive" - raise ValueError(msg) - if max_entry_weight <= 0: - msg = "max_entry_weight must be positive" - raise ValueError(msg) - if max_errors_per_entry <= 0: - msg = "max_errors_per_entry must be positive" - raise ValueError(msg) + require_positive_int(maxsize, "maxsize") + require_positive_int(max_entry_payload_bytes, "max_entry_payload_bytes") + require_positive_int(max_errors_per_entry, "max_errors_per_entry") + require_bool(write_once, "write_once") + require_bool(enable_debug_log, "enable_debug_log") + require_positive_int(max_debug_entries, "max_debug_entries") + validated_sink = validate_optional_integrity_event_sink(integrity_event_sink) + validated_debug_key = validate_optional_debug_fingerprint_key(debug_fingerprint_key) self._cache: OrderedDict[_CacheKey, IntegrityCacheEntry] = OrderedDict() self._maxsize = maxsize - self._max_entry_weight = max_entry_weight + self._max_entry_payload_bytes = max_entry_payload_bytes self._max_errors_per_entry = max_errors_per_entry self._lock = Lock() self._write_once = write_once - self._strict = strict + self._cache_generation = 0 - # Audit logging with O(1) eviction via deque maxlen - self._audit_log: deque[WriteLogEntry] | None = ( - deque(maxlen=max_audit_entries) if enable_audit else None + self._debug_log: deque[CacheDebugLogEntry] | None = ( + deque(maxlen=max_debug_entries) if enable_debug_log else None ) - self._audit_sequence = 0 - self._max_audit_entries = max_audit_entries + self._debug_sequence = 0 + self._max_debug_entries = max_debug_entries + self._debug_fingerprint_key = ( + validated_debug_key if validated_debug_key is not None else token_bytes(32) + ) + self._integrity_event_sink = validated_sink - # Statistics self._hits = 0 self._misses = 0 self._unhashable_skips = 0 self._oversize_skips = 0 self._error_bloat_skips = 0 - self._combined_weight_skips = 0 + self._combined_payload_skips = 0 self._corruption_detected = 0 + self._integrity_events_emitted = 0 self._idempotent_writes = 0 self._write_once_conflicts = 0 + self._uncacheable_function_skips = 0 self._sequence = 0 + def _raise_key_contract_error( + self, + *, + operation: str, + message_id: str, + attribute: str | None, + locale_code: str, + use_isolating: bool, + detail: str, + ) -> NoReturn: + """Raise a typed key-contract failure and emit structured evidence.""" + with self._lock: + self._unhashable_skips += 1 + self._emit_integrity_event( + kind=CacheIntegrityEventKind.KEY_SERIALIZATION_FAILED, + key=None, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=self._sequence, + detail=detail, + ) + context = IntegrityContext( + component="cache", + operation=operation, + key=message_id, + expected="cache-key contract", + actual=detail, + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"Cache key contract failed for '{message_id}': {detail}" + raise CacheKeySerializationError(msg, context=context) + def get( self, message_id: str, @@ -213,97 +230,90 @@ def get( locale_code: str, *, use_isolating: bool, + function_generation: int = 0, ) -> IntegrityCacheEntry | None: - """Get cached entry with integrity verification. - - Thread-safe. Verifies checksum before returning entry. - - Args: - message_id: Message identifier - args: Message arguments (may contain unhashable values like lists) - attribute: Attribute name - locale_code: Locale code - use_isolating: Whether Unicode isolation marks are used - - Returns: - IntegrityCacheEntry if found and valid, None on miss or corruption - - Raises: - CacheCorruptionError: If strict=True and checksum mismatch detected - """ - key = self._make_key(message_id, args, attribute, locale_code, use_isolating=use_isolating) - + """Get a cached entry with integrity verification.""" + key = self._make_key( + message_id, + args, + attribute, + locale_code, + use_isolating=use_isolating, + function_generation=function_generation, + ) if key is None: - with self._lock: - self._unhashable_skips += 1 - # Unhashable args bypass the cache entirely: no key exists, no - # lookup occurs. This is not a cache miss — it is a cache bypass. - # Counting it as a miss would deflate hit_rate and mislead operators - # into diagnosing insufficient cache size when the real issue is - # an unhashable argument type. unhashable_skips is the correct counter. - return None + self._raise_key_contract_error( + operation="get", + message_id=message_id, + attribute=attribute, + locale_code=locale_code, + use_isolating=use_isolating, + detail="arguments could not be encoded into the canonical cache key", + ) with self._lock: entry = self._cache.get(key) if entry is None: self._misses += 1 - self._audit("MISS", key, None) + self._record_debug_operation("MISS", key, None) return None - # INTEGRITY CHECK: Verify checksum before returning if not entry.verify(): self._corruption_detected += 1 - self._audit("CORRUPTION", key, entry) - - if self._strict: - # Fail-fast: raise immediately - context = IntegrityContext( - component="cache", - operation="get", - key=message_id, - expected=entry.checksum.hex(), - actual="", - timestamp=time.monotonic(), - wall_time_unix=time.time(), - ) - msg = f"Cache entry corruption detected for '{message_id}'" - raise CacheCorruptionError(msg, context=context) - # Non-strict: evict corrupted entry, return miss + self._record_debug_operation("CORRUPTION", key, entry) + self._emit_integrity_event( + kind=CacheIntegrityEventKind.ENTRY_CORRUPTION, + key=key, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=entry.sequence, + detail="cached entry failed checksum verification", + ) del self._cache[key] - self._misses += 1 - return None + context = IntegrityContext( + component="cache", + operation="get", + key=message_id, + expected=entry.checksum.hex(), + actual="", + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"Cache entry corruption detected for '{message_id}'" + raise CacheCorruptionError(msg, context=context) - # KEY BINDING CHECK: Verify entry is stored under the correct key. - # Detects key confusion where an entry is moved to a different cache slot - # while its checksum remains internally consistent (verify() above only - # checks that the stored key_hash matches the checksum, not that the - # stored key_hash matches the CURRENT lookup key). - expected_key_hash = IntegrityCache._compute_key_hash(key) + expected_key_hash = IntegrityCache._compute_key_binding_digest(key) if not hmac.compare_digest(entry.key_hash, expected_key_hash): self._corruption_detected += 1 - self._audit("CORRUPTION", key, entry) - - if self._strict: - context = IntegrityContext( - component="cache", - operation="get", - key=message_id, - expected=expected_key_hash.hex(), - actual=entry.key_hash.hex(), - timestamp=time.monotonic(), - wall_time_unix=time.time(), - ) - msg = f"Cache key confusion detected for '{message_id}'" - raise CacheCorruptionError(msg, context=context) - # Non-strict: evict entry with wrong key binding, return miss + self._record_debug_operation("KEY_CONFUSION", key, entry) + self._emit_integrity_event( + kind=CacheIntegrityEventKind.KEY_CONFUSION, + key=key, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=entry.sequence, + detail="cached entry key binding did not match lookup slot", + ) del self._cache[key] - self._misses += 1 - return None + context = IntegrityContext( + component="cache", + operation="get", + key=message_id, + expected=expected_key_hash.hex(), + actual=entry.key_hash.hex(), + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"Cache key confusion detected for '{message_id}'" + raise CacheCorruptionError(msg, context=context) - # Move to end (mark as recently used) and record hit self._cache.move_to_end(key) self._hits += 1 - self._audit("HIT", key, entry) + self._record_debug_operation("HIT", key, entry) return entry def put( @@ -314,134 +324,167 @@ def put( locale_code: str, *, use_isolating: bool, + function_generation: int = 0, formatted: str, errors: tuple[FrozenFluentError, ...], ) -> None: - """Store entry with integrity metadata. - - Thread-safe. Computes checksum and stores immutable entry. - - Args: - message_id: Message identifier - args: Message arguments (may contain unhashable values like lists) - attribute: Attribute name - locale_code: Locale code - use_isolating: Whether Unicode isolation marks are used - formatted: Formatted message string - errors: Tuple of FrozenFluentError instances - - Raises: - WriteConflictError: If write_once=True and key already exists (strict mode) - """ - # Check entry weight before caching - if len(formatted) > self._max_entry_weight: + """Store a cache entry with integrity metadata.""" + retained_errors = tuple(error.sanitized_for_cache() for error in errors) + formatted_payload_bytes = len(formatted.encode("utf-8", errors="surrogatepass")) + if formatted_payload_bytes > self._max_entry_payload_bytes: with self._lock: self._oversize_skips += 1 return - if len(errors) > self._max_errors_per_entry: + if len(retained_errors) > self._max_errors_per_entry: with self._lock: self._error_bloat_skips += 1 return - # Dynamic weight calculation based on actual error content. - # Formatted string already passed the per-string check above; this fires when - # the combined total (formatted + error payload) exceeds the limit. - # Counted separately from error_bloat_skips so operators can distinguish - # "too many errors" (error_bloat) from "combined content too heavy" (combined_weight). - total_weight = len(formatted) + sum(_estimate_error_weight(e) for e in errors) - if total_weight > self._max_entry_weight: + total_payload_bytes = formatted_payload_bytes + sum( + _estimate_error_payload_bytes(error) for error in retained_errors + ) + if total_payload_bytes > self._max_entry_payload_bytes: with self._lock: - self._combined_weight_skips += 1 + self._combined_payload_skips += 1 return - key = self._make_key(message_id, args, attribute, locale_code, use_isolating=use_isolating) - + key = self._make_key( + message_id, + args, + attribute, + locale_code, + use_isolating=use_isolating, + function_generation=function_generation, + ) if key is None: - with self._lock: - self._unhashable_skips += 1 - return + self._raise_key_contract_error( + operation="put", + message_id=message_id, + attribute=attribute, + locale_code=locale_code, + use_isolating=use_isolating, + detail="arguments could not be encoded into the canonical cache key", + ) with self._lock: - # WRITE-ONCE: Check for duplicate keys with idempotent write detection if self._write_once and key in self._cache: existing = self._cache[key] - - # IDEMPOTENT CHECK: Compare content hashes (excludes metadata). - # Thundering herd scenario: Multiple threads resolve same message - # simultaneously, all compute identical results. First thread wins, - # subsequent threads should succeed silently (idempotent write). - # Cross-class call within the same module: IntegrityCacheEntry._compute_content_hash - # is a static pure function needed here to compare hashes without a full entry. - # Both classes are permanently co-located in cache.py. - new_content_hash = ( - IntegrityCacheEntry._compute_content_hash( # noqa: SLF001 - co-module - formatted, errors - ) + new_content_hash = IntegrityCacheEntry._compute_content_hash( # noqa: SLF001 - co-module pure helper + formatted, + retained_errors, ) if hmac.compare_digest(existing.content_hash, new_content_hash): - # Benign race: identical content already cached self._idempotent_writes += 1 - self._audit("WRITE_ONCE_IDEMPOTENT", key, existing) + self._record_debug_operation("WRITE_ONCE_IDEMPOTENT", key, existing) return - # TRUE CONFLICT: Different content for same key - self._audit("WRITE_ONCE_CONFLICT", key, existing) self._write_once_conflicts += 1 + self._record_debug_operation("WRITE_ONCE_CONFLICT", key, existing) + self._emit_integrity_event( + kind=CacheIntegrityEventKind.WRITE_CONFLICT, + key=key, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=existing.sequence, + detail="write-once conflict detected for an existing cache key", + ) + context = IntegrityContext( + component="cache", + operation="put", + key=message_id, + expected="", + actual=f"", + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"Write-once violation: '{message_id}' already cached" + raise WriteConflictError( + msg, + context=context, + existing_seq=existing.sequence, + new_seq=self._sequence + 1, + ) - if self._strict: - context = IntegrityContext( - component="cache", - operation="put", - key=message_id, - expected="", - actual=f"", - timestamp=time.monotonic(), - wall_time_unix=time.time(), - ) - msg = f"Write-once violation: '{message_id}' already cached" - raise WriteConflictError( - msg, - context=context, - existing_seq=existing.sequence, - new_seq=self._sequence + 1, - ) - return - - # Increment sequence for new entry self._sequence += 1 entry = IntegrityCacheEntry.create( - formatted, errors, self._sequence, IntegrityCache._compute_key_hash(key) + formatted, + retained_errors, + self._sequence, + IntegrityCache._compute_key_binding_digest(key), ) + if not entry.verify(): + self._record_debug_operation("ENTRY_VERIFICATION_FAILED", key, entry) + self._emit_integrity_event( + kind=CacheIntegrityEventKind.ENTRY_VERIFICATION_FAILED, + key=key, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=entry.sequence, + detail="new cache entry failed immediate verification", + ) + context = IntegrityContext( + component="cache", + operation="put", + key=message_id, + expected="freshly constructed entry passes verify()", + actual="verify() returned False", + timestamp=time.monotonic(), + wall_time_unix=time.time(), + ) + msg = f"New cache entry failed immediate verification for '{message_id}'" + raise IntegrityCheckFailedError(msg, context=context) - # LRU eviction only when adding a new key (not updating an existing one). - # Without this guard, updating an existing key in a full cache would - # evict an unrelated LRU entry AND keep the existing key, shrinking - # the cache by one slot per thundering-herd write to the same key. is_update = key in self._cache if not is_update and len(self._cache) >= self._maxsize: evicted_key, evicted_entry = self._cache.popitem(last=False) - self._audit("EVICT", evicted_key, evicted_entry) + self._record_debug_operation("EVICT", evicted_key, evicted_entry) - # Update existing (promote to MRU end) or insert new if is_update: self._cache.move_to_end(key) self._cache[key] = entry - self._audit("PUT", key, entry) + self._record_debug_operation("PUT", key, entry) - def clear(self) -> None: - """Clear all cached entries. + def note_uncacheable_result( + self, + message_id: str, + args: Mapping[str, FluentValue] | None, + attribute: str | None, + locale_code: str, + *, + use_isolating: bool, + function_generation: int = 0, + ) -> None: + """Record that one resolution result was intentionally not cached. - Thread-safe. Call when bundle is mutated (add_resource, add_function). + Premise: + Non-cacheable custom functions are a correctness choice, not a + cache-size miss. - Metrics are cumulative and NOT reset on clear. They reflect the total - operational history of this cache instance. Resetting on clear would - destroy production observability (hit-rate trends, corruption counts) - and make auditing impossible after routine cache invalidation. + Reason: + Operators need a distinct counter for “cache disabled by purity + contract” so they do not misdiagnose the bypass as insufficient + capacity or unhashable input. """ + key = self._make_key( + message_id, + args, + attribute, + locale_code, + use_isolating=use_isolating, + function_generation=function_generation, + ) + with self._lock: + self._uncacheable_function_skips += 1 + if key is not None: + self._record_debug_operation("BYPASS_NONCACHEABLE_FUNCTION", key, None) + + def clear(self) -> None: + """Clear all cached entries and advance the cache generation.""" with self._lock: self._cache.clear() - # Note: hits/misses/skips/corruption/idempotent_writes NOT reset - # — cumulative counters for production observability and audit. - # Note: sequence NOT reset (monotonic for audit trail) - # Note: audit log NOT cleared (historical record) + self._cache_generation += 1 diff --git a/src/ftllexengine/runtime/cache_audit.py b/src/ftllexengine/runtime/cache_audit.py index 893ba980..a9d2aaa1 100644 --- a/src/ftllexengine/runtime/cache_audit.py +++ b/src/ftllexengine/runtime/cache_audit.py @@ -1,50 +1,70 @@ -"""Audit helpers for IntegrityCache.""" +"""Bounded cache debug-log helpers for ``IntegrityCache``.""" + +# ruff: noqa: SLF001 - co-module mixins are the owning implementation surface from __future__ import annotations -import hashlib -import time -from typing import TYPE_CHECKING +from time import monotonic, time +from typing import TYPE_CHECKING, cast -from .cache_types import IntegrityCacheEntry, WriteLogEntry, _CacheKey +from .cache_events import CacheDebugLogEntry if TYPE_CHECKING: from .cache_protocols import CacheStateProtocol + from .cache_types import IntegrityCacheEntry, _CacheKey + + +def _as_cache_state(value: object) -> CacheStateProtocol: + """Cast one mixin receiver to the structural cache contract. + + Premise: + The mixins are reused by the concrete cache class, not instantiated on + their own. + + Reason: + Mypy cannot infer that relationship from mixin inheritance alone, so + the cast lives in one helper rather than leaking repeated type noise + through every method body. + """ + return cast("CacheStateProtocol", value) class _CacheAuditMixin: - """Audit-log behavior for IntegrityCache.""" + """Bounded debug-log behavior.""" - def get_audit_log(self: CacheStateProtocol) -> tuple[WriteLogEntry, ...]: - """Get audit log entries.""" - with self._lock: - if self._audit_log is None: + def get_debug_log(self: object) -> tuple[CacheDebugLogEntry, ...]: + """Return recent cache activity from the bounded debug ring.""" + state = _as_cache_state(self) + with state._lock: + if state._debug_log is None: return () - return tuple(self._audit_log) + return tuple(state._debug_log) - def _audit( - self: CacheStateProtocol, + def _record_debug_operation( + self: object, operation: str, key: _CacheKey, entry: IntegrityCacheEntry | None, ) -> None: - """Record audit log entry (internal, assumes lock held).""" - if self._audit_log is None: + """Record one recent-operation debug entry (lock already held).""" + state = _as_cache_state(self) + if state._debug_log is None: return - self._audit_sequence += 1 - key_hash = hashlib.blake2b( - str(key).encode("utf-8", errors="surrogatepass"), - digest_size=8, - ).hexdigest() + state._debug_sequence += 1 + key_fingerprint = state._compute_debug_key_fingerprint( + key, + secret=state._debug_fingerprint_key, + ) - log_entry = WriteLogEntry( + log_entry = CacheDebugLogEntry( operation=operation, - key_hash=key_hash, - timestamp=time.monotonic(), - sequence=self._audit_sequence, - cache_sequence=entry.sequence if entry is not None else self._sequence, + key_fingerprint=key_fingerprint, + timestamp_monotonic=monotonic(), + wall_time_unix=time(), + debug_sequence=state._debug_sequence, + cache_sequence=entry.sequence if entry is not None else state._sequence, + cache_generation=state._cache_generation, checksum_hex=entry.checksum.hex() if entry is not None else "", - wall_time_unix=time.time(), ) - self._audit_log.append(log_entry) + state._debug_log.append(log_entry) diff --git a/src/ftllexengine/runtime/cache_config.py b/src/ftllexengine/runtime/cache_config.py index 79b0f381..cc3778ab 100644 --- a/src/ftllexengine/runtime/cache_config.py +++ b/src/ftllexengine/runtime/cache_config.py @@ -1,87 +1,113 @@ -"""Cache configuration for FluentBundle. +"""Cache configuration for ``FluentBundle``. -Provides a single frozen dataclass that encapsulates all cache-related -parameters. Replaces seven individual constructor parameters with one -typed object, reducing API surface and eliminating parameter duplication -between FluentBundle and IntegrityCache. - -Python 3.13+. Zero external dependencies. +Provides one frozen dataclass that encapsulates all cache-related parameters. +The cache contract is intentionally separate from formatting strictness: cache +integrity failures are system failures regardless of whether callers choose +strict or fallback-oriented formatting behavior. """ from __future__ import annotations from dataclasses import dataclass +from typing import TYPE_CHECKING + +from ftllexengine.constants import ( + DEFAULT_CACHE_SIZE, + DEFAULT_MAX_ENTRY_PAYLOAD_BYTES, +) +from ftllexengine.core.validators import require_bool, require_positive_int -from ftllexengine.constants import DEFAULT_CACHE_SIZE, DEFAULT_MAX_ENTRY_WEIGHT -from ftllexengine.core.validators import require_positive_int +if TYPE_CHECKING: + from .cache_events import IntegrityEventSink __all__ = ["CacheConfig"] +def _validate_optional_fingerprint_key(value: object) -> bytes | None: + """Validate the optional keyed-fingerprint secret. + + Premise: + Debug fingerprints are privacy controls, not cosmetic formatting. + + Reason: + An empty string, text value, or short byte sequence weakens the contract + silently. The configuration boundary therefore validates the shape up + front instead of letting one cache instance limp along with weak input. + """ + if value is None: + return None + if not isinstance(value, bytes): + msg = f"debug_fingerprint_key must be bytes or None, got {type(value).__name__}" + raise TypeError(msg) + if len(value) < 16: + msg = "debug_fingerprint_key must contain at least 16 bytes" + raise ValueError(msg) + return value + + +def _validate_optional_integrity_event_sink(value: object) -> None: + """Validate the optional structured integrity-event sink.""" + if value is None: + return + record = getattr(value, "record", None) + if not callable(record): + msg = ( + "integrity_event_sink must implement a callable record(event) method, " + f"got {type(value).__name__}" + ) + raise TypeError(msg) + + @dataclass(frozen=True, slots=True) class CacheConfig: - """Immutable configuration for FluentBundle format caching. + """Immutable configuration for ``FluentBundle`` format caching. - All fields have sensible defaults; constructing ``CacheConfig()`` with - no arguments produces a usable configuration. Pass an instance to + All fields have sensible defaults; constructing ``CacheConfig()`` with no + arguments produces a usable cache configuration. Pass an instance to ``FluentBundle(cache=CacheConfig(...))`` to enable caching. Attributes: size: Maximum cache entries (default: 1000). write_once: Reject updates to existing cache keys (default: False). Enables data-race detection in concurrent environments. - integrity_strict: If True, raise CacheCorruptionError on checksum - mismatch and WriteConflictError on write-once violations - (default: True). If False, silently evict corrupted entries - and ignore write conflicts. Acts as an upper bound: when - FluentBundle ``strict=False``, the cache is always lenient - regardless of this setting (AND-gate with bundle strict mode). - enable_audit: Maintain audit log of all cache operations (default: False). - max_audit_entries: Maximum audit log entries before oldest eviction - (default: 10000). Only relevant when ``enable_audit=True``. - max_entry_weight: Maximum memory weight for a single cached result - (default: 10000). Results exceeding this are computed but not cached. + enable_debug_log: Maintain a bounded recent-operation ring buffer + (default: False). This is a debug surface, not a compliance ledger. + max_debug_entries: Maximum debug-log entries before oldest eviction + (default: 10000). Only relevant when ``enable_debug_log=True``. + max_entry_payload_bytes: Maximum retained UTF-8 payload bytes for one + cached result (default: 10000). Results exceeding this are computed + but not cached. max_errors_per_entry: Maximum errors per cache entry (default: 50). - Prevents memory exhaustion from pathological cases. - - Example: - >>> from ftllexengine import FluentBundle # doctest: +SKIP - >>> from ftllexengine.runtime.cache_config import CacheConfig # doctest: +SKIP - >>> config = CacheConfig(size=500, write_once=True) # doctest: +SKIP - >>> bundle = FluentBundle("en", cache=config) # doctest: +SKIP - >>> bundle.cache_enabled # doctest: +SKIP - True - >>> assert bundle.cache_config is not None # doctest: +SKIP - >>> bundle.cache_config.size # doctest: +SKIP - 500 - - Example - Financial application: - >>> config = CacheConfig( # doctest: +SKIP - ... write_once=True, - ... integrity_strict=True, - ... enable_audit=True, - ... max_audit_entries=50000, - ... ) - >>> bundle = FluentBundle("en", cache=config, strict=True) # doctest: +SKIP + Prevents payload blow-up from pathological failure sets. + integrity_event_sink: Optional structured sink for critical integrity + events such as corruption or write conflicts. + debug_fingerprint_key: Optional keyed-fingerprint secret used for debug + log key fingerprints. When omitted, each cache instance generates a + private process-local secret automatically. """ size: int = DEFAULT_CACHE_SIZE write_once: bool = False - integrity_strict: bool = True - enable_audit: bool = False - max_audit_entries: int = 10000 - max_entry_weight: int = DEFAULT_MAX_ENTRY_WEIGHT + enable_debug_log: bool = False + max_debug_entries: int = 10000 + max_entry_payload_bytes: int = DEFAULT_MAX_ENTRY_PAYLOAD_BYTES max_errors_per_entry: int = 50 + integrity_event_sink: IntegrityEventSink | None = None + debug_fingerprint_key: bytes | None = None def __post_init__(self) -> None: """Validate configuration values at construction time. Raises: - TypeError: If any integer field receives a non-int value. - ValueError: If size, max_entry_weight, max_errors_per_entry, - or max_audit_entries is zero or negative. + TypeError: If any field receives the wrong type. + ValueError: If any positive integer field is zero or negative, or + if ``debug_fingerprint_key`` is too short. """ require_positive_int(self.size, "size") - require_positive_int(self.max_entry_weight, "max_entry_weight") + require_bool(self.write_once, "write_once") + require_bool(self.enable_debug_log, "enable_debug_log") + require_positive_int(self.max_debug_entries, "max_debug_entries") + require_positive_int(self.max_entry_payload_bytes, "max_entry_payload_bytes") require_positive_int(self.max_errors_per_entry, "max_errors_per_entry") - require_positive_int(self.max_audit_entries, "max_audit_entries") + _validate_optional_integrity_event_sink(self.integrity_event_sink) + _validate_optional_fingerprint_key(self.debug_fingerprint_key) diff --git a/src/ftllexengine/runtime/cache_events.py b/src/ftllexengine/runtime/cache_events.py new file mode 100644 index 00000000..f15f2a5b --- /dev/null +++ b/src/ftllexengine/runtime/cache_events.py @@ -0,0 +1,174 @@ +"""Structured cache debug and integrity event contracts. + +The cache has two different evidence surfaces: + +1. a bounded debug ring for routine cache traffic such as hits and misses; +2. a critical integrity-event channel for corruption, write conflicts, and + contract failures that operators may need to retain durably. + +Keeping those surfaces separate prevents normal volume from overwriting the +small set of events that actually matter during incident response. +""" + +from __future__ import annotations + +import asyncio +import threading +from contextvars import ContextVar, Token +from dataclasses import dataclass, field +from enum import StrEnum +from time import monotonic, time +from typing import Protocol, Self, final + +__all__ = [ + "CacheDebugLogEntry", + "CacheIntegrityCorrelationScope", + "CacheIntegrityEvent", + "CacheIntegrityEventKind", + "IntegrityEventSink", + "MemoryIntegrityEventSink", +] + + +_cache_integrity_correlation_id: ContextVar[str | None] = ContextVar( + "ftllexengine_cache_integrity_correlation_id", + default=None, +) + + +class CacheIntegrityEventKind(StrEnum): + """Critical cache-integrity event kinds.""" + + ENTRY_CORRUPTION = "entry_corruption" + KEY_CONFUSION = "key_confusion" + WRITE_CONFLICT = "write_conflict" + KEY_SERIALIZATION_FAILED = "key_serialization_failed" + ENTRY_VERIFICATION_FAILED = "entry_verification_failed" + + +@dataclass(frozen=True, slots=True) +class CacheDebugLogEntry: + """One bounded debug-log record for routine cache traffic. + + Premise: + Debug history is useful for local cache tuning, but it is not the same + artifact as incident-grade integrity evidence. + + Reason: + The entry stores keyed fingerprints and cache sequencing data so callers + can inspect recent cache behavior without treating the ring buffer as an + append-only audit ledger. + """ + + operation: str + key_fingerprint: str + timestamp_monotonic: float + wall_time_unix: float + debug_sequence: int + cache_sequence: int + cache_generation: int + checksum_hex: str + + +@dataclass(frozen=True, slots=True) +class CacheIntegrityEvent: + """Structured critical integrity evidence emitted by the cache.""" + + kind: CacheIntegrityEventKind + message_id: str + locale_code: str + attribute: str | None + use_isolating: bool + key_fingerprint: str | None + event_sequence: int + cache_sequence: int + cache_generation: int + correlation_id: str | None + thread_id: int + task_name: str | None + detail: str + timestamp_monotonic: float = field(default_factory=monotonic) + wall_time_unix: float = field(default_factory=time) + + +class IntegrityEventSink(Protocol): + """Consumer of structured critical cache-integrity events.""" + + def record(self, event: CacheIntegrityEvent, /) -> None: + """Persist or forward one critical integrity event.""" + + +@final +class MemoryIntegrityEventSink: + """Thread-safe in-memory sink for tests and embedded diagnostics. + + Premise: + The library cannot assume every application wants file or network I/O + for integrity events. + + Reason: + A small in-memory sink gives callers and tests a concrete implementation + while leaving durable retention to explicit application wiring. + """ + + __slots__ = ("_events", "_lock") + + def __init__(self) -> None: + self._events: list[CacheIntegrityEvent] = [] + self._lock = threading.Lock() + + def record(self, event: CacheIntegrityEvent, /) -> None: + """Append one event to the in-memory list.""" + with self._lock: + self._events.append(event) + + def snapshot(self) -> tuple[CacheIntegrityEvent, ...]: + """Return an immutable view of recorded events.""" + with self._lock: + return tuple(self._events) + + +@final +class CacheIntegrityCorrelationScope: + """Context manager that binds one correlation ID to emitted integrity events. + + Premise: + Request correlation belongs to the call context, not to the cache key. + + Reason: + A context-local scope lets services attach request or job identifiers to + critical cache events without widening every formatting method signature. + """ + + __slots__ = ("_correlation_id", "_token") + + def __init__(self, correlation_id: str) -> None: + self._correlation_id = correlation_id + self._token: Token[str | None] | None = None + + def __enter__(self) -> Self: + self._token = _cache_integrity_correlation_id.set(self._correlation_id) + return self + + def __exit__( + self, + exc_type: type[BaseException] | None, + exc_val: BaseException | None, + exc_tb: object, + ) -> None: + if self._token is not None: + _cache_integrity_correlation_id.reset(self._token) + + +def current_cache_integrity_correlation_id() -> str | None: + """Return the correlation ID bound to the current logical execution flow.""" + return _cache_integrity_correlation_id.get() + + +def current_cache_integrity_task_name() -> str | None: + """Return the current asyncio task name when cache work runs inside a task.""" + try: + task = asyncio.current_task() + except RuntimeError: + return None + return task.get_name() if task is not None else None diff --git a/src/ftllexengine/runtime/cache_integrity_eventing.py b/src/ftllexengine/runtime/cache_integrity_eventing.py new file mode 100644 index 00000000..be33bc4b --- /dev/null +++ b/src/ftllexengine/runtime/cache_integrity_eventing.py @@ -0,0 +1,94 @@ +"""Structured cache-integrity event emission helpers.""" + +# ruff: noqa: SLF001 - co-module mixins are the owning implementation surface + +from __future__ import annotations + +import threading +from typing import TYPE_CHECKING, cast + +from .cache_events import ( + CacheIntegrityEvent, + CacheIntegrityEventKind, + current_cache_integrity_correlation_id, + current_cache_integrity_task_name, +) + +if TYPE_CHECKING: + from .cache_protocols import CacheStateProtocol + from .cache_types import _CacheKey + + +def _as_cache_state(value: object) -> CacheStateProtocol: + """Cast one mixin receiver to the structural cache contract.""" + return cast("CacheStateProtocol", value) + + +class _CacheIntegrityEventMixin: + """Critical integrity-event construction and emission.""" + + def _build_integrity_event( + self: object, + *, + kind: CacheIntegrityEventKind, + key: _CacheKey | None, + message_id: str, + locale_code: str, + attribute: str | None, + use_isolating: bool, + cache_sequence: int, + detail: str, + ) -> CacheIntegrityEvent: + """Construct one structured critical integrity event.""" + state = _as_cache_state(self) + state._integrity_events_emitted += 1 + return CacheIntegrityEvent( + kind=kind, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + key_fingerprint=( + state._compute_debug_key_fingerprint( + key, + secret=state._debug_fingerprint_key, + ) + if key is not None + else None + ), + event_sequence=state._integrity_events_emitted, + cache_sequence=cache_sequence, + cache_generation=state._cache_generation, + correlation_id=current_cache_integrity_correlation_id(), + thread_id=threading.get_ident(), + task_name=current_cache_integrity_task_name(), + detail=detail, + ) + + def _emit_integrity_event( + self: object, + *, + kind: CacheIntegrityEventKind, + key: _CacheKey | None, + message_id: str, + locale_code: str, + attribute: str | None, + use_isolating: bool, + cache_sequence: int, + detail: str, + ) -> CacheIntegrityEvent: + """Emit one structured critical integrity event.""" + state = _as_cache_state(self) + event = state._build_integrity_event( + kind=kind, + key=key, + message_id=message_id, + locale_code=locale_code, + attribute=attribute, + use_isolating=use_isolating, + cache_sequence=cache_sequence, + detail=detail, + ) + if state._integrity_event_sink is not None: + state._integrity_event_sink.record(event) + return event diff --git a/src/ftllexengine/runtime/cache_introspection.py b/src/ftllexengine/runtime/cache_introspection.py index 99dfeab1..95afb79b 100644 --- a/src/ftllexengine/runtime/cache_introspection.py +++ b/src/ftllexengine/runtime/cache_introspection.py @@ -1,12 +1,16 @@ -"""Stats and key-shaping helpers for IntegrityCache.""" +"""Stats and key-shaping helpers for ``IntegrityCache``.""" + +# ruff: noqa: SLF001 - co-module mixins are the owning implementation surface from __future__ import annotations -from typing import TYPE_CHECKING +from typing import TYPE_CHECKING, cast from ftllexengine.constants import MAX_DEPTH -from .cache_keys import HASHABLE_NODE_BUDGET, compute_key_hash, make_hashable, make_key +from .cache_key_codec import compute_debug_key_fingerprint, compute_key_binding_digest +from .cache_keys import HASHABLE_NODE_BUDGET, make_hashable, make_key +from .cache_types import CacheStats if TYPE_CHECKING: from collections.abc import Mapping @@ -14,23 +18,33 @@ from ftllexengine.core.value_types import FluentValue from .cache_protocols import CacheStateProtocol - from .cache_types import CacheStats, HashableValue, _CacheKey + from .cache_types import HashableValue, _CacheKey + + +def _as_cache_state(value: object) -> CacheStateProtocol: + """Cast one mixin receiver to the structural cache contract.""" + return cast("CacheStateProtocol", value) class _CacheKeyMixin: - """Static key-shaping helpers preserved on IntegrityCache.""" + """Static key-shaping helpers preserved on ``IntegrityCache``.""" _MAX_HASHABLE_NODES: int = HASHABLE_NODE_BUDGET @staticmethod def _make_hashable(value: object, depth: int = MAX_DEPTH) -> HashableValue: - """Convert potentially unhashable cache arguments into a stable hashable form.""" + """Convert potentially unhashable cache arguments into a stable form.""" return make_hashable(value, depth=depth) @staticmethod - def _compute_key_hash(key: _CacheKey) -> bytes: - """Compute the 8-byte key binding used to detect cache slot confusion.""" - return compute_key_hash(key) + def _compute_key_binding_digest(key: _CacheKey) -> bytes: + """Compute the internal key-binding digest for cache entries.""" + return compute_key_binding_digest(key) + + @staticmethod + def _compute_debug_key_fingerprint(key: _CacheKey, *, secret: bytes) -> str: + """Compute the keyed fingerprint exposed through debug/event surfaces.""" + return compute_debug_key_fingerprint(key, secret=secret) @staticmethod def _make_key( @@ -40,6 +54,7 @@ def _make_key( locale_code: str, *, use_isolating: bool, + function_generation: int = 0, ) -> _CacheKey | None: """Create the immutable lookup key for a formatting request.""" return make_key( @@ -48,121 +63,148 @@ def _make_key( attribute, locale_code, use_isolating=use_isolating, + function_generation=function_generation, ) class _CacheStatsMixin: - """Stats and property accessors for IntegrityCache.""" + """Stats and property accessors for ``IntegrityCache``.""" - def get_stats(self: CacheStateProtocol) -> CacheStats: + def get_stats(self: object) -> CacheStats: """Get cache statistics.""" - with self._lock: - total = self._hits + self._misses - hit_rate = (self._hits / total * 100) if total > 0 else 0.0 - - return { - "size": len(self._cache), - "maxsize": self._maxsize, - "max_entry_weight": self._max_entry_weight, - "max_errors_per_entry": self._max_errors_per_entry, - "hits": self._hits, - "misses": self._misses, - "hit_rate": round(hit_rate, 2), - "unhashable_skips": self._unhashable_skips, - "oversize_skips": self._oversize_skips, - "error_bloat_skips": self._error_bloat_skips, - "combined_weight_skips": self._combined_weight_skips, - "corruption_detected": self._corruption_detected, - "idempotent_writes": self._idempotent_writes, - "write_once_conflicts": self._write_once_conflicts, - "sequence": self._sequence, - "write_once": self._write_once, - "strict": self._strict, - "audit_enabled": self._audit_log is not None, - "audit_entries": len(self._audit_log) if self._audit_log is not None else 0, - } - - def __len__(self: CacheStateProtocol) -> int: + state = _as_cache_state(self) + with state._lock: + total = state._hits + state._misses + hit_rate = (state._hits / total * 100) if total > 0 else 0.0 + + return CacheStats( + size=len(state._cache), + maxsize=state._maxsize, + max_entry_payload_bytes=state._max_entry_payload_bytes, + max_errors_per_entry=state._max_errors_per_entry, + hits=state._hits, + misses=state._misses, + hit_rate=round(hit_rate, 2), + unhashable_skips=state._unhashable_skips, + oversize_skips=state._oversize_skips, + error_bloat_skips=state._error_bloat_skips, + combined_payload_skips=state._combined_payload_skips, + corruption_detected=state._corruption_detected, + integrity_events_emitted=state._integrity_events_emitted, + idempotent_writes=state._idempotent_writes, + write_once_conflicts=state._write_once_conflicts, + uncacheable_function_skips=state._uncacheable_function_skips, + sequence=state._sequence, + cache_generation=state._cache_generation, + write_once=state._write_once, + debug_log_enabled=state._debug_log is not None, + debug_log_entries=len(state._debug_log) if state._debug_log is not None else 0, + ) + + def __len__(self: object) -> int: """Get current cache size. Thread-safe.""" - with self._lock: - return len(self._cache) + state = _as_cache_state(self) + with state._lock: + return len(state._cache) @property - def size(self: CacheStateProtocol) -> int: + def size(self: object) -> int: """Current number of cached entries. Thread-safe.""" - with self._lock: - return len(self._cache) + state = _as_cache_state(self) + with state._lock: + return len(state._cache) @property - def maxsize(self: CacheStateProtocol) -> int: + def maxsize(self: object) -> int: """Maximum cache size.""" - return self._maxsize + state = _as_cache_state(self) + return state._maxsize @property - def hits(self: CacheStateProtocol) -> int: + def hits(self: object) -> int: """Number of cache hits. Thread-safe.""" - with self._lock: - return self._hits + state = _as_cache_state(self) + with state._lock: + return state._hits @property - def misses(self: CacheStateProtocol) -> int: + def misses(self: object) -> int: """Number of cache misses. Thread-safe.""" - with self._lock: - return self._misses + state = _as_cache_state(self) + with state._lock: + return state._misses @property - def unhashable_skips(self: CacheStateProtocol) -> int: - """Number of operations skipped due to unhashable args. Thread-safe.""" - with self._lock: - return self._unhashable_skips + def unhashable_skips(self: object) -> int: + """Number of operations rejected due to unsupported key input.""" + state = _as_cache_state(self) + with state._lock: + return state._unhashable_skips @property - def oversize_skips(self: CacheStateProtocol) -> int: - """Number of operations skipped due to result weight. Thread-safe.""" - with self._lock: - return self._oversize_skips + def oversize_skips(self: object) -> int: + """Number of operations skipped due to payload budget overrun.""" + state = _as_cache_state(self) + with state._lock: + return state._oversize_skips @property - def max_entry_weight(self: CacheStateProtocol) -> int: - """Maximum memory weight for cached results.""" - return self._max_entry_weight + def max_entry_payload_bytes(self: object) -> int: + """Maximum retained payload bytes for cached results.""" + state = _as_cache_state(self) + return state._max_entry_payload_bytes @property - def corruption_detected(self: CacheStateProtocol) -> int: + def corruption_detected(self: object) -> int: """Number of checksum mismatches detected. Thread-safe.""" - with self._lock: - return self._corruption_detected + state = _as_cache_state(self) + with state._lock: + return state._corruption_detected @property - def idempotent_writes(self: CacheStateProtocol) -> int: + def integrity_events_emitted(self: object) -> int: + """Number of critical integrity events emitted. Thread-safe.""" + state = _as_cache_state(self) + with state._lock: + return state._integrity_events_emitted + + @property + def idempotent_writes(self: object) -> int: """Number of benign concurrent writes with identical content. Thread-safe.""" - with self._lock: - return self._idempotent_writes + state = _as_cache_state(self) + with state._lock: + return state._idempotent_writes @property - def error_bloat_skips(self: CacheStateProtocol) -> int: + def error_bloat_skips(self: object) -> int: """Number of puts skipped due to excess error count. Thread-safe.""" - with self._lock: - return self._error_bloat_skips + state = _as_cache_state(self) + with state._lock: + return state._error_bloat_skips @property - def combined_weight_skips(self: CacheStateProtocol) -> int: - """Number of puts skipped due to combined formatted+error weight. Thread-safe.""" - with self._lock: - return self._combined_weight_skips + def combined_payload_skips(self: object) -> int: + """Number of puts skipped due to combined payload limit. Thread-safe.""" + state = _as_cache_state(self) + with state._lock: + return state._combined_payload_skips @property - def write_once_conflicts(self: CacheStateProtocol) -> int: - """Number of true write-once conflicts (different content, same key). Thread-safe.""" - with self._lock: - return self._write_once_conflicts + def write_once_conflicts(self: object) -> int: + """Number of true write-once conflicts. Thread-safe.""" + state = _as_cache_state(self) + with state._lock: + return state._write_once_conflicts @property - def write_once(self: CacheStateProtocol) -> bool: - """Whether write-once mode is enabled.""" - return self._write_once + def uncacheable_function_skips(self: object) -> int: + """Number of results not cached due to non-cacheable functions.""" + state = _as_cache_state(self) + with state._lock: + return state._uncacheable_function_skips @property - def strict(self: CacheStateProtocol) -> bool: - """Whether strict mode is enabled.""" - return self._strict + def write_once(self: object) -> bool: + """Whether write-once mode is enabled.""" + state = _as_cache_state(self) + return state._write_once diff --git a/src/ftllexengine/runtime/cache_key_codec.py b/src/ftllexengine/runtime/cache_key_codec.py new file mode 100644 index 00000000..f1d99964 --- /dev/null +++ b/src/ftllexengine/runtime/cache_key_codec.py @@ -0,0 +1,192 @@ +"""Canonical serialization for cache keys and keyed fingerprints. + +The cache owns one versioned binary encoding for lookup keys so that: + +- entry key binding and debug fingerprints cannot drift independently; +- hash inputs are explicit data structures rather than ``str(key)`` display + strings; +- future format changes can bump one codec version consciously. +""" + +from __future__ import annotations + +import hashlib +from datetime import date, datetime +from decimal import Decimal +from typing import TYPE_CHECKING, cast + +from ftllexengine.core.value_types import FluentNumber + +if TYPE_CHECKING: + from .cache_types import HashableValue, _CacheKey + +__all__ = [ + "compute_debug_key_fingerprint", + "compute_key_binding_digest", + "encode_cache_key", +] + +_CACHE_KEY_CODEC_VERSION: bytes = b"FTLLexEngineCacheKey\x01" + + +def _encode_int(value: int) -> bytes: + """Encode one arbitrary-precision Python integer canonically. + + Premise: + ``FluentValue`` accepts Python ``int``, whose precision is unbounded. + + Reason: + The cache key codec must preserve that contract instead of silently + truncating to 64-bit integers or crashing on large values. A sign byte + plus a length-prefixed magnitude gives one canonical binary encoding for + every integer representable by Python. + """ + magnitude = abs(value) + magnitude_bytes = magnitude.to_bytes( + max(1, (magnitude.bit_length() + 7) // 8), + "big", + ) + sign_byte = b"\x01" if value < 0 else b"\x00" + return sign_byte + len(magnitude_bytes).to_bytes(4, "big") + magnitude_bytes + + +def _encode_bool(*, value: bool) -> bytes: + return b"\x01" if value else b"\x00" + + +def _encode_text(value: str) -> bytes: + encoded = value.encode("utf-8", errors="surrogatepass") + return len(encoded).to_bytes(4, "big") + encoded + + +def _encode_decimal(value: Decimal) -> bytes: + if value.is_nan(): + return b"D" + _encode_text("NaN") + return b"D" + _encode_text(str(value)) + + +def _encode_datetime(value: datetime) -> bytes: + tz_key = str(value.tzinfo) if value.tzinfo is not None else "__naive__" + return b"T" + _encode_text(value.isoformat()) + _encode_text(tz_key) + + +def _encode_fluent_number(value: FluentNumber) -> bytes: + return ( + b"F" + + _encode_text(type(value.value).__name__) + + _encode_hashable_value(cast("HashableValue", value.value)) + + _encode_text(value.formatted) + + _encode_hashable_value(cast("HashableValue", value.precision)) + ) + + +def _encode_tuple(value: tuple[HashableValue, ...]) -> bytes: + return ( + b"Q" + + len(value).to_bytes(4, "big") + + b"".join(_encode_hashable_value(item) for item in value) + ) + + +def _encode_frozenset(value: frozenset[HashableValue]) -> bytes: + encoded_items = sorted(_encode_hashable_value(item) for item in value) + return b"R" + len(encoded_items).to_bytes(4, "big") + b"".join(encoded_items) + + +def _encode_basic_scalar_value(value: HashableValue) -> bytes | None: + if value is None: + return b"N" + if isinstance(value, str): + return b"S" + _encode_text(value) + if isinstance(value, bool): + return b"B" + _encode_bool(value=value) + if isinstance(value, int): + return b"I" + _encode_int(value) + return None + + +def _encode_extended_scalar_value(value: HashableValue) -> bytes | None: + if isinstance(value, Decimal): + return _encode_decimal(value) + if isinstance(value, datetime): + return _encode_datetime(value) + if isinstance(value, date): + return b"d" + _encode_text(value.isoformat()) + if isinstance(value, FluentNumber): + return _encode_fluent_number(value) + return None + + +def _encode_scalar_value(value: HashableValue) -> bytes | None: + encoded_basic = _encode_basic_scalar_value(value) + if encoded_basic is not None: + return encoded_basic + return _encode_extended_scalar_value(value) + + +def _encode_collection_value(value: HashableValue) -> bytes | None: + if isinstance(value, tuple): + return _encode_tuple(value) + if isinstance(value, frozenset): + return _encode_frozenset(value) + return None + + +def _encode_hashable_value(value: HashableValue) -> bytes: + encoded_scalar = _encode_scalar_value(value) + if encoded_scalar is not None: + return encoded_scalar + + encoded_collection = _encode_collection_value(value) + if encoded_collection is not None: + return encoded_collection + + msg = f"Unsupported cache key value type: {type(value).__name__}" + raise TypeError(msg) + + +def encode_cache_key(key: _CacheKey) -> bytes: + """Return the one canonical binary encoding for one cache key. + + Premise: + Cache-key hashing is a contract fact shared across integrity checks, + debug logs, and any external correlation tooling. + + Reason: + One versioned encoder prevents accidental drift between the key binding + digest stored inside entries and the keyed fingerprints exposed through + observability surfaces. + """ + message_id, args_tuple, attribute, locale_code, use_isolating, function_generation = key + + encoded_args = bytearray() + encoded_args.extend(len(args_tuple).to_bytes(4, "big")) + for arg_name, arg_value in args_tuple: + encoded_args.extend(_encode_text(arg_name)) + encoded_args.extend(_encode_hashable_value(arg_value)) + + return b"".join( + ( + _CACHE_KEY_CODEC_VERSION, + _encode_text(message_id), + bytes(encoded_args), + b"\x01" + _encode_text(attribute) if attribute is not None else b"\x00", + _encode_text(locale_code), + _encode_bool(value=use_isolating), + _encode_int(function_generation), + ) + ) + + +def compute_key_binding_digest(key: _CacheKey) -> bytes: + """Compute the internal key-binding digest stored in cache entries.""" + return hashlib.blake2b(encode_cache_key(key), digest_size=16).digest() + + +def compute_debug_key_fingerprint(key: _CacheKey, *, secret: bytes) -> str: + """Compute the keyed fingerprint exposed to debug and event surfaces.""" + return hashlib.blake2b( + encode_cache_key(key), + key=secret, + digest_size=12, + ).hexdigest() diff --git a/src/ftllexengine/runtime/cache_keys.py b/src/ftllexengine/runtime/cache_keys.py index 6fe29975..4f99d5fa 100644 --- a/src/ftllexengine/runtime/cache_keys.py +++ b/src/ftllexengine/runtime/cache_keys.py @@ -1,8 +1,7 @@ -"""Hashable-key conversion helpers for IntegrityCache.""" +"""Hashable-key conversion helpers for ``IntegrityCache``.""" from __future__ import annotations -import hashlib from collections.abc import Mapping, Sequence from datetime import date, datetime from decimal import Decimal @@ -16,11 +15,32 @@ from ftllexengine.runtime.cache_types import HashableValue, _CacheKey -__all__ = ["HASHABLE_NODE_BUDGET", "compute_key_hash", "make_hashable", "make_key"] +__all__ = ["HASHABLE_NODE_BUDGET", "make_hashable", "make_key"] HASHABLE_NODE_BUDGET: int = 10_000 +def _validated_mapping_items( + value: Mapping[object, object], +) -> tuple[tuple[str, object], ...]: + """Normalize mapping keys before any sorting or hashing happens. + + Premise: + Cache-key shaping must not execute arbitrary key comparison behavior. + + Reason: + Restricting keys to plain strings keeps the cache boundary a pure data + transformation instead of letting comparison methods participate. + """ + items: list[tuple[str, object]] = [] + for key, item in value.items(): + if not isinstance(key, str): + msg = f"Mapping keys in cache arguments must be str, got {type(key).__name__}" + raise TypeError(msg) + items.append((key, item)) + return tuple(sorted(items, key=lambda pair: pair[0])) + + def _hashable_decimal(value: Decimal) -> HashableValue: if value.is_nan(): return ("__decimal__", "__NaN__") @@ -39,7 +59,10 @@ def _hashable_mapping( ) -> HashableValue: return cast( "HashableValue", - (tag, tuple(sorted((key, recurse(item)) for key, item in value.items()))), + ( + tag, + tuple((key, recurse(item)) for key, item in _validated_mapping_items(value)), + ), ) @@ -136,7 +159,9 @@ def recurse(item: object) -> HashableValue: return known_value if isinstance(current, Mapping): return _hashable_mapping("__mapping__", current, recurse) - if isinstance(current, Sequence): + if isinstance(current, Sequence) and not isinstance( + current, (str, bytes, bytearray) + ): return _hashable_sequence("__seq__", current, recurse) msg = f"Unknown type in cache key: {type(current).__name__}" @@ -145,14 +170,6 @@ def recurse(item: object) -> HashableValue: return go(value, depth) -def compute_key_hash(key: _CacheKey) -> bytes: - """Compute the 8-byte BLAKE2b key binding used by cache entries.""" - return hashlib.blake2b( - str(key).encode("utf-8", errors="surrogatepass"), - digest_size=8, - ).digest() - - def make_key( message_id: str, args: Mapping[str, FluentValue] | None, @@ -160,6 +177,7 @@ def make_key( locale_code: str, *, use_isolating: bool, + function_generation: int = 0, ) -> _CacheKey | None: """Create an immutable cache key tuple from formatting arguments.""" if args is None: @@ -167,11 +185,11 @@ def make_key( else: try: items: list[tuple[str, HashableValue]] = [] - for key, value in args.items(): + for key, value in _validated_mapping_items(cast("Mapping[object, object]", args)): items.append((key, make_hashable(value))) - args_tuple = tuple(sorted(items)) + args_tuple = tuple(items) hash(args_tuple) except (TypeError, RecursionError): return None - return (message_id, args_tuple, attribute, locale_code, use_isolating) + return (message_id, args_tuple, attribute, locale_code, use_isolating, function_generation) diff --git a/src/ftllexengine/runtime/cache_protocols.py b/src/ftllexengine/runtime/cache_protocols.py index 2a4bc2f4..dc8615eb 100644 --- a/src/ftllexengine/runtime/cache_protocols.py +++ b/src/ftllexengine/runtime/cache_protocols.py @@ -1,4 +1,4 @@ -"""Typing protocols for IntegrityCache mixins.""" +"""Typing protocols for cache mixins.""" from __future__ import annotations @@ -8,31 +8,55 @@ from collections import OrderedDict, deque from threading import Lock - from .cache_types import CacheStats, IntegrityCacheEntry, WriteLogEntry, _CacheKey + from .cache_events import CacheDebugLogEntry, CacheIntegrityEvent, IntegrityEventSink + from .cache_types import CacheStats, IntegrityCacheEntry, _CacheKey class CacheStateProtocol(Protocol): - """Structural contract implemented by IntegrityCache.""" + """Structural contract implemented by ``IntegrityCache``.""" - _audit_log: deque[WriteLogEntry] | None - _audit_sequence: int _cache: OrderedDict[_CacheKey, IntegrityCacheEntry] - _combined_weight_skips: int + _cache_generation: int + _combined_payload_skips: int _corruption_detected: int + _debug_log: deque[CacheDebugLogEntry] | None + _debug_sequence: int + _debug_fingerprint_key: bytes _error_bloat_skips: int _hits: int _idempotent_writes: int + _integrity_event_sink: IntegrityEventSink | None + _integrity_events_emitted: int _lock: Lock - _max_entry_weight: int + _max_debug_entries: int + _max_entry_payload_bytes: int _max_errors_per_entry: int _maxsize: int _misses: int _oversize_skips: int _sequence: int - _strict: bool + _uncacheable_function_skips: int _unhashable_skips: int _write_once: bool _write_once_conflicts: int def get_stats(self) -> CacheStats: ... # pragma: no cover - typing-only protocol declaration + + @staticmethod + def _compute_debug_key_fingerprint(key: _CacheKey, *, secret: bytes) -> str: + ... # pragma: no cover - typing-only protocol declaration + + def _build_integrity_event( + self, + *, + kind: object, + key: _CacheKey | None, + message_id: str, + locale_code: str, + attribute: str | None, + use_isolating: bool, + cache_sequence: int, + detail: str, + ) -> CacheIntegrityEvent: + ... # pragma: no cover - typing-only protocol declaration diff --git a/src/ftllexengine/runtime/cache_types.py b/src/ftllexengine/runtime/cache_types.py index 01aeb316..a464d4c7 100644 --- a/src/ftllexengine/runtime/cache_types.py +++ b/src/ftllexengine/runtime/cache_types.py @@ -6,33 +6,43 @@ import hmac import struct import time +from collections.abc import Iterator, Mapping from dataclasses import dataclass, field from datetime import date, datetime from decimal import Decimal -from typing import TypedDict +from typing import cast from ftllexengine.core.value_types import FluentNumber from ftllexengine.diagnostics import FrozenFluentError __all__ = [ "_DEFAULT_MAX_ERRORS_PER_ENTRY", - "CacheAuditLogEntry", "CacheStats", "HashableValue", "IntegrityCacheEntry", - "WriteLogEntry", "_CacheKey", "_CacheValue", - "_estimate_error_weight", + "_estimate_error_payload_bytes", ] -class CacheStats(TypedDict): - """Typed statistics snapshot returned by IntegrityCache.get_stats().""" +@dataclass(frozen=True, slots=True) +class CacheStats(Mapping[str, int | float | bool]): + """Immutable cache statistics snapshot returned by ``IntegrityCache``. + + Premise: + Operational evidence is part of the cache contract, not an incidental + debugging convenience. + + Reason: + Returning a mutable ``dict`` weakens the public surface by suggesting + callers can edit cache state. This snapshot behaves like a read-only + mapping for ergonomics while keeping the contract immutable. + """ size: int maxsize: int - max_entry_weight: int + max_entry_payload_bytes: int max_errors_per_entry: int hits: int misses: int @@ -40,51 +50,90 @@ class CacheStats(TypedDict): unhashable_skips: int oversize_skips: int error_bloat_skips: int + combined_payload_skips: int corruption_detected: int + integrity_events_emitted: int idempotent_writes: int write_once_conflicts: int - combined_weight_skips: int + uncacheable_function_skips: int sequence: int + cache_generation: int write_once: bool - strict: bool - audit_enabled: bool - audit_entries: int + debug_log_enabled: bool + debug_log_entries: int + + def __getitem__(self, key: str) -> int | float | bool: + """Provide mapping-style access for existing operational call sites.""" + if key not in self.__dataclass_fields__: + raise KeyError(key) + return cast("int | float | bool", getattr(self, key)) + + def __iter__(self) -> Iterator[str]: + """Iterate over public statistic field names in declaration order.""" + return iter(self.__dataclass_fields__) + + def __len__(self) -> int: + """Return the number of exposed cache statistic fields.""" + return len(self.__dataclass_fields__) + + def as_dict(self) -> dict[str, int | float | bool]: + """Materialize an ordinary ``dict`` when a concrete mapping is required.""" + return {key: self[key] for key in self} -_ERROR_BASE_OVERHEAD: int = 100 _DEFAULT_MAX_ERRORS_PER_ENTRY: int = 50 +_PAYLOAD_BASE_BYTES: int = 8 + + +def _encoded_length(value: str) -> int: + """Return the stored UTF-8 byte length for one text field.""" + return len(value.encode("utf-8", errors="surrogatepass")) -def _estimate_error_weight(error: FrozenFluentError) -> int: - """Estimate the memory weight of one FrozenFluentError.""" - weight = _ERROR_BASE_OVERHEAD + len(error.message) +def _estimate_error_payload_bytes(error: FrozenFluentError) -> int: + """Estimate the serialized payload bytes retained for one cached error. + + Premise: + The cache budget must describe what the cache actually retains, not a + vague approximation of process memory. + + Reason: + A payload-byte estimate is deterministic and portable across Python + builds, unlike object-allocator overhead. The cache therefore limits + retained error payload, while overall entry count stays bounded by + ``maxsize``. + """ + payload = _PAYLOAD_BASE_BYTES + _encoded_length(error.message) if error.diagnostic is not None: - diag = error.diagnostic - weight += len(diag.message) + diagnostic = error.diagnostic + payload += _encoded_length(diagnostic.code.name) + payload += _encoded_length(diagnostic.message) for attr in ( - diag.hint, - diag.help_url, - diag.function_name, - diag.argument_name, - diag.expected_type, - diag.received_type, - diag.ftl_location, + diagnostic.hint, + diagnostic.help_url, + diagnostic.function_name, + diagnostic.argument_name, + diagnostic.expected_type, + diagnostic.received_type, + diagnostic.ftl_location, ): if attr is not None: - weight += len(attr) - if diag.resolution_path is not None: - for path_element in diag.resolution_path: - weight += len(path_element) + payload += _encoded_length(attr) + if diagnostic.span is not None: + payload += 16 + if diagnostic.resolution_path is not None: + for path_element in diagnostic.resolution_path: + payload += _encoded_length(path_element) if error.context is not None: - ctx = error.context - weight += len(ctx.input_value) - weight += len(ctx.locale_code) - weight += len(ctx.parse_type) - weight += len(ctx.fallback_value) + context = error.context + payload += _encoded_length(context.input_value) + payload += _encoded_length(context.locale_code) + payload += _encoded_length(context.parse_type) + payload += _encoded_length(context.fallback_value) - return weight + return payload type HashableValue = ( @@ -100,13 +149,25 @@ def _estimate_error_weight(error: FrozenFluentError) -> int: | frozenset["HashableValue"] ) -type _CacheKey = tuple[str, tuple[tuple[str, HashableValue], ...], str | None, str, bool] +type _CacheKey = ( + tuple[str, tuple[tuple[str, HashableValue], ...], str | None, str, bool, int] +) type _CacheValue = tuple[str, tuple[FrozenFluentError, ...]] @dataclass(frozen=True, slots=True) class IntegrityCacheEntry: - """Immutable cache entry with integrity metadata.""" + """Immutable cache entry with accidental-corruption detection metadata. + + Premise: + Cache entries can outlive the request that produced them. + + Reason: + The entry stores the sanitized error snapshot returned by + ``FrozenFluentError.sanitized_for_cache()`` rather than the live error + object, so retention follows the cache privacy contract instead of the + transient runtime contract. + """ formatted: str errors: tuple[FrozenFluentError, ...] @@ -117,7 +178,7 @@ class IntegrityCacheEntry: content_hash: bytes = field(init=False, repr=False, compare=False, hash=False) def __post_init__(self) -> None: - """Compute and store content_hash after field initialization.""" + """Compute and store ``content_hash`` after field initialization.""" object.__setattr__( self, "content_hash", self._compute_content_hash(self.formatted, self.errors) ) @@ -130,7 +191,7 @@ def create( sequence: int, key_hash: bytes, ) -> IntegrityCacheEntry: - """Create entry with computed checksum.""" + """Create an entry with computed accidental-corruption digest.""" created_at = time.monotonic() checksum = cls._compute_checksum(formatted, errors, created_at, sequence, key_hash) return cls( @@ -144,7 +205,7 @@ def create( @staticmethod def _feed_errors(h: hashlib.blake2b, errors: tuple[FrozenFluentError, ...]) -> None: - """Feed error sequence into an active hasher.""" + """Feed the error sequence into an active hasher.""" h.update(len(errors).to_bytes(4, "big")) for error in errors: h.update(b"\x01") @@ -158,7 +219,17 @@ def _compute_checksum( sequence: int, key_hash: bytes, ) -> bytes: - """Compute a BLAKE2b-128 checksum for content plus metadata.""" + """Compute a BLAKE2b-128 digest for content plus metadata. + + Premise: + Cache entries need a cheap detector for accidental mutation and key + confusion inside the current process. + + Reason: + This digest is not advertised as tamper evidence against code that + can rewrite both payload and digest; it is a fail-closed accidental + corruption detector. + """ h = hashlib.blake2b(digest_size=16) encoded = formatted.encode("utf-8", errors="surrogatepass") h.update(len(encoded).to_bytes(4, "big")) @@ -184,7 +255,7 @@ def verify(self) -> bool: return all(error.verify_integrity() for error in self.errors) def as_result(self) -> _CacheValue: - """Extract formatted result and errors as a tuple.""" + """Extract the formatted result and cached error tuple.""" return (self.formatted, self.errors) @staticmethod @@ -192,26 +263,10 @@ def _compute_content_hash( formatted: str, errors: tuple[FrozenFluentError, ...], ) -> bytes: - """Compute a BLAKE2b-128 hash of content only.""" + """Compute a BLAKE2b-128 digest of content only.""" h = hashlib.blake2b(digest_size=16) encoded = formatted.encode("utf-8", errors="surrogatepass") h.update(len(encoded).to_bytes(4, "big")) h.update(encoded) IntegrityCacheEntry._feed_errors(h, errors) return h.digest() - - -@dataclass(frozen=True, slots=True) -class WriteLogEntry: - """Immutable audit log entry for cache operations.""" - - operation: str - key_hash: str - timestamp: float - sequence: int - cache_sequence: int - checksum_hex: str - wall_time_unix: float - - -CacheAuditLogEntry = WriteLogEntry diff --git a/src/ftllexengine/runtime/cache_validation.py b/src/ftllexengine/runtime/cache_validation.py new file mode 100644 index 00000000..5c6dab02 --- /dev/null +++ b/src/ftllexengine/runtime/cache_validation.py @@ -0,0 +1,40 @@ +"""Constructor-boundary validation for cache configuration primitives.""" + +from __future__ import annotations + +from typing import TYPE_CHECKING, cast + +if TYPE_CHECKING: + from .cache_events import IntegrityEventSink + +__all__ = [ + "validate_optional_debug_fingerprint_key", + "validate_optional_integrity_event_sink", +] + + +def validate_optional_integrity_event_sink(value: object) -> IntegrityEventSink | None: + """Validate the optional structured integrity-event sink boundary.""" + if value is None: + return None + record = getattr(value, "record", None) + if not callable(record): + msg = ( + "integrity_event_sink must implement a callable record(event) method, " + f"got {type(value).__name__}" + ) + raise TypeError(msg) + return cast("IntegrityEventSink", value) + + +def validate_optional_debug_fingerprint_key(value: object) -> bytes | None: + """Validate the optional keyed fingerprint secret boundary.""" + if value is None: + return None + if not isinstance(value, bytes): + msg = f"debug_fingerprint_key must be bytes or None, got {type(value).__name__}" + raise TypeError(msg) + if len(value) < 16: + msg = "debug_fingerprint_key must contain at least 16 bytes" + raise ValueError(msg) + return value diff --git a/src/ftllexengine/runtime/function_bridge.py b/src/ftllexengine/runtime/function_bridge.py index b75d2a24..fd09619a 100644 --- a/src/ftllexengine/runtime/function_bridge.py +++ b/src/ftllexengine/runtime/function_bridge.py @@ -75,12 +75,13 @@ class FunctionRegistry(_FunctionRegistryIntrospectionMixin): CUSTOM """ - __slots__ = ("_frozen", "_functions") + __slots__ = ("_frozen", "_functions", "_generation") def __init__(self) -> None: """Initialize empty function registry.""" self._functions: dict[str, FunctionSignature] = {} self._frozen: bool = False + self._generation: int = 0 def register( self, @@ -88,6 +89,7 @@ def register( *, ftl_name: str | None = None, param_map: dict[str, str] | None = None, + cacheable: bool = False, ) -> None: """Register Python function for FTL use. @@ -95,6 +97,9 @@ def register( func: Python function to register ftl_name: Function name in FTL (default: func.__name__.upper()) param_map: Custom parameter mappings (overrides auto-generation) + cacheable: Whether formatting results that depend on this function + may be cached. Defaults to ``False`` for safety; built-in pure + formatting helpers opt in explicitly during registry creation. Raises: TypeError: If registry is frozen (via freeze() method). @@ -121,8 +126,10 @@ def register( func, ftl_name=ftl_name, param_map=param_map, + cacheable=cacheable, ) self._functions[signature_metadata.ftl_name] = signature_metadata + self._generation += 1 def call( self, @@ -184,6 +191,7 @@ def copy(self) -> FunctionRegistry: """Create an unfrozen copy of this registry.""" new_registry = FunctionRegistry() new_registry._functions = self._functions.copy() + new_registry._generation = self._generation return new_registry @staticmethod diff --git a/src/ftllexengine/runtime/function_metadata.py b/src/ftllexengine/runtime/function_metadata.py index 7cfbcf38..138ac9cd 100644 --- a/src/ftllexengine/runtime/function_metadata.py +++ b/src/ftllexengine/runtime/function_metadata.py @@ -50,6 +50,7 @@ class FunctionMetadata: requires_locale: Whether function needs bundle locale injected expected_positional_args: Expected number of positional args from FTL (before locale) category: Function category for documentation + cacheable: Whether formatted outputs using this function may be cached Example: >>> NUMBER_META = FunctionMetadata( # doctest: +SKIP @@ -66,6 +67,7 @@ class FunctionMetadata: requires_locale: bool expected_positional_args: int = 1 category: FunctionCategory = FunctionCategory.FORMATTING + cacheable: bool = True # Centralized metadata registry for built-in functions diff --git a/src/ftllexengine/runtime/function_registry_helpers.py b/src/ftllexengine/runtime/function_registry_helpers.py index f6e94570..d87c59bd 100644 --- a/src/ftllexengine/runtime/function_registry_helpers.py +++ b/src/ftllexengine/runtime/function_registry_helpers.py @@ -6,6 +6,7 @@ from typing import TYPE_CHECKING from ftllexengine.diagnostics import ErrorCategory, ErrorTemplate, FrozenFluentError +from ftllexengine.diagnostics._redaction import redacted_custom_function_failure from .function_decorator import _FTL_REQUIRES_LOCALE_ATTR from .value_types import FunctionSignature @@ -29,6 +30,7 @@ def build_function_signature( *, ftl_name: str | None = None, param_map: dict[str, str] | None = None, + cacheable: bool, ) -> FunctionSignature: """Build immutable registration metadata for one callable.""" if ftl_name is None: @@ -87,6 +89,7 @@ def build_function_signature( ftl_name=ftl_name, param_mapping=immutable_mapping, callable=func, + cacheable=cacheable, ) @@ -106,5 +109,5 @@ def call_registered_function( try: return func_sig.callable(*positional, **python_kwargs) except (TypeError, ValueError) as e: - diag = ErrorTemplate.function_failed(ftl_name, str(e)) + diag = ErrorTemplate.function_failed(ftl_name, redacted_custom_function_failure(e)) raise FrozenFluentError(str(diag), ErrorCategory.RESOLUTION, diagnostic=diag) from e diff --git a/src/ftllexengine/runtime/function_registry_introspection.py b/src/ftllexengine/runtime/function_registry_introspection.py index cc6da725..1d4665bb 100644 --- a/src/ftllexengine/runtime/function_registry_introspection.py +++ b/src/ftllexengine/runtime/function_registry_introspection.py @@ -19,6 +19,7 @@ class _FunctionRegistryState(Protocol): """Structural contract implemented by FunctionRegistry.""" _functions: dict[str, FunctionSignature] + _generation: int class _FunctionRegistryIntrospectionMixin: @@ -50,6 +51,16 @@ def get_callable( sig = self._functions.get(ftl_name) return sig.callable if sig else None + def is_cacheable(self: _FunctionRegistryState, ftl_name: str) -> bool: + """Return whether the registered function explicitly allows caching.""" + sig = self._functions.get(ftl_name) + return sig.cacheable if sig else False + + @property + def cache_generation(self: _FunctionRegistryState) -> int: + """Monotonic registry generation used in cache contracts.""" + return self._generation + def __iter__(self: _FunctionRegistryState) -> Iterator[str]: """Iterate over registered FTL function names.""" return iter(self._functions) diff --git a/src/ftllexengine/runtime/functions.py b/src/ftllexengine/runtime/functions.py index 74283254..f205000a 100644 --- a/src/ftllexengine/runtime/functions.py +++ b/src/ftllexengine/runtime/functions.py @@ -458,13 +458,13 @@ def create_default_registry() -> FunctionRegistry: registry = FunctionRegistry() # Register NUMBER function with camelCase parameter mapping - registry.register(number_format, ftl_name="NUMBER") + registry.register(number_format, ftl_name="NUMBER", cacheable=True) # Register DATETIME function with camelCase parameter mapping - registry.register(datetime_format, ftl_name="DATETIME") + registry.register(datetime_format, ftl_name="DATETIME", cacheable=True) # Register CURRENCY function with camelCase parameter mapping - registry.register(currency_format, ftl_name="CURRENCY") + registry.register(currency_format, ftl_name="CURRENCY", cacheable=True) return registry diff --git a/src/ftllexengine/runtime/locale_formatting.py b/src/ftllexengine/runtime/locale_formatting.py index 3bc9d4c8..a0559f21 100644 --- a/src/ftllexengine/runtime/locale_formatting.py +++ b/src/ftllexengine/runtime/locale_formatting.py @@ -15,6 +15,7 @@ from ftllexengine.constants import FALLBACK_FUNCTION_ERROR, MAX_FORMAT_DIGITS from ftllexengine.core.babel_compat import get_babel_dates, get_babel_numbers from ftllexengine.diagnostics import ErrorCategory, FrozenErrorContext, FrozenFluentError +from ftllexengine.diagnostics._redaction import fingerprint_text from ftllexengine.diagnostics.templates import ErrorTemplate if TYPE_CHECKING: @@ -32,6 +33,50 @@ ] +def _safe_datetime_parse_reason() -> str: + """Return the public reason for rejected datetime strings. + + Premise: + ``datetime.fromisoformat()`` exposes raw parser details that may echo + caller-supplied content into the exception message. + + Reason: + The runtime keeps the diagnostic actionable by naming the accepted + contract, while the redacted fingerprint carries correlation value + without disclosing the original string. + """ + return "input is not ISO 8601 format" + + +def _formatting_context( + *, + locale_code: LocaleCode, + value: object, + parse_type: Literal["currency", "datetime", "number"], + fallback_value: str, +) -> FrozenErrorContext: + """Build a safe formatting-error context. + + Premise: + Formatting helpers receive caller-supplied values and downstream + library exceptions can be data-bearing. + + Reason: + The context keeps the locale and live fallback string that the runtime + needs for the current call, while the original value is stored as a + stable fingerprint instead of raw content that may escape into logs or + strict-mode exception surfaces. Long-lived retention surfaces such as + the format cache sanitize the fallback separately when they snapshot the + error. + """ + return FrozenErrorContext( + input_value=fingerprint_text(value, label="format_value"), + locale_code=locale_code, + parse_type=parse_type, + fallback_value=fallback_value, + ) + + def format_number_for_locale( *, locale_code: LocaleCode, @@ -94,10 +139,10 @@ def format_number_for_locale( except (ValueError, TypeError, InvalidOperation, AttributeError, KeyError) as e: fallback = str(value) - diagnostic = ErrorTemplate.formatting_failed("NUMBER", str(value), str(e)) - context = FrozenErrorContext( - input_value=str(value), + diagnostic = ErrorTemplate.formatting_failed("NUMBER", value, e) + context = _formatting_context( locale_code=locale_code, + value=value, parse_type="number", fallback_value=fallback, ) @@ -128,11 +173,14 @@ def format_datetime_for_locale( except ValueError as e: fallback = FALLBACK_FUNCTION_ERROR.format(name="DATETIME") diagnostic = ErrorTemplate.formatting_failed( - "DATETIME", value, "not ISO 8601 format" + "DATETIME", + value, + e, + safe_reason=_safe_datetime_parse_reason(), ) - context = FrozenErrorContext( - input_value=value, + context = _formatting_context( locale_code=locale_code, + value=value, parse_type="datetime", fallback_value=fallback, ) @@ -197,10 +245,10 @@ def format_datetime_for_locale( except (ValueError, OverflowError, AttributeError, KeyError) as e: fallback = dt_value.isoformat() - diagnostic = ErrorTemplate.formatting_failed("DATETIME", str(dt_value), str(e)) - context = FrozenErrorContext( - input_value=str(dt_value), + diagnostic = ErrorTemplate.formatting_failed("DATETIME", dt_value, e) + context = _formatting_context( locale_code=locale_code, + value=dt_value, parse_type="datetime", fallback_value=fallback, ) @@ -290,11 +338,11 @@ def format_currency_for_locale( except (ValueError, TypeError, InvalidOperation, AttributeError, KeyError) as e: fallback = f"{currency} {value}" diagnostic = ErrorTemplate.formatting_failed( - "CURRENCY", f"{currency} {value}", str(e) + "CURRENCY", f"{currency} {value}", e ) - context = FrozenErrorContext( - input_value=f"{currency} {value}", + context = _formatting_context( locale_code=locale_code, + value=f"{currency} {value}", parse_type="currency", fallback_value=fallback, ) diff --git a/src/ftllexengine/runtime/resolution_context.py b/src/ftllexengine/runtime/resolution_context.py index 9bdc998c..75f8f38d 100644 --- a/src/ftllexengine/runtime/resolution_context.py +++ b/src/ftllexengine/runtime/resolution_context.py @@ -5,12 +5,12 @@ via custom function re-entry. Architecture: - - GlobalDepthGuard: Uses contextvars for async-safe global depth tracking + - GlobalDepthGuard: Uses contextvars for async-safe same-session depth tracking - ResolutionContext: Explicit per-resolution state (stack, depth, expansion) Thread Safety: ResolutionContext is created per-resolution for full isolation. - GlobalDepthGuard uses contextvars for thread/async-safe state. + GlobalDepthGuard uses contextvars for thread/async-safe same-session state. Python 3.13+. """ @@ -35,19 +35,10 @@ __all__ = ["GlobalDepthGuard", "ResolutionContext"] # ContextVar State (Architectural Decision): -# Global resolution depth tracking via contextvars prevents custom functions from -# bypassing depth limits by calling back into bundle.format_pattern(). -# -# Trade-off: -# - Explicit parameter threading would require signature changes across resolver, -# function bridge, and all custom function implementations (~10+ signatures). -# - ContextVar provides thread/async-safe implicit state with minimal API impact. -# - Security requirement (DoS prevention via stack overflow) takes precedence over -# the explicit control flow principle. -# -# This is a permanent architectural pattern; the security mechanism cannot be -# implemented without cross-context state tracking. Each async task/thread -# maintains independent state via contextvars semantics. +# Global resolution depth tracking is still the right owner for same-session +# nested formatting. Cross-thread entry is owned separately by +# ResolutionReentryGate at the bundle boundary because spawned threads do not +# inherit ContextVars. _global_resolution_depth: ContextVar[int] = ContextVar( "fluent_resolution_depth", default=0 ) @@ -73,13 +64,11 @@ class GlobalDepthGuard: GlobalDepthGuard prevents this by tracking depth across all contexts. - Thread Spawning Limitation: - Custom functions that spawn NEW threads bypass this guard: each new - thread starts with the ContextVar default (0). The guard prevents - re-entry within a single thread/async task; it does not prevent - cross-thread recursive invocation. Custom functions that may spawn - threads and call back into bundle.format_pattern() from those threads - must apply independent rate limiting at the custom function level. + Thread Spawning: + Cross-thread entry is rejected by the bundle-owned ResolutionReentryGate + while custom-function code is executing. This guard therefore remains + responsible only for same-session depth tracking, which is exactly what + ContextVar propagation can represent reliably. """ __slots__ = ("_max_depth", "_token") @@ -143,9 +132,12 @@ class ResolutionContext: _seen: set[str] = field(init=False, default_factory=set) max_depth: int = MAX_DEPTH max_expression_depth: int = MAX_DEPTH - max_expansion_size: int = DEFAULT_MAX_EXPANSION_SIZE + max_expansion_size: int | None = DEFAULT_MAX_EXPANSION_SIZE _total_chars: int = field(init=False, default=0) _expression_guard: DepthGuard = field(init=False) + _output_budget_exhausted: bool = field(init=False, default=False) + _cacheable_output: bool = field(init=False, default=True) + _noncacheable_functions: set[str] = field(init=False, default_factory=set) def __post_init__(self) -> None: """Initialize the expression depth guard with configured max depth.""" @@ -230,20 +222,72 @@ def resolution_path(self) -> tuple[str, ...]: """ return tuple(self._stack) - def track_expansion(self, char_count: int) -> None: - """Add to running expansion total. + def reserve_output(self, text: str) -> None: + """Reserve budget for the exact string about to be appended. - Does not raise on budget exceeded — callers must check - ``total_chars > max_expansion_size`` after calling this method and - generate the appropriate error. Keeping error generation in the caller - (resolver) preserves separation of concerns: this object tracks state; - the resolver decides what to do when limits are breached. + Premise: + Budgeting after partial formatting creates undercount gaps. - This prevents Billion Laughs attacks where small FTL input expands to - gigabytes via nested message references (e.g., m0={m1}{m1}, - m1={m2}{m2}, ...). + Reason: + The owner of the output budget must see the final string fragment + including isolation marks and fallbacks before it becomes visible. """ - self._total_chars += char_count + next_total = self._total_chars + len(text) + if self.max_expansion_size is not None and next_total > self.max_expansion_size: + diag = ErrorTemplate.expansion_budget_exceeded( + next_total, + self.max_expansion_size, + ) + raise FrozenFluentError( + str(diag), + ErrorCategory.RESOLUTION, + diagnostic=diag, + ) + self._total_chars = next_total + + def mark_output_budget_exhausted(self) -> None: + """Remember that a later append crossed the output budget. + + Premise: + Nested pattern resolution may convert an append failure into an + error tuple instead of re-raising immediately. + + Reason: + The enclosing pattern loop must still stop at the first quota + breach so no later literal or fallback output leaks past the + configured maximum. + """ + self._output_budget_exhausted = True + + def mark_noncacheable_function(self, function_name: str) -> None: + """Mark the current resolution as unsafe to cache. + + Premise: + Custom functions may depend on time, I/O, process state, or other + external inputs outside the cache key. + + Reason: + Resolution must carry cacheability evidence forward explicitly so + the bundle can skip caching results that depended on non-pure + callables. + """ + self._cacheable_output = False + self._noncacheable_functions.add(function_name) + + @property + def output_budget_exhausted(self) -> bool: + """Report whether output generation must stop after a budget breach.""" + return self._output_budget_exhausted + + @property + def cacheable_output(self) -> bool: + """Report whether the resolved output may safely enter the cache.""" + return self._cacheable_output + + @property + def noncacheable_functions(self) -> frozenset[str]: + """Return the non-cacheable functions observed during this resolution.""" + return frozenset(self._noncacheable_functions) @property def expression_guard(self) -> DepthGuard: diff --git a/src/ftllexengine/runtime/resolver.py b/src/ftllexengine/runtime/resolver.py index c43e1db7..1f436caa 100644 --- a/src/ftllexengine/runtime/resolver.py +++ b/src/ftllexengine/runtime/resolver.py @@ -8,9 +8,9 @@ resolver fully reentrant and compatible with async frameworks. Each resolution operation creates its own isolated context. - Global depth tracking uses contextvars for async-safe per-task state, - preventing custom functions from bypassing depth limits by calling - back into bundle.format_pattern(). + Same-session depth tracking uses contextvars, while the bundle-owned + ResolutionReentryGate rejects cross-thread re-entry during custom-function + execution. """ from __future__ import annotations @@ -26,6 +26,7 @@ ) from ftllexengine.core import depth_clamp from ftllexengine.diagnostics import ( + DiagnosticCode, ErrorCategory, ErrorTemplate, FrozenFluentError, @@ -40,7 +41,6 @@ from ftllexengine.runtime.resolver_runtime import _ResolverRuntimeMixin from ftllexengine.runtime.resolver_selection import _ResolverSelectionMixin from ftllexengine.syntax import ( - Expression, FunctionReference, Message, MessageReference, @@ -59,6 +59,7 @@ from collections.abc import Mapping from ftllexengine.core.value_types import FluentValue + from ftllexengine.runtime._resolution_gate import ResolutionReentryGate from ftllexengine.runtime.function_bridge import FunctionRegistry __all__ = ["FluentResolver", "GlobalDepthGuard", "ResolutionContext"] @@ -73,6 +74,21 @@ UNICODE_PDI: str = "\u2069" # U+2069 POP DIRECTIONAL ISOLATE +def _is_output_budget_error(error: FrozenFluentError) -> bool: + """Recognize the resolver's fail-closed output-budget breach. + + Premise: + Expansion quota failures are structural guardrails, not recoverable + formatting glitches. + + Reason: + The pattern loop must stop at the first budget breach instead of + rendering a fallback and continuing with later output. + """ + diagnostic = error.diagnostic + return diagnostic is not None and diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED + + class FluentResolver(_ResolverRuntimeMixin, _ResolverSelectionMixin): """Resolves Fluent messages to strings. @@ -92,6 +108,7 @@ class FluentResolver(_ResolverRuntimeMixin, _ResolverSelectionMixin): "_max_expansion_size", "_max_nesting_depth", "_messages", + "_resolution_gate", "_terms", "_use_isolating", ) @@ -103,9 +120,10 @@ def __init__( terms: dict[str, Term], *, function_registry: FunctionRegistry, + reentry_gate: ResolutionReentryGate | None = None, use_isolating: bool = True, max_nesting_depth: int = MAX_DEPTH, - max_expansion_size: int = DEFAULT_MAX_EXPANSION_SIZE, + max_expansion_size: int | None = DEFAULT_MAX_EXPANSION_SIZE, ) -> None: """Initialize resolver. @@ -114,6 +132,9 @@ def __init__( messages: Message registry terms: Term registry function_registry: Function registry with camelCase conversion (keyword-only) + reentry_gate: Bundle-owned admission control for custom-function + re-entry. Direct ``FluentResolver`` callers may omit this to + get a resolver-owned gate with the same protection. use_isolating: Wrap interpolated values in Unicode bidi marks (keyword-only) max_nesting_depth: Maximum resolution depth limit (keyword-only) max_expansion_size: Maximum total characters in resolved output (keyword-only) @@ -123,6 +144,17 @@ def __init__( self._messages = messages self._terms = terms self._function_registry = function_registry + if reentry_gate is None: + # Premise: direct resolver users do not always have a bundle owner. + # Reason: a resolver-owned gate preserves the cross-thread + # re-entry invariant without forcing callers to construct an + # internal coordination object for one-off resolution work. + from ftllexengine.runtime._resolution_gate import ( # noqa: PLC0415 - runtime-only default path + ResolutionReentryGate, + ) + + reentry_gate = ResolutionReentryGate() + self._resolution_gate = reentry_gate self._max_nesting_depth = depth_clamp(max_nesting_depth) self._max_expansion_size = max_expansion_size @@ -221,9 +253,10 @@ def resolve_message( fallback = FALLBACK_MISSING_MESSAGE.format(id=msg_key) return (fallback, tuple(errors)) - # Use GlobalDepthGuard to track depth across separate format_pattern() calls. - # This prevents custom functions from bypassing depth limits by calling - # back into bundle.format_pattern() which creates a fresh ResolutionContext. + # Same-session nested format_pattern() calls share a depth budget here. + # Cross-thread fresh entry is handled earlier by the bundle-owned + # ResolutionReentryGate because ContextVar depth does not propagate to + # spawned threads. try: with GlobalDepthGuard(max_depth=context.max_depth): context.push(msg_key) @@ -233,9 +266,10 @@ def resolve_message( finally: context.pop() except FrozenFluentError as e: - # Resolution limit exceeded (global depth, expression depth, or - # expansion budget). Collect error and return fallback — prevents - # partial output from reaching the caller. + # Resolution limits that escape the pattern loop, such as depth + # guards, fail closed at the message boundary. Output-budget + # breaches are handled inside _resolve_pattern so the caller keeps + # the safe prefix accumulated before the quota was crossed. errors.append(e) fallback = FALLBACK_MISSING_MESSAGE.format(id=msg_key) return (fallback, tuple(errors)) @@ -254,36 +288,21 @@ def _resolve_pattern( """ parts: list[str] = [] - # Fast-path: budget already exceeded before any element is processed. - # Covers externally-provided ResolutionContext instances (e.g., test fixtures, - # callers that pass a pre-populated context) where no element in this pattern - # has yet contributed to the error list. - if context.total_chars > context.max_expansion_size: - diag = ErrorTemplate.expansion_budget_exceeded( - context.total_chars, context.max_expansion_size - ) - errors.append( - FrozenFluentError(str(diag), ErrorCategory.RESOLUTION, diagnostic=diag) - ) - return "".join(parts) - for element in pattern.elements: + if context.output_budget_exhausted: + break match element: case TextElement(): - context.track_expansion(len(element.value)) - if context.total_chars > context.max_expansion_size: - diag = ErrorTemplate.expansion_budget_exceeded( - context.total_chars, context.max_expansion_size - ) - errors.append( - FrozenFluentError( - str(diag), ErrorCategory.RESOLUTION, diagnostic=diag - ) - ) + try: + context.reserve_output(element.value) + except FrozenFluentError as error: + context.mark_output_budget_exhausted() + errors.append(error) break parts.append(element.value) case Placeable(): try: + error_count_before = len(errors) # Track expression depth to prevent stack overflow from deeply # nested SelectExpressions. The guard must be applied HERE at # the Pattern->Placeable entry point, not just in _resolve_expression @@ -294,48 +313,58 @@ def _resolve_pattern( value = self._resolve_expression( element.expression, args, errors, context ) - formatted = self._format_value(value) - pre_track = context.total_chars - context.track_expansion(len(formatted)) - if context.total_chars > context.max_expansion_size: - if pre_track <= context.max_expansion_size: - # This placeable's formatted output caused the overflow. - # When pre_track was already over the limit, the overflow - # was reported inside the nested resolution (term, message, - # or select variant), and we must not duplicate that error. - diag = ErrorTemplate.expansion_budget_exceeded( - context.total_chars, context.max_expansion_size - ) - errors.append( - FrozenFluentError( - str(diag), ErrorCategory.RESOLUTION, diagnostic=diag - ) - ) + # Premise: nested variant/message resolution may convert a + # budget breach into an appended error instead of a direct + # exception on this stack frame. + # Reason: the outer pattern must still stop before adding + # isolation marks or later suffix text. + if any( + _is_output_budget_error(error) + for error in errors[error_count_before:] + ): + context.mark_output_budget_exhausted() break - - # Wrap in Unicode bidi isolation marks (FSI/PDI) - # Per Unicode TR9, prevents RTL/LTR text interference - if self._use_isolating: - parts.append(f"{UNICODE_FSI}{formatted}{UNICODE_PDI}") - else: - parts.append(formatted) + formatted = self._format_value(value) + rendered = ( + f"{UNICODE_FSI}{formatted}{UNICODE_PDI}" + if self._use_isolating + else formatted + ) + context.reserve_output(rendered) + parts.append(rendered) except FrozenFluentError as e: # Mozilla-aligned error handling: # Collect error, show readable fallback (not {ERROR: ...}) errors.append(e) + if _is_output_budget_error(e): + context.mark_output_budget_exhausted() + break # Check category for type-safe fallback extraction if e.category == ErrorCategory.FORMATTING and e.fallback_value: # Formatting errors carry the original value as fallback + try: + context.reserve_output(e.fallback_value) + except FrozenFluentError as budget_error: + context.mark_output_budget_exhausted() + errors.append(budget_error) + break parts.append(e.fallback_value) else: - parts.append(self._get_fallback_for_placeable(element.expression)) + fallback = self._get_fallback_for_placeable(element.expression) + try: + context.reserve_output(fallback) + except FrozenFluentError as budget_error: + context.mark_output_budget_exhausted() + errors.append(budget_error) + break + parts.append(fallback) return "".join(parts) def _resolve_expression( self, - expr: Expression, + expr: object, args: Mapping[str, FluentValue], errors: list[FrozenFluentError], context: ResolutionContext, @@ -369,7 +398,7 @@ def _resolve_expression( return self._resolve_expression(expr.expression, args, errors, context) case _: # Defensive: catch unknown expression types from programmatic AST construction - diag = ErrorTemplate.unknown_expression(type(expr).__name__) # type: ignore[unreachable] + diag = ErrorTemplate.unknown_expression(type(expr).__name__) raise FrozenFluentError(str(diag), ErrorCategory.RESOLUTION, diagnostic=diag) def _resolve_variable_reference( diff --git a/src/ftllexengine/runtime/resolver_runtime.py b/src/ftllexengine/runtime/resolver_runtime.py index 9868c13b..691f760d 100644 --- a/src/ftllexengine/runtime/resolver_runtime.py +++ b/src/ftllexengine/runtime/resolver_runtime.py @@ -15,6 +15,7 @@ FALLBACK_MISSING_VARIABLE, ) from ftllexengine.diagnostics import ErrorCategory, ErrorTemplate, FrozenFluentError +from ftllexengine.diagnostics._redaction import redacted_custom_function_failure from ftllexengine.syntax import ( Expression, FunctionReference, @@ -29,6 +30,7 @@ if TYPE_CHECKING: from ftllexengine.core.value_types import FluentValue + from ftllexengine.runtime._resolution_gate import ResolutionReentryGate from ftllexengine.runtime.function_bridge import FunctionRegistry from ftllexengine.runtime.resolution_context import ResolutionContext @@ -42,6 +44,7 @@ class _ResolverRuntimeMixin: _function_registry: FunctionRegistry _locale: str + _resolution_gate: ResolutionReentryGate if TYPE_CHECKING: @@ -81,6 +84,9 @@ def _resolve_function_call( ) raise FrozenFluentError(str(diag), ErrorCategory.RESOLUTION, diagnostic=diag) + if not self._function_registry.is_cacheable(func_name): + context.mark_noncacheable_function(func_name) + return self._call_function_safe( func_name, [*positional_values, self._locale], @@ -88,6 +94,9 @@ def _resolve_function_call( errors, ) + if not self._function_registry.is_cacheable(func_name): + context.mark_noncacheable_function(func_name) + return self._call_function_safe( func_name, positional_values, @@ -104,21 +113,20 @@ def _call_function_safe( ) -> FluentValue: """Call a registered function and normalize unexpected exceptions.""" try: - return self._function_registry.call(func_name, positional, named) + with self._resolution_gate.custom_function_scope(): + return self._function_registry.call(func_name, positional, named) except FrozenFluentError: raise except asyncio.CancelledError: raise except Exception as error: # noqa: BLE001 - function adapters may raise arbitrary user exceptions + failure_detail = redacted_custom_function_failure(error) logger.warning( - "Custom function %s raised %s: %s", + "Custom function %s failed: %s", func_name, - type(error).__name__, - str(error), - ) - diag = ErrorTemplate.function_failed( - func_name, f"Uncaught exception: {type(error).__name__}: {error}" + failure_detail, ) + diag = ErrorTemplate.function_failed(func_name, failure_detail) errors.append( FrozenFluentError(str(diag), ErrorCategory.RESOLUTION, diagnostic=diag) ) diff --git a/src/ftllexengine/runtime/value_types.py b/src/ftllexengine/runtime/value_types.py index 400a4eda..65da6d49 100644 --- a/src/ftllexengine/runtime/value_types.py +++ b/src/ftllexengine/runtime/value_types.py @@ -70,6 +70,10 @@ class FunctionSignature: params. Stored as sorted tuple of (ftl_param, python_param) pairs for full immutability. callable: The actual Python function + cacheable: Whether cached formatting results may safely reuse outputs + produced by this function. Custom functions default to ``False`` so + time-, I/O-, and environment-dependent callables do not acquire + accidental cache semantics. param_dict: Read-only dict view of param_mapping for O(1) lookup. Computed once at construction, exposed as MappingProxyType to prevent mutation while avoiding per-call dict reconstruction. @@ -85,6 +89,7 @@ class FunctionSignature: ftl_name: str param_mapping: tuple[tuple[str, str], ...] callable: Callable[..., FluentValue] + cacheable: bool = False param_dict: MappingProxyType[str, str] = field( init=False, repr=False, compare=False ) diff --git a/src/ftllexengine/syntax/ast.py b/src/ftllexengine/syntax/ast.py index da7630f2..06460c28 100644 --- a/src/ftllexengine/syntax/ast.py +++ b/src/ftllexengine/syntax/ast.py @@ -218,8 +218,9 @@ def __post_init__(self) -> None: The type annotation (Pattern, not Pattern | None) enforces this at the type level; this validates at runtime for programmatic construction. """ - if self.value is None: # pragma: no branch - runtime guard for programmatic construction - msg = "Term must have a value pattern (cannot be None)" # type: ignore[unreachable] + value_obj: object = self.value + if value_obj is None: # pragma: no branch - runtime guard for programmatic construction + msg = "Term must have a value pattern (cannot be None)" raise ValueError(msg) @staticmethod diff --git a/src/ftllexengine/syntax/parser/core.py b/src/ftllexengine/syntax/parser/core.py index e77ca382..6b0afe59 100644 --- a/src/ftllexengine/syntax/parser/core.py +++ b/src/ftllexengine/syntax/parser/core.py @@ -37,9 +37,11 @@ import logging import re import sys +from copy import replace from typing import TYPE_CHECKING from ftllexengine.constants import MAX_DEPTH, MAX_SOURCE_SIZE +from ftllexengine.core._limits import LimitArg, resolve_limit_arg from ftllexengine.diagnostics import DiagnosticCode from ftllexengine.enums import CommentType from ftllexengine.syntax.ast import ( @@ -198,21 +200,26 @@ class FluentParserV1: max_nesting_depth: Maximum allowed placeable nesting depth (default: 100) """ - __slots__ = ("_max_nesting_depth", "_max_parse_errors", "_max_source_size") + __slots__ = ( + "_max_nesting_depth", + "_max_parse_errors", + "_max_source_size", + "_max_stream_line_length", + ) def __init__( self, *, - max_source_size: int | None = None, + max_source_size: LimitArg = None, max_nesting_depth: int | None = None, - max_parse_errors: int | None = None, + max_parse_errors: LimitArg = None, + max_stream_line_length: LimitArg = None, ) -> None: """Initialize parser with optional size, nesting depth, and error limits. Args: max_source_size: Maximum source length in characters (default: 10M). - Set to None or 0 to disable size limit (not recommended). - Must be non-negative if specified. + Use UNLIMITED only for an intentional opt-out. max_nesting_depth: Maximum placeable nesting depth (default: 100). Prevents DoS via deeply nested { { { ... } } }. Must be positive (> 0) if specified. @@ -221,21 +228,37 @@ def __init__( max_parse_errors: Maximum number of Junk (error) entries before aborting (default: 100). Prevents memory exhaustion from malformed input generating excessive errors. Real FTL files rarely exceed 10 errors. + max_stream_line_length: Maximum line length accepted by parse_stream(). + Defaults to the same bound as max_source_size so a + single hostile line cannot allocate an unbounded chunk. Raises: - ValueError: If max_nesting_depth is specified and <= 0. + ValueError: If max_nesting_depth is specified and <= 0, or if + any configurable security limit is non-positive. """ # Validate max_nesting_depth if max_nesting_depth is not None and max_nesting_depth <= 0: msg = f"max_nesting_depth must be positive (got {max_nesting_depth})" raise ValueError(msg) - self._max_source_size = ( - max_source_size if max_source_size is not None else MAX_SOURCE_SIZE + self._max_source_size = resolve_limit_arg( + max_source_size, + field_name="max_source_size", + default=MAX_SOURCE_SIZE, ) - self._max_parse_errors = ( - max_parse_errors if max_parse_errors is not None else _MAX_PARSE_ERRORS + self._max_parse_errors = resolve_limit_arg( + max_parse_errors, + field_name="max_parse_errors", + default=_MAX_PARSE_ERRORS, + ) + stream_line_default = ( + self._max_source_size if self._max_source_size is not None else MAX_SOURCE_SIZE + ) + self._max_stream_line_length = resolve_limit_arg( + max_stream_line_length, + field_name="max_stream_line_length", + default=stream_line_default, ) # Calculate desired depth @@ -262,7 +285,7 @@ def __init__( @property def max_source_size(self) -> int: """Maximum allowed source length in characters.""" - return self._max_source_size + return self._max_source_size if self._max_source_size is not None else sys.maxsize @property def max_nesting_depth(self) -> int: @@ -308,7 +331,7 @@ def parse(self, source: str) -> Resource: # noqa: PLR0915 - main parser loop - :func:`~ftllexengine.syntax.parser.rules.parse_comment` - Comment parsing """ # Validate input size (DoS prevention) - if self._max_source_size > 0 and len(source) > self._max_source_size: + if self._max_source_size is not None and len(source) > self._max_source_size: msg = ( f"Source length ({len(source):,} characters) exceeds maximum " f"({self._max_source_size:,} characters). " @@ -478,13 +501,7 @@ def parse(self, source: str) -> Resource: # noqa: PLR0915 - main parser loop term = term_parse.value # Attach comment if available if attach_comment is not None: - term = Term( - id=term.id, - value=term.value, - attributes=term.attributes, - comment=attach_comment, - span=term.span, - ) + term = replace(term, comment=attach_comment) entries.append(term) cursor = term_parse.cursor continue @@ -497,13 +514,7 @@ def parse(self, source: str) -> Resource: # noqa: PLR0915 - main parser loop message = message_parse.value # Attach comment if available if attach_comment is not None: - message = Message( - id=message.id, - value=message.value, - attributes=message.attributes, - comment=attach_comment, - span=message.span, - ) + message = replace(message, comment=attach_comment) entries.append(message) cursor = message_parse.cursor else: @@ -550,7 +561,7 @@ def parse(self, source: str) -> Resource: # noqa: PLR0915 - main parser loop return Resource(entries=tuple(entries)) - def parse_stream(self, lines: Iterable[str]) -> Iterator[Entry]: + def parse_stream(self, lines: Iterable[object]) -> Iterator[Entry]: """Parse FTL entries incrementally from a line-oriented source stream. Splits the stream at blank-line boundaries, which delimit top-level FTL @@ -586,13 +597,57 @@ def parse_stream(self, lines: Iterable[str]) -> Iterator[Entry]: 'greeting' """ chunk: list[str] = [] + chunk_chars = 0 + total_chars = 0 for line in lines: + if not isinstance(line, str): + msg = ( + f"parse_stream() lines must yield str, got {type(line).__name__}. " + "Decode bytes to str before streaming FTL input." + ) + raise TypeError(msg) + + line_length = len(line) + if ( + self._max_stream_line_length is not None + and line_length > self._max_stream_line_length + ): + msg = ( + f"Stream line length ({line_length:,} characters) exceeds maximum " + f"({self._max_stream_line_length:,} characters). " + "Configure max_stream_line_length only when you can safely bound " + "hostile or generated input." + ) + raise ValueError(msg) + + total_chars += line_length + if self._max_source_size is not None and total_chars > self._max_source_size: + msg = ( + f"Stream length ({total_chars:,} characters) exceeds maximum " + f"({self._max_source_size:,} characters). " + "Configure max_source_size only when you can safely accept larger input." + ) + raise ValueError(msg) + stripped = line.rstrip("\n\r") - if stripped: + # FTL blank blocks may contain spaces, so whitespace-only space lines + # must delimit entries just like empty lines. Tabs remain non-blank. + is_blank_block = stripped.strip(" ") == "" + if not is_blank_block: + next_chunk_chars = chunk_chars + len(stripped) + (1 if chunk else 0) + if self._max_source_size is not None and next_chunk_chars > self._max_source_size: + msg = ( + f"Entry chunk length ({next_chunk_chars:,} characters) exceeds maximum " + f"({self._max_source_size:,} characters). " + "Configure max_source_size only when the producer is trusted." + ) + raise ValueError(msg) chunk.append(stripped) + chunk_chars = next_chunk_chars elif chunk: yield from self.parse("\n".join(chunk)).entries chunk = [] + chunk_chars = 0 if chunk: yield from self.parse("\n".join(chunk)).entries @@ -609,7 +664,7 @@ def _junk_limit_exceeded(self, junk_count: int) -> bool: Returns: True if the limit is active and has been reached, False otherwise """ - if self._max_parse_errors > 0 and junk_count >= self._max_parse_errors: + if self._max_parse_errors is not None and junk_count >= self._max_parse_errors: logger.warning( "Parse aborted: exceeded maximum of %d Junk entries. " "This usually indicates severely malformed FTL input. " diff --git a/src/ftllexengine/validation/resource.py b/src/ftllexengine/validation/resource.py index 4c14153e..41f6f00d 100644 --- a/src/ftllexengine/validation/resource.py +++ b/src/ftllexengine/validation/resource.py @@ -39,7 +39,12 @@ from ftllexengine.syntax.parser import FluentParserV1 -__all__ = ["validate_resource"] +__all__ = [ + "_check_undefined_references", + "_collect_entries", + "_detect_circular_references", + "validate_resource", +] logger = logging.getLogger(__name__) @@ -54,7 +59,7 @@ def _detect_circular_references(graph: dict[str, set[str]]) -> list[ValidationWa def validate_resource( - source: str, + source: object, *, parser: FluentParserV1 | None = None, known_messages: frozenset[str] | None = None, @@ -110,7 +115,7 @@ def validate_resource( # Type validation at API boundary - type hints are not enforced at runtime. # Defensive check: users may pass bytes despite str annotation. if not isinstance(source, str): - msg = ( # type: ignore[unreachable] + msg = ( f"source must be str, not {type(source).__name__}. " "Decode bytes to str (e.g., source.decode('utf-8')) before calling validate_resource()." ) diff --git a/src/ftllexengine/validation/resource_entries.py b/src/ftllexengine/validation/resource_entries.py index ca284a00..7f21f00b 100644 --- a/src/ftllexengine/validation/resource_entries.py +++ b/src/ftllexengine/validation/resource_entries.py @@ -34,13 +34,11 @@ def _check_entry( warnings.append( ValidationWarning( code=DiagnosticCode.VALIDATION_DUPLICATE_ID, - message=( - f"Duplicate {kind} ID '{entry_name}' (later definition will overwrite earlier)" - ), + message=f"Duplicate {kind} ID '{entry_name}' is invalid", context=entry_name, line=line, column=column, - severity=WarningSeverity.WARNING, + severity=WarningSeverity.CRITICAL, ) ) seen_ids.add(entry_name) @@ -51,13 +49,13 @@ def _check_entry( ValidationWarning( code=DiagnosticCode.VALIDATION_SHADOW_WARNING, message=( - f"{kind.capitalize()} '{entry_name}' shadows existing {kind} " - "(this definition will override the earlier one)" + f"{kind.capitalize()} '{entry_name}' redefines an existing " + f"{kind} and is invalid" ), context=entry_name, line=line, column=column, - severity=WarningSeverity.WARNING, + severity=WarningSeverity.CRITICAL, ) ) @@ -71,12 +69,12 @@ def _check_entry( code=DiagnosticCode.VALIDATION_DUPLICATE_ATTRIBUTE, message=( f"{kind.capitalize()} '{entry_name}' has duplicate attribute " - f"'{attr_name}' (later will override earlier)" + f"'{attr_name}' and is invalid" ), context=f"{entry_name}.{attr_name}", line=line, column=column, - severity=WarningSeverity.WARNING, + severity=WarningSeverity.CRITICAL, ) ) seen_attr_ids.add(attr_name) diff --git a/tests/fuzz/test_runtime_async_bundle_property.py b/tests/fuzz/test_runtime_async_bundle_property.py index 5c71f361..2ff706da 100644 --- a/tests/fuzz/test_runtime_async_bundle_property.py +++ b/tests/fuzz/test_runtime_async_bundle_property.py @@ -6,7 +6,7 @@ - Concurrent: multiple concurrent format_pattern calls produce consistent results. - Context manager: async with always exits cleanly regardless of operations. - Stability: format_pattern on unknown message ID behaves predictably (non-strict mode). -- Immutability: sync read operations (has_message, get_message, etc.) are consistent. +- Immutability: async read operations (has_message, get_message, etc.) are consistent. """ from __future__ import annotations @@ -61,7 +61,7 @@ def test_message_ids_match_sync_bundle(self, locale: str, source: str) -> None: async def run_async() -> set[str]: async_bundle = AsyncFluentBundle(locale, use_isolating=False, strict=False) await async_bundle.add_resource(source) - return set(async_bundle.get_message_ids()) + return set(await async_bundle.get_message_ids()) async_ids = asyncio.run(run_async()) event( @@ -128,7 +128,9 @@ async def run_async() -> tuple[set[str], set[str]]: b_str = AsyncFluentBundle(locale, use_isolating=False, strict=False) await b_buf.add_resource(source) await b_str.add_resource_stream(source.splitlines(keepends=True)) - return set(b_buf.get_message_ids()), set(b_str.get_message_ids()) + buffered_ids = set(await b_buf.get_message_ids()) + streamed_ids = set(await b_str.get_message_ids()) + return buffered_ids, streamed_ids ids_buf, ids_stream = asyncio.run(run_async()) event( @@ -218,7 +220,7 @@ async def run_async() -> str: locale, use_isolating=False, strict=False ) as bundle: await bundle.add_resource(source) - ids = list(bundle.get_message_ids()) + ids = list(await bundle.get_message_ids()) if ids: r, _ = await bundle.format_pattern(ids[0]) return r @@ -241,12 +243,12 @@ async def run_async() -> None: # --------------------------------------------------------------------------- -# Sync read operations consistency +# Async read operations consistency # --------------------------------------------------------------------------- -class TestSyncReadOperationsConsistency: - """Property: sync read operations reflect state set by async mutation ops.""" +class TestAsyncReadOperationsConsistency: + """Property: async read operations reflect state set by async mutation ops.""" @given( locale=_locale_strategy, @@ -262,8 +264,8 @@ def test_has_message_consistent_with_get_message_ids( async def run_async() -> tuple[list[str], list[bool]]: bundle = AsyncFluentBundle(locale, use_isolating=False, strict=False) await bundle.add_resource(source) - ids = list(bundle.get_message_ids()) - has_flags = [bundle.has_message(mid) for mid in ids] + ids = list(await bundle.get_message_ids()) + has_flags = [await bundle.has_message(mid) for mid in ids] return ids, has_flags registered_ids, has_flags = asyncio.run(run_async()) @@ -286,8 +288,12 @@ def test_get_message_returns_message_node_for_known_ids( async def run_async() -> tuple[bool, int]: bundle = AsyncFluentBundle(locale, use_isolating=False, strict=False) await bundle.add_resource(source) - ids = list(bundle.get_message_ids()) - all_found = all(bundle.get_message(mid) is not None for mid in ids) + ids = list(await bundle.get_message_ids()) + all_found = True + for mid in ids: + if await bundle.get_message(mid) is None: + all_found = False + break return all_found, len(ids) result, count = asyncio.run(run_async()) @@ -332,7 +338,7 @@ def test_has_attribute_false_for_unknown_message( """ async def run_async() -> bool: bundle = AsyncFluentBundle(locale, use_isolating=False, strict=False) - return bundle.has_attribute(unknown_id, "label") + return await bundle.has_attribute(unknown_id, "label") result = asyncio.run(run_async()) assert result is False diff --git a/tests/fuzz/test_validation_resource_property.py b/tests/fuzz/test_validation_resource_property.py index 658d06e3..b42663d7 100644 --- a/tests/fuzz/test_validation_resource_property.py +++ b/tests/fuzz/test_validation_resource_property.py @@ -185,15 +185,18 @@ def test_always_returns_validation_result(self, source: str) -> None: @given(source=validation_resource_sources()) @settings(deadline=None) def test_is_valid_iff_no_errors_or_annotations(self, source: str) -> None: - """Property: is_valid is True iff errors and annotations are both empty.""" + """Property: is_valid is True iff no blocking diagnostics exist.""" result = validate_resource(source) has_errors = len(result.errors) > 0 has_annotations = len(result.annotations) > 0 + has_critical_warnings = result.critical_warning_count > 0 event(f"has_errors={has_errors}") - expected = not has_errors and not has_annotations + event(f"has_critical_warnings={has_critical_warnings}") + expected = not has_errors and not has_annotations and not has_critical_warnings assert result.is_valid == expected, ( f"is_valid={result.is_valid} but errors={result.errors!r}, " - f"annotations={result.annotations!r}" + f"annotations={result.annotations!r}, " + f"critical_warnings={result.critical_warning_count}" ) @given(source=validation_resource_sources()) diff --git a/tests/localization_orchestration_cases/ast_and_cleanup.py b/tests/localization_orchestration_cases/ast_and_cleanup.py index c29be61f..44b1b33c 100644 --- a/tests/localization_orchestration_cases/ast_and_cleanup.py +++ b/tests/localization_orchestration_cases/ast_and_cleanup.py @@ -367,29 +367,29 @@ def test_five_mismatches_pluralises_noun(self) -> None: err_str = str(exc_info.value) assert "more issues" in err_str -class TestGetCacheAuditLogBundleWithoutCache: - """Tests for get_cache_audit_log when a bundle in _bundles has no cache. +class TestGetCacheDebugLogBundleWithoutCache: + """Tests for get_cache_debug_log when a bundle in _bundles has no cache. - When bundle.get_cache_audit_log() returns None (bundle has no cache - configured), that bundle's locale is excluded from the audit_logs dict. - This exercises the ``if audit_log is not None:`` False branch. + When bundle.get_cache_debug_log() returns None (bundle has no cache + configured), that bundle's locale is excluded from the debug_logs dict. + This exercises the ``if debug_log is not None:`` False branch. """ - def test_bundle_without_cache_excluded_from_audit_log(self) -> None: - """Locale with a no-cache bundle is absent from the audit log mapping.""" + def test_bundle_without_cache_excluded_from_debug_log(self) -> None: + """Locale with a no-cache bundle is absent from the debug-log mapping.""" l10n = FluentLocalization( - ["en", "de"], cache=CacheConfig(enable_audit=True), + ["en", "de"], cache=CacheConfig(enable_debug_log=True), ) l10n.add_resource("en", "msg = Hello\n") l10n.format_value("msg") - # Inject a bundle with no cache for "de"; get_cache_audit_log() returns None + # Inject a bundle with no cache for "de"; get_cache_debug_log() returns None no_cache_bundle = FluentBundle("de") no_cache_bundle.add_resource("msg = Hallo\n") l10n._bundles["de"] = no_cache_bundle - audit_logs = l10n.get_cache_audit_log() + debug_logs = l10n.get_cache_debug_log() - assert audit_logs is not None - assert "en" in audit_logs - assert "de" not in audit_logs + assert debug_logs is not None + assert "en" in debug_logs + assert "de" not in debug_logs diff --git a/tests/localization_orchestration_cases/cache_and_properties.py b/tests/localization_orchestration_cases/cache_and_properties.py index b68da2ec..5b1283d0 100644 --- a/tests/localization_orchestration_cases/cache_and_properties.py +++ b/tests/localization_orchestration_cases/cache_and_properties.py @@ -9,7 +9,7 @@ from ftllexengine.core.locale_utils import normalize_locale from ftllexengine.localization import ( - CacheAuditLogEntry, + CacheDebugLogEntry, FluentLocalization, PathResourceLoader, ) @@ -75,67 +75,67 @@ def test_skips_bundle_with_no_cache(self) -> None: assert stats["bundle_count"] == 2 assert stats["maxsize"] == 100 # Only en's maxsize -class TestCacheAuditLogBranch: - """Tests for get_cache_audit_log per-locale audit access.""" +class TestCacheDebugLogBranch: + """Tests for get_cache_debug_log per-locale access.""" def test_returns_none_when_caching_disabled(self) -> None: - """get_cache_audit_log() returns None when localization caching is disabled.""" + """get_cache_debug_log() returns None when localization caching is disabled.""" l10n = FluentLocalization(["en"]) l10n.add_resource("en", "msg = Hello\n") - assert l10n.get_cache_audit_log() is None + assert l10n.get_cache_debug_log() is None def test_returns_empty_mapping_when_no_bundles_initialized(self) -> None: - """get_cache_audit_log() does not create bundles during inspection.""" - l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) + """get_cache_debug_log() does not create bundles during inspection.""" + l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_debug_log=True)) - audit_logs = l10n.get_cache_audit_log() - assert audit_logs == {} + debug_logs = l10n.get_cache_debug_log() + assert debug_logs == {} - def test_returns_per_locale_write_log_entries(self) -> None: - """get_cache_audit_log() returns immutable CacheAuditLogEntry tuples per locale.""" - l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_audit=True)) + def test_returns_per_locale_debug_log_entries(self) -> None: + """get_cache_debug_log() returns immutable CacheDebugLogEntry tuples per locale.""" + l10n = FluentLocalization(["en", "de"], cache=CacheConfig(enable_debug_log=True)) l10n.add_resource("en", "msg = Hello\n") l10n.add_resource("de", "msg = Hallo\n") l10n.format_value("msg") l10n.format_value("msg") - audit_logs = l10n.get_cache_audit_log() - assert audit_logs is not None - assert list(audit_logs) == ["en", "de"] - assert [entry.operation for entry in audit_logs["en"]] == ["MISS", "PUT", "HIT"] - assert audit_logs["de"] == () - assert all(isinstance(entry, CacheAuditLogEntry) for entry in audit_logs["en"]) + debug_logs = l10n.get_cache_debug_log() + assert debug_logs is not None + assert list(debug_logs) == ["en", "de"] + assert [entry.operation for entry in debug_logs["en"]] == ["MISS", "PUT", "HIT"] + assert debug_logs["de"] == () + assert all(isinstance(entry, CacheDebugLogEntry) for entry in debug_logs["en"]) - @given(enable_audit=st.booleans(), locales=locale_chains(min_size=1, max_size=3)) + @given(enable_debug_log=st.booleans(), locales=locale_chains(min_size=1, max_size=3)) @settings(max_examples=20, suppress_health_check=[HealthCheck.function_scoped_fixture]) - def test_property_audit_log_tracks_initialized_locales( - self, enable_audit: bool, locales: list[str] + def test_property_debug_log_tracks_initialized_locales( + self, enable_debug_log: bool, locales: list[str] ) -> None: - """PROPERTY: get_cache_audit_log() uses canonical locale keys.""" - l10n = FluentLocalization(locales, cache=CacheConfig(enable_audit=enable_audit)) + """PROPERTY: get_cache_debug_log() uses canonical locale keys.""" + l10n = FluentLocalization(locales, cache=CacheConfig(enable_debug_log=enable_debug_log)) for locale in locales: l10n.add_resource(locale, "msg = Hello\n") l10n.format_value("msg") - audit_logs = l10n.get_cache_audit_log() - assert audit_logs is not None + debug_logs = l10n.get_cache_debug_log() + assert debug_logs is not None normalized_locales = [normalize_locale(locale) for locale in locales] - assert list(audit_logs) == normalized_locales + assert list(debug_logs) == normalized_locales - event(f"audit={'enabled' if enable_audit else 'disabled'}") + event(f"debug_log={'enabled' if enable_debug_log else 'disabled'}") event(f"locale_count={len(locales)}") - if enable_audit: - assert len(audit_logs[normalized_locales[0]]) >= 2 + if enable_debug_log: + assert len(debug_logs[normalized_locales[0]]) >= 2 assert all( - isinstance(entry, CacheAuditLogEntry) - for entry in audit_logs[normalized_locales[0]] + isinstance(entry, CacheDebugLogEntry) + for entry in debug_logs[normalized_locales[0]] ) else: - assert all(log == () for log in audit_logs.values()) + assert all(log == () for log in debug_logs.values()) class TestFormatPattern: """Tests for format_pattern fallback chain edge cases.""" @@ -356,15 +356,17 @@ def test_add_resource_twice_uses_latest( value1: str, value2: str, ) -> None: - """Adding resource twice uses latest value (override property).""" - event("outcome=override") + """Explicit overwrite admission replaces the earlier localized value.""" + event("outcome=explicit_override") locale = locales[0] l10n = FluentLocalization([locale]) l10n.add_resource(locale, f"{message_id} = {value1}") result1, _ = l10n.format_value(message_id) - l10n.add_resource(locale, f"{message_id} = {value2}") + # Premise: replacing a canonical message ID is a deliberate mutation. + # Reason: tests must opt in explicitly instead of relying on load order. + l10n.add_resource(locale, f"{message_id} = {value2}", allow_overwrite=True) result2, _ = l10n.format_value(message_id) assert value1 in result1 or value2 in result1 diff --git a/tests/parsing_dates_cases/parse_date_cases.py b/tests/parsing_dates_cases/parse_date_cases.py index 0c6b607e..0071f770 100644 --- a/tests/parsing_dates_cases/parse_date_cases.py +++ b/tests/parsing_dates_cases/parse_date_cases.py @@ -1,6 +1,7 @@ # mypy: ignore-errors """Split test cases from tests/test_parsing_dates.py.""" +from ftllexengine.diagnostics._redaction import redacted_parse_failure from tests.parsing_dates_cases import * # noqa: F403 - shared split test support # --------------------------------------------------------------------------- @@ -45,7 +46,7 @@ def test_parse_date_invalid_returns_error(self) -> None: assert len(errors) > 0 assert result is None assert errors[0].parse_type == "date" - assert errors[0].input_value == "invalid" + assert errors[0].input_value == redacted_parse_failure("invalid", parse_type="date") def test_parse_date_empty_returns_error(self) -> None: """Empty input returns error in list.""" diff --git a/tests/runtime_bundle_cases/__init__.py b/tests/runtime_bundle_cases/__init__.py index 81d4e037..d3f8347f 100644 --- a/tests/runtime_bundle_cases/__init__.py +++ b/tests/runtime_bundle_cases/__init__.py @@ -13,7 +13,11 @@ from ftllexengine.constants import MAX_LOCALE_LENGTH_HARD_LIMIT, MAX_SOURCE_SIZE from ftllexengine.core.locale_utils import normalize_locale from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError, ValidationError -from ftllexengine.integrity import FormattingIntegrityError, SyntaxIntegrityError +from ftllexengine.integrity import ( + FormattingIntegrityError, + ResourceConflictIntegrityError, + SyntaxIntegrityError, +) from ftllexengine.runtime import FluentBundle from ftllexengine.runtime.cache_config import CacheConfig from ftllexengine.runtime.function_bridge import FunctionRegistry @@ -23,7 +27,8 @@ __all__ = [ "MAX_LOCALE_LENGTH_HARD_LIMIT", "MAX_SOURCE_SIZE", "Any", "CacheConfig", "ErrorCategory", "FluentBundle", "FormattingIntegrityError", "FrozenFluentError", - "FunctionRegistry", "Mock", "SyntaxIntegrityError", "ValidationError", + "FunctionRegistry", "Mock", "ResourceConflictIntegrityError", "SyntaxIntegrityError", + "ValidationError", "assume", "create_default_registry", "event", "example", "given", "logging", "normalize_locale", "patch", "pytest", "st", "validate_resource", ] diff --git a/tests/runtime_bundle_cases/basic.py b/tests/runtime_bundle_cases/basic.py index 3b0a31a2..18bd223d 100644 --- a/tests/runtime_bundle_cases/basic.py +++ b/tests/runtime_bundle_cases/basic.py @@ -6,6 +6,7 @@ FluentBundle, FrozenFluentError, Mock, + ResourceConflictIntegrityError, patch, pytest, ) @@ -366,8 +367,8 @@ def test_multiple_locales_independent(self) -> None: assert result_en == "Hello!" assert errors_en == () - def test_overwrite_message_with_new_resource(self) -> None: - """Adding resource with same message ID overwrites.""" + def test_overwrite_message_requires_explicit_admission(self) -> None: + """Adding resource with same message ID needs allow_overwrite.""" bundle = FluentBundle("en_US") bundle.add_resource("msg = Original") @@ -375,7 +376,9 @@ def test_overwrite_message_with_new_resource(self) -> None: assert result1 == "Original" assert errors1 == () - bundle.add_resource("msg = Updated") + with pytest.raises(ResourceConflictIntegrityError, match="msg"): + bundle.add_resource("msg = Updated") + bundle.add_resource("msg = Updated", allow_overwrite=True) result2, errors2 = bundle.format_pattern("msg") assert result2 == "Updated" assert errors2 == () diff --git a/tests/runtime_bundle_cases/properties.py b/tests/runtime_bundle_cases/properties.py index 9ad05634..3cf79b29 100644 --- a/tests/runtime_bundle_cases/properties.py +++ b/tests/runtime_bundle_cases/properties.py @@ -5,6 +5,7 @@ FluentBundle, FormattingIntegrityError, FunctionRegistry, + ResourceConflictIntegrityError, SyntaxIntegrityError, assume, create_default_registry, @@ -433,38 +434,36 @@ def test_ascii_alphanumeric_input_is_canonicalized_or_rejected(self, locale: str class TestBundleOverwriteWarning: - """Overwriting an existing message or term in add_resource logs a WARNING.""" + """Replacing existing IDs requires explicit overwrite admission.""" - def test_message_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: - """Overwriting a message logs a warning with the message ID.""" + def test_message_overwrite_requires_allow_overwrite( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Implicit message replacement is rejected.""" bundle = FluentBundle("en") with caplog.at_level(logging.WARNING): bundle.add_resource("greeting = Hello") - bundle.add_resource("greeting = Goodbye") + with pytest.raises(ResourceConflictIntegrityError, match="greeting"): + bundle.add_resource("greeting = Goodbye") - warning_messages = [ - record.message for record in caplog.records - if record.levelno == logging.WARNING - ] - assert any("Overwriting existing message 'greeting'" in msg for msg in warning_messages) + assert caplog.records == [] - def test_term_overwrite_logs_warning(self, caplog: pytest.LogCaptureFixture) -> None: - """Overwriting a term logs a warning with the term ID.""" + def test_term_overwrite_requires_allow_overwrite( + self, caplog: pytest.LogCaptureFixture + ) -> None: + """Implicit term replacement is rejected.""" bundle = FluentBundle("en") with caplog.at_level(logging.WARNING): bundle.add_resource("-brand = Acme") - bundle.add_resource("-brand = NewCorp") + with pytest.raises(ResourceConflictIntegrityError, match="-brand"): + bundle.add_resource("-brand = NewCorp") - warning_messages = [ - record.message for record in caplog.records - if record.levelno == logging.WARNING - ] - assert any("Overwriting existing term '-brand'" in msg for msg in warning_messages) + assert caplog.records == [] def test_no_warning_for_new_entries(self, caplog: pytest.LogCaptureFixture) -> None: - """No overwrite warning when adding distinct entries.""" + """Distinct entries register without overwrite diagnostics.""" bundle = FluentBundle("en") with caplog.at_level(logging.WARNING): @@ -477,12 +476,12 @@ def test_no_warning_for_new_entries(self, caplog: pytest.LogCaptureFixture) -> N ] assert len(overwrite_warnings) == 0 - def test_last_write_wins_behavior_preserved(self) -> None: - """Last Write Wins behavior: last added resource wins on repeated key.""" + def test_explicit_overwrite_replaces_previous_value(self) -> None: + """Intentional replacement works when the caller opts in.""" bundle = FluentBundle("en") bundle.add_resource("greeting = First") - bundle.add_resource("greeting = Second") - bundle.add_resource("greeting = Third") + bundle.add_resource("greeting = Second", allow_overwrite=True) + bundle.add_resource("greeting = Third", allow_overwrite=True) result, _ = bundle.format_pattern("greeting") assert result == "Third" diff --git a/tests/runtime_bundle_cases/state.py b/tests/runtime_bundle_cases/state.py index 4b7e9db9..4cd9c787 100644 --- a/tests/runtime_bundle_cases/state.py +++ b/tests/runtime_bundle_cases/state.py @@ -6,6 +6,7 @@ CacheConfig, FluentBundle, FormattingIntegrityError, + ResourceConflictIntegrityError, SyntaxIntegrityError, ValidationError, logging, @@ -79,30 +80,30 @@ def test_cache_write_once_config(self) -> None: assert off.cache_config is not None assert off.cache_config.write_once is False - def test_cache_enable_audit_config(self) -> None: - """cache_config.enable_audit reflects configured boolean.""" - on = FluentBundle("en", cache=CacheConfig(enable_audit=True)) + def test_cache_enable_debug_log_config(self) -> None: + """cache_config.enable_debug_log reflects configured boolean.""" + on = FluentBundle("en", cache=CacheConfig(enable_debug_log=True)) assert on.cache_config is not None - assert on.cache_config.enable_audit is True - off = FluentBundle("en", cache=CacheConfig(enable_audit=False)) + assert on.cache_config.enable_debug_log is True + off = FluentBundle("en", cache=CacheConfig(enable_debug_log=False)) assert off.cache_config is not None - assert off.cache_config.enable_audit is False + assert off.cache_config.enable_debug_log is False - def test_cache_max_audit_entries_config(self) -> None: - """cache_config.max_audit_entries reflects configured maximum.""" + def test_cache_max_debug_entries_config(self) -> None: + """cache_config.max_debug_entries reflects configured maximum.""" bundle = FluentBundle( - "en", cache=CacheConfig(max_audit_entries=5000) + "en", cache=CacheConfig(max_debug_entries=5000) ) assert bundle.cache_config is not None - assert bundle.cache_config.max_audit_entries == 5000 + assert bundle.cache_config.max_debug_entries == 5000 - def test_cache_max_entry_weight_config(self) -> None: - """cache_config.max_entry_weight reflects configured maximum.""" + def test_cache_max_entry_payload_bytes_config(self) -> None: + """cache_config.max_entry_payload_bytes reflects configured maximum.""" bundle = FluentBundle( - "en", cache=CacheConfig(max_entry_weight=8000) + "en", cache=CacheConfig(max_entry_payload_bytes=8000) ) assert bundle.cache_config is not None - assert bundle.cache_config.max_entry_weight == 8000 + assert bundle.cache_config.max_entry_payload_bytes == 8000 def test_cache_max_errors_per_entry_config(self) -> None: """cache_config.max_errors_per_entry reflects configured maximum.""" @@ -355,27 +356,22 @@ def test_add_resource_clears_cache(self) -> None: bundle.add_resource("second = Second") assert bundle.get_cache_stats()["size"] == 0 # type: ignore[index] - def test_duplicate_terms_overwrite(self, caplog: Any) -> None: - """Duplicate term definitions produce overwrite warning.""" + def test_duplicate_terms_are_rejected(self, caplog: Any) -> None: + """Duplicate term definitions fail closed inside one resource.""" bundle = FluentBundle("en") - bundle.add_resource("-brand = Firefox\n-brand = Chrome\n") - assert any( - "Overwriting existing term '-brand'" in r.message - for r in caplog.records - ) + with pytest.raises(ResourceConflictIntegrityError, match="-brand"): + bundle.add_resource("-brand = Firefox\n-brand = Chrome\n") + assert caplog.records == [] - def test_multiple_duplicate_terms(self, caplog: Any) -> None: - """Multiple duplicate terms each produce warnings.""" + def test_multiple_duplicate_terms_are_rejected(self, caplog: Any) -> None: + """Multiple duplicate terms fail as one audited conflict set.""" bundle = FluentBundle("en") - bundle.add_resource( - "-brand = First\n-version = First\n" - "-brand = Second\n-version = Second\n" - ) - warnings = [ - r for r in caplog.records - if "Overwriting existing term" in r.message - ] - assert len(warnings) == 2 + with pytest.raises(ResourceConflictIntegrityError, match="-brand, -version"): + bundle.add_resource( + "-brand = First\n-version = First\n" + "-brand = Second\n-version = Second\n" + ) + assert caplog.records == [] def test_comments_with_debug_logging(self, caplog: Any) -> None: """Comments are processed at debug level without errors.""" @@ -769,4 +765,3 @@ def test_format_pattern_caches_result(self) -> None: # -- Introspection (variables, introspect_message/term, has_attribute) ------- - diff --git a/tests/runtime_cache_hashable_cases/__init__.py b/tests/runtime_cache_hashable_cases/__init__.py index fe5177c2..d14c87c5 100644 --- a/tests/runtime_cache_hashable_cases/__init__.py +++ b/tests/runtime_cache_hashable_cases/__init__.py @@ -1,5 +1,5 @@ """Tests for IntegrityCache hashable key construction, NaN normalization, and -unhashable argument handling. +fail-closed key-contract handling. Covers: - __init__ parameter validation @@ -9,8 +9,8 @@ - _make_key integration and error recovery (RecursionError, TypeError) - NaN normalization (Decimal) to prevent cache pollution DoS vectors - Hashable conversion of list/dict/set/tuple args for full cache coverage -- Unhashable argument graceful bypass (skips caching, increments counter) -- Error bloat protection (max_entry_weight, max_errors_per_entry) +- Unhashable argument fail-closed rejection with integrity evidence +- Error bloat protection (max_entry_payload_bytes, max_errors_per_entry) - LRU eviction and move-to-end behavior - Property accessors (size, hits, misses, unhashable_skips, oversize_skips) """ @@ -27,6 +27,7 @@ from ftllexengine.constants import MAX_DEPTH from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.integrity import CacheKeySerializationError from ftllexengine.runtime.cache import IntegrityCache from ftllexengine.runtime.function_bridge import FluentNumber, FluentValue @@ -34,6 +35,7 @@ "MAX_DEPTH", "UTC", "Any", + "CacheKeySerializationError", "Decimal", "ErrorCategory", "FluentNumber", diff --git a/tests/runtime_cache_hashable_cases/section_10_property_accessors.py b/tests/runtime_cache_hashable_cases/section_10_property_accessors.py index 2764649d..0e8f9a6b 100644 --- a/tests/runtime_cache_hashable_cases/section_10_property_accessors.py +++ b/tests/runtime_cache_hashable_cases/section_10_property_accessors.py @@ -1,6 +1,9 @@ # mypy: ignore-errors """Split test cases from tests/test_runtime_cache_hashable.py.""" +from dataclasses import replace + +from ftllexengine.integrity import CacheCorruptionError, WriteConflictError from tests.runtime_cache_hashable_cases import * # noqa: F403 - shared split test support # ============================================================================ @@ -13,7 +16,7 @@ class TestIntegrityCacheProperties: def test_len_and_size_consistent(self) -> None: """len(cache) and cache.size return the same current entry count.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() assert len(cache) == 0 cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) assert len(cache) == 1 @@ -24,17 +27,17 @@ def test_len_and_size_consistent(self) -> None: def test_maxsize_property(self) -> None: """maxsize property returns the configured maximum size.""" - cache = IntegrityCache(strict=False, maxsize=500) + cache = IntegrityCache(maxsize=500) assert cache.maxsize == 500 - def test_max_entry_weight_property(self) -> None: - """max_entry_weight property returns the configured weight limit.""" - cache = IntegrityCache(strict=False, max_entry_weight=5000) - assert cache.max_entry_weight == 5000 + def test_max_entry_payload_bytes_property(self) -> None: + """max_entry_payload_bytes property returns the configured weight limit.""" + cache = IntegrityCache(max_entry_payload_bytes=5000) + assert cache.max_entry_payload_bytes == 5000 def test_hits_increments_on_cache_hit(self) -> None: """hits property increments each time get() finds an entry.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) cache.get("msg", None, None, "en", use_isolating=True) assert cache.hits == 1 @@ -43,43 +46,44 @@ def test_hits_increments_on_cache_hit(self) -> None: def test_misses_increments_on_cache_miss(self) -> None: """misses increments only for true cache misses, not unhashable bypasses.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.get("msg1", None, None, "en", use_isolating=True) assert cache.misses == 1 cache.get("msg2", None, None, "en", use_isolating=True) assert cache.misses == 2 - def test_misses_not_incremented_for_unhashable_bypass(self) -> None: - """Unhashable args bypass the cache entirely; misses is not incremented. + def test_misses_not_incremented_for_unhashable_rejection(self) -> None: + """Invalid key input is rejected and does not count as a cache miss. - An unhashable bypass is not a cache miss: no key was constructed or - looked up. Only unhashable_skips reflects the event. Conflating them - would deflate hit_rate and mislead operators about cache efficiency. + A key-contract failure is not an ordinary miss: the cache refuses the + operation before any lookup slot is consulted. """ - cache = IntegrityCache(strict=False) + cache = IntegrityCache() class UnknownType: pass - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] assert cache.unhashable_skips == 1 assert cache.misses == 0 - def test_hit_rate_excludes_unhashable_bypasses(self) -> None: + def test_hit_rate_excludes_unhashable_rejections(self) -> None: """hit_rate is computed over hashable interactions only: hits / (hits + misses). - Unhashable bypasses do not count as misses, so they do not dilute the - rate. A cache with one hashable hit and one unhashable bypass reports + Key-contract rejections do not count as misses, so they do not dilute the + rate. A cache with one hashable hit and one rejected lookup reports hit_rate=100.0, not 50.0. """ - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) cache.get("msg", None, None, "en", use_isolating=True) # hit class UnknownType: pass - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] stats = cache.get_stats() assert stats["hits"] == 1 @@ -89,7 +93,7 @@ class UnknownType: def test_hit_rate_zero_on_all_true_misses(self) -> None: """hit_rate is 0.0 when all interactions are true misses (no unhashable).""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.get("absent", None, None, "en", use_isolating=True) stats = cache.get_stats() assert stats["hits"] == 0 @@ -97,8 +101,8 @@ def test_hit_rate_zero_on_all_true_misses(self) -> None: assert stats["hit_rate"] == 0.0 def test_hit_rate_correct_mixed_hits_and_misses(self) -> None: - """hit_rate is accurate across a mix of hits, misses, and unhashable bypasses.""" - cache = IntegrityCache(strict=False) + """hit_rate is accurate across hits, misses, and rejected key input.""" + cache = IntegrityCache() cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) cache.get("msg", None, None, "en", use_isolating=True) # hit cache.get("msg", None, None, "en", use_isolating=True) # hit @@ -107,7 +111,8 @@ def test_hit_rate_correct_mixed_hits_and_misses(self) -> None: class UnknownType: pass - cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", {"x": UnknownType()}, None, "en", use_isolating=True) # type: ignore[dict-item] stats = cache.get_stats() assert stats["hits"] == 2 @@ -116,28 +121,118 @@ class UnknownType: # hit_rate = 2 / (2 + 1) * 100 = 66.67% assert stats["hit_rate"] == round(2 / 3 * 100, 2) - def test_unhashable_skips_increments_on_skip(self) -> None: - """unhashable_skips increments for both get() and put() skips.""" - cache = IntegrityCache(strict=False) + def test_unhashable_skips_increments_on_rejection(self) -> None: + """unhashable_skips increments for both get() and put() rejections.""" + cache = IntegrityCache() class UnknownType: pass get_args: dict[str, object] = {"data": UnknownType()} - cache.get("msg", get_args, None, "en", use_isolating=True) # type: ignore[arg-type] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", get_args, None, "en", use_isolating=True) # type: ignore[arg-type] assert cache.unhashable_skips == 1 put_args: dict[str, object] = {"data": UnknownType()} - cache.put("msg", put_args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] + with pytest.raises(CacheKeySerializationError): + cache.put( + "msg", + put_args, + None, + "en", + use_isolating=True, + formatted="result", + errors=(), + ) # type: ignore[arg-type] assert cache.unhashable_skips == 2 def test_oversize_skips_increments_on_oversize_entry(self) -> None: - """oversize_skips increments when formatted string exceeds max_entry_weight.""" - cache = IntegrityCache(strict=False, max_entry_weight=10) + """oversize_skips increments when formatted string exceeds max_entry_payload_bytes.""" + cache = IntegrityCache(max_entry_payload_bytes=10) cache.put("msg1", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) assert cache.oversize_skips == 1 cache.put("msg2", None, None, "en", use_isolating=True, formatted="y" * 50, errors=()) assert cache.oversize_skips == 2 + def test_corruption_and_integrity_event_properties(self) -> None: + """Integrity counters reflect detected corruption events.""" + cache = IntegrityCache() + cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) + + key = next(iter(cache._cache)) + entry = cache._cache[key] + cache._cache[key] = replace(entry, formatted="corrupted") + + with pytest.raises(CacheCorruptionError): + cache.get("msg", None, None, "en", use_isolating=True) + + assert cache.corruption_detected == 1 + assert cache.integrity_events_emitted == 1 + + def test_write_once_and_idempotent_properties(self) -> None: + """Write-once counters distinguish benign and conflicting rewrites.""" + cache = IntegrityCache(write_once=True) + cache.put("msg", None, None, "en", use_isolating=True, formatted="value", errors=()) + cache.put("msg", None, None, "en", use_isolating=True, formatted="value", errors=()) + + with pytest.raises(WriteConflictError): + cache.put( + "msg", + None, + None, + "en", + use_isolating=True, + formatted="other", + errors=(), + ) + + assert cache.idempotent_writes == 1 + assert cache.write_once_conflicts == 1 + assert cache.write_once is True + + def test_error_bloat_and_combined_payload_properties(self) -> None: + """Skip counters expose count- and payload-based rejections separately.""" + cache = IntegrityCache(max_entry_payload_bytes=200, max_errors_per_entry=1) + count_errors = ( + FrozenFluentError("first", ErrorCategory.REFERENCE), + FrozenFluentError("second", ErrorCategory.REFERENCE), + ) + cache.put( + "count", + None, + None, + "en", + use_isolating=True, + formatted="value", + errors=count_errors, + ) + payload_error = FrozenFluentError("x" * 180, ErrorCategory.REFERENCE) + cache.put( + "payload", + None, + None, + "en", + use_isolating=True, + formatted="x" * 80, + errors=(payload_error,), + ) + + assert cache.error_bloat_skips == 1 + assert cache.combined_payload_skips == 1 + + def test_uncacheable_function_skip_property(self) -> None: + """The cache exposes intentional non-cacheable bypass counts.""" + cache = IntegrityCache(enable_debug_log=True) + cache.note_uncacheable_result( + "msg", + {"value": "x"}, + None, + "en", + use_isolating=True, + ) + + assert cache.uncacheable_function_skips == 1 + assert cache.get_debug_log()[0].operation == "BYPASS_NONCACHEABLE_FUNCTION" + @given( st.integers(min_value=1, max_value=1000), st.integers(min_value=1, max_value=10000), @@ -147,18 +242,17 @@ def test_oversize_skips_increments_on_oversize_entry(self) -> None: def test_property_constructor_parameters_stored_correctly( self, maxsize: int, - max_entry_weight: int, + max_entry_payload_bytes: int, max_errors_per_entry: int, ) -> None: """PROPERTY: Constructor parameters are stored and reflected by properties.""" cache = IntegrityCache( - strict=False, maxsize=maxsize, - max_entry_weight=max_entry_weight, + max_entry_payload_bytes=max_entry_payload_bytes, max_errors_per_entry=max_errors_per_entry, ) assert cache.maxsize == maxsize - assert cache.max_entry_weight == max_entry_weight + assert cache.max_entry_payload_bytes == max_entry_payload_bytes assert cache.size == 0 assert cache.hits == 0 assert cache.misses == 0 @@ -168,7 +262,7 @@ def test_property_constructor_parameters_stored_correctly( @settings(max_examples=50) def test_property_primitive_args_always_cacheable(self, text: str) -> None: """PROPERTY: All primitive FluentValue types produce valid, retrievable entries.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() args_list: list[dict[str, FluentValue]] = [ {"text": text}, diff --git a/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py b/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py index 603eb9c1..928e4fe9 100644 --- a/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py +++ b/tests/runtime_cache_hashable_cases/section_1_initialization_validation.py @@ -21,15 +21,15 @@ def test_maxsize_negative_rejected(self) -> None: with pytest.raises(ValueError, match="maxsize must be positive"): IntegrityCache(maxsize=-1) - def test_max_entry_weight_zero_rejected(self) -> None: - """IntegrityCache rejects max_entry_weight=0.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(max_entry_weight=0) - - def test_max_entry_weight_negative_rejected(self) -> None: - """IntegrityCache rejects negative max_entry_weight.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(max_entry_weight=-1) + def test_max_entry_payload_bytes_zero_rejected(self) -> None: + """IntegrityCache rejects max_entry_payload_bytes=0.""" + with pytest.raises(ValueError, match="max_entry_payload_bytes must be positive"): + IntegrityCache(max_entry_payload_bytes=0) + + def test_max_entry_payload_bytes_negative_rejected(self) -> None: + """IntegrityCache rejects negative max_entry_payload_bytes.""" + with pytest.raises(ValueError, match="max_entry_payload_bytes must be positive"): + IntegrityCache(max_entry_payload_bytes=-1) def test_max_errors_per_entry_zero_rejected(self) -> None: """IntegrityCache rejects max_errors_per_entry=0.""" diff --git a/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py b/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py index 342fb053..d9927dce 100644 --- a/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py +++ b/tests/runtime_cache_hashable_cases/section_4_make_key_integration.py @@ -12,15 +12,16 @@ class TestMakeKey: """Test _make_key integration with _make_hashable. _make_key builds a cache key tuple from (message_id, args, attribute, - locale_code, use_isolating). Returns None on any hashing failure, - allowing cache bypass without raising to the caller. + locale_code, use_isolating, function_generation). Returns None on hashing + failure so the cache layer can raise one typed boundary error instead of + performing a partial lookup. """ def test_make_key_with_none_args(self) -> None: """_make_key with None args returns key with empty tuple for args component.""" key = IntegrityCache._make_key("msg-id", None, None, "en-US", use_isolating=True) assert key is not None - assert key == ("msg-id", (), None, "en-US", True) + assert key == ("msg-id", (), None, "en-US", True, 0) def test_make_key_with_simple_args(self) -> None: """_make_key handles simple string/int arguments.""" diff --git a/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py b/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py index 96db8094..d787d876 100644 --- a/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py +++ b/tests/runtime_cache_hashable_cases/section_5_na_n_normalization.py @@ -13,7 +13,7 @@ class TestNaNDecimalNormalization: def test_decimal_nan_cache_key_consistency(self) -> None: """Decimal NaN produces consistent cache key across independent instances.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="Decimal Result", errors=()) entry = cache.get("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True) assert entry is not None @@ -21,7 +21,7 @@ def test_decimal_nan_cache_key_consistency(self) -> None: def test_decimal_nan_does_not_pollute_cache(self) -> None: """Multiple puts with Decimal NaN update the same entry.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) for i in range(10): cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) stats = cache.get_stats() @@ -32,7 +32,7 @@ def test_decimal_nan_does_not_pollute_cache(self) -> None: def test_decimal_snan_normalized_same_as_qnan(self) -> None: """Signaling NaN and quiet NaN both normalize to the same canonical key.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="QNaN", errors=()) # sNaN should resolve to same cache key as qNaN entry = cache.get("msg", {"val": Decimal("sNaN")}, None, "en", use_isolating=True) @@ -40,7 +40,7 @@ def test_decimal_snan_normalized_same_as_qnan(self) -> None: def test_decimal_nan_different_from_regular_decimal(self) -> None: """Decimal NaN has different cache key from regular Decimal values.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() cache.put("msg", {"val": Decimal("NaN")}, None, "en", use_isolating=True, formatted="NaN Result", errors=()) cache.put("msg", {"val": Decimal("1.0")}, None, "en", use_isolating=True, formatted="Regular Result", errors=()) @@ -59,7 +59,7 @@ class TestNaNInNestedStructures: def test_nan_in_list_normalized(self) -> None: """NaN values within lists are normalized for cache key consistency.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() items = [Decimal(1), Decimal("NaN"), Decimal(3)] cache.put("msg", {"items": items}, None, "en", use_isolating=True, formatted="List Result", errors=()) entry = cache.get("msg", {"items": items}, None, "en", use_isolating=True) @@ -68,7 +68,7 @@ def test_nan_in_list_normalized(self) -> None: def test_nan_in_dict_normalized(self) -> None: """NaN values within dicts are normalized for cache key consistency.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() args: dict[str, FluentValue] = {"data": {"a": Decimal(1), "b": Decimal("NaN")}} cache.put("msg", args, None, "en", use_isolating=True, formatted="Dict Result", errors=()) data = {"a": Decimal(1), "b": Decimal("NaN")} @@ -78,7 +78,7 @@ def test_nan_in_dict_normalized(self) -> None: def test_deeply_nested_nan_normalized(self) -> None: """NaN values in deeply nested structures are normalized consistently.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() deep_args: dict[str, FluentValue] = { "outer": { "inner": [ @@ -111,7 +111,7 @@ def test_nan_cache_pollution_prevented(self) -> None: create 100 unique, unretrievable entries, evicting all legitimate entries. With normalization all NaN entries collapse to a single key. """ - cache = IntegrityCache(strict=False, maxsize=10) + cache = IntegrityCache(maxsize=10) for i in range(5): cache.put(f"legit{i}", None, None, "en", use_isolating=True, formatted=f"Legit {i}", errors=()) for i in range(100): @@ -133,7 +133,7 @@ def test_all_decimal_special_values_produce_retrievable_keys( self, value: Decimal ) -> None: """PROPERTY: For any Decimal value, put followed by get returns the entry.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() args = {"val": value} cache.put("msg", args, None, "en", use_isolating=True, formatted=f"Value: {value}", errors=()) entry = cache.get("msg", args, None, "en", use_isolating=True) diff --git a/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py b/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py index 5a2e3fa7..475d8ebb 100644 --- a/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py +++ b/tests/runtime_cache_hashable_cases/section_6_hashable_conversion_cache_roundtrip_tests.py @@ -18,7 +18,7 @@ class TestCacheHashableConversion: # pylint: disable=too-many-public-methods def test_get_with_list_value_now_cacheable(self) -> None: """get() with list args succeeds: lists are converted to type-tagged tuples.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"key": [1, 2, 3]} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) @@ -29,7 +29,7 @@ def test_get_with_list_value_now_cacheable(self) -> None: def test_get_with_dict_value_now_cacheable(self) -> None: """get() with nested dict args succeeds: dicts are converted to sorted tuples.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"key": {"nested": "value"}} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) @@ -40,7 +40,7 @@ def test_get_with_dict_value_now_cacheable(self) -> None: def test_get_with_set_value_now_cacheable(self) -> None: """get() with set args succeeds: sets are converted to type-tagged frozensets.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, object] = {"key": {1, 2, 3}} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] @@ -51,14 +51,14 @@ def test_get_with_set_value_now_cacheable(self) -> None: def test_put_with_list_value_now_caches(self) -> None: """put() with list args stores entry: lists are converted at key build time.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) cache.put("msg-id", {"items": [1, 2, 3]}, None, "en-US", use_isolating=True, formatted="formatted", errors=()) assert len(cache) == 1 assert cache.unhashable_skips == 0 def test_put_with_dict_value_now_caches(self) -> None: """put() with nested dict args stores entry: dicts are converted at key build.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) cache.put("msg-id", {"config": {"option": "value"}}, None, "en-US", use_isolating=True, formatted="fmt", errors=()) assert len(cache) == 1 assert cache.unhashable_skips == 0 @@ -81,7 +81,7 @@ def test_make_key_converts_nested_structures_to_valid_key(self) -> None: def test_get_with_tuple_value_cacheable(self) -> None: """get() caches tuple-valued args correctly via type-tagged conversion.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"coords": (10, 20, 30)} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) @@ -92,7 +92,7 @@ def test_get_with_tuple_value_cacheable(self) -> None: def test_get_with_tuple_containing_list_cacheable(self) -> None: """get() caches tuple-with-nested-list args: nested list is converted.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, object] = {"data": (1, [2, 3], 4)} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] @@ -106,7 +106,7 @@ def test_get_with_various_tuples_cacheable( self, tuple_value: tuple[int, int, int] ) -> None: """PROPERTY: Tuple-valued args cache and retrieve correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"tuple_arg": tuple_value} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) @@ -118,7 +118,7 @@ def test_get_with_various_tuples_cacheable( @given(st.lists(st.integers(), min_size=1, max_size=10)) def test_get_with_various_lists_cacheable(self, list_value: list[int]) -> None: """PROPERTY: List-valued args cache and retrieve correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"list_arg": list_value} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) @@ -134,7 +134,7 @@ def test_get_with_various_lists_cacheable(self, list_value: list[int]) -> None: ) def test_put_with_various_dicts_cacheable(self, dict_value: dict[str, int]) -> None: """PROPERTY: Dict-valued args cache correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args = {"dict_arg": dict_value} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) assert len(cache) == 1 @@ -143,7 +143,7 @@ def test_put_with_various_dicts_cacheable(self, dict_value: dict[str, int]) -> N def test_mixed_hashable_and_convertible_args(self) -> None: """Cache handles mixed hashable/convertible args in the same call.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, object] = { "str_arg": "value", "int_arg": 42, @@ -157,7 +157,7 @@ def test_mixed_hashable_and_convertible_args(self) -> None: def test_empty_list_cacheable(self) -> None: """Empty lists are converted and cached correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, list[object]] = {"empty_list": []} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] @@ -167,7 +167,7 @@ def test_empty_list_cacheable(self) -> None: def test_empty_dict_cacheable(self) -> None: """Empty dicts are converted and cached correctly.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, dict[object, object]] = {"empty_dict": {}} cache.put("msg-id", args, None, "en-US", use_isolating=True, formatted="formatted", errors=()) # type: ignore[arg-type] cached = cache.get("msg-id", args, None, "en-US", use_isolating=True) # type: ignore[arg-type] diff --git a/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py b/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py index 9576a26d..59ed9620 100644 --- a/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py +++ b/tests/runtime_cache_hashable_cases/section_7_unhashable_argument_handling.py @@ -9,38 +9,38 @@ class TestUnhashableHandling: - """Test graceful bypass for arguments that cannot be hashed. + """Test fail-closed handling for arguments that cannot be keyed. - Covers three bypass mechanisms: + Covers three rejection mechanisms: 1. Unknown type in _make_hashable (case _ branch) 2. Python's hash() raising TypeError 3. RecursionError from circular references - In all cases: entry is not cached, unhashable_skips increments. + In all cases: the cache rejects the operation and records one + ``unhashable_skips`` integrity counter increment. """ - def test_get_with_unknown_type_skips_cache(self) -> None: - """get() with unknown type arg bypasses cache and increments unhashable_skips. + def test_get_with_unknown_type_raises_key_contract_error(self) -> None: + """get() rejects unknown-type args and increments unhashable_skips. - UnknownType is not recognized by _make_hashable's match/case dispatch, - triggering TypeError("Unknown type in cache key") → _make_key returns None. - An unhashable bypass is not a cache miss: no key was looked up, so misses - is not incremented. Only unhashable_skips reflects the event. + UnknownType is not recognized by the canonical key encoder, so the cache + raises ``CacheKeySerializationError`` instead of pretending the lookup + was an ordinary miss. """ - cache = IntegrityCache(strict=False) + cache = IntegrityCache() class UnknownType: pass args: dict[str, object] = {"data": UnknownType()} - result = cache.get("msg", args, None, "en", use_isolating=True) # type: ignore[arg-type] - assert result is None + with pytest.raises(CacheKeySerializationError): + cache.get("msg", args, None, "en", use_isolating=True) # type: ignore[arg-type] assert cache.unhashable_skips == 1 assert cache.misses == 0 assert cache.hits == 0 - def test_put_with_unhashable_hash_raises_skips_cache(self) -> None: - """put() with arg whose __hash__ raises TypeError skips caching.""" - cache = IntegrityCache(strict=False) + def test_put_with_unhashable_hash_raises_key_contract_error(self) -> None: + """put() rejects arg values that cannot be encoded into the cache key.""" + cache = IntegrityCache() class CustomObject: def __hash__(self) -> int: # pylint: disable=invalid-hash-returned @@ -48,13 +48,22 @@ def __hash__(self) -> int: # pylint: disable=invalid-hash-returned raise TypeError(msg) args: dict[str, object] = {"obj": CustomObject()} - cache.put("msg", args, None, "en", use_isolating=True, formatted="result", errors=()) # type: ignore[arg-type] + with pytest.raises(CacheKeySerializationError): + cache.put( + "msg", + args, + None, + "en", + use_isolating=True, + formatted="result", + errors=(), + ) # type: ignore[arg-type] assert cache.size == 0 assert cache.unhashable_skips == 1 - def test_unhashable_custom_object_in_get_skipped(self) -> None: - """Custom unhashable objects in get() args bypass caching gracefully.""" - cache = IntegrityCache(strict=False, maxsize=100) + def test_unhashable_custom_object_in_get_raises(self) -> None: + """Custom unhashable objects in get() args are rejected explicitly.""" + cache = IntegrityCache(maxsize=100) class UnhashableClass: def __init__(self) -> None: @@ -65,13 +74,13 @@ def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned raise TypeError(msg) custom_args: dict[str, object] = {"custom": UnhashableClass()} - result = cache.get("msg-id", custom_args, None, "en-US", use_isolating=True) # type: ignore[arg-type] - assert result is None + with pytest.raises(CacheKeySerializationError): + cache.get("msg-id", custom_args, None, "en-US", use_isolating=True) # type: ignore[arg-type] assert cache.unhashable_skips == 1 def test_unhashable_skips_not_incremented_for_convertible_types(self) -> None: """unhashable_skips only counts truly unhashable objects; lists/dicts do not.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) assert cache.unhashable_skips == 0 cache.get("msg1", {"list": [1]}, None, "en-US", use_isolating=True) @@ -82,14 +91,15 @@ def test_unhashable_skips_not_incremented_for_convertible_types(self) -> None: def test_unhashable_skips_preserved_on_clear(self) -> None: """clear() does not reset unhashable_skips; counter is cumulative.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) class UnhashableClass: def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned msg = "unhashable type" raise TypeError(msg) - cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] assert cache.unhashable_skips == 1 # clear() removes entries but preserves cumulative observability metrics. cache.clear() @@ -101,14 +111,15 @@ def test_get_stats_includes_unhashable_skips(self) -> None: Unhashable args bypass the cache entirely; no key lookup occurs. misses counts only true cache misses (key looked up, not found). """ - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) class UnhashableClass: def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned msg = "unhashable type" raise TypeError(msg) - cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] + with pytest.raises(CacheKeySerializationError): + cache.get("msg", {"obj": UnhashableClass()}, None, "en-US", use_isolating=True) # type: ignore[dict-item] stats = cache.get_stats() assert "unhashable_skips" in stats assert stats["unhashable_skips"] == 1 @@ -116,65 +127,68 @@ def __hash__(self) -> NoReturn: # pylint: disable=invalid-hash-returned def test_hashable_args_do_not_increment_unhashable_skips(self) -> None: """Fully hashable primitive args never increment unhashable_skips.""" - cache = IntegrityCache(strict=False, maxsize=100) + cache = IntegrityCache(maxsize=100) args: dict[str, FluentValue] = {"str": "value", "int": 42, "decimal": Decimal("3.14")} cache.get("msg1", args, None, "en-US", use_isolating=True) cache.put("msg2", args, None, "en-US", use_isolating=True, formatted="result", errors=()) assert cache.unhashable_skips == 0 - def test_put_with_circular_reference_increments_skip_counter(self) -> None: - """Circular reference in args increments unhashable_skips and skips storage.""" - cache = IntegrityCache(strict=False, maxsize=100) + def test_put_with_circular_reference_raises(self) -> None: + """Circular reference in args raises and increments unhashable_skips.""" + cache = IntegrityCache(maxsize=100) circular: dict[str, object] = {} circular["self"] = circular # Circular reference assert cache.unhashable_skips == 0 - cache.put( - message_id="test", - args=circular, # type: ignore[arg-type] - attribute=None, - locale_code="en", - use_isolating=True, - formatted="output", - errors=(), - ) + with pytest.raises(CacheKeySerializationError): + cache.put( + message_id="test", + args=circular, # type: ignore[arg-type] + attribute=None, + locale_code="en", + use_isolating=True, + formatted="output", + errors=(), + ) assert cache.unhashable_skips == 1 assert len(cache) == 0 - def test_put_with_nested_circular_reference_increments_skip(self) -> None: - """Nested circular reference also triggers unhashable_skips increment.""" - cache = IntegrityCache(strict=False, maxsize=50) + def test_put_with_nested_circular_reference_raises(self) -> None: + """Nested circular reference also triggers explicit rejection.""" + cache = IntegrityCache(maxsize=50) nested: dict[str, object] = {"level1": {}} nested["level1"]["back"] = nested # type: ignore[index] initial_skips = cache.unhashable_skips - cache.put( - message_id="nested_test", - args=nested, # type: ignore[arg-type] - attribute=None, - locale_code="lv", - use_isolating=True, - formatted="result", - errors=(), - ) + with pytest.raises(CacheKeySerializationError): + cache.put( + message_id="nested_test", + args=nested, # type: ignore[arg-type] + attribute=None, + locale_code="lv", + use_isolating=True, + formatted="result", + errors=(), + ) assert cache.unhashable_skips == initial_skips + 1 assert len(cache) == 0 - def test_put_with_custom_unhashable_in_args_dict(self) -> None: - """Custom unhashable object as a dict value triggers skip.""" - cache = IntegrityCache(strict=False, maxsize=100) + def test_put_with_custom_unhashable_in_args_dict_raises(self) -> None: + """Custom unhashable object as a dict value raises fail-closed.""" + cache = IntegrityCache(maxsize=100) class UnhashableObject: __hash__ = None # type: ignore[assignment] unhashable_args = {"obj": UnhashableObject()} initial_skips = cache.unhashable_skips - cache.put( - message_id="custom_obj", - args=unhashable_args, # type: ignore[arg-type] - attribute="attr", - locale_code="en_US", - use_isolating=True, - formatted="value", - errors=(), - ) + with pytest.raises(CacheKeySerializationError): + cache.put( + message_id="custom_obj", + args=unhashable_args, # type: ignore[arg-type] + attribute="attr", + locale_code="en_US", + use_isolating=True, + formatted="value", + errors=(), + ) assert cache.unhashable_skips == initial_skips + 1 assert len(cache) == 0 diff --git a/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py b/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py index 6d8051e0..a7fa8a76 100644 --- a/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py +++ b/tests/runtime_cache_hashable_cases/section_8_error_bloat_protection.py @@ -12,12 +12,12 @@ class TestIntegrityCacheErrorBloatProtection: """Test IntegrityCache error collection memory bounding. Prevents unbounded memory use when a single message generates many errors. - Two limits: max_errors_per_entry (count) and max_entry_weight (bytes). + Two limits: max_errors_per_entry (count) and max_entry_payload_bytes (bytes). """ def test_put_rejects_excessive_error_count(self) -> None: """put() skips caching when error count exceeds max_errors_per_entry.""" - cache = IntegrityCache(strict=False, max_errors_per_entry=10) + cache = IntegrityCache(max_errors_per_entry=10) errors = tuple( FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(15) ) @@ -26,26 +26,22 @@ def test_put_rejects_excessive_error_count(self) -> None: assert cache.get_stats()["error_bloat_skips"] == 1 assert cache.get("msg", None, None, "en", use_isolating=True) is None - def test_put_rejects_excessive_error_weight(self) -> None: - """put() skips caching when total weight exceeds max_entry_weight. - - Dynamic weight: base (100) + string len + per-error weights. - 10 errors with 100-char messages + 100-char formatted string exceeds 2000. - """ - cache = IntegrityCache(strict=False, max_entry_weight=2000, max_errors_per_entry=50) + def test_put_rejects_excessive_error_payload(self) -> None: + """put() skips caching when retained payload bytes exceed the budget.""" + cache = IntegrityCache(max_entry_payload_bytes=1000, max_errors_per_entry=50) errors = tuple( FrozenFluentError("E" * 100, ErrorCategory.REFERENCE) for _ in range(10) ) cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=errors) assert cache.size == 0 - # 10 errors pass the count check (10 <= 50), but combined weight - # (100 formatted + 10 * 200 per error = 2100) exceeds max_entry_weight=2000. - assert cache.get_stats()["combined_weight_skips"] == 1 + # 10 errors pass the count check (10 <= 50), but the retained payload + # (100 formatted bytes + 10 * 108 error bytes) exceeds the 1000-byte budget. + assert cache.get_stats()["combined_payload_skips"] == 1 assert cache.get_stats()["error_bloat_skips"] == 0 def test_put_accepts_reasonable_error_collections(self) -> None: """put() caches results with error counts and weights within limits.""" - cache = IntegrityCache(strict=False, max_entry_weight=15000, max_errors_per_entry=50) + cache = IntegrityCache(max_entry_payload_bytes=15000, max_errors_per_entry=50) errors = tuple( FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) for i in range(10) ) diff --git a/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py b/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py index dfc30ec7..e3a87d9b 100644 --- a/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py +++ b/tests/runtime_cache_hashable_cases/section_9_lru_eviction_behavior.py @@ -13,7 +13,7 @@ class TestIntegrityCacheLRUBehavior: def test_put_moves_existing_key_to_end_of_lru(self) -> None: """put() on existing key marks it as recently used (moves to LRU tail).""" - cache = IntegrityCache(strict=False, maxsize=3) + cache = IntegrityCache(maxsize=3) cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) cache.put("msg3", None, None, "en", use_isolating=True, formatted="result3", errors=()) @@ -35,7 +35,7 @@ def test_put_moves_existing_key_to_end_of_lru(self) -> None: def test_put_evicts_lru_entry_when_cache_full(self) -> None: """put() evicts the least recently used entry when capacity is reached.""" - cache = IntegrityCache(strict=False, maxsize=2) + cache = IntegrityCache(maxsize=2) cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) assert cache.size == 2 diff --git a/tests/runtime_cache_integrity_cases/checksums.py b/tests/runtime_cache_integrity_cases/checksums.py index f711602b..0f795773 100644 --- a/tests/runtime_cache_integrity_cases/checksums.py +++ b/tests/runtime_cache_integrity_cases/checksums.py @@ -1,274 +1,94 @@ # mypy: ignore-errors from __future__ import annotations -import contextlib +from dataclasses import replace +from unittest.mock import patch import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st -from ftllexengine.diagnostics import ( - ErrorCategory, - FrozenFluentError, -) -from ftllexengine.integrity import CacheCorruptionError -from ftllexengine.runtime.cache import ( - IntegrityCache, - IntegrityCacheEntry, -) +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.integrity import CacheCorruptionError, IntegrityCheckFailedError +from ftllexengine.runtime.cache import IntegrityCache, IntegrityCacheEntry -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 +_FG = 0 +_NO_KEY_HASH: bytes = b"\x00" * 16 -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ +def _put(cache: IntegrityCache, message_id: str, formatted: str) -> None: + cache.put( + message_id, + None, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted=formatted, + errors=(), + ) -class TestChecksumComputation: - """Test BLAKE2b-128 checksum computation.""" +def _get(cache: IntegrityCache, message_id: str) -> IntegrityCacheEntry | None: + return cache.get( + message_id, + None, + None, + "en", + use_isolating=True, + function_generation=_FG, + ) - def test_checksum_computed_on_create(self) -> None: - """IntegrityCacheEntry.create() computes checksum.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.checksum is not None - assert len(entry.checksum) == 16 # BLAKE2b-128 = 16 bytes - def test_different_metadata_different_checksum(self) -> None: - """Different metadata (sequence, timestamp) produces different checksums. +class TestEntryChecksumContract: + """Entry digests detect accidental mutation and key confusion.""" - Checksums now include created_at and sequence for complete audit trail integrity. - Identical content with different metadata produces different checksums. - """ - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) - # Checksums differ because sequence is different (and created_at likely differs) - assert entry1.checksum != entry2.checksum + def test_entry_create_self_verifies(self) -> None: + error = FrozenFluentError("problem", ErrorCategory.REFERENCE) + entry = IntegrityCacheEntry.create("value", (error,), sequence=1, key_hash=_NO_KEY_HASH) - def test_different_content_different_checksum(self) -> None: - """Different content produces different checksums.""" - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry1.checksum != entry2.checksum - - def test_errors_affect_checksum(self) -> None: - """Errors are included in checksum computation.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry_with_errors = IntegrityCacheEntry.create( - "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry_no_errors.checksum != entry_with_errors.checksum - - def test_verify_returns_true_for_valid_entry(self) -> None: - """verify() returns True for uncorrupted entry.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - - def test_entry_as_result_preserves_content(self) -> None: - """as_result() returns correct (formatted, errors) pair.""" - errors = (FrozenFluentError("Test", ErrorCategory.REFERENCE),) - entry = IntegrityCacheEntry.create("Hello", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.as_result() == ("Hello", errors) - - @given(st.text(min_size=0, max_size=1000)) - @settings(max_examples=50) - def test_checksum_validates_correctly(self, text: str) -> None: - """PROPERTY: Checksum validation is deterministic for same entry. - - Checksums now include metadata (created_at, sequence) for complete audit - trail integrity. Different entries with same content will have different - checksums due to different timestamps. We verify that each entry's - checksum validates correctly. - """ - entry = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) - # Each entry should validate its own checksum correctly + assert len(entry.checksum) == 16 assert entry.verify() is True - event(f"text_len={len(text)}") -class TestCorruptionDetectionStrictMode: - """Test corruption detection in strict mode (fail-fast).""" + def test_same_content_with_different_metadata_has_different_checksum(self) -> None: + first = IntegrityCacheEntry.create("value", (), sequence=1, key_hash=_NO_KEY_HASH) + second = IntegrityCacheEntry.create("value", (), sequence=2, key_hash=_NO_KEY_HASH) - def test_strict_mode_raises_on_corruption(self) -> None: - """strict=True raises CacheCorruptionError on checksum mismatch.""" - cache = IntegrityCache(strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + assert first.checksum != second.checksum - # Simulate corruption by directly modifying internal state - key = next(iter(cache._cache.keys())) - original_entry = cache._cache[key] + def test_immediate_verification_failure_raises_on_put(self) -> None: + cache = IntegrityCache() + with patch.object(IntegrityCacheEntry, "verify", return_value=False), pytest.raises( + IntegrityCheckFailedError, + match="immediate verification", + ): + _put(cache, "msg", "Hello") - # Create corrupted entry with wrong checksum - corrupted = IntegrityCacheEntry( - formatted="Corrupted!", - errors=original_entry.errors, - checksum=original_entry.checksum, # Wrong checksum for new content - created_at=original_entry.created_at, - sequence=original_entry.sequence, - key_hash=original_entry.key_hash, - ) - cache._cache[key] = corrupted - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg", None, None, "en", use_isolating=True) +class TestCacheCorruptionDetection: + """Cache lookups must raise on detected integrity failures.""" - assert "corruption detected" in str(exc_info.value).lower() - assert exc_info.value.context is not None - assert exc_info.value.context.component == "cache" + def test_corrupted_entry_raises_cache_corruption_error(self) -> None: + cache = IntegrityCache() + _put(cache, "msg", "Hello") - def test_strict_mode_corruption_counter_incremented(self) -> None: - """Corruption detection increments corruption_detected counter.""" - cache = IntegrityCache(strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Corrupt entry - key = next(iter(cache._cache.keys())) + key = next(iter(cache._cache)) entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - with contextlib.suppress(CacheCorruptionError): - cache.get("msg", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["corruption_detected"] == 1 + cache._cache[key] = replace(entry, formatted="Corrupted") -class TestCorruptionDetectionNonStrictMode: - """Test corruption detection in non-strict mode (silent eviction).""" + with pytest.raises(CacheCorruptionError, match="corruption detected"): + _get(cache, "msg") - def test_non_strict_evicts_corrupted_entry(self) -> None: - """strict=False silently evicts corrupted entry.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + def test_key_confusion_raises_cache_corruption_error(self) -> None: + cache = IntegrityCache() + _put(cache, "msg", "Hello") - # Verify entry exists - assert cache.get("msg", None, None, "en", use_isolating=True) is not None - - # Corrupt entry - key = next(iter(cache._cache.keys())) + key = next(iter(cache._cache)) entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, + cache._cache[key] = IntegrityCacheEntry.create( + entry.formatted, + entry.errors, sequence=entry.sequence, - key_hash=entry.key_hash, + key_hash=b"\x01" * 16, ) - cache._cache[key] = corrupted - - # Get returns None (not an exception) - result = cache.get("msg", None, None, "en", use_isolating=True) - assert result is None - - # Entry was evicted - stats = cache.get_stats() - assert stats["size"] == 0 - assert stats["corruption_detected"] == 1 - - def test_non_strict_records_miss_on_corruption(self) -> None: - """Corrupted entry results in cache miss.""" - cache = IntegrityCache(strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # First get is a hit - cache.get("msg", None, None, "en", use_isolating=True) - stats = cache.get_stats() - assert stats["hits"] == 1 - assert stats["misses"] == 0 - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Second get is a miss (corruption detected, entry evicted) - cache.get("msg", None, None, "en", use_isolating=True) - stats = cache.get_stats() - assert stats["misses"] == 1 # Corruption triggers miss - -class TestKeyBindingConfusion: - """Cover the key-binding confusion check (lines 652-670). - - The key-binding check fires when an entry's stored key_hash doesn't match - the hash of the lookup key. This is distinct from a checksum mismatch: - the entry is internally consistent (verify() passes) but is stored under - the wrong key slot — a sign of active tampering or memory corruption. - - Strategy: put an entry under key B, inject it into the slot for key A, - then call get(key A). verify() passes (entry_b is internally valid) but - the key_hash bound to key B != _compute_key_hash(key A). - """ - - @staticmethod - def _inject_key_confused_entry(cache: IntegrityCache) -> None: - """Put msg-b, then move its entry into the msg-a slot.""" - cache.put("msg-b", None, None, "en", use_isolating=True, formatted="Hello B", errors=()) - key_b: tuple = ("msg-b", (), None, "en", True) - key_a: tuple = ("msg-a", (), None, "en", True) - # Inject entry_b under key_a — checksum is valid but key_hash is wrong - cache._cache[key_a] = cache._cache[key_b] - - def test_key_confusion_strict_raises(self) -> None: - """strict=True raises CacheCorruptionError on key-binding mismatch.""" - cache = IntegrityCache(strict=True) - self._inject_key_confused_entry(cache) - - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg-a", None, None, "en", use_isolating=True) - - assert "key confusion" in str(exc_info.value).lower() - assert exc_info.value.context is not None - assert exc_info.value.context.component == "cache" - assert exc_info.value.context.operation == "get" - - def test_key_confusion_strict_increments_counter(self) -> None: - """Key-binding confusion increments corruption_detected counter.""" - cache = IntegrityCache(strict=True) - self._inject_key_confused_entry(cache) - - with contextlib.suppress(CacheCorruptionError): - cache.get("msg-a", None, None, "en", use_isolating=True) - - assert cache.get_stats()["corruption_detected"] == 1 - - def test_key_confusion_non_strict_returns_none(self) -> None: - """strict=False evicts the confused entry and returns None.""" - cache = IntegrityCache(strict=False) - self._inject_key_confused_entry(cache) - - result = cache.get("msg-a", None, None, "en", use_isolating=True) - - assert result is None - stats = cache.get_stats() - assert stats["corruption_detected"] == 1 - assert stats["misses"] == 1 - - def test_key_confusion_non_strict_evicts_entry(self) -> None: - """Non-strict key confusion removes the confused entry from the cache.""" - cache = IntegrityCache(strict=False) - self._inject_key_confused_entry(cache) - - key_a: tuple = ("msg-a", (), None, "en", True) - assert key_a in cache._cache # Injected entry is present - - cache.get("msg-a", None, None, "en", use_isolating=True) - assert key_a not in cache._cache + with pytest.raises(CacheCorruptionError, match="key confusion detected"): + _get(cache, "msg") diff --git a/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py b/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py index 2e45bc6b..8837d6b6 100644 --- a/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py +++ b/tests/runtime_cache_integrity_cases/idempotence_and_hashes.py @@ -1,383 +1,99 @@ # mypy: ignore-errors -# mypy: ignore-errors from __future__ import annotations -import threading -from datetime import UTC from decimal import Decimal import pytest -from hypothesis import event, given, settings -from hypothesis import strategies as st -from ftllexengine.diagnostics import ( - ErrorCategory, - FrozenFluentError, -) +from ftllexengine.core.value_types import FluentNumber from ftllexengine.integrity import WriteConflictError -from ftllexengine.runtime.cache import ( - IntegrityCache, - IntegrityCacheEntry, -) - -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 - -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ - - - -class TestContentHash: - """Test content-only hash computation for idempotent write detection.""" - - def test_content_hash_computed(self) -> None: - """IntegrityCacheEntry has content_hash property.""" - entry = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - content_hash = entry.content_hash - assert content_hash is not None - assert len(content_hash) == 16 # BLAKE2b-128 - - def test_identical_content_same_hash(self) -> None: - """Entries with identical content have identical content hashes. - - This is critical for idempotent write detection: concurrent threads - computing the same formatted result should produce matching content hashes. - """ - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("Hello", (), sequence=2, key_hash=_NO_KEY_HASH) - - # Full checksums differ (include metadata) - assert entry1.checksum != entry2.checksum - - # Content hashes are identical - assert entry1.content_hash == entry2.content_hash - - def test_different_content_different_hash(self) -> None: - """Entries with different content have different content hashes.""" - entry1 = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("World", (), sequence=1, key_hash=_NO_KEY_HASH) - - assert entry1.content_hash != entry2.content_hash - - def test_errors_affect_content_hash(self) -> None: - """Errors are included in content hash computation.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry_no_errors = IntegrityCacheEntry.create("Hello", (), sequence=1, key_hash=_NO_KEY_HASH) - entry_with_errors = IntegrityCacheEntry.create( - "Hello", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - - assert entry_no_errors.content_hash != entry_with_errors.content_hash - - @given(st.text(min_size=0, max_size=500)) - @settings(max_examples=30) - def test_content_hash_deterministic(self, text: str) -> None: - """PROPERTY: Content hash is deterministic for same content.""" - entry1 = IntegrityCacheEntry.create(text, (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create(text, (), sequence=999, key_hash=_NO_KEY_HASH) - - assert entry1.content_hash == entry2.content_hash - event(f"text_len={len(text)}") - -class TestIdempotentWrites: - """Test idempotent write detection for thundering herd scenarios. - - In write_once mode, concurrent writes with identical content (formatted + errors) - are treated as idempotent operations, not conflicts. This prevents false-positive - WriteConflictError during thundering herds where multiple threads resolve the - same message simultaneously. - """ - - def test_idempotent_write_succeeds_in_strict_mode(self) -> None: - """Identical content is allowed in write_once + strict mode. - - Thundering herd scenario: Multiple threads resolve same message, - all compute identical results. Second thread should succeed silently. - """ - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Second put with IDENTICAL content should succeed (idempotent) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Verify entry unchanged - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - assert entry.sequence == 1 # Original sequence preserved - - def test_different_content_raises_conflict(self) -> None: - """Different content raises WriteConflictError in strict mode.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - with pytest.raises(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - def test_idempotent_write_counter_incremented(self) -> None: - """Idempotent writes increment the idempotent_writes counter.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Perform idempotent writes - for _ in range(5): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) +from ftllexengine.runtime.cache import IntegrityCache + +_FG = 0 + + +def _put( + cache: IntegrityCache, + message_id: str, + formatted: str, + *, + args: dict[str, object] | None = None, +) -> None: + cache.put( + message_id, + args, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted=formatted, + errors=(), + ) + + +def _get(cache: IntegrityCache, message_id: str, *, args: dict[str, object] | None = None): + return cache.get( + message_id, + args, + None, + "en", + use_isolating=True, + function_generation=_FG, + ) + + +class TestWriteOnceIdempotence: + """Write-once mode should distinguish benign duplicate writes from conflicts.""" + + def test_idempotent_duplicate_write_is_accepted(self) -> None: + cache = IntegrityCache(write_once=True) + + _put(cache, "msg", "Hello") + _put(cache, "msg", "Hello") stats = cache.get_stats() - assert stats["idempotent_writes"] == 5 - - def test_idempotent_writes_property(self) -> None: - """idempotent_writes property returns correct count.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - assert cache.idempotent_writes == 0 + assert stats["idempotent_writes"] == 1 + assert stats["write_once_conflicts"] == 0 + assert _get(cache, "msg").formatted == "Hello" - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - assert cache.idempotent_writes == 1 + def test_conflicting_duplicate_write_raises(self) -> None: + cache = IntegrityCache(write_once=True) - def test_idempotent_with_errors(self) -> None: - """Idempotent detection includes errors in comparison.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - cache = IntegrityCache(write_once=True, strict=True) + _put(cache, "msg", "Hello") - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) - - # Same content WITH same error = idempotent - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=(error,)) - assert cache.idempotent_writes == 1 - - # Same text but WITHOUT error = conflict - with pytest.raises(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - def test_idempotent_non_strict_mode(self) -> None: - """Idempotent writes also work in non-strict mode.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Idempotent write - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Different content silently ignored (non-strict) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + with pytest.raises(WriteConflictError, match="already cached"): + _put(cache, "msg", "World") stats = cache.get_stats() - assert stats["idempotent_writes"] == 1 # Only one idempotent - - # Original value preserved - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" - - def test_idempotent_counter_preserved_on_clear(self) -> None: - """Idempotent counter is cumulative across clear() calls.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - assert cache.idempotent_writes == 1 - - # clear() removes entries but does NOT reset cumulative metrics. - cache.clear() - - assert cache.idempotent_writes == 1 - - def test_audit_records_idempotent_writes(self) -> None: - """Audit log records WRITE_ONCE_IDEMPOTENT operations.""" - cache = IntegrityCache(write_once=True, strict=True, enable_audit=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_IDEMPOTENT" in operations - - def test_audit_records_conflict(self) -> None: - """Audit log records WRITE_ONCE_CONFLICT for different content.""" - cache = IntegrityCache(write_once=True, strict=False, enable_audit=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict (non-strict) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_CONFLICT" in operations - -class TestIdempotentWritesConcurrency: - """Test idempotent writes under concurrent access (thundering herd).""" - - def test_concurrent_identical_writes_no_exceptions(self) -> None: - """Concurrent writes with identical content all succeed (no exceptions). - - This is the thundering herd scenario: multiple threads resolve same - message simultaneously, all compute identical results. Without idempotent - detection, N-1 threads would crash with WriteConflictError. - """ - cache = IntegrityCache(write_once=True, strict=True) - errors: list[Exception] = [] - - def put_identical() -> None: - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - except Exception as e: # pylint: disable=broad-exception-caught - errors.append(e) - - # 20 threads all trying to cache same value - threads = [threading.Thread(target=put_identical) for _ in range(20)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # NO exceptions should occur (all are idempotent or first write) - assert len(errors) == 0, f"Got {len(errors)} exceptions: {errors}" - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - - # Idempotent counter should reflect concurrent writes minus first - assert stats["idempotent_writes"] == 19 # 20 threads - 1 first write - - def test_concurrent_different_writes_raises_conflicts(self) -> None: - """Concurrent writes with DIFFERENT content raise conflicts.""" - cache = IntegrityCache(write_once=True, strict=True) - conflict_count = 0 - lock = threading.Lock() - - def put_different(i: int) -> None: - nonlocal conflict_count - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - except WriteConflictError: - with lock: - conflict_count += 1 - - # 10 threads all trying to cache DIFFERENT values - threads = [threading.Thread(target=put_different, args=(i,)) for i in range(10)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # Most writes should fail (conflict) - assert conflict_count >= 9 # At least 9 conflicts (1 succeeds) - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - -class TestDatetimeTimezoneCollisionPrevention: - """Test that datetime objects with different timezones produce distinct cache keys. - - Two datetime objects can represent the same UTC instant but have different tzinfo. - Python's datetime equality considers them equal, but they format to different - local time strings. The cache must distinguish them. - """ - - def test_same_utc_instant_different_timezone_distinct_keys(self) -> None: - """Datetimes with same UTC instant but different tzinfo produce distinct keys.""" - from datetime import datetime, timedelta, timezone - - # 12:00 UTC - dt_utc = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) - # 07:00 EST (UTC-5) = 12:00 UTC - SAME INSTANT - dt_est = datetime(2024, 1, 1, 7, 0, 0, tzinfo=timezone(timedelta(hours=-5))) - - # Verify they represent the same instant (Python equality) - assert dt_utc == dt_est - - # But they should produce DIFFERENT cache keys - key_utc = IntegrityCache._make_hashable(dt_utc) - key_est = IntegrityCache._make_hashable(dt_est) - assert key_utc != key_est - - def test_naive_datetime_distinguished_from_aware(self) -> None: - """Naive datetime is distinguished from aware datetime.""" - from datetime import datetime - - dt_naive = datetime(2024, 1, 1, 12, 0, 0) # noqa: DTZ001 - naive datetime by design - dt_aware = datetime(2024, 1, 1, 12, 0, 0, tzinfo=UTC) - - key_naive = IntegrityCache._make_hashable(dt_naive) - key_aware = IntegrityCache._make_hashable(dt_aware) - - # Different tz_key means different cache keys - assert key_naive != key_aware - assert isinstance(key_naive, tuple) - assert isinstance(key_aware, tuple) - assert key_naive[2] == "__naive__" - assert key_aware[2] == "UTC" - -class TestDecimalNegativeZeroCollisionPrevention: - """Test that Decimal("0") and Decimal("-0") produce distinct cache keys. - - Python's Decimal("0") == Decimal("-0"), but locale-aware formatting may - distinguish them (e.g., "-0" vs "0"). The cache must treat them as distinct. - """ - - def test_zero_and_negative_zero_distinct_keys(self) -> None: - """Decimal("0") and Decimal("-0") produce distinct cache keys.""" - key_pos = IntegrityCache._make_hashable(Decimal(0)) - key_neg = IntegrityCache._make_hashable(Decimal("-0")) - - # They're equal in Python - assert Decimal(0) == Decimal("-0") - - # But distinct in cache keys (via str representation) - assert key_pos != key_neg - assert key_pos == ("__decimal__", "0") - assert key_neg == ("__decimal__", "-0") + assert stats["write_once_conflicts"] == 1 -class TestSequenceMappingABCSupport: - """Test that Sequence and Mapping ABCs are supported, not just list/tuple/dict.""" - def test_userlist_accepted(self) -> None: - """UserList (Sequence ABC) is accepted and type-tagged.""" - from collections import UserList +class TestHashableValueNormalization: + """Canonical hashable conversion should preserve semantic distinctions.""" - values = UserList([1, 2, 3]) - result = IntegrityCache._make_hashable(values) + def test_bool_and_int_do_not_collide(self) -> None: + cache = IntegrityCache(maxsize=10) + _put(cache, "msg", "bool", args={"value": True}) + _put(cache, "msg", "int", args={"value": 1}) - # Should be tagged as __seq__ (generic Sequence) - assert isinstance(result, tuple) - assert result[0] == "__seq__" - # Inner values are type-tagged - assert result[1] == (("__int__", 1), ("__int__", 2), ("__int__", 3)) + assert _get(cache, "msg", args={"value": True}).formatted == "bool" + assert _get(cache, "msg", args={"value": 1}).formatted == "int" - def test_chainmap_accepted(self) -> None: - """ChainMap (Mapping ABC) is accepted with __mapping__ tag.""" - from collections import ChainMap + def test_nan_decimal_values_share_one_stable_key(self) -> None: + cache = IntegrityCache(maxsize=10) + first = Decimal("NaN") + second = Decimal("NaN") - values: ChainMap[str, int] = ChainMap({"a": 1}, {"b": 2}) - result = IntegrityCache._make_hashable(values) + _put(cache, "msg", "nan", args={"value": first}) + assert _get(cache, "msg", args={"value": second}).formatted == "nan" - # Should be tagged tuple with __mapping__ prefix - assert isinstance(result, tuple) - assert result[0] == "__mapping__" - # ChainMap flattens to view of first-found keys - inner = result[1] - assert isinstance(inner, tuple) - assert ("a", ("__int__", 1)) in inner - assert ("b", ("__int__", 2)) in inner + def test_fluent_number_is_keyed_by_value_and_formatting_metadata(self) -> None: + cache = IntegrityCache(maxsize=10) + left = FluentNumber(Decimal("1.20"), "1.20", precision=2) + right = FluentNumber(Decimal("1.2"), "1.2", precision=1) - def test_list_still_tagged_as_list(self) -> None: - """Regular list still uses __list__ tag, not __seq__.""" - result = IntegrityCache._make_hashable([1, 2]) - assert isinstance(result, tuple) - assert result[0] == "__list__" + _put(cache, "msg", "left", args={"value": left}) + _put(cache, "msg", "right", args={"value": right}) - def test_tuple_still_tagged_as_tuple(self) -> None: - """Regular tuple still uses __tuple__ tag, not __seq__.""" - result = IntegrityCache._make_hashable((1, 2)) - assert isinstance(result, tuple) - assert result[0] == "__tuple__" + assert _get(cache, "msg", args={"value": left}).formatted == "left" + assert _get(cache, "msg", args={"value": right}).formatted == "right" diff --git a/tests/runtime_cache_integrity_cases/integrity_edges.py b/tests/runtime_cache_integrity_cases/integrity_edges.py index 93301a90..4e19608d 100644 --- a/tests/runtime_cache_integrity_cases/integrity_edges.py +++ b/tests/runtime_cache_integrity_cases/integrity_edges.py @@ -1,561 +1,117 @@ # mypy: ignore-errors from __future__ import annotations -from hypothesis import event, given, settings -from hypothesis import strategies as st +import pytest -from ftllexengine.diagnostics import ( - Diagnostic, - DiagnosticCode, - ErrorCategory, - FrozenErrorContext, - FrozenFluentError, -) +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError +from ftllexengine.integrity import CacheKeySerializationError, WriteConflictError from ftllexengine.runtime.cache import ( + CacheDebugLogEntry, + CacheIntegrityEventKind, IntegrityCache, - IntegrityCacheEntry, - _estimate_error_weight, + MemoryIntegrityEventSink, ) -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 - -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ - - - -class TestIntegrityCacheEntryContentHash: - """Test IntegrityCacheEntry checksum computation with error.content_hash.""" - - def test_compute_checksum_uses_error_content_hash(self) -> None: - """_compute_checksum uses error.content_hash when available.""" - error = FrozenFluentError("Test error", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create( - "formatted text", (error,), sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry.checksum is not None - assert len(entry.checksum) == 16 # BLAKE2b-128 - assert entry.verify() is True - - def test_compute_checksum_with_multiple_errors_content_hash(self) -> None: - """_compute_checksum uses content_hash for multiple errors.""" - errors = ( - FrozenFluentError("Error 1", ErrorCategory.REFERENCE), - FrozenFluentError("Error 2", ErrorCategory.RESOLUTION), - FrozenFluentError("Error 3", ErrorCategory.CYCLIC), - ) - entry = IntegrityCacheEntry.create( - "formatted text", errors, sequence=1, key_hash=_NO_KEY_HASH - ) - assert entry.checksum is not None - assert entry.verify() is True - - @given(st.integers(min_value=1, max_value=10)) - @settings(max_examples=50) - def test_property_checksum_deterministic_with_errors(self, error_count: int) -> None: - """PROPERTY: Checksum is deterministic; each entry validates against itself. - - Checksums include metadata (created_at, sequence) for complete audit trail - integrity, so two independently created entries with the same content will - have different checksums. Each entry does self-validate correctly. - """ - errors = tuple( - FrozenFluentError(f"Error {i}", ErrorCategory.REFERENCE) - for i in range(error_count) - ) - entry = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - entry2 = IntegrityCacheEntry.create("formatted", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry2.verify() is True - event(f"error_count={error_count}") - - def test_cache_put_get_with_frozen_errors(self) -> None: - """Cache operations work correctly with FrozenFluentError.content_hash.""" - cache = IntegrityCache(strict=False) - errors = ( - FrozenFluentError("Reference error", ErrorCategory.REFERENCE), - FrozenFluentError("Resolution error", ErrorCategory.RESOLUTION), - ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="formatted text", errors=errors) - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "formatted text" - assert entry.errors == errors - assert entry.verify() is True - -class TestIntegrityCacheAuditLogDisabled: - """Test get_audit_log() returns empty tuple when audit logging is disabled.""" - - def test_get_audit_log_returns_empty_when_disabled_by_default(self) -> None: - """get_audit_log() returns empty tuple when audit disabled (default).""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.get("msg1", None, None, "en", use_isolating=True) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="result2", errors=()) - audit_log = cache.get_audit_log() - assert audit_log == () - assert isinstance(audit_log, tuple) +_FG = 0 - def test_get_audit_log_returns_empty_when_disabled_explicit(self) -> None: - """get_audit_log() returns empty tuple when enable_audit=False explicitly.""" - cache = IntegrityCache(enable_audit=False, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="result", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.get_audit_log() == () - @given( - st.integers(min_value=1, max_value=20), - st.integers(min_value=1, max_value=10), +def _put(cache: IntegrityCache, message_id: str, *, args: dict[str, object] | None = None) -> None: + cache.put( + message_id, + args, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted="value", + errors=(), ) - @settings(max_examples=30) - def test_property_audit_log_always_empty_when_disabled( - self, put_count: int, get_count: int - ) -> None: - """PROPERTY: get_audit_log() always returns empty tuple when disabled.""" - cache = IntegrityCache(enable_audit=False, strict=False) - for i in range(put_count): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - for i in range(get_count): - cache.get(f"msg{i % put_count}", None, None, "en", use_isolating=True) - audit_log = cache.get_audit_log() - assert audit_log == () - assert len(audit_log) == 0 - event(f"put_count={put_count}") - -class TestIntegrityCacheAuditLogEnabled: - """Test get_audit_log() returns tuple of entries when audit logging is enabled.""" - - def test_get_audit_log_returns_tuple_when_enabled(self) -> None: - """get_audit_log() returns tuple with entries when enable_audit=True.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="result1", errors=()) - cache.get("msg1", None, None, "en", use_isolating=True) - cache.get("msg2", None, None, "en", use_isolating=True) # Miss - audit_log = cache.get_audit_log() - assert isinstance(audit_log, tuple) - assert len(audit_log) >= 3 # PUT + HIT + MISS - - @given(st.integers(min_value=1, max_value=10)) - @settings(max_examples=20) - def test_property_audit_log_returns_tuple_when_enabled(self, op_count: int) -> None: - """PROPERTY: get_audit_log() returns tuple of at least op_count entries.""" - cache = IntegrityCache(enable_audit=True, strict=False) - for i in range(op_count): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - audit_log = cache.get_audit_log() - assert isinstance(audit_log, tuple) - assert len(audit_log) >= op_count - event(f"op_count={op_count}") -class TestIntegrityCachePropertyGetters: - """Test property getters for complete coverage.""" - def test_corruption_detected_property(self) -> None: - """corruption_detected property reflects detected corruption count.""" - cache = IntegrityCache(strict=False) - assert cache.corruption_detected == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - key = next(iter(cache._cache.keys())) - original_entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted!", - errors=original_entry.errors, - checksum=original_entry.checksum, - created_at=original_entry.created_at, - sequence=original_entry.sequence, - key_hash=original_entry.key_hash, - ) - cache._cache[key] = corrupted - cache.get("msg", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 1 - - def test_write_once_property(self) -> None: - """write_once property reflects constructor argument.""" - assert IntegrityCache(write_once=False, strict=False).write_once is False - assert IntegrityCache(write_once=True, strict=False).write_once is True - - def test_strict_property(self) -> None: - """strict property reflects constructor argument.""" - assert IntegrityCache(strict=False).strict is False - assert IntegrityCache(strict=True).strict is True - - @given(st.booleans(), st.booleans()) - @settings(max_examples=4) - def test_property_write_once_strict_reflect_constructor( - self, write_once: bool, strict: bool - ) -> None: - """PROPERTY: write_once and strict properties reflect constructor args.""" - cache = IntegrityCache(write_once=write_once, strict=strict) - assert cache.write_once == write_once - assert cache.strict == strict - wo = "write_once" if write_once else "normal" - event(f"mode={wo}") - - def test_corruption_detected_accumulates_across_multiple(self) -> None: - """corruption_detected accumulates across multiple corruption events.""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - for key in list(cache._cache.keys()): - entry = cache._cache[key] - cache._cache[key] = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, +class TestDebugLogSurface: + """The bounded debug log is distinct from integrity events.""" + + def test_disabled_debug_log_returns_empty_tuple(self) -> None: + cache = IntegrityCache() + _put(cache, "msg") + assert cache.get_debug_log() == () + + def test_enabled_debug_log_records_recent_operations(self) -> None: + cache = IntegrityCache(enable_debug_log=True) + _put(cache, "msg") + cache.get("msg", None, None, "en", use_isolating=True, function_generation=_FG) + cache.get("missing", None, None, "en", use_isolating=True, function_generation=_FG) + + log = cache.get_debug_log() + assert [entry.operation for entry in log] == ["PUT", "HIT", "MISS"] + assert all(isinstance(entry, CacheDebugLogEntry) for entry in log) + + def test_debug_log_is_bounded(self) -> None: + cache = IntegrityCache(enable_debug_log=True, max_debug_entries=2) + _put(cache, "a") + _put(cache, "b") + _put(cache, "c") + + log = cache.get_debug_log() + assert len(log) == 2 + assert [entry.operation for entry in log] == ["PUT", "PUT"] + + +class TestKeyContractFailures: + """Unsupported cache-key values must fail closed.""" + + def test_unencodable_args_raise_typed_integrity_error_on_put(self) -> None: + cache = IntegrityCache() + with pytest.raises(CacheKeySerializationError, match="Cache key contract failed"): + _put(cache, "msg", args={"value": object()}) + + def test_unencodable_args_raise_typed_integrity_error_on_get(self) -> None: + cache = IntegrityCache() + with pytest.raises(CacheKeySerializationError, match="Cache key contract failed"): + cache.get( + "msg", + {"value": object()}, + None, + "en", + use_isolating=True, + function_generation=_FG, ) - cache.get("msg1", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 1 - cache.get("msg2", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 2 - cache.get("msg3", None, None, "en", use_isolating=True) - assert cache.corruption_detected == 3 - - def test_error_bloat_skips_property(self) -> None: - """error_bloat_skips property reflects excess-error-count skip count.""" - cache = IntegrityCache(strict=False, max_errors_per_entry=2) - errors = tuple( - FrozenFluentError(f"err-{i}", ErrorCategory.REFERENCE) for i in range(3) - ) - assert cache.error_bloat_skips == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=errors) - assert cache.error_bloat_skips == 1 - - def test_combined_weight_skips_property_initial_zero(self) -> None: - """combined_weight_skips property starts at zero.""" - cache = IntegrityCache(strict=False) - assert cache.combined_weight_skips == 0 - - def test_combined_weight_skips_property_incremented(self) -> None: - """combined_weight_skips property reflects combined-weight skip count.""" - # max_entry_weight=200: formatted (100 chars) passes check 1, - # but combined with error overhead (100 base + 150 msg = 250), total=350 fails. - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - assert cache.combined_weight_skips == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - assert cache.combined_weight_skips == 1 - - def test_write_once_conflicts_property_initial_zero(self) -> None: - """write_once_conflicts property starts at zero.""" - cache = IntegrityCache(write_once=True, strict=False) - assert cache.write_once_conflicts == 0 - - def test_write_once_conflicts_property_incremented(self) -> None: - """write_once_conflicts property reflects true conflict count.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - assert cache.write_once_conflicts == 0 - - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - assert cache.write_once_conflicts == 1 - -class TestIntegrityCacheEdgeCases: - """Additional edge cases for complete coverage.""" - - def test_entry_with_empty_errors_differs_from_entry_with_error(self) -> None: - """Entries with empty vs non-empty errors tuples have distinct checksums.""" - error = FrozenFluentError("Test", ErrorCategory.REFERENCE) - entry1 = IntegrityCacheEntry.create("text", (), sequence=1, key_hash=_NO_KEY_HASH) - entry2 = IntegrityCacheEntry.create("text", (error,), sequence=2, key_hash=_NO_KEY_HASH) - assert entry1.checksum != entry2.checksum - - def test_cache_stats_includes_all_integrity_fields(self) -> None: - """get_stats() includes corruption_detected, write_once, strict, audit_enabled.""" - cache = IntegrityCache(write_once=True, strict=True, enable_audit=False) - stats = cache.get_stats() - assert "corruption_detected" in stats - assert "write_once" in stats - assert "strict" in stats - assert "audit_enabled" in stats - assert stats["corruption_detected"] == 0 - assert stats["write_once"] is True - assert stats["strict"] is True - assert stats["audit_enabled"] is False - - def test_multiple_operations_exercise_all_properties(self) -> None: - """Exercise all properties through multiple cache operations.""" - cache = IntegrityCache( - maxsize=10, write_once=False, strict=False, enable_audit=False - ) - for i in range(5): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"result{i}", errors=()) - assert cache.size == 5 - assert cache.maxsize == 10 - assert cache.hits == 0 - assert cache.misses == 0 - assert cache.corruption_detected == 0 - assert cache.write_once is False - assert cache.strict is False - for i in range(5): - entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) - assert entry is not None - assert cache.hits == 5 - assert cache.get_audit_log() == () -class TestEstimateErrorWeightWithContext: - """Test _estimate_error_weight with errors containing FrozenErrorContext. - - Covers the branch where error.context fields are processed. - """ - - def test_error_weight_with_context(self) -> None: - """Error with context includes all context field lengths in weight.""" - context = FrozenErrorContext( - input_value="test_input_value", - locale_code="en_US", - parse_type="number", - fallback_value="{!NUMBER}", - ) - error = FrozenFluentError( - "Parse error", ErrorCategory.FORMATTING, context=context - ) - weight = _estimate_error_weight(error) - expected_weight = ( - 100 # _ERROR_BASE_OVERHEAD - + len("Parse error") - + len("test_input_value") - + len("en_US") - + len("number") - + len("{!NUMBER}") - ) - assert weight == expected_weight - - def test_error_weight_without_context(self) -> None: - """Error without context only includes base overhead plus message length.""" - error = FrozenFluentError("Simple error", ErrorCategory.REFERENCE) - weight = _estimate_error_weight(error) - assert weight == 100 + len("Simple error") - - @given( - input_val=st.text(min_size=0, max_size=100), - locale=st.text(min_size=0, max_size=20), - parse_type=st.sampled_from( - ["", "currency", "date", "datetime", "decimal", "number"] - ), - fallback=st.text(min_size=0, max_size=50), - ) - @settings(max_examples=50) - def test_property_error_weight_accounts_for_all_context_fields( - self, - input_val: str, - locale: str, - parse_type: str, - fallback: str, - ) -> None: - """PROPERTY: Error weight correctly accounts for all context field lengths.""" - context = FrozenErrorContext( - input_value=input_val, - locale_code=locale, - parse_type=parse_type, - fallback_value=fallback, - ) - error = FrozenFluentError("Test", ErrorCategory.FORMATTING, context=context) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Test") - + len(input_val) - + len(locale) - + len(parse_type) - + len(fallback) - ) - assert weight == expected - event(f"context_len={len(input_val) + len(locale)}") - -class TestEstimateErrorWeightDiagnosticBranches: - """Test _estimate_error_weight with diagnostic fields including resolution_path.""" - - def test_error_weight_diagnostic_without_resolution_path(self) -> None: - """Error with diagnostic but no resolution_path skips path length processing.""" - diagnostic = Diagnostic( - code=DiagnosticCode.MESSAGE_NOT_FOUND, - message="Reference error", - ) - error = FrozenFluentError( - "Message not found", ErrorCategory.REFERENCE, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = 100 + len("Message not found") + len("Reference error") - assert weight == expected - - def test_error_weight_diagnostic_with_resolution_path(self) -> None: - """Error with diagnostic and resolution_path includes path element lengths.""" - diagnostic = Diagnostic( - code=DiagnosticCode.CYCLIC_REFERENCE, - message="Reference error", - resolution_path=("message1", "term1", "message2"), - ) - error = FrozenFluentError( - "Circular reference", ErrorCategory.CYCLIC, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Circular reference") - + len("Reference error") - + len("message1") - + len("term1") - + len("message2") - ) - assert weight == expected - - def test_error_weight_diagnostic_with_all_optional_fields(self) -> None: - """Error with diagnostic containing all optional fields includes them in weight.""" - diagnostic = Diagnostic( - code=DiagnosticCode.INVALID_ARGUMENT, - message="Invalid argument", - hint="Use NUMBER() function", - help_url="https://example.com/help", - function_name="CURRENCY", - argument_name="minimumFractionDigits", - expected_type="int", - received_type="str", - ftl_location="message.ftl:42", - ) - error = FrozenFluentError( - "Function call error", ErrorCategory.FORMATTING, diagnostic=diagnostic - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Function call error") - + len("Invalid argument") - + len("Use NUMBER() function") - + len("https://example.com/help") - + len("CURRENCY") - + len("minimumFractionDigits") - + len("int") - + len("str") - + len("message.ftl:42") - ) - assert weight == expected - -class TestCacheEntryVerifyWithCorruptedError: - """Test IntegrityCacheEntry.verify() when error.verify_integrity() returns False. - - Exercises the defense-in-depth check where entry verification recurses into - each contained error's own verify_integrity() method. - """ - - def test_verify_returns_false_when_error_message_corrupted(self) -> None: - """IntegrityCacheEntry.verify() returns False when error is memory-corrupted. - - Simulates memory corruption: error._message is changed without updating - the stored _content_hash, causing verify_integrity() to return False. - """ - error = FrozenFluentError("Test error 2", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - object.__setattr__(error, "_frozen", False) - object.__setattr__(error, "_message", "corrupted message") - object.__setattr__(error, "_frozen", True) - assert error.verify_integrity() is False - assert entry.verify() is False - - def test_verify_detects_corruption_defense_in_depth(self) -> None: - """IntegrityCacheEntry.verify() provides defense-in-depth error verification.""" - error = FrozenFluentError("Original message", ErrorCategory.REFERENCE) - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - object.__setattr__(error, "_frozen", False) - object.__setattr__(error, "_message", "Corrupted by memory error") - object.__setattr__(error, "_frozen", True) - assert error.verify_integrity() is False - assert entry.verify() is False - - def test_verify_returns_true_when_all_errors_valid(self) -> None: - """IntegrityCacheEntry.verify() returns True when all errors pass integrity.""" - errors = ( - FrozenFluentError("Error 1", ErrorCategory.REFERENCE), - FrozenFluentError("Error 2", ErrorCategory.FORMATTING), - FrozenFluentError("Error 3", ErrorCategory.CYCLIC), - ) - entry = IntegrityCacheEntry.create("Result", errors, sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True - - def test_verify_returns_false_if_any_error_corrupted(self) -> None: - """IntegrityCacheEntry.verify() returns False if any single error is corrupted.""" - error1 = FrozenFluentError("Error 1", ErrorCategory.REFERENCE) - error2 = FrozenFluentError("Error 2", ErrorCategory.FORMATTING) - error3 = FrozenFluentError("Error 3", ErrorCategory.CYCLIC) - entry = IntegrityCacheEntry.create( - "Result", (error1, error2, error3), sequence=1, key_hash=_NO_KEY_HASH - ) - object.__setattr__(error2, "_frozen", False) - object.__setattr__(error2, "_content_hash", b"bad_hash_xxxxxxx") - object.__setattr__(error2, "_frozen", True) - assert entry.verify() is False - -class TestErrorWeightAndVerifyIntegration: - """Integration tests combining error weight estimation and verification.""" - - def test_large_error_with_context_and_diagnostic(self) -> None: - """Error with both context and diagnostic computes correct weight.""" - context = FrozenErrorContext( - input_value="very long input value that would increase weight significantly", - locale_code="en_US", - parse_type="currency", - fallback_value="{!CURRENCY}", - ) - diagnostic = Diagnostic( - code=DiagnosticCode.PARSE_DECIMAL_FAILED, - message="Failed to parse number", - hint="Check number format", - resolution_path=("step1", "step2", "step3"), - ) - error = FrozenFluentError( - "Complex error message", - ErrorCategory.FORMATTING, - diagnostic=diagnostic, - context=context, - ) - weight = _estimate_error_weight(error) - expected = ( - 100 - + len("Complex error message") - + len("Failed to parse number") - + len("Check number format") - + len("step1") + len("step2") + len("step3") - + len("very long input value that would increase weight significantly") - + len("en_US") - + len("currency") - + len("{!CURRENCY}") - ) - assert weight == expected - assert error.verify_integrity() is True - entry = IntegrityCacheEntry.create("Result", (error,), sequence=1, key_hash=_NO_KEY_HASH) - assert entry.verify() is True + def test_uncacheable_result_with_unencodable_args_counts_without_debug_log_entry(self) -> None: + cache = IntegrityCache(enable_debug_log=True) + cache.note_uncacheable_result( + "msg", + {"value": object()}, + None, + "en", + use_isolating=True, + function_generation=_FG, + ) + + assert cache.uncacheable_function_skips == 1 + assert cache.get_debug_log() == () + + +class TestIntegrityEventSink: + """Critical events should emit structured evidence.""" + + def test_write_conflict_emits_event(self) -> None: + sink = MemoryIntegrityEventSink() + cache = IntegrityCache(write_once=True, integrity_event_sink=sink) + _put(cache, "msg") + + with pytest.raises(WriteConflictError): + cache.put( + "msg", + None, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted="other", + errors=(FrozenFluentError("err", ErrorCategory.REFERENCE),), + ) - @given( - message=st.text(min_size=1, max_size=100), - input_val=st.text(min_size=0, max_size=50), - locale=st.text(min_size=0, max_size=10), - ) - @settings(max_examples=50) - def test_property_weight_estimation_deterministic( - self, message: str, input_val: str, locale: str - ) -> None: - """PROPERTY: Weight estimation is deterministic and positive.""" - context = FrozenErrorContext( - input_value=input_val, - locale_code=locale, - parse_type="number", - fallback_value="fallback", - ) - error = FrozenFluentError(message, ErrorCategory.FORMATTING, context=context) - weight1 = _estimate_error_weight(error) - weight2 = _estimate_error_weight(error) - assert weight1 == weight2 - assert weight1 > 0 - min_weight = len(message) + len(input_val) + len(locale) + len("number") + len("fallback") - assert weight1 >= min_weight - event(f"weight={weight1}") + events = sink.snapshot() + assert len(events) == 1 + assert events[0].kind is CacheIntegrityEventKind.WRITE_CONFLICT diff --git a/tests/runtime_cache_integrity_cases/limits_and_timing.py b/tests/runtime_cache_integrity_cases/limits_and_timing.py index c093f796..aaa7fd39 100644 --- a/tests/runtime_cache_integrity_cases/limits_and_timing.py +++ b/tests/runtime_cache_integrity_cases/limits_and_timing.py @@ -1,303 +1,83 @@ # mypy: ignore-errors from __future__ import annotations -import time - import pytest -from hypothesis import event, given -from hypothesis import strategies as st -from ftllexengine.constants import DEFAULT_MAX_ENTRY_WEIGHT -from ftllexengine.diagnostics import ( - ErrorCategory, - FrozenFluentError, -) -from ftllexengine.integrity import CacheCorruptionError, IntegrityContext +from ftllexengine.constants import DEFAULT_MAX_ENTRY_PAYLOAD_BYTES +from ftllexengine.diagnostics import ErrorCategory, FrozenFluentError from ftllexengine.runtime import FluentBundle -from ftllexengine.runtime.cache import ( - IntegrityCache, -) +from ftllexengine.runtime.cache import IntegrityCache from ftllexengine.runtime.cache_config import CacheConfig -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 - -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ - - - -class TestCacheEntrySizeLimit: - """IntegrityCache max_entry_weight prevents caching of oversized results.""" - - def test_default_max_entry_weight(self) -> None: - """Default max_entry_weight is DEFAULT_MAX_ENTRY_WEIGHT (10,000 characters).""" - cache = IntegrityCache(strict=False) - assert cache.max_entry_weight == DEFAULT_MAX_ENTRY_WEIGHT - assert cache.max_entry_weight == 10_000 - - def test_custom_max_entry_weight(self) -> None: - """Custom max_entry_weight is stored and returned correctly.""" - cache = IntegrityCache(strict=False, max_entry_weight=1000) - assert cache.max_entry_weight == 1000 - - def test_invalid_max_entry_weight_rejected(self) -> None: - """Zero and negative max_entry_weight raise ValueError.""" - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(strict=False, max_entry_weight=0) +_FG = 0 - with pytest.raises(ValueError, match="max_entry_weight must be positive"): - IntegrityCache(strict=False, max_entry_weight=-1) - def test_small_entries_cached(self) -> None: - """Entries below max_entry_weight are stored and retrievable.""" - cache = IntegrityCache(strict=False, max_entry_weight=1000) +def _put(cache: IntegrityCache, message_id: str, *, formatted: str, errors=()) -> None: + cache.put( + message_id, + None, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted=formatted, + errors=errors, + ) - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - assert cache.size == 1 - assert cache.oversize_skips == 0 +class TestPayloadByteLimit: + """The cache payload budget should be explicit and deterministic.""" - cached = cache.get("msg", None, None, "en", use_isolating=True) - assert cached is not None - assert cached.as_result() == ("x" * 100, ()) + def test_default_payload_limit_matches_constant(self) -> None: + cache = IntegrityCache() + assert cache.max_entry_payload_bytes == DEFAULT_MAX_ENTRY_PAYLOAD_BYTES - def test_large_entries_not_cached(self) -> None: - """Entries exceeding max_entry_weight are skipped and counted.""" - cache = IntegrityCache(strict=False, max_entry_weight=100) + def test_invalid_payload_limit_is_rejected(self) -> None: + with pytest.raises(TypeError, match="max_entry_payload_bytes must be int"): + IntegrityCache(max_entry_payload_bytes=True) + with pytest.raises(ValueError, match="max_entry_payload_bytes must be positive"): + IntegrityCache(max_entry_payload_bytes=0) - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 200, errors=()) + def test_invalid_integrity_event_sink_is_rejected(self) -> None: + with pytest.raises(TypeError, match="integrity_event_sink must implement"): + IntegrityCache(integrity_event_sink=object()) # type: ignore[arg-type] - assert cache.size == 0 - assert cache.oversize_skips == 1 + def test_invalid_debug_fingerprint_key_contract_is_rejected(self) -> None: + with pytest.raises(TypeError, match="debug_fingerprint_key must be bytes or None"): + IntegrityCache(debug_fingerprint_key="not-bytes") # type: ignore[arg-type] + with pytest.raises(ValueError, match="debug_fingerprint_key must contain at least 16 bytes"): + IntegrityCache(debug_fingerprint_key=b"short") - cached = cache.get("msg", None, None, "en", use_isolating=True) - assert cached is None - - def test_boundary_entry_size(self) -> None: - """Entry exactly at max_entry_weight is cached (inclusive boundary).""" - cache = IntegrityCache(strict=False, max_entry_weight=100) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - - assert cache.size == 1 - assert cache.oversize_skips == 0 - - def test_get_stats_includes_oversize_skips(self) -> None: - """get_stats() reports oversize_skips and max_entry_weight.""" - cache = IntegrityCache(strict=False, max_entry_weight=50) - - for i in range(5): - cache.put(f"msg-{i}", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) + def test_oversize_formatted_result_is_not_cached(self) -> None: + cache = IntegrityCache(max_entry_payload_bytes=10) + _put(cache, "msg", formatted="x" * 20) stats = cache.get_stats() - assert stats["oversize_skips"] == 5 - assert stats["max_entry_weight"] == 50 + assert stats["oversize_skips"] == 1 assert stats["size"] == 0 - def test_clear_preserves_oversize_skips(self) -> None: - """clear() removes entries but preserves cumulative oversize_skips counter.""" - cache = IntegrityCache(strict=False, max_entry_weight=50) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=()) - assert cache.oversize_skips == 1 - - cache.clear() - assert cache.oversize_skips == 1 - - def test_bundle_cache_uses_default_max_entry_weight(self) -> None: - """FluentBundle's internal cache uses default max_entry_weight.""" - bundle = FluentBundle("en", cache=CacheConfig()) - bundle.add_resource("msg = { $data }") - - small_data = "x" * 100 - bundle.format_pattern("msg", {"data": small_data}) - - stats = bundle.get_cache_stats() - assert stats is not None - assert stats["size"] == 1 - - @given(st.integers(min_value=1, max_value=1000)) - def test_max_entry_weight_property(self, size: int) -> None: - """PROPERTY: max_entry_weight is correctly stored and returned.""" - event(f"weight_size={size}") - cache = IntegrityCache(strict=False, max_entry_weight=size) - assert cache.max_entry_weight == size - - def test_combined_weight_skips_counter_incremented(self) -> None: - """Entries skipped due to combined weight increment combined_weight_skips. - - Scenario: formatted string (100 chars) passes check 1 (len <= max_entry_weight=200). - Error overhead = 100 (base) + 150 (message) = 250. Total = 350 > 200 fails check 3. - """ - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) + def test_combined_payload_limit_is_tracked_separately(self) -> None: + cache = IntegrityCache(max_entry_payload_bytes=200) + error = FrozenFluentError("x" * 180, ErrorCategory.REFERENCE) + _put(cache, "msg", formatted="x" * 80, errors=(error,)) stats = cache.get_stats() - assert stats["combined_weight_skips"] == 1 + assert stats["combined_payload_skips"] == 1 assert stats["oversize_skips"] == 0 - assert stats["error_bloat_skips"] == 0 - assert stats["size"] == 0 - - def test_combined_weight_skips_distinct_from_oversize_skips(self) -> None: - """oversize_skips and combined_weight_skips are separate, distinct counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - # Check 1 (oversize): formatted string alone exceeds max_entry_weight - cache.put("over-msg", None, None, "en", use_isolating=True, formatted="x" * 201, errors=()) - - # Check 3 (combined_weight): formatted OK, but combined total exceeds limit - cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) - - stats = cache.get_stats() - assert stats["oversize_skips"] == 1 - assert stats["combined_weight_skips"] == 1 - - def test_combined_weight_skips_distinct_from_error_bloat_skips(self) -> None: - """error_bloat_skips and combined_weight_skips are separate, distinct counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200, max_errors_per_entry=2) - heavy_error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - # Check 2 (error_bloat): too many errors by count - many_errors = tuple( - FrozenFluentError(f"e-{i}", ErrorCategory.REFERENCE) for i in range(3) - ) - cache.put("bloat-msg", None, None, "en", use_isolating=True, formatted="Hello", errors=many_errors) - - # Check 3 (combined_weight): error count OK (1 <= 2), combined weight fails - cache.put("combined-msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(heavy_error,)) - - stats = cache.get_stats() - assert stats["error_bloat_skips"] == 1 - assert stats["combined_weight_skips"] == 1 - - def test_combined_weight_skips_preserved_on_clear(self) -> None: - """clear() preserves cumulative combined_weight_skips counter.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - assert cache.combined_weight_skips == 1 - - cache.clear() - assert cache.combined_weight_skips == 1 - - def test_get_stats_includes_combined_weight_skips(self) -> None: - """get_stats() reports combined_weight_skips alongside related skip counters.""" - cache = IntegrityCache(strict=False, max_entry_weight=200) - error = FrozenFluentError("x" * 150, ErrorCategory.REFERENCE) - - cache.put("msg", None, None, "en", use_isolating=True, formatted="x" * 100, errors=(error,)) - - stats = cache.get_stats() - assert "combined_weight_skips" in stats - assert stats["combined_weight_skips"] == 1 - -class TestWriteLogEntryWallTime: - """WriteLogEntry carries both monotonic timestamp and wall_time_unix.""" - def test_write_log_entry_has_wall_time_unix_field(self) -> None: - """WriteLogEntry.wall_time_unix field exists and is a float.""" - before = time.time() - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) - after = time.time() - log = cache.get_audit_log() - assert len(log) >= 1 - entry = log[0] - assert isinstance(entry.wall_time_unix, float) - assert isinstance(entry.cache_sequence, int) - # Wall time should be bracketed between the before/after calls - assert before <= entry.wall_time_unix <= after +class TestBundleUsesPayloadBudget: + """Bundles should propagate payload-byte cache limits into their internal cache.""" - def test_write_log_entry_timestamp_is_monotonic(self) -> None: - """WriteLogEntry.timestamp (monotonic) is distinct from wall_time_unix.""" + def test_bundle_cache_uses_payload_limit(self) -> None: + bundle = FluentBundle("en", cache=CacheConfig(max_entry_payload_bytes=50)) + long_text = "x" * 100 + bundle.add_resource(f"msg = {long_text}") - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) + result, errors = bundle.format_pattern("msg") - log = cache.get_audit_log() - entry = log[0] - # Monotonic and wall clock are different clocks — values may differ - assert isinstance(entry.timestamp, float) - assert isinstance(entry.wall_time_unix, float) - # Both should be positive - assert entry.timestamp > 0 - assert entry.wall_time_unix > 0 - - def test_audit_log_multiple_entries_wall_time_non_decreasing(self) -> None: - """wall_time_unix values across audit entries are non-decreasing.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("a", None, None, "en", use_isolating=True, formatted="A", errors=()) - cache.put("b", None, None, "en", use_isolating=True, formatted="B", errors=()) - cache.put("c", None, None, "en", use_isolating=True, formatted="C", errors=()) - - log = cache.get_audit_log() - wall_times = [e.wall_time_unix for e in log] - for i in range(len(wall_times) - 1): - assert wall_times[i] <= wall_times[i + 1], ( - f"wall_time_unix not non-decreasing at index {i}: " - f"{wall_times[i]} > {wall_times[i + 1]}" - ) - - def test_audit_log_sequence_is_monotonic_even_with_misses(self) -> None: - """Audit-event sequence increases monotonically across misses and hits.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.get("missing", None, None, "en", use_isolating=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - - log = cache.get_audit_log() - sequences = [entry.sequence for entry in log] - assert sequences == sorted(sequences) - assert [entry.operation for entry in log] == ["MISS", "PUT", "HIT"] - assert [entry.cache_sequence for entry in log] == [0, 1, 1] - -class TestIntegrityContextWallTime: - """IntegrityContext.wall_time_unix is populated at integrity error sites.""" - - def test_integrity_context_wall_time_unix_field_exists(self) -> None: - """IntegrityContext accepts wall_time_unix and stores it correctly.""" - t = time.time() - ctx = IntegrityContext( - component="test", - operation="check", - timestamp=time.monotonic(), - wall_time_unix=t, - ) - assert ctx.wall_time_unix == t - - def test_integrity_context_wall_time_unix_defaults_to_none(self) -> None: - """IntegrityContext.wall_time_unix defaults to None for backwards compat.""" - ctx = IntegrityContext(component="test", operation="check") - assert ctx.wall_time_unix is None - - def test_cache_corruption_error_context_has_wall_time(self) -> None: - """CacheCorruptionError raised by strict cache carries wall_time_unix.""" - cache = IntegrityCache(enable_audit=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="ok", errors=()) - - # Corrupt the checksum by manipulating the stored entry directly - key = next(iter(cache._cache)) - entry = cache._cache[key] - - # Corrupt the checksum in-place via object.__setattr__ (frozen dataclass). - # content_hash is field(init=False), so we cannot pass it to __init__. - object.__setattr__(entry, "checksum", b"\x00" * 16) # deliberately invalid - cache._cache[key] = entry - - before = time.time() - with pytest.raises(CacheCorruptionError) as exc_info: - cache.get("msg", None, None, "en", use_isolating=True) - after = time.time() - - ctx = exc_info.value.context - assert ctx is not None - assert ctx.wall_time_unix is not None - assert before <= ctx.wall_time_unix <= after + assert result == long_text + assert errors == () + stats = bundle.get_cache_stats() + assert stats is not None + assert stats["oversize_skips"] == 1 diff --git a/tests/runtime_cache_integrity_cases/write_once_audit.py b/tests/runtime_cache_integrity_cases/write_once_audit.py index 01bcf620..4614695d 100644 --- a/tests/runtime_cache_integrity_cases/write_once_audit.py +++ b/tests/runtime_cache_integrity_cases/write_once_audit.py @@ -1,439 +1,91 @@ # mypy: ignore-errors from __future__ import annotations -import contextlib -import threading -from concurrent.futures import ThreadPoolExecutor, as_completed +from concurrent.futures import ThreadPoolExecutor import pytest from ftllexengine.integrity import WriteConflictError from ftllexengine.runtime.cache import ( + CacheIntegrityEventKind, IntegrityCache, - IntegrityCacheEntry, - WriteLogEntry, + MemoryIntegrityEventSink, ) -# Sentinel key_hash for unit tests that verify checksum mechanics but do not -# need meaningful key binding (all-zeros = "unbound test entry"). -_NO_KEY_HASH: bytes = b"\x00" * 8 +_FG = 0 -# ============================================================================ -# CHECKSUM VERIFICATION TESTS -# ============================================================================ +def _put(cache: IntegrityCache, message_id: str, formatted: str = "value") -> None: + cache.put( + message_id, + None, + None, + "en", + use_isolating=True, + function_generation=_FG, + formatted=formatted, + errors=(), + ) -class TestWriteOnceStrictMode: - """Test write-once semantics in strict mode.""" +class TestWriteOnceAndDebugEvidence: + """Write-once conflicts and debug retention should be explicit.""" - def test_write_once_allows_first_write(self) -> None: - """First write to a key succeeds.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + def test_write_once_conflict_raises_and_increments_counter(self) -> None: + cache = IntegrityCache(write_once=True) + _put(cache, "msg", "one") - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" + with pytest.raises(WriteConflictError): + _put(cache, "msg", "two") - def test_write_once_strict_raises_on_second_write(self) -> None: - """Second write to same key raises WriteConflictError in strict mode.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) + assert cache.get_stats()["write_once_conflicts"] == 1 - with pytest.raises(WriteConflictError) as exc_info: - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + def test_idempotent_duplicate_write_does_not_raise(self) -> None: + cache = IntegrityCache(write_once=True) + _put(cache, "msg", "one") + _put(cache, "msg", "one") - assert "write-once violation" in str(exc_info.value).lower() - assert exc_info.value.existing_seq == 1 - assert exc_info.value.new_seq == 2 # Would-be sequence of rejected entry + assert cache.get_stats()["idempotent_writes"] == 1 - def test_write_once_preserves_original_value(self) -> None: - """Write-once rejection preserves original cached value.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Original", errors=()) + def test_debug_log_retention_is_bounded(self) -> None: + cache = IntegrityCache(enable_debug_log=True, max_debug_entries=3) + for idx in range(5): + _put(cache, f"msg{idx}") - with contextlib.suppress(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="Updated", errors=()) + debug_log = cache.get_debug_log() + assert len(debug_log) == 3 + assert all(entry.operation == "PUT" for entry in debug_log) - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Original" - def test_write_once_conflict_counter_incremented_before_raise(self) -> None: - """write_once_conflicts is incremented before WriteConflictError is raised.""" - cache = IntegrityCache(write_once=True, strict=True) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) +class TestIntegrityEventEmission: + """Critical evidence should go to the event sink, not the debug ring.""" - with contextlib.suppress(WriteConflictError): - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) + def test_write_conflict_emits_structured_event(self) -> None: + sink = MemoryIntegrityEventSink() + cache = IntegrityCache(write_once=True, integrity_event_sink=sink) + _put(cache, "msg", "one") - # Counter must be observable even after an exception was raised - assert cache.write_once_conflicts == 1 + with pytest.raises(WriteConflictError): + _put(cache, "msg", "two") -class TestWriteOnceNonStrictMode: - """Test write-once semantics in non-strict mode.""" + events = sink.snapshot() + assert len(events) == 1 + assert events[0].kind is CacheIntegrityEventKind.WRITE_CONFLICT + assert events[0].message_id == "msg" - def test_write_once_non_strict_silently_skips(self) -> None: - """Second write silently skipped in non-strict mode.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - # No exception raised - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) +class TestConcurrentWriteOnce: + """Concurrent identical writes should converge without false conflicts.""" - # Original value preserved - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "Hello" + def test_identical_concurrent_writes_are_idempotent(self) -> None: + cache = IntegrityCache(write_once=True) - def test_write_once_allows_different_keys(self) -> None: - """Write-once allows writes to different keys.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="First", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Second", errors=()) + def worker() -> None: + _put(cache, "msg", "value") - entry1 = cache.get("msg1", None, None, "en", use_isolating=True) - entry2 = cache.get("msg2", None, None, "en", use_isolating=True) - assert entry1 is not None - assert entry1.formatted == "First" - assert entry2 is not None - assert entry2.formatted == "Second" - - def test_write_once_conflict_counter_incremented(self) -> None: - """True write-once conflicts increment write_once_conflicts counter.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Different content for same key = true conflict - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - stats = cache.get_stats() - assert stats["write_once_conflicts"] == 1 - - def test_write_once_conflict_counter_multiple(self) -> None: - """write_once_conflicts accumulates across repeated true conflicts.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - for i in range(5): - cache.put("msg", None, None, "en", use_isolating=True, formatted=f"World-{i}", errors=()) - - assert cache.write_once_conflicts == 5 - - def test_write_once_conflict_not_incremented_for_idempotent(self) -> None: - """Idempotent writes do NOT increment write_once_conflicts.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) # Idempotent - - assert cache.write_once_conflicts == 0 - assert cache.idempotent_writes == 1 - - def test_write_once_conflict_counter_preserved_on_clear(self) -> None: - """clear() preserves cumulative write_once_conflicts counter.""" - cache = IntegrityCache(write_once=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) # Conflict - - assert cache.write_once_conflicts == 1 - cache.clear() - assert cache.write_once_conflicts == 1 - -class TestWriteOnceDisabled: - """Test behavior when write-once is disabled (default).""" - - def test_default_allows_overwrites(self) -> None: - """Default cache allows overwriting entries.""" - cache = IntegrityCache(write_once=False, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="World", errors=()) - - entry = cache.get("msg", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.formatted == "World" - -class TestAuditLogging: - """Test audit logging functionality.""" - - def test_audit_disabled_by_default(self) -> None: - """Audit logging is disabled by default.""" - cache = IntegrityCache() - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["audit_enabled"] is False - assert stats["audit_entries"] == 0 - - def test_audit_enabled_records_operations(self) -> None: - """Audit logging records operations when enabled.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - cache.get("msg", None, None, "en", use_isolating=True) - cache.get("msg2", None, None, "en", use_isolating=True) # Miss + with ThreadPoolExecutor(max_workers=4) as pool: + list(pool.map(lambda _: worker(), range(4))) stats = cache.get_stats() - assert stats["audit_enabled"] is True - assert stats["audit_entries"] >= 3 # PUT + HIT + MISS - - def test_audit_log_entry_structure(self) -> None: - """Audit log entries have correct structure.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Access internal audit log for verification - audit_log = cache._audit_log - assert audit_log is not None - assert len(audit_log) >= 1 - - entry = audit_log[0] # pylint: disable=unsubscriptable-object - assert isinstance(entry, WriteLogEntry) - assert entry.operation == "PUT" - assert isinstance(entry.key_hash, str) - assert isinstance(entry.timestamp, float) - assert entry.sequence >= 0 - assert isinstance(entry.checksum_hex, str) - - def test_audit_log_records_all_operation_types(self) -> None: - """Audit log records HIT, MISS, PUT, EVICT operations.""" - cache = IntegrityCache(maxsize=2, enable_audit=True, strict=False) - - # PUT 3 entries to trigger eviction - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) # Evicts msg1 - - # HIT - cache.get("msg2", None, None, "en", use_isolating=True) - - # MISS - cache.get("nonexistent", None, None, "en", use_isolating=True) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = {entry.operation for entry in audit_log} - assert "PUT" in operations - assert "EVICT" in operations - assert "HIT" in operations - assert "MISS" in operations - - def test_audit_log_max_entries_enforced(self) -> None: - """Audit log respects max_audit_entries limit.""" - cache = IntegrityCache(enable_audit=True, max_audit_entries=5, strict=False) - - # Generate more operations than max_audit_entries - for i in range(10): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - audit_log = cache._audit_log - assert audit_log is not None - assert len(audit_log) <= 5 - - def test_audit_log_not_cleared_on_cache_clear(self) -> None: - """Audit log preserved when cache is cleared (historical record).""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - audit_log_before = len(cache._audit_log or []) - cache.clear() - audit_log_after = len(cache._audit_log or []) - - assert audit_log_after >= audit_log_before - - def test_audit_records_write_once_rejection(self) -> None: - """Audit log records WRITE_ONCE_CONFLICT for different content writes.""" - cache = IntegrityCache(write_once=True, enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="First", errors=()) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Second", errors=()) # Conflict (different content) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "WRITE_ONCE_CONFLICT" in operations - -class TestAuditLoggingCorruption: - """Test audit logging of corruption events.""" - - def test_audit_records_corruption(self) -> None: - """Audit log records CORRUPTION operations.""" - cache = IntegrityCache(enable_audit=True, strict=False) - cache.put("msg", None, None, "en", use_isolating=True, formatted="Hello", errors=()) - - # Corrupt entry - key = next(iter(cache._cache.keys())) - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Trigger corruption detection - cache.get("msg", None, None, "en", use_isolating=True) - - audit_log = cache._audit_log - assert audit_log is not None - - # pylint: disable=not-an-iterable - operations = [entry.operation for entry in audit_log] - assert "CORRUPTION" in operations - -class TestSequenceNumbers: - """Test monotonically increasing sequence numbers.""" - - def test_sequence_increments_on_put(self) -> None: - """Sequence number increments with each put.""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - - entry1 = cache.get("msg1", None, None, "en", use_isolating=True) - entry2 = cache.get("msg2", None, None, "en", use_isolating=True) - entry3 = cache.get("msg3", None, None, "en", use_isolating=True) - - assert entry1 is not None - assert entry1.sequence == 1 - assert entry2 is not None - assert entry2.sequence == 2 - assert entry3 is not None - assert entry3.sequence == 3 - - def test_sequence_not_reset_on_clear(self) -> None: - """Sequence number continues after cache clear (audit trail integrity).""" - cache = IntegrityCache(strict=False) - cache.put("msg1", None, None, "en", use_isolating=True, formatted="One", errors=()) - cache.put("msg2", None, None, "en", use_isolating=True, formatted="Two", errors=()) - - stats_before = cache.get_stats() - assert stats_before["sequence"] == 2 - - cache.clear() - - cache.put("msg3", None, None, "en", use_isolating=True, formatted="Three", errors=()) - - entry = cache.get("msg3", None, None, "en", use_isolating=True) - assert entry is not None - assert entry.sequence == 3 - -class TestConcurrentIntegrity: - """Test integrity under concurrent access.""" - - def test_concurrent_puts_maintain_integrity(self) -> None: - """Concurrent puts produce valid checksums.""" - cache = IntegrityCache(maxsize=100, strict=False) - - def put_entry(i: int) -> None: - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - with ThreadPoolExecutor(max_workers=10) as executor: - futures = [executor.submit(put_entry, i) for i in range(100)] - for future in as_completed(futures): - future.result() - - # All entries should have valid checksums - for i in range(100): - entry = cache.get(f"msg{i}", None, None, "en", use_isolating=True) - if entry is not None: - assert entry.verify(), f"Entry msg{i} failed checksum verification" - - def test_write_once_thread_safety(self) -> None: - """Write-once semantics are thread-safe.""" - cache = IntegrityCache(write_once=True, strict=False) - success_count = 0 - lock = threading.Lock() - - def try_put() -> None: - nonlocal success_count - try: - cache.put("msg", None, None, "en", use_isolating=True, formatted="Value", errors=()) - with lock: - success_count += 1 - except WriteConflictError: - pass # Expected for some threads - - threads = [threading.Thread(target=try_put) for _ in range(20)] - for thread in threads: - thread.start() - for thread in threads: - thread.join() - - # Only one entry should exist - stats = cache.get_stats() - assert stats["size"] == 1 - -class TestIntegrityStats: - """Test integrity-related statistics.""" - - def test_stats_includes_integrity_fields(self) -> None: - """get_stats() includes all integrity-related fields.""" - cache = IntegrityCache( - write_once=True, - strict=True, - enable_audit=True, - ) - - stats = cache.get_stats() - - # Verify integrity-specific fields exist - assert "corruption_detected" in stats - assert "sequence" in stats - assert "write_once" in stats - assert "strict" in stats - assert "audit_enabled" in stats - assert "audit_entries" in stats - assert "write_once_conflicts" in stats - assert "combined_weight_skips" in stats - - # Verify types - assert isinstance(stats["corruption_detected"], int) - assert isinstance(stats["sequence"], int) - assert isinstance(stats["write_once"], bool) - assert isinstance(stats["strict"], bool) - assert isinstance(stats["audit_enabled"], bool) - assert isinstance(stats["audit_entries"], int) - assert isinstance(stats["write_once_conflicts"], int) - assert isinstance(stats["combined_weight_skips"], int) - - # Verify values reflect configuration - assert stats["write_once"] is True - assert stats["strict"] is True - assert stats["audit_enabled"] is True assert stats["write_once_conflicts"] == 0 - assert stats["combined_weight_skips"] == 0 - - def test_corruption_counter_accumulates(self) -> None: - """corruption_detected counter accumulates across multiple corruptions.""" - cache = IntegrityCache(strict=False) - - for i in range(3): - cache.put(f"msg{i}", None, None, "en", use_isolating=True, formatted=f"Value {i}", errors=()) - - # Corrupt all entries - for key in list(cache._cache.keys()): - entry = cache._cache[key] - corrupted = IntegrityCacheEntry( - formatted="Corrupted", - errors=entry.errors, - checksum=entry.checksum, - created_at=entry.created_at, - sequence=entry.sequence, - key_hash=entry.key_hash, - ) - cache._cache[key] = corrupted - - # Trigger corruption detection for each - for i in range(3): - cache.get(f"msg{i}", None, None, "en", use_isolating=True) - - stats = cache.get_stats() - assert stats["corruption_detected"] == 3 + assert stats["idempotent_writes"] >= 1 diff --git a/tests/runtime_cache_property_cases/property_tests_basic_invariants.py b/tests/runtime_cache_property_cases/property_tests_basic_invariants.py index 785391c0..63ce95d5 100644 --- a/tests/runtime_cache_property_cases/property_tests_basic_invariants.py +++ b/tests/runtime_cache_property_cases/property_tests_basic_invariants.py @@ -17,7 +17,7 @@ class TestCacheInvariants: @settings(max_examples=100) def test_cache_maxsize_enforced(self, maxsize: int) -> None: """INVARIANT: Cache never exceeds maxsize.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Add more than maxsize entries for i in range(maxsize + 10): @@ -52,7 +52,7 @@ def test_get_after_put_returns_value( value: tuple[str, tuple[()]], ) -> None: """PROPERTY: get(k) after put(k, v) returns v.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value cache.put(msg_id, args, attr, locale, use_isolating=True, formatted=formatted, errors=errors) @@ -74,7 +74,7 @@ def test_get_without_put_returns_none( locale: str, ) -> None: """PROPERTY: get(k) without put(k) returns None.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) result = cache.get(msg_id, None, None, locale, use_isolating=True) @@ -85,7 +85,7 @@ def test_get_without_put_returns_none( @settings(max_examples=50) def test_clear_resets_cache_to_empty(self, maxsize: int) -> None: """PROPERTY: clear() empties cache and resets counters.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Add some entries for i in range(min(10, maxsize)): @@ -114,7 +114,7 @@ def test_hit_counter_increments_on_cache_hit( value: tuple[str, tuple[()]], ) -> None: """PROPERTY: Cache hits increment hit counter.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value cache.put(msg_id, None, None, locale, use_isolating=True, formatted=formatted, errors=errors) @@ -138,7 +138,7 @@ def test_miss_counter_increments_on_cache_miss( locale: str, ) -> None: """PROPERTY: Cache misses increment miss counter.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) initial_stats = cache.get_stats() cache.get(msg_id, None, None, locale, use_isolating=True) # Cache miss diff --git a/tests/runtime_cache_property_cases/property_tests_init_parameters.py b/tests/runtime_cache_property_cases/property_tests_init_parameters.py index edbdd874..e0534538 100644 --- a/tests/runtime_cache_property_cases/property_tests_init_parameters.py +++ b/tests/runtime_cache_property_cases/property_tests_init_parameters.py @@ -21,19 +21,18 @@ class TestIntegrityCacheHypothesisProperties: def test_property_init_parameters_stored_correctly( self, maxsize: int, - max_entry_weight: int, + max_entry_payload_bytes: int, max_errors_per_entry: int, ) -> None: """PROPERTY: Constructor parameters are stored correctly.""" cache = IntegrityCache( - strict=False, maxsize=maxsize, - max_entry_weight=max_entry_weight, + max_entry_payload_bytes=max_entry_payload_bytes, max_errors_per_entry=max_errors_per_entry, ) assert cache.maxsize == maxsize - assert cache.max_entry_weight == max_entry_weight + assert cache.max_entry_payload_bytes == max_entry_payload_bytes assert cache.size == 0 assert cache.hits == 0 assert cache.misses == 0 @@ -43,7 +42,7 @@ def test_property_init_parameters_stored_correctly( @settings(max_examples=50) def test_property_primitives_hashable(self, text: str) -> None: """PROPERTY: All primitive types produce valid cache keys.""" - cache = IntegrityCache(strict=False) + cache = IntegrityCache() # String cache.put("msg", {"text": text}, None, "en", use_isolating=True, formatted="result", errors=()) diff --git a/tests/runtime_cache_property_cases/property_tests_key_handling.py b/tests/runtime_cache_property_cases/property_tests_key_handling.py index 50211192..d494be48 100644 --- a/tests/runtime_cache_property_cases/property_tests_key_handling.py +++ b/tests/runtime_cache_property_cases/property_tests_key_handling.py @@ -25,7 +25,7 @@ def test_same_key_retrieves_same_value( value: tuple[str, tuple[()]], ) -> None: """PROPERTY: Same key components retrieve same cached value.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value # Put with specific key @@ -55,7 +55,7 @@ def test_different_locale_creates_different_key( """PROPERTY: Different locales create different cache keys.""" assume(locale1 != locale2) - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value # Put with locale1 @@ -86,7 +86,7 @@ def test_different_attribute_creates_different_key( """PROPERTY: Different attributes create different cache keys.""" assume(attr1 != attr2) - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value # Put with attr1 @@ -112,7 +112,7 @@ def test_args_dict_key_stability( value: tuple[str, tuple[()]], ) -> None: """PROPERTY: Equivalent args dicts produce same cache key.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) formatted, errors = value # Put with args dict diff --git a/tests/runtime_cache_property_cases/property_tests_lru_eviction.py b/tests/runtime_cache_property_cases/property_tests_lru_eviction.py index a85236c3..7415b94b 100644 --- a/tests/runtime_cache_property_cases/property_tests_lru_eviction.py +++ b/tests/runtime_cache_property_cases/property_tests_lru_eviction.py @@ -16,7 +16,7 @@ class TestLRUEviction: @settings(max_examples=50) def test_lru_evicts_least_recently_used(self, maxsize: int) -> None: """PROPERTY: LRU eviction removes oldest entry.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Fill cache to capacity for i in range(maxsize): @@ -50,7 +50,7 @@ def test_lru_access_pattern_eviction( access_pattern: list[int], ) -> None: """PROPERTY: LRU eviction respects access patterns.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Fill cache for i in range(maxsize): diff --git a/tests/runtime_cache_property_cases/property_tests_robustness.py b/tests/runtime_cache_property_cases/property_tests_robustness.py index 587ba4dc..ff3d410d 100644 --- a/tests/runtime_cache_property_cases/property_tests_robustness.py +++ b/tests/runtime_cache_property_cases/property_tests_robustness.py @@ -30,7 +30,7 @@ def test_cache_handles_various_arg_types( self, args: dict[str, int | Decimal | str | bool | None] ) -> None: """ROBUSTNESS: Cache handles various argument types.""" - cache = IntegrityCache(maxsize=100, strict=False) + cache = IntegrityCache(maxsize=100 ) # Should not crash with various arg types try: @@ -55,7 +55,7 @@ def test_cache_handles_duplicate_puts( maxsize: int, ) -> None: """ROBUSTNESS: Cache handles duplicate puts gracefully.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Put same message multiple times for msg_id in msg_ids: @@ -69,7 +69,7 @@ def test_cache_handles_duplicate_puts( @settings(max_examples=50) def test_cache_stats_never_negative(self, maxsize: int) -> None: """ROBUSTNESS: Cache stats are never negative.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Perform various operations cache.put("msg", None, None, "en_US", use_isolating=True, formatted="result", errors=()) diff --git a/tests/runtime_cache_property_cases/property_tests_statistics.py b/tests/runtime_cache_property_cases/property_tests_statistics.py index fc757205..1031193d 100644 --- a/tests/runtime_cache_property_cases/property_tests_statistics.py +++ b/tests/runtime_cache_property_cases/property_tests_statistics.py @@ -28,7 +28,7 @@ def test_hit_rate_consistency( operations: list[tuple[str, str]], ) -> None: """PROPERTY: hit_rate = hits / (hits + misses).""" - cache = IntegrityCache(maxsize=20, strict=False) + cache = IntegrityCache(maxsize=20 ) for op, msg_id in operations: if op == "put": @@ -59,7 +59,7 @@ def test_size_equals_entry_count( maxsize: int, ) -> None: """PROPERTY: size stat equals actual number of cached entries.""" - cache = IntegrityCache(maxsize=maxsize, strict=False) + cache = IntegrityCache(maxsize=maxsize ) # Add entries for i in range(num_entries): diff --git a/tests/runtime_function_bridge_cases/function_signature_tests.py b/tests/runtime_function_bridge_cases/function_signature_tests.py index e3f8eac7..6c629620 100644 --- a/tests/runtime_function_bridge_cases/function_signature_tests.py +++ b/tests/runtime_function_bridge_cases/function_signature_tests.py @@ -18,6 +18,7 @@ def test_create_function_signature(self) -> None: ftl_name="TEST", param_mapping=(("minimumValue", "minimum_value"),), callable=str, + cacheable=False, ) assert sig.python_name == "test_func" @@ -31,6 +32,7 @@ def test_function_signature_immutable(self) -> None: ftl_name="TEST", param_mapping=(), callable=lambda: "test", + cacheable=False, ) with pytest.raises(AttributeError): diff --git a/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py b/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py index ba3fe64c..66bd33c1 100644 --- a/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py +++ b/tests/runtime_resolver_depth_cycles_cases/resolution_context_tests.py @@ -76,69 +76,81 @@ def test_expression_depth_property_after_increment(self) -> None: assert context.expression_depth == 0 + def test_noncacheable_functions_property_returns_frozen_snapshot(self) -> None: + """noncacheable_functions exposes the observed function names immutably.""" + context = ResolutionContext() + + context.mark_noncacheable_function("NOW") + + assert context.cacheable_output is False + assert context.noncacheable_functions == frozenset({"NOW"}) -class TestResolutionContextTrackExpansion: - """Direct tests for ResolutionContext.track_expansion() accumulation. - Targets the expansion budget DoS protection: track_expansion() accumulates - character counts without raising. Callers check - ``total_chars > max_expansion_size`` after each call and generate - FrozenFluentError themselves (separation of state tracking from error policy). +class TestResolutionContextOutputBudget: + """Direct tests for ResolutionContext.reserve_output(). + + Premise: + The output-budget owner must see the exact rendered fragment before it + becomes visible to the caller. + + Reason: + A fail-closed reserve step prevents undercount gaps for isolation marks, + fallbacks, and nested pattern output. """ - def test_track_expansion_accumulates_correctly(self) -> None: - """track_expansion() accumulates total_chars without raising.""" + def test_reserve_output_accumulates_within_budget(self) -> None: + """reserve_output() updates total_chars for admitted fragments.""" context = ResolutionContext(max_expansion_size=100) - context.track_expansion(99) + context.reserve_output("x" * 99) assert context.total_chars == 99 assert context.total_chars <= context.max_expansion_size - # Exceeding budget is detectable by caller; no exception raised here - context.track_expansion(2) - assert context.total_chars == 101 - assert context.total_chars > context.max_expansion_size + context.reserve_output("y") + assert context.total_chars == 100 + assert context.total_chars == context.max_expansion_size - def test_track_expansion_exact_budget_limit_detectable(self) -> None: - """Exact budget limit is detectable by caller after track_expansion.""" + def test_reserve_output_rejects_fragment_that_crosses_budget(self) -> None: + """reserve_output() raises before admitting an over-budget fragment.""" context = ResolutionContext(max_expansion_size=100) - context.track_expansion(100) + context.reserve_output("x" * 100) assert context.total_chars == 100 - # At exactly the budget: caller may allow or deny based on policy - assert context.total_chars <= context.max_expansion_size - # One more char pushes over the limit — caller detects via comparison - context.track_expansion(1) - assert context.total_chars == 101 - assert context.total_chars > context.max_expansion_size + with pytest.raises(FrozenFluentError) as exc_info: + context.reserve_output("y") + + assert context.total_chars == 100 + assert exc_info.value.diagnostic is not None + assert exc_info.value.diagnostic.code == DiagnosticCode.EXPANSION_BUDGET_EXCEEDED @given( budget=st.integers(min_value=1, max_value=1000), first_chunk=st.integers(min_value=0, max_value=500), ) @settings(max_examples=50) - def test_track_expansion_accumulates_accurately( + def test_reserve_output_preserves_exact_running_total( self, budget: int, first_chunk: int ) -> None: - """Property: track_expansion() always accumulates total_chars precisely. - - For any budget and chunk sizes, total_chars must equal the exact sum of - all chunk arguments passed. The caller detects budget exhaustion via - ``total_chars > max_expansion_size``. - """ + """Property: admitted output updates the running total exactly.""" context = ResolutionContext(max_expansion_size=budget) - context.track_expansion(first_chunk) + if first_chunk > budget: + with pytest.raises(FrozenFluentError): + context.reserve_output("a" * first_chunk) + assert context.total_chars == 0 + event("boundary=initial_reject") + return + + context.reserve_output("a" * first_chunk) assert context.total_chars == first_chunk - over_budget = first_chunk > budget - event("boundary=at_or_over_budget" if over_budget else "boundary=under_budget") + hits_boundary = first_chunk == budget + event("boundary=exact_budget" if hits_boundary else "boundary=under_budget") - # Add one more chunk that guarantees budget is exceeded second_chunk = budget - first_chunk + 1 if second_chunk > 0: - context.track_expansion(second_chunk) - assert context.total_chars == first_chunk + second_chunk - assert context.total_chars > context.max_expansion_size + with pytest.raises(FrozenFluentError): + context.reserve_output("b" * second_chunk) + assert context.total_chars == first_chunk event("error_path=budget_exceeded") diff --git a/tests/syntax_parser_core_cases/do_slimits_and_validation.py b/tests/syntax_parser_core_cases/do_slimits_and_validation.py index d71a661f..eaa61f37 100644 --- a/tests/syntax_parser_core_cases/do_slimits_and_validation.py +++ b/tests/syntax_parser_core_cases/do_slimits_and_validation.py @@ -1,6 +1,7 @@ # mypy: ignore-errors """Split test cases from tests/test_syntax_parser_core.py.""" +from ftllexengine import UNLIMITED from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support # ============================================================================ @@ -82,9 +83,9 @@ def test_max_source_size_custom(self) -> None: assert parser.max_source_size == 5000 def test_max_source_size_disabled(self) -> None: - """max_source_size=0 disables the limit.""" - parser = FluentParserV1(max_source_size=0) - assert parser.max_source_size == 0 + """UNLIMITED disables the source-size guard explicitly.""" + parser = FluentParserV1(max_source_size=UNLIMITED) + assert parser.max_source_size == sys.maxsize def test_oversized_source_raises_value_error(self) -> None: """parse() raises ValueError when source exceeds limit.""" @@ -115,8 +116,8 @@ def test_source_at_exact_limit(self) -> None: assert result is not None def test_disabled_limit_accepts_large_source(self) -> None: - """max_source_size=0 accepts arbitrarily large source.""" - parser = FluentParserV1(max_source_size=0) + """UNLIMITED accepts intentionally unbounded source input.""" + parser = FluentParserV1(max_source_size=UNLIMITED) result = parser.parse("msg = " + ("x" * 100000)) assert result is not None diff --git a/tests/syntax_parser_core_cases/do_sprotection.py b/tests/syntax_parser_core_cases/do_sprotection.py index 8f356fbf..8d2b30fa 100644 --- a/tests/syntax_parser_core_cases/do_sprotection.py +++ b/tests/syntax_parser_core_cases/do_sprotection.py @@ -1,6 +1,7 @@ # mypy: ignore-errors """Split test cases from tests/test_syntax_parser_core.py.""" +from ftllexengine import UNLIMITED from tests.syntax_parser_core_cases import * # noqa: F403 - shared split test support # ============================================================================ @@ -154,8 +155,8 @@ def test_depth_exceeded_counts_toward_limit( # -- max_parse_errors: boundary conditions ----------------------------- def test_disabled_max_parse_errors_never_aborts(self) -> None: - """Parser with max_parse_errors=0 never aborts.""" - parser = FluentParserV1(max_parse_errors=0) + """UNLIMITED keeps parse-error collection intentionally unbounded.""" + parser = FluentParserV1(max_parse_errors=UNLIMITED) source = "####\n" * 200 result = parser.parse(source) junk = [e for e in result.entries if isinstance(e, Junk)] diff --git a/tests/syntax_parser_core_cases/parse_stream_cases.py b/tests/syntax_parser_core_cases/parse_stream_cases.py index 63fffa01..50d10889 100644 --- a/tests/syntax_parser_core_cases/parse_stream_cases.py +++ b/tests/syntax_parser_core_cases/parse_stream_cases.py @@ -95,6 +95,13 @@ def line_gen() -> object: assert len(entries) == 1 assert isinstance(entries[0], Message) + def test_non_string_line_is_rejected(self) -> None: + """parse_stream should reject undecoded byte streams explicitly.""" + parser = FluentParserV1() + + with pytest.raises(TypeError, match="must yield str, got bytes"): + list(parser.parse_stream([b"msg = bytes\n"])) # type: ignore[list-item] + def test_lines_without_trailing_newlines(self) -> None: """Lines without trailing newlines are handled correctly.""" parser = FluentParserV1() @@ -103,6 +110,27 @@ def test_lines_without_trailing_newlines(self) -> None: msg_entries = [e for e in entries if isinstance(e, Message)] assert len(msg_entries) == 2 + def test_stream_line_length_limit_fails_closed(self) -> None: + """Per-line budgets should reject oversized lines before buffering them.""" + parser = FluentParserV1(max_stream_line_length=5) + + with pytest.raises(ValueError, match="Stream line length"): + list(parser.parse_stream(["123456"])) + + def test_stream_total_length_limit_fails_closed(self) -> None: + """Total stream budgets should reject overlong streams before parsing chunks.""" + parser = FluentParserV1(max_source_size=5) + + with pytest.raises(ValueError, match="Stream length"): + list(parser.parse_stream(["1234", "56"])) + + def test_entry_chunk_length_limit_fails_closed(self) -> None: + """One blank-line-delimited entry cannot exceed the configured chunk budget.""" + parser = FluentParserV1(max_source_size=12, max_stream_line_length=100) + + with pytest.raises(ValueError, match="Entry chunk length"): + list(parser.parse_stream(["ab=1", "cd=2", "ef=3"])) + def test_leading_blank_line_is_skipped(self) -> None: """Blank line before any content is silently skipped. diff --git a/tests/test_api_boundary.py b/tests/test_api_boundary.py index 924ffc16..ea1fc548 100644 --- a/tests/test_api_boundary.py +++ b/tests/test_api_boundary.py @@ -257,7 +257,7 @@ def test_bundle_validate_resource_accepts_string(self) -> None: def test_standalone_validate_resource_rejects_bytes(self) -> None: """Standalone validate_resource raises TypeError for bytes.""" with pytest.raises(TypeError) as exc_info: - validate_resource(b"msg = Hello") # type: ignore[arg-type] + validate_resource(b"msg = Hello") assert "source must be str" in str(exc_info.value) assert "bytes" in str(exc_info.value) @@ -265,7 +265,7 @@ def test_standalone_validate_resource_rejects_bytes(self) -> None: def test_standalone_validate_resource_rejects_list(self) -> None: """Standalone validate_resource raises TypeError for list.""" with pytest.raises(TypeError) as exc_info: - validate_resource(["msg = Hello"]) # type: ignore[arg-type] + validate_resource(["msg = Hello"]) assert "source must be str" in str(exc_info.value) assert "list" in str(exc_info.value) diff --git a/tests/test_core_limits.py b/tests/test_core_limits.py new file mode 100644 index 00000000..afc011e4 --- /dev/null +++ b/tests/test_core_limits.py @@ -0,0 +1,53 @@ +"""Boundary-contract tests for explicit security limit helpers. + +These tests cover the fail-closed limit normalization contract shared by the +parser, loaders, and runtime surfaces. +""" + +from __future__ import annotations + +import pytest + +from ftllexengine.core._limits import UNLIMITED, UnlimitedLimit, resolve_limit_arg + + +class TestUnlimitedLimit: + """The unlimited sentinel should be explicit and self-describing.""" + + def test_repr_uses_semantic_name(self) -> None: + """The sentinel repr should surface policy intent in logs and docs.""" + assert repr(UNLIMITED) == "UNLIMITED" + assert isinstance(UNLIMITED, UnlimitedLimit) + + +class TestResolveLimitArg: + """resolve_limit_arg() should reject ambiguous or unsafe inputs.""" + + def test_rejects_bool_even_though_bool_is_an_int_subclass(self) -> None: + """Security limits must not accept booleans as accidental integers.""" + with pytest.raises(TypeError, match="max_source_size must be int, got bool"): + resolve_limit_arg(True, field_name="max_source_size", default=10) + + def test_rejects_non_int_boundary_values(self) -> None: + """Arbitrary objects at the limit boundary must fail fast.""" + with pytest.raises(TypeError, match="max_source_size must be int, got str"): + resolve_limit_arg("10", field_name="max_source_size", default=10) # type: ignore[arg-type] + + def test_rejects_unlimited_when_owner_disallows_it(self) -> None: + """Owners can opt out of unlimited mode explicitly.""" + with pytest.raises( + ValueError, + match="max_pending_operations does not support unlimited mode", + ): + resolve_limit_arg( + UNLIMITED, + field_name="max_pending_operations", + default=16, + allow_unlimited=False, + ) + + @pytest.mark.parametrize("candidate", [0, -1, -99]) + def test_rejects_non_positive_values(self, candidate: int) -> None: + """Zero and negatives are ambiguous magic values and are never accepted.""" + with pytest.raises(ValueError, match="must be positive"): + resolve_limit_arg(candidate, field_name="max_source_size", default=10) diff --git a/tests/test_diagnostics_location.py b/tests/test_diagnostics_location.py index bb6595b6..f4f4f517 100644 --- a/tests/test_diagnostics_location.py +++ b/tests/test_diagnostics_location.py @@ -231,10 +231,10 @@ def test_introspect_term_with_attributes(self) -> None: def test_introspect_rejects_invalid_type(self) -> None: """introspect_message raises TypeError for non-Message/Term.""" with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message("not a message") # type: ignore[arg-type] + introspect_message("not a message") with pytest.raises(TypeError, match="Expected Message or Term"): - introspect_message(123) # type: ignore[arg-type] + introspect_message(123) class TestFluentLocalizationFormatValueMapping: diff --git a/tests/test_diagnostics_templates.py b/tests/test_diagnostics_templates.py index 7a17237b..9363d0bb 100644 --- a/tests/test_diagnostics_templates.py +++ b/tests/test_diagnostics_templates.py @@ -18,6 +18,11 @@ from hypothesis import event, given from hypothesis import strategies as st +from ftllexengine.diagnostics._redaction import ( + fingerprint_text, + redacted_custom_function_failure, + redacted_parse_failure, +) from ftllexengine.diagnostics.codes import Diagnostic, DiagnosticCode from ftllexengine.diagnostics.templates import ErrorTemplate @@ -221,15 +226,36 @@ def test_function_failed(self, fn: str, reason: str) -> None: @given(fn=_function_names, value=_short_text, reason=_short_text) def test_formatting_failed(self, fn: str, value: str, reason: str) -> None: - """formatting_failed embeds function name, value, and reason.""" + """formatting_failed embeds function name plus redacted value and detail.""" d = ErrorTemplate.formatting_failed(fn, value, reason) assert d.code == DiagnosticCode.FORMATTING_FAILED assert fn in d.message - assert value in d.message - assert reason in d.message + assert fingerprint_text(value, label="format_value") in d.message + assert fingerprint_text(reason, label="detail") in d.message assert d.function_name == fn event(f"fn={fn}") + def test_formatting_failed_accepts_safe_reason(self) -> None: + """formatting_failed can preserve a safe high-level reason alongside redaction.""" + d = ErrorTemplate.formatting_failed( + "DATETIME", + "not-a-date", + ValueError("raw parser detail"), + safe_reason="input is not ISO 8601 format", + ) + assert "input is not ISO 8601 format" in d.message + assert fingerprint_text("not-a-date", label="format_value") in d.message + assert fingerprint_text(ValueError("raw parser detail"), label="detail") in d.message + + +class TestRedactionHelpers: + """Direct coverage for redaction helpers with edge-case exception shapes.""" + + def test_redacted_custom_function_failure_without_args_uses_type_only(self) -> None: + """Exceptions with empty args should not invent redacted detail text.""" + error = RuntimeError() + assert redacted_custom_function_failure(error) == "uncaught RuntimeError" + @given( fn=_function_names, expected=_positive_ints, @@ -350,10 +376,10 @@ class TestParsingTemplates: def test_parse_decimal_failed( self, val: str, locale: str, reason: str ) -> None: - """parse_decimal_failed embeds value, locale, and reason.""" + """parse_decimal_failed embeds redacted value summary, locale, and reason.""" d = ErrorTemplate.parse_decimal_failed(val, locale, reason) assert d.code == DiagnosticCode.PARSE_DECIMAL_FAILED - assert val in d.message + assert redacted_parse_failure(val, parse_type="decimal") in d.message assert locale in d.message assert reason in d.message event("template=parse_decimal_failed") @@ -362,10 +388,10 @@ def test_parse_decimal_failed( def test_parse_date_failed( self, val: str, locale: str, reason: str ) -> None: - """parse_date_failed embeds value, locale, and reason.""" + """parse_date_failed embeds redacted value summary, locale, and reason.""" d = ErrorTemplate.parse_date_failed(val, locale, reason) assert d.code == DiagnosticCode.PARSE_DATE_FAILED - assert val in d.message + assert redacted_parse_failure(val, parse_type="date") in d.message assert locale in d.message assert reason in d.message assert "ISO 8601" in d.hint # type: ignore[operator] @@ -375,10 +401,10 @@ def test_parse_date_failed( def test_parse_datetime_failed( self, val: str, locale: str, reason: str ) -> None: - """parse_datetime_failed embeds value, locale, and reason.""" + """parse_datetime_failed embeds redacted value summary, locale, and reason.""" d = ErrorTemplate.parse_datetime_failed(val, locale, reason) assert d.code == DiagnosticCode.PARSE_DATETIME_FAILED - assert val in d.message + assert redacted_parse_failure(val, parse_type="datetime") in d.message assert locale in d.message assert reason in d.message event("template=parse_datetime_failed") @@ -387,10 +413,10 @@ def test_parse_datetime_failed( def test_parse_currency_failed( self, val: str, locale: str, reason: str ) -> None: - """parse_currency_failed embeds value, locale, and reason.""" + """parse_currency_failed embeds redacted value summary, locale, and reason.""" d = ErrorTemplate.parse_currency_failed(val, locale, reason) assert d.code == DiagnosticCode.PARSE_CURRENCY_FAILED - assert val in d.message + assert redacted_parse_failure(val, parse_type="currency") in d.message assert locale in d.message assert reason in d.message event("template=parse_currency_failed") @@ -408,10 +434,11 @@ def test_parse_locale_unknown(self, locale: str) -> None: def test_parse_currency_ambiguous( self, symbol: str, val: str ) -> None: - """parse_currency_ambiguous embeds symbol and full value.""" + """parse_currency_ambiguous embeds symbol and redacted value summary.""" d = ErrorTemplate.parse_currency_ambiguous(symbol, val) assert d.code == DiagnosticCode.PARSE_CURRENCY_AMBIGUOUS - assert symbol in d.message + assert fingerprint_text(symbol, label="currency_symbol") in d.message + assert redacted_parse_failure(val, parse_type="currency") in d.message assert d.hint is not None event("template=parse_currency_ambiguous") @@ -419,10 +446,10 @@ def test_parse_currency_ambiguous( def test_parse_currency_symbol_unknown( self, symbol: str, val: str ) -> None: - """parse_currency_symbol_unknown embeds symbol.""" + """parse_currency_symbol_unknown embeds redacted symbol summary.""" d = ErrorTemplate.parse_currency_symbol_unknown(symbol, val) assert d.code == DiagnosticCode.PARSE_CURRENCY_SYMBOL_UNKNOWN - assert symbol in d.message + assert fingerprint_text(symbol, label="currency_symbol") in d.message assert d.hint is not None event("template=parse_currency_symbol_unknown") @@ -430,10 +457,10 @@ def test_parse_currency_symbol_unknown( def test_parse_currency_code_invalid( self, code: str, val: str ) -> None: - """parse_currency_code_invalid embeds the 3-letter ISO code.""" + """parse_currency_code_invalid embeds redacted code summary.""" d = ErrorTemplate.parse_currency_code_invalid(code, val) assert d.code == DiagnosticCode.PARSE_CURRENCY_CODE_INVALID - assert code in d.message + assert fingerprint_text(code, label="currency_code") in d.message assert d.hint is not None event("template=parse_currency_code_invalid") @@ -441,10 +468,11 @@ def test_parse_currency_code_invalid( def test_parse_amount_invalid( self, amount: str, val: str, reason: str ) -> None: - """parse_amount_invalid embeds amount, value, and reason.""" + """parse_amount_invalid embeds redacted amount/value summaries and reason.""" d = ErrorTemplate.parse_amount_invalid(amount, val, reason) assert d.code == DiagnosticCode.PARSE_AMOUNT_INVALID - assert amount in d.message + assert fingerprint_text(amount, label="amount_fragment") in d.message + assert redacted_parse_failure(val, parse_type="currency") in d.message assert reason in d.message assert d.hint is not None event("template=parse_amount_invalid") @@ -532,5 +560,5 @@ def test_parse_currency_symbol_unknown_template(self) -> None: assert diagnostic.code == DiagnosticCode.PARSE_CURRENCY_SYMBOL_UNKNOWN assert "Unknown currency symbol" in diagnostic.message - assert "XYZ" in diagnostic.message + assert fingerprint_text("XYZ", label="currency_symbol") in diagnostic.message assert diagnostic.hint is not None diff --git a/tests/test_diagnostics_validation.py b/tests/test_diagnostics_validation.py index 0b578630..aae885c5 100644 --- a/tests/test_diagnostics_validation.py +++ b/tests/test_diagnostics_validation.py @@ -9,7 +9,7 @@ - Format idempotence: Multiple format() calls produce same output - Sanitization bounds: Sanitized content length is bounded - Count properties: error_count and warning_count match tuple lengths -- Validity invariant: is_valid == (no errors and no annotations) +- Validity invariant: is_valid fails closed on errors, annotations, and critical warnings Python 3.13+. """ @@ -231,9 +231,11 @@ def test_property_warning_count_matches_warnings_length( def test_property_is_valid_iff_no_errors_or_annotations( self, result: ValidationResult ) -> None: - """PROPERTY: is_valid == (no errors AND no annotations).""" + """PROPERTY: is_valid fails closed on errors, annotations, and critical warnings.""" expected_valid = ( - len(result.errors) == 0 and len(result.annotations) == 0 + len(result.errors) == 0 + and len(result.annotations) == 0 + and all(w.severity != WarningSeverity.CRITICAL for w in result.warnings) ) assert result.is_valid == expected_valid event(f"is_valid={result.is_valid}") @@ -242,9 +244,13 @@ def test_property_is_valid_iff_no_errors_or_annotations( def test_property_warnings_do_not_affect_validity( self, result: ValidationResult ) -> None: - """PROPERTY: Warnings alone do not make result invalid.""" + """PROPERTY: Non-critical warnings alone do not make result invalid.""" if len(result.errors) == 0 and len(result.annotations) == 0: - assert result.is_valid, "Result with only warnings should be valid" + expected_valid = all( + warning.severity != WarningSeverity.CRITICAL + for warning in result.warnings + ) + assert result.is_valid == expected_valid has_warnings = len(result.warnings) > 0 event(f"has_warnings={has_warnings}") diff --git a/tests/test_documentation_tooling.py b/tests/test_documentation_tooling.py index 3d111e4f..c5777764 100644 --- a/tests/test_documentation_tooling.py +++ b/tests/test_documentation_tooling.py @@ -78,6 +78,29 @@ def _index_routes() -> dict[str, tuple[Path, str]]: return routes +def _documentation_index_targets() -> set[str]: + """Return the Markdown files listed in the human docs map. + + Premise: + The root docs index should be a complete inventory of `docs/*.md`, not + just an API symbol router. + + Reason: + A dedicated parser here lets tests fail the moment a new guide lands in + `docs/` without being added to the published navigation map. + """ + index_path = REPO_ROOT / "docs" / "DOC_00_Index.md" + text = index_path.read_text(encoding="utf-8") + start = text.index("## Documentation Map") + end = text.index("## Routing Table") + section = text[start:end] + + return { + Path(target).name + for target in re.findall(r"\[[^\]]+\]\(([^)#]+\.md)\)", section) + } + + def _symbol_headings(md_path: Path) -> set[str]: """Return the set of second-level symbol headings in a markdown file.""" text = md_path.read_text(encoding="utf-8") @@ -233,7 +256,7 @@ def test_run_examples_registers_contracts_for_all_shipped_examples() -> None: assert set(run_examples.EXAMPLE_CONTRACTS) == shipped_examples assert ( run_examples.EXAMPLE_CONTRACTS["parser_only.py"]( - "[PASS] Warning-only validation semantics verified\n" + "[PASS] Critical warning validation semantics verified\n" "[PASS] Invalid syntax semantics verified\n" "All examples completed successfully!\n" ) @@ -242,8 +265,8 @@ def test_run_examples_registers_contracts_for_all_shipped_examples() -> None: assert run_examples.EXAMPLE_CONTRACTS["parser_only.py"]("incomplete output") is not None -def test_validate_version_uses_afad_frontmatter_version_contract() -> None: - """validate_version should enforce the AFAD v4.0 `version:` frontmatter key.""" +def test_validate_version_uses_configured_frontmatter_version_contract() -> None: + """validate_version should enforce the configured `version:` frontmatter key.""" pyproject = tomllib.loads((REPO_ROOT / "pyproject.toml").read_text(encoding="utf-8")) validate_version = _load_script_module( @@ -650,6 +673,15 @@ def test_sdist_includes_root_frontmatter_docs_and_readme() -> None: assert missing == [] +def test_root_readme_remains_plain_storefront_markdown() -> None: + """The root README should stay human-first and avoid AFAD-style wrapper markup.""" + readme = (REPO_ROOT / "README.md").read_text(encoding="utf-8") + + assert not readme.startswith("---\n") + assert not readme.lstrip().startswith("