Skip to content

Add CRA/EN 18031 threat model: asset register, trust boundaries, security controls#1149

Open
spoorcc wants to merge 28 commits into
mainfrom
claude/threat-model-en18031-Sfmqp
Open

Add CRA/EN 18031 threat model: asset register, trust boundaries, security controls#1149
spoorcc wants to merge 28 commits into
mainfrom
claude/threat-model-en18031-Sfmqp

Conversation

@spoorcc
Copy link
Copy Markdown
Contributor

@spoorcc spoorcc commented Apr 19, 2026

  • security/threat_model.py: pytm-based threat model covering the full SDLC
    (runtime, GitHub Actions CI/CD, PyPI distribution). Defines 6 trust
    boundaries, 5 primary assets (PA-01..PA-05), 10 supporting assets
    (SA-01..SA-10), 8 environmental assets (EA-01..EA-08), and 15 annotated
    data flows with existing-control flags.

  • doc/explanation/security.rst: Sphinx RST page documenting the asset
    register, trust boundaries, data flows, implemented controls table, and
    known gaps/residual risks — aligned with CRA Article 13 and EN 18031.

  • doc/index.rst: wire security.rst into the Explanation toctree.

  • pyproject.toml: add pytm==1.3.1 as a new [security] optional dependency.

  • CHANGELOG.rst: record the new security documentation in 0.14.0 (unreleased).

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive Security Model page covering threat analysis, asset register, trust boundaries, data flows, controls, gaps, operating assumptions and disclosure guidance.
    • Included the security page in the docs index and added a docs directive to embed generated diagrams/tables from the machine-readable model.
    • Updated docs styling for improved table header rendering and CSS consistency.
    • Added a maintainers' guidance entry to keep the model and docs synchronized.
  • Chores

    • Added an executable, machine-readable threat-model source, a structured threats catalog, and an optional "security" docs dependency; devcontainer now installs the security extras.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds an executable pytm-based threat model and Sphinx tooling to render it, introduces comprehensive security documentation (CRA/EN 18031 alignment, STRIDE), publishes structured threats JSON, updates docs/toctree and changelog, and adds a security optional dependency and devcontainer install of that extra.

Changes

Security model + docs + build

Layer / File(s) Summary
Data / Catalog
security/threats.json
Adds 10 structured threat entries (DFT-01DFT-10) with metadata, conditions, mitigations, examples, and references.
Core Model
security/threat_model.py
New pytm-based threat-model module: configures TM, assumptions, boundaries, actors, processes, ~15 dataflows (DF-01DF-15), asset inventory, Control dataclass, CONTROLS list, ASSET_CONTROLS index, and __main__ entry calling tm.process().
Sphinx integration
doc/_ext/pytm_directive.py, doc/conf.py
Adds .. pytm:: directive and caching/diagram generation logic; registers pytm_model config and pytm_directive extension; includes sphinx.ext.graphviz.
Documentation pages
doc/explanation/security.rst, doc/index.rst, CHANGELOG.rst, AGENTS.md
New detailed security page (scope, assumptions, asset register, dataflows, controls, gaps) and TOC/changelog entries; AGENTS guidance to keep model/docs aligned.
Docs styling
doc/dfetch_preamble.inc, doc/static/css/custom.css
Adjusts Sphinx table header styling and CSS to style table header stubs consistently.
Packaging & Devcontainer
pyproject.toml, .devcontainer/Dockerfile
Adds optional dependency pytm @ git+https://github.com/OWASP/pytm.git@master to project.optional-dependencies.docs and includes security extra in devcontainer editable install.

Sequence Diagram(s)

sequenceDiagram
  participant Contributor as Contributor
  participant Repo as GitHub Repository
  participant CI as GitHub Actions Runner
  participant Df as dfetch CLI
  participant VCS as Remote VCS / Archive
  participant PyPI as PyPI/TestPyPI
  participant Consumer as Consumer Build System

  Contributor->>Repo: Push/PR (manifest, source, workflows)
  Repo->>CI: Trigger workflow (OIDC, pinned Actions)
  CI->>Df: Run dfetch (parse manifest, fetch, validate)
  Df->>VCS: Clone / fetch archives
  VCS-->>Df: Serve archives / source
  Df->>Df: Validate archives (paths, symlinks, decompression, integrity optional)
  CI->>PyPI: Publish package (OIDC trusted publish)
  Consumer->>PyPI: Install package
  Consumer->>Df: Execute fetched sources during build
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • Fix #891 - Update docs #892: Also modifies the Sphinx extensions configuration in doc/conf.py; likely related to doc build/extension changes.

Suggested labels

development, documentation

Suggested reviewers

  • ben-edna
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change—adding a CRA/EN 18031-aligned threat model with asset register, trust boundaries, and security controls as the primary objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/threat-model-en18031-Sfmqp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/explanation/security.rst`:
- Line 205: Replace the repo path string `/.github/workflows/*.yml` with the
repo-relative form `.github/workflows/*.yml` in the documentation so it does not
imply an absolute filesystem path; search for the exact token
`/.github/workflows/*.yml` in doc/explanation/security.rst and update it to
`.github/workflows/*.yml`.
- Around line 166-172: Add the missing asset declaration for SA-01 in the threat
model so the executable model matches the RST: create an Asset (or call the
model's register_asset / add_asset helper) with id "SA-01", name "dfetch
Process", type/category matching other assets, and a description that mirrors
the RST text (including "Restricted" and the subprocess injection/dependency
confusion concerns); ensure the new Asset is included in the model collection
used by the report generator (e.g., ThreatModel.assets or the register/export
path) so python -m security.threat_model --report emits the same SA-01 entry as
the documentation.
- Around line 393-492: Replace the hardcoded count "17 supply-chain health
checks" with a non-static description so it won't go stale: update the doc text
that mentions "17 supply-chain health checks" to say something like "the full
set of OpenSSF Scorecard checks" (or "the current set of OpenSSF Scorecard
checks") and optionally add a parenthetical note pointing to the Scorecard
project for the authoritative list; also update or remove the corresponding
enumeration in threat_model.py (the list of check names) so the docs and
threat_model.py remain consistent when the Scorecard checks change.
- Line 17: The reference link for `EN 18031` points to an incorrect ETSI URL and
should be replaced with a correct source; update the link target used in the
reference label `EN 18031` to point to the CEN-CENELEC information page or an
authorized distributor (for example the CEN-CENELEC standards overview or an
iTeh/national standards body preview) and adjust the link text if needed so the
doc accurately identifies EN 18031 as a CEN/CENELEC standard rather than linking
to ETSI.
- Around line 408-410: Replace the hardcoded line-number reference in the
security.rst text with the function name only: update the fragment that
currently reads "dfetch/util/util.py:265–285" to reference
"check_no_path_traversal() in dfetch/util/util.py" (or simply
"check_no_path_traversal()") so the docs cite the function by name and remove
the specific line numbers; locate the mention in doc/explanation/security.rst
and change the referenced text accordingly.

In `@pyproject.toml`:
- Line 110: The security.rst "Regenerate analysis output" section currently
shows the diagram generation commands (--seq and --dfd) but omits that PlantUML
and Graphviz are external system dependencies; update that section to add a
short note stating that running the diagram commands requires PlantUML and
Graphviz to be installed on the system (they are not installed via pip and pip
install .[security] alone will not produce diagrams), and mention alternatives
or links if desired so maintainers know the extra prerequisites before running
those commands.

In `@security/threat_model.py`:
- Around line 376-384: The EA-06..EA-08 assets are missing from the
environmental-assets block; add mapping comments so the module matches
doc/explanation/security.rst by annotating EA-06 → developer and/or
gh_actions_runner (Developer Workstation / CI Runner), EA-07 → consumer_build
(Consumer Build System), and EA-08 → network_transport (Network Transport).
Insert these concise comment lines next to the existing EA-01..EA-05 block using
the exact symbol names developer, gh_actions_runner, consumer_build, and
network_transport so the executable model and RST asset register stay aligned.
- Around line 242-374: The supporting-asset numbering is inconsistent: SA-01
("dfetch Process") is documented but missing from the model; either add SA-01 in
this file mapping to the existing dfetch_cli Process or remove SA-01 from the
RST. Fix by adding a Data/Process entry named "SA-01: dfetch Process" (matching
fields used for other assets) and set its identifier/attributes to reflect the
existing dfetch_cli Process (or alternatively remove the SA-01 row from
doc/explanation/security.rst); ensure the symbol dfetch_cli is referenced/used
so the model and documentation remain consistent.
- Around line 211-213: The hasWriteAccess flag for integrity_hash_record is
incorrect: update integrity_hash_record.hasWriteAccess to True (since
developers/processes write the dfetch.yaml field) and similarly review and set
pypi_package.hasWriteAccess = True where the publish pipeline writes to it;
locate the Datastore objects named integrity_hash_record and pypi_package in
threat_model.py and set their hasWriteAccess to reflect actual write capability
by processes/pipelines while keeping integrity_hash_record.classification =
Classification.SENSITIVE unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0131831c-88e8-40e8-800c-522279fc431b

📥 Commits

Reviewing files that changed from the base of the PR and between ac7dd04 and e81dc49.

📒 Files selected for processing (6)
  • CHANGELOG.rst
  • doc/explanation/security.rst
  • doc/index.rst
  • pyproject.toml
  • security/__init__.py
  • security/threat_model.py

Comment thread doc/explanation/security.rst Outdated
Comment thread doc/explanation/security.rst Outdated
Comment thread doc/explanation/security.rst Outdated
Comment thread doc/explanation/security.rst Outdated
Comment thread doc/explanation/security.rst Outdated
Comment thread pyproject.toml Outdated
Comment thread security/threat_model.py Outdated
Comment thread security/threat_model.py Outdated
Comment thread security/threat_model.py Outdated
@spoorcc spoorcc force-pushed the claude/threat-model-en18031-Sfmqp branch from 8948021 to c5d1217 Compare April 19, 2026 12:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
doc/explanation/security.rst (1)

17-17: ⚠️ Potential issue | 🟡 Minor

EN 18031 link still generic.

https://www.cenelec.eu/ is the CEN-CENELEC homepage, not a page about EN 18031. Consider linking to the CEN-CENELEC standards portal entry or an authorised distributor preview (e.g. iTeh) so readers can reach the actual standard in one click.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/explanation/security.rst` at line 17, The EN 18031 anchor target
currently points to the generic CEN-CENELEC homepage; update the reST reference
for "`EN 18031`" in security.rst so it links directly to the standard’s specific
entry (for example the CEN/CENELEC standards portal page or an authorised
distributor preview such as iTeh) instead of https://www.cenelec.eu/, ensuring
the anchor text remains "`EN 18031`" and the URL points to the specific standard
resource.
security/threat_model.py (1)

256-259: 🧹 Nitpick | 🔵 Trivial

Nit: consider declaring SA-01 as a first-class model element instead of comment-only.

The comment here maps SA-01 → dfetch_cli, but the pytm --report output won't emit an "SA-01" label next to the dfetch CLI process. If you want the executable model to surface the same SA-XX identifiers as the RST asset register, consider renaming dfetch_cli = Process("SA-01: dfetch CLI") (mirroring the naming convention used for PA-0x/SA-02..SA-10 datastores). Same applies to EA-01..EA-08 mapping comments at Lines 394–401. Non-blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@security/threat_model.py` around lines 256 - 259, The SA-01 comment should be
represented as a first-class pytm model element: rename the existing dfetch_cli
Process creation to include the SA identifier (e.g. dfetch_cli = Process("SA-01:
dfetch CLI")) so the --report output shows the SA-01 label; similarly update the
model elements referenced by the EA-01..EA-08 comments so their Process/Entity
constructors include the EA-0x prefix (e.g. Process("EA-01: ...") or
DataStore("EA-02: ...")) to mirror the RST asset register and surface those
identifiers in reports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/explanation/security.rst`:
- Around line 303-307: Replace the empty classification marker '—' for the EA-08
/ Network Transport table row with a clear label (e.g., "Public" or "N/A") and,
if choosing "N/A", add a brief footnote explaining why the transport channel is
not confidential; specifically update the EA-08 row that currently contains '—'
(the Network Transport entry listing "HTTPS, SSH, SVN, and plain HTTP carrying
all remote fetch traffic") so the classification column matches the ISO/IEC
27005 taxonomy used elsewhere.

---

Duplicate comments:
In `@doc/explanation/security.rst`:
- Line 17: The EN 18031 anchor target currently points to the generic
CEN-CENELEC homepage; update the reST reference for "`EN 18031`" in security.rst
so it links directly to the standard’s specific entry (for example the
CEN/CENELEC standards portal page or an authorised distributor preview such as
iTeh) instead of https://www.cenelec.eu/, ensuring the anchor text remains "`EN
18031`" and the URL points to the specific standard resource.

In `@security/threat_model.py`:
- Around line 256-259: The SA-01 comment should be represented as a first-class
pytm model element: rename the existing dfetch_cli Process creation to include
the SA identifier (e.g. dfetch_cli = Process("SA-01: dfetch CLI")) so the
--report output shows the SA-01 label; similarly update the model elements
referenced by the EA-01..EA-08 comments so their Process/Entity constructors
include the EA-0x prefix (e.g. Process("EA-01: ...") or DataStore("EA-02: ..."))
to mirror the RST asset register and surface those identifiers in reports.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: acb6cfa3-58a6-4dd6-91aa-2838010a0ecf

📥 Commits

Reviewing files that changed from the base of the PR and between e81dc49 and 8948021.

📒 Files selected for processing (3)
  • .devcontainer/Dockerfile
  • doc/explanation/security.rst
  • security/threat_model.py

Comment thread doc/explanation/security.rst Outdated
@spoorcc spoorcc force-pushed the claude/threat-model-en18031-Sfmqp branch from c5d1217 to 7b284be Compare April 19, 2026 12:48
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented Apr 19, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pyproject.toml (1)

110-116: ⚠️ Potential issue | 🟠 Major

Include the security package in setuptools discovery or document source-checkout-only usage.

The new security extra installs pytm, but security/threat_model.py is excluded from the configured package include list (dfetch, dfetch.*). The python -m security.threat_model --report workflow will only work from a repository checkout, not from a standard wheel install of the project with the security extra.

Suggested fix
 [tool.setuptools.packages.find]
-include = ["dfetch", "dfetch.*"]
+include = ["dfetch", "dfetch.*", "security", "security.*"]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 110 - 116, The security extra currently installs
pytm but the security package (security/threat_model.py) is not included by
setuptools discovery because the include list only has "dfetch" and "dfetch.*";
update the package discovery in pyproject.toml to also include "security" and
"security.*" so the security module is bundled into wheels (or alternatively add
a short note in your packaging documentation that the python -m
security.threat_model --report workflow only works from a source checkout and
not from a wheel), and ensure references to the extra name "security" and the
module path "security.threat_model" are consistent with the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/explanation/security.rst`:
- Around line 417-421: Update the documentation text describing the
path-traversal control to match the actual implementation: replace the reference
to os.path.realpath with either the exact API used (pathlib.Path.resolve()) or a
generic phrase like “resolves paths” and mention check_no_path_traversal() from
dfetch/util/util.py so the doc accurately reflects the implementation.
- Around line 265-270: The current EA-01 "Remote VCS Servers" entry wrongly
implies non-TLS MITM is mitigated by SSH controls; update the EA-01 text to
explicitly state that "BatchMode=yes" and SSH host-key verification only affect
SSH remotes and do not protect plain http:// or svn:// transports, call out
residual MITM risk for non-TLS endpoints, and recommend using TLS-protected
transports (HTTPS, svn+https) or other TLS/SSH tunnels for those hosts instead;
keep the existing mention of GitHub/GitLab/Gitea but reword the mitigation
sentence to separate SSH-specific controls (BatchMode, host-key verification)
from general TLS requirements.

In `@security/threat_model.py`:
- Around line 409-428: The df03 and df04 Dataflow objects currently mark
encryption/integrity controls as unconditional even though they accept
unencrypted svn:// and http:// paths; update the model by removing or setting to
False the unconditional flags on df03.controls.isEncrypted and
df04.controls.providesIntegrity, and instead create separate Dataflow entries
(e.g., df03_https_ssh, df03_plain and df04_https_ssh, df04_plain) that
explicitly represent HTTPS/SSH (isEncrypted=True, providesIntegrity=True) versus
svn/http (isEncrypted=False, providesIntegrity=False) paths so the pytm report
reflects conditional guarantees; also apply the same split for df07-related
flows and ensure corresponding changes are mirrored in
doc/explanation/security.rst to keep docs and model aligned.
- Around line 312-325: The model declares gh_workflows = Datastore("SA-06:
GitHub Actions Workflows") but contradictorily sets gh_workflows.hasWriteAccess
= False while the description explicitly states PRs can modify
.github/workflows/*.yml; update the executable model by setting
gh_workflows.hasWriteAccess = True to reflect writable workflow definitions (and
re-evaluate gh_workflows.storesSensitiveData/classification if needed) and
ensure the change in security/threat_model.py (references: gh_workflows,
Datastore, boundary_github, hasWriteAccess) is kept in sync with
doc/explanation/security.rst.

---

Outside diff comments:
In `@pyproject.toml`:
- Around line 110-116: The security extra currently installs pytm but the
security package (security/threat_model.py) is not included by setuptools
discovery because the include list only has "dfetch" and "dfetch.*"; update the
package discovery in pyproject.toml to also include "security" and "security.*"
so the security module is bundled into wheels (or alternatively add a short note
in your packaging documentation that the python -m security.threat_model
--report workflow only works from a source checkout and not from a wheel), and
ensure references to the extra name "security" and the module path
"security.threat_model" are consistent with the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c4c0b47e-f52e-4a55-b262-8ef2a7e19fa1

📥 Commits

Reviewing files that changed from the base of the PR and between 8948021 and 7b284be.

📒 Files selected for processing (8)
  • .devcontainer/Dockerfile
  • AGENTS.md
  • CHANGELOG.rst
  • doc/explanation/security.rst
  • doc/index.rst
  • pyproject.toml
  • security/__init__.py
  • security/threat_model.py

Comment thread doc/explanation/security.rst Outdated
Comment thread doc/explanation/security.rst Outdated
Comment thread security/threat_model.py Outdated
Comment thread security/threat_model.py Outdated
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented Apr 19, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 19, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
doc/explanation/security.rst (1)

442-442: 🧹 Nitpick | 🔵 Trivial

Hard-coded line number (line 277) will drift.

This reference pins to a specific line in dfetch/util/util.py that will silently become incorrect on any refactor in that file. Prefer referencing the function by name only (e.g., check_no_path_traversal() in dfetch/util/util.py) so the docs stay correct without coupling to line numbers.

Proposed fix
-       ``dfetch/util/util.py`` — ``check_no_path_traversal`` at line 277
+       ``check_no_path_traversal()`` in ``dfetch/util/util.py``
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/explanation/security.rst` at line 442, Remove the hard-coded line number
and reference the function by name and file only: replace the text
"``dfetch/util/util.py`` — ``check_no_path_traversal`` at line 277" with
something like "``check_no_path_traversal()`` in ``dfetch/util/util.py``";
ensure you reference the function symbol check_no_path_traversal() (no line
numbers) so the documentation remains correct after refactors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 116: The pyproject.toml currently includes a top-level "security" package
via the include entry (include = ["dfetch", "dfetch.*", "security",
"security.*"]), which exposes a global security module; either vendor/relocate
the code under your project namespace (move the package into dfetch/security —
e.g., dfetch/security/threat_model.py and update any module imports and python
-m entrypoints to use dfetch.security.threat_model) or remove "security" and
"security.*" from the include list so the package is not distributed in the
wheel; update the include array accordingly and fix any import paths referencing
the old top-level "security" symbol.

In `@security/threat_model.py`:
- Around line 94-108: The EA IDs and titles are out of order and EA-04's title
doesn't match the docs; update the declarations so EA-03 (gh_repository) is
declared before EA-04 and rename gh_actions_runner to use the exact title
"EA-04: GitHub Actions Infrastructure" instead of "GitHub Actions Runner";
ensure gh_repository and gh_actions_runner (the two ExternalEntity objects)
retain their current inBoundary and description content but appear in the
correct order and with the exact EA-03/EA-04 titles to match
doc/explanation/security.rst.
- Line 127: The Process instance dfetch_cli is labeled "SA-01: dfetch CLI" but
the RST uses "dfetch Process"; update the Process(...) call for dfetch_cli to
use the exact RST title ("SA-01: dfetch Process") so the threat ID text matches
across surfaces, and scan for any other occurrences of "SA-01" to ensure
consistent naming.

---

Duplicate comments:
In `@doc/explanation/security.rst`:
- Line 442: Remove the hard-coded line number and reference the function by name
and file only: replace the text "``dfetch/util/util.py`` —
``check_no_path_traversal`` at line 277" with something like
"``check_no_path_traversal()`` in ``dfetch/util/util.py``"; ensure you reference
the function symbol check_no_path_traversal() (no line numbers) so the
documentation remains correct after refactors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4758c3d8-3d8e-4b75-bf40-ef9b25926bb8

📥 Commits

Reviewing files that changed from the base of the PR and between 7b284be and 80b9f9b.

📒 Files selected for processing (3)
  • doc/explanation/security.rst
  • pyproject.toml
  • security/threat_model.py

Comment thread pyproject.toml Outdated
Comment thread security/threat_model.py Outdated
Comment thread security/threat_model.py Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
security/threat_model.py (1)

469-475: ⚠️ Potential issue | 🟠 Major

Do not mark DF-07 integrity as unconditional.

Line 475 still reports integrity for every vendored-file write, but DF-07 can write content from Git/SVN, optional-hash archives, or patch application. This makes the generated STRIDE report overstate the guarantee; keep this False unless the flow is split by integrity-verified vs unverified sources.

🛠️ Proposed fix
 df07.controls.sanitizesInput = True
-df07.controls.providesIntegrity = True
+df07.controls.providesIntegrity = False  # not guaranteed for VCS, patch, or optional-hash archive paths

Based on learnings, after security-relevant changes, review and update security/threat_model.py and doc/explanation/security.rst to keep them aligned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@security/threat_model.py` around lines 469 - 475, The Dataflow df07 is
incorrectly marked as always providing integrity; modify
df07.controls.providesIntegrity to False (or only set True on a new, separate
Dataflow representing integrity-verified sources) so DF-07 does not overstate
guarantees; update the DF-07 description if needed and ensure any related
documentation (e.g., security/threat_model.py references and
doc/explanation/security.rst) is updated to reflect the conditional integrity
status.
doc/explanation/security.rst (1)

435-442: ⚠️ Potential issue | 🟡 Minor

Align the path-traversal control description with the implementation.

This section now says check_no_path_traversal() uses os.path.realpath and not pathlib.Path.resolve; please verify the function and keep the API reference exact. If the implementation still uses Path.resolve(), this doc should be reverted to that wording.

🔎 Read-only verification
#!/bin/bash
# Description: Inspect the actual path traversal helper implementation.
# Expected: The documentation should name the same path-resolution API used here.

rg -n -A30 -B5 'def\s+check_no_path_traversal\b|check_no_path_traversal\s*=' --type py
📝 Proposed fix if the helper uses pathlib
-     - ``check_no_path_traversal()`` resolves both the candidate path and the
-       destination root via ``os.path.realpath`` (symlink-aware, not
-       ``pathlib.Path.resolve``), then rejects any path whose resolved prefix
-       does not start with the resolved root.  Applied to every file copy and
-       post-extraction symlink.
+     - ``check_no_path_traversal()`` resolves both the candidate path and the
+       destination root with ``pathlib.Path.resolve()``, then rejects any path
+       that escapes the resolved root.  Applied to every file copy and
+       post-extraction symlink.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/explanation/security.rst` around lines 435 - 442, The documentation
claims check_no_path_traversal() uses os.path.realpath but the implementation
may use pathlib.Path.resolve; inspect the actual implementation of
check_no_path_traversal in dfetch/util/util.py and update this doc block to
match: if the function uses pathlib.Path.resolve, revert the wording to mention
pathlib.Path.resolve (including any resolve(strict=...) behavior), otherwise
keep the os.path.realpath wording; ensure the doc references the exact API used
and preserves mention that it compares resolved candidate path prefix against
the resolved destination root.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@doc/explanation/security.rst`:
- Around line 19-26: Update the security.rst snippet to clarify that the listed
python -m security.threat_model commands must be run from a repository source
checkout with the repository root on PYTHONPATH (or run from the repo root)
because the security/ package is intentionally excluded from built wheels; note
that pip install dfetch[security] will not install security/threat_model. Edit
the paragraph that references security/threat_model.py and the example commands
to add a short sentence stating this checkout/PYTHONPATH requirement and the
exclusion from packaged wheels so users don't expect pip-only availability.

In `@security/threat_model.py`:
- Around line 501-508: The DF-12 dataflow is currently modeled in the wrong
direction; update the Dataflow construction for df12 so it represents
gh_repository -> gh_actions_runner (repository to runner) instead of
gh_actions_runner to gh_repository, and ensure the descriptive text still
explains that GitHub Actions checks out source from the repository; also update
the DF-12 entry in doc/explanation/security.rst so the diagram/text reflects the
repository → actions runner direction. Locate the Dataflow instance named df12
and change its source/target ordering (gh_repository and gh_actions_runner) and
then edit the DF-12 row in security.rst to match the same direction and wording.

---

Duplicate comments:
In `@doc/explanation/security.rst`:
- Around line 435-442: The documentation claims check_no_path_traversal() uses
os.path.realpath but the implementation may use pathlib.Path.resolve; inspect
the actual implementation of check_no_path_traversal in dfetch/util/util.py and
update this doc block to match: if the function uses pathlib.Path.resolve,
revert the wording to mention pathlib.Path.resolve (including any
resolve(strict=...) behavior), otherwise keep the os.path.realpath wording;
ensure the doc references the exact API used and preserves mention that it
compares resolved candidate path prefix against the resolved destination root.

In `@security/threat_model.py`:
- Around line 469-475: The Dataflow df07 is incorrectly marked as always
providing integrity; modify df07.controls.providesIntegrity to False (or only
set True on a new, separate Dataflow representing integrity-verified sources) so
DF-07 does not overstate guarantees; update the DF-07 description if needed and
ensure any related documentation (e.g., security/threat_model.py references and
doc/explanation/security.rst) is updated to reflect the conditional integrity
status.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: df1ae2e4-0318-4df3-b7e9-26316ad11a9a

📥 Commits

Reviewing files that changed from the base of the PR and between 80b9f9b and 5686b14.

📒 Files selected for processing (3)
  • doc/explanation/security.rst
  • pyproject.toml
  • security/threat_model.py

Comment thread doc/explanation/security.rst Outdated
Comment thread security/threat_model.py Outdated
@spoorcc spoorcc force-pushed the claude/threat-model-en18031-Sfmqp branch from 5686b14 to e34d3ff Compare April 19, 2026 19:50
Comment thread security/threat_model.py Fixed
Comment thread security/threat_model.py Fixed
Comment thread security/threat_model.py Fixed
Comment thread security/threat_model.py Fixed
claude and others added 6 commits April 30, 2026 21:11
…rity controls

- security/threat_model.py: pytm-based threat model covering the full SDLC
  (runtime, GitHub Actions CI/CD, PyPI distribution). Defines 6 trust
  boundaries, 5 primary assets (PA-01..PA-05), 10 supporting assets
  (SA-01..SA-10), 8 environmental assets (EA-01..EA-08), and 15 annotated
  data flows with existing-control flags.

- doc/explanation/security.rst: Sphinx RST page documenting the asset
  register, trust boundaries, data flows, implemented controls table, and
  known gaps/residual risks — aligned with CRA Article 13 and EN 18031.

- doc/index.rst: wire security.rst into the Explanation toctree.

- pyproject.toml: add pytm==1.3.1 as a new [security] optional dependency.

- CHANGELOG.rst: record the new security documentation in 0.14.0 (unreleased).

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
security/threat_model.py:
- Split DF-03/DF-04 into DF-03a/b and DF-04a/b (HTTPS+SSH vs svn://+http://)
  so pytm STRIDE analysis reflects conditional encryption guarantees rather
  than unconditionally marking unencrypted transports as encrypted.
- Set gh_workflows.hasWriteAccess = True (was False but description stated
  PRs can modify .github/workflows/ definitions — now consistent).
- Remove threatsFile=None (pytm rejects None; omitting uses built-in library).

doc/explanation/security.rst:
- EA-01: Separate SSH-specific controls (BatchMode=yes, host-key verification)
  from non-TLS transports (http://, svn://), which remain unmitigated; add
  explicit MITM residual risk and HTTPS/svn+https:// recommendation.
- Data Flows table: replace single DF-03/DF-04 rows with DF-03a/b and DF-04a/b
  to match the split in the pytm model.
- Path-traversal control: clarify that check_no_path_traversal() uses
  os.path.realpath (not pathlib.Path.resolve); fix line reference to 277.

pyproject.toml:
- Add "security" and "security.*" to setuptools package discovery so that
  `pip install .[security]` bundles the threat model module and
  `python -m security.threat_model` works from an installed wheel.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
…security package

pyproject.toml:
- Remove "security" and "security.*" from setuptools include list; the
  threat model is a source-checkout-only compliance tool and must not be
  bundled as a top-level namespace package in the distributed wheel.

security/threat_model.py:
- Declare gh_repository (EA-03) before gh_actions_runner (EA-04) to match
  numeric order in the RST asset register.
- Rename gh_actions_runner label from "EA-04: GitHub Actions Runner" to
  "EA-04: GitHub Actions Infrastructure" to match doc/explanation/security.rst.
- Rename dfetch_cli label from "SA-01: dfetch CLI" to "SA-01: dfetch Process"
  to match the Supporting Assets table in security.rst.

doc/explanation/security.rst:
- Remove hard-coded line number from path-traversal control entry; reference
  the function symbol check_no_path_traversal() and file only so the doc
  stays correct after refactors.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
Inserts the mandatory eight-section Product Security Context note before
the Scope and Assumptions section in doc/explanation/security.rst.

Covers: product/manufacturer identification, IPFRU, user roles (Developer /
CI Runner / Security-operator), operating environment, architecture and
connectivity, external dependencies, security assumptions, and support
period / data handling.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
@spoorcc spoorcc force-pushed the claude/threat-model-en18031-Sfmqp branch from b25e9da to 092f230 Compare April 30, 2026 21:21
Implements a custom Sphinx extension (doc/_ext/pytm_directive.py) that
renders sections of the pytm threat model directly at build time — no
pre-generated intermediate files, no duplication between code and docs.

Extension highlights
- .. pytm:: directive with :assets:, :dataflows:, :controls:, :gaps:,
  and :threats: flags; any subset may be combined
- Model path configured via pytm_model in conf.py (absolute path);
  directive also accepts an explicit relative-path argument
- Incremental builds: env.note_dependency() marks the document for
  rebuild whenever security/threat_model.py changes
- Single-build caching: model is loaded once in the builder-inited hook
  (single-threaded) and stored in app._pytm_cache; directive runs only
  read from cache, so parallel_read_safe = True is correct
- Thread-safe lazy fallback: _load_lock guards the rare case of an
  explicit path not covered by the preload

Threat model additions (security/threat_model.py)
- CONTROLS list (17 entries): each control has name, assets, implementation
  text (RST markup supported), and reference path
- GAPS list (8 entries): each gap has name and description (RST markup)
- classification added to all ExternalEntity and Process elements so the
  asset table renders complete Classification column values
- Data objects fetched from TM._data (separate from TM._elements in pytm)

Documentation (doc/explanation/security.rst)
- Four static list-table blocks replaced with four .. pytm:: directives
  (:assets:, :dataflows:, :controls:, :gaps:) — ~445 lines removed
- Introductory paragraphs updated to note live generation from the model
- Product Security Context note (added previously) unchanged

Packaging
- pytm==1.3.1 added to the docs extras group so pip install .[docs]
  installs the dependency for documentation builds

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
spoorcc added 2 commits May 1, 2026 15:43
…plantweb silently overwrites it at the docutils level (which is exactly what its own source comment says it does to avoid generating warnings). The [app.add_directive] warning disappears, and .. graphviz:: continues to work for the DFD rendering.
- _generate_diagrams: pre-renders PlantUML sequence diagram to PNG via
  plantweb Python API (confdir/_static/pytm/threat_model_seq.png) so
  the LaTeX builder embeds it with ``.. image::`` rather than calling
  the network-dependent ``.. uml::`` directive at compile time.
  Falls back silently to ``.. only:: html`` wrapper when plantweb is
  unavailable or the render server is unreachable.

- _render_dfd: removes the ``.. only:: html`` guard so sphinx.ext.graphviz
  produces the DFD via local ``dot`` binary in both HTML and PDF builds.
  Previously the guard excluded the diagram from PDF entirely.

- _list_table: prepends ``.. tabularcolumns::`` with proportional
  ``p{\linewidth}`` column specs so Sphinx uses the LaTeX ``longtable``
  environment, allowing tall tables (controls, gaps, threats) to break
  vertically across page boundaries.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
The previous commit emitted bare ``.. graphviz::`` without an ``.. only::
html`` guard. On CI runners where the ``dot`` binary is not installed
sphinx.ext.graphviz fails silently and emits ``\includegraphics{None}``,
which crashes pdflatex.

Fix: _generate_diagrams now calls ``dot -Tpng`` via subprocess to pre-render
the DFD to _static/pytm/threat_model_dfd.png (same pattern as the plantweb
pre-render for the sequence diagram). _render_dfd emits ``.. image::`` when
the PNG is available. If ``dot`` is absent the exception is suppressed and the
output falls back to ``.. only:: html`` (diagram visible in HTML, absent from
PDF but no crash).

Both diagrams therefore appear in PDF when the necessary tools are available
on the build host, and neither causes a build failure when they are not.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
…ols/gaps

Each ID cell in the threats, controls, gaps, assets, and dataflows tables
now carries a ``.. _label:`` anchor so Sphinx registers it as a named
cross-reference target.

Cross-reference columns (Controls column in the threats table; Asset(s) and
Threat(s) columns in the controls and gaps tables) now emit ``:ref:`` links
instead of plain text, allowing one-click navigation between related items
in both the HTML and PDF renders.

A new _ref_list() helper builds comma-separated ``:ref:`ID <id>``` strings
and returns an em-dash for empty lists. _list_table() gains an
anchor_first_col flag that injects ``.. _label:`` + blank line before each
ID cell, which Sphinx picks up as a valid explicit hyperlink target.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
The pre-render-to-PNG approach was unreliable: subprocess dot and plantweb
API calls failed silently, falling back to ``.. only:: html`` and hiding
both diagrams from the PDF.

Root cause: graphviz (dot) is not installed in the CI doc/PDF build jobs,
and sphinx.ext.graphviz unconditionally emits \sphinxincludegraphics{None}
when dot is absent — crashing pdflatex.

Fix: use ``.. uml::`` via plantweb.directive for both diagrams, matching
exactly how the existing architecture diagrams are rendered:

- Sequence diagram: pass pytm's @startuml/@enduml output directly to
  ``.. uml::`` — plantweb sends it to plantuml.com.
- Data-flow diagram: wrap pytm's GraphViz DOT output in PlantUML
  @startdot/@enddot tags and pass it to ``.. uml::`` — plantuml.com
  renders GraphViz natively, no local dot binary required.

plantuml.com is in the egress allowlist for both the HTML and PDF CI jobs
and is already used successfully for all other UML diagrams in the docs.

This also removes the complex pre-render infrastructure (_generate_diagrams
confdir parameter, seq_img_path/dfd_img_path in the data dict, subprocess
and plantweb-API pre-render blocks) which is no longer needed.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
…eX PDF

Root cause: plantweb saves rendered PNGs to builder.outdir (``_build/latex/
plantweb/hash.png`` for the LaTeX builder, because ``builder.imagedir = ''``).
Sphinx's copy_image_files copies images FROM srcdir TO outdir, so it copies
``srcdir/_build/latex/plantweb/hash.png → outdir/hash.png``. But srcdir is the
RST source tree and ``_build/`` is outside it, so the source path doesn't
resolve and the copy silently fails. pdflatex then crashes with:
  ! Unable to load picture or PDF file 'hash.png'

Fix: separate HTML and LaTeX rendering:

HTML (``.. only:: html``): ``.. uml::`` via plantweb.directive renders at Sphinx
read time, calling plantuml.com. This correctly generates images in
``builder.outdir/_images/plantweb/`` for the HTML builder.

LaTeX (``.. only:: latex``): ``_generate_diagrams`` pre-renders both diagrams
to PNG using the plantweb Python API during ``builder-inited``, saving PNGs to
``confdir/_static/pytm/`` (the SOURCE TREE). ``.. image:: /_static/pytm/...``
then references these files. Sphinx resolves the /-path relative to srcdir,
finds the PNG in the source tree, and copies it to ``_build/latex/`` where
pdflatex can load it with ``\includegraphics{threat_model_dfd.png}``.

If pre-rendering fails (plantweb unavailable, network blocked), the LaTeX block
is simply omitted; diagrams are HTML-only but the PDF build does not crash.

DFD: the GraphViz DOT output from pytm is wrapped in @startdot/@enddot so
plantweb renders it via PlantUML's bundled GraphViz — no local dot binary needed.

https://claude.ai/code/session_01Rc28JtpAPWhJtA3YvS5kcr
@spoorcc
Copy link
Copy Markdown
Contributor Author

spoorcc commented May 2, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.devcontainer/Dockerfile:
- Line 56: The pip install extras list in the devcontainer Dockerfile includes a
non-existent "security" extra which will break the build; either remove
"security" from the pip install -e .[development,docs,test,casts,build,security]
invocation in the Dockerfile, or add a proper
[project.optional-dependencies].security section to pyproject.toml that lists
the intended packages (e.g., pytm) so the extra exists; update the Dockerfile's
pip extras or pyproject.toml's optional-dependencies to match the actual extras
(development, docs, test, casts, build, sbom, wheel).

In `@doc/_ext/pytm_directive.py`:
- Around line 134-138: The directive currently only marks the Python model file
as a dependency which leaves non-Python inputs like security/threats.json
untracked; update the dependency registration in run() (and where
_get_model_data() is called) to also call env.note_dependency() for the threats
JSON (and any other non-Python inputs used to render the threats/controls
section) so Sphinx will rebuild on JSON edits and avoid stale in-memory cache
content.
- Around line 369-371: The code currently uses inline "# type: ignore" on the
importlib spec and on a docutils container children usage; instead explicitly
handle nullable and type-mismatch cases: after calling
importlib.util.spec_from_file_location("_pytm_model_tmp", model_path) check that
the returned spec is not None and that spec.loader is present (raise a clear
ImportError or ValueError with context if absent) before calling
spec.loader.exec_module(mod), and call importlib.util.module_from_spec(spec)
only after that check; for the docutils container, replace direct uses of
container.children with an explicit list conversion (e.g., children =
list(container.children)) or appropriate typing cast so the code no longer needs
"# type: ignore" and types are explicit.

In `@doc/explanation/security.rst`:
- Around line 75-77: Update the text to narrow the claim: change the sentence to
state that dfetch validates the destination path via check_no_path_traversal()
before extraction (as performed in Update._check_destination()) and that archive
symlinks are validated post-extraction (see archive symlink checks in
dfetch/vcs/archive.py), rather than implying the check runs on every individual
file copy.

In `@security/threat_model.py`:
- Line 12: The file contains an inline pylint suppression comment "# pylint:
disable=too-many-lines" in security/threat_model.py which violates guidelines;
remove that suppression and resolve the root cause by refactoring the large
module: split the threat_model module into smaller modules (e.g., create
assets.py, dataflows.py, controls.py or move large model definitions into a
separate data/config file) and update imports/exports used by any functions or
classes referenced in threat_model (search for top-level symbols like the main
threat model classes or data structures) so the resulting files are under the
pylint line limits and the inline disable can be deleted.
- Around line 946-950: Update the C-021 description string in
security/threat_model.py to fix the grammar: replace "does generates" with
either "does generate" or simply "generates" in the description=(...) text so
the sentence reads correctly (e.g., "The release pipeline generates SLSA
provenance attestations..." or "The release pipeline does generate SLSA
provenance attestations...").
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3fd610c8-b2a9-41e1-aa61-88d4165eab32

📥 Commits

Reviewing files that changed from the base of the PR and between 80b9f9b and f7a3ed6.

📒 Files selected for processing (13)
  • .devcontainer/Dockerfile
  • AGENTS.md
  • CHANGELOG.rst
  • doc/_ext/pytm_directive.py
  • doc/conf.py
  • doc/dfetch_preamble.inc
  • doc/explanation/security.rst
  • doc/index.rst
  • doc/static/css/custom.css
  • pyproject.toml
  • security/__init__.py
  • security/threat_model.py
  • security/threats.json

Comment thread .devcontainer/Dockerfile

RUN pip install --no-cache-dir --root-user-action=ignore --upgrade pip==26.0.1 \
&& pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build] \
&& pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build,security] \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the security extra is not defined in pyproject.toml
grep -n "^\[project.optional-dependencies\]" pyproject.toml
grep -n "^security\s*=" pyproject.toml || echo "No 'security' extra found in pyproject.toml"

Repository: dfetch-org/dfetch

Length of output: 140


🏁 Script executed:

# Get the exact content of line 56 in Dockerfile
sed -n '56p' .devcontainer/Dockerfile

Repository: dfetch-org/dfetch

Length of output: 172


🏁 Script executed:

# Get the full optional-dependencies section from pyproject.toml to see all defined extras
sed -n '/^\[project.optional-dependencies\]/,/^\[/p' pyproject.toml | head -30

Repository: dfetch-org/dfetch

Length of output: 931


🏁 Script executed:

# Read the complete optional-dependencies section from pyproject.toml
sed -n '/^\[project.optional-dependencies\]/,/^$/p' pyproject.toml

Repository: dfetch-org/dfetch

Length of output: 1625


🏁 Script executed:

# Verify that pip would fail with this unknown extra by checking pip behavior
# Also search Dockerfile for any other references to 'security'
grep -n "security" .devcontainer/Dockerfile

Repository: dfetch-org/dfetch

Length of output: 175


The security extra does not exist — this will break the devcontainer build.

The pyproject.toml defines these extras: development, docs, test, casts, build, sbom, wheel. There is no security extra. The pytm dependency was added to the docs extra, not a separate security extra.

Either remove security from this line, or add a [project.optional-dependencies].security section to pyproject.toml.

🐛 Proposed fix (remove non-existent extra)
-    && pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build,security] \
+    && pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build] \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
&& pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build,security] \
&& pip install --no-cache-dir --root-user-action=ignore -e .[development,docs,test,casts,build] \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.devcontainer/Dockerfile at line 56, The pip install extras list in the
devcontainer Dockerfile includes a non-existent "security" extra which will
break the build; either remove "security" from the pip install -e
.[development,docs,test,casts,build,security] invocation in the Dockerfile, or
add a proper [project.optional-dependencies].security section to pyproject.toml
that lists the intended packages (e.g., pytm) so the extra exists; update the
Dockerfile's pip extras or pyproject.toml's optional-dependencies to match the
actual extras (development, docs, test, casts, build, sbom, wheel).

Comment thread doc/_ext/pytm_directive.py Outdated
Comment on lines +134 to +138
# Mark this document as depending on the model file so Sphinx
# rebuilds it whenever the model changes.
env.note_dependency(model_path)

data = _get_model_data(model_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Track non-Python threat-model inputs in the dependency key.

run() and _get_model_data() only invalidate on model_path, but the rendered threats section is also sourced from security/threats.json. A JSON-only edit will keep incremental Sphinx builds and the in-memory cache serving stale threat/control content until a clean build.

Also applies to: 183-193

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/_ext/pytm_directive.py` around lines 134 - 138, The directive currently
only marks the Python model file as a dependency which leaves non-Python inputs
like security/threats.json untracked; update the dependency registration in
run() (and where _get_model_data() is called) to also call env.note_dependency()
for the threats JSON (and any other non-Python inputs used to render the
threats/controls section) so Sphinx will rebuild on JSON edits and avoid stale
in-memory cache content.

Comment on lines +369 to +371
spec = importlib.util.spec_from_file_location("_pytm_model_tmp", model_path)
mod = importlib.util.module_from_spec(spec) # type: ignore[arg-type]
spec.loader.exec_module(mod) # type: ignore[union-attr]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

Python stdlib docs: can importlib.util.spec_from_file_location()returnNone, and can ModuleSpec.loaderbeNonebeforeexec_module() is called?

💡 Result:

Yes, importlib.util.spec_from_file_location can return None. No, ModuleSpec.loader cannot be None before exec_module is called when using this function. spec_from_file_location(name, location, *, loader=None, submodule_search_locations=None) returns None if no suitable loader is found for the file location (i.e., the file suffix does not match any known loader suffixes like .py, .pyc, etc.). From the source code in Lib/importlib/_bootstrap_external.py: if loader is None: for loader_class, suffixes in _get_supported_file_loaders: if location.endswith(tuple(suffixes)): loader = loader_class(name, location) spec.loader = loader break else: return None If a loader is explicitly provided, or one is found automatically, it creates a ModuleSpec with that loader set, so spec.loader is not None. The official documentation describes it as a factory function that creates a ModuleSpec, with loader as an optional parameter (default None, which triggers auto-detection), and does not mention returning None except through the implementation for unknown suffixes. Secondary sources sometimes check "if spec is None" defensively, but the primary sources confirm it can happen for invalid file types. For ModuleSpec in general (from importlib.machinery.ModuleSpec), the docs state "The loader used to load the module. The finder should always set this attribute." When created by spec_from_file_location with a valid file, it follows this. Namespace packages have loader=None, but spec_from_file_location is for file-based modules, not namespaces. Before exec_module, the spec.loader is already set during spec creation, and exec_module uses it.

Citations:


🏁 Script executed:

# First, check file size and read the specific sections
wc -l doc/_ext/pytm_directive.py

Repository: dfetch-org/dfetch

Length of output: 92


🏁 Script executed:

# Read lines around 369-371
sed -n '360,380p' doc/_ext/pytm_directive.py

Repository: dfetch-org/dfetch

Length of output: 862


🏁 Script executed:

# Read lines around 667-677
sed -n '660,685p' doc/_ext/pytm_directive.py

Repository: dfetch-org/dfetch

Length of output: 978


🏁 Script executed:

# Check the imports and function signatures at the top of the file
head -50 doc/_ext/pytm_directive.py

Repository: dfetch-org/dfetch

Length of output: 1503


🏁 Script executed:

# Look for importlib usage and type hints
rg "importlib|module_from_spec" doc/_ext/pytm_directive.py -A 2 -B 2

Repository: dfetch-org/dfetch

Length of output: 410


🏁 Script executed:

# Check container/StringList imports and what nodes.section() returns
rg "container|StringList|nodes.section" doc/_ext/pytm_directive.py -B 2 -A 2 | head -60

Repository: dfetch-org/dfetch

Length of output: 666


🏁 Script executed:

# Let me check the typing module imports and see if there are any type stubs hints
rg "from typing|import typing" doc/_ext/pytm_directive.py

Repository: dfetch-org/dfetch

Length of output: 99


🏁 Script executed:

# Check what container.children actually is - look at docutils source or type hints
# Also check if there's a py.typed marker or stub files
find . -name "docutils" -o -name "py.typed" 2>/dev/null | head -10

Repository: dfetch-org/dfetch

Length of output: 43


🏁 Script executed:

# Let's look more broadly at the actual return/assignment patterns
# to understand what type container.children actually is
rg "children" doc/_ext/pytm_directive.py -B 2 -A 2

Repository: dfetch-org/dfetch

Length of output: 231


Remove the inline # type: ignore suppressions by handling the nullable import path and type mismatches explicitly.

The suppressions avoid real nullable edges and type mismatches:

  • spec_from_file_location() can return None for unsupported file types, requiring an explicit check before accessing spec.loader
  • container.children from docutils nodes is a list-like object that requires explicit conversion to list for type compatibility
Suggested fixes
     spec = importlib.util.spec_from_file_location("_pytm_model_tmp", model_path)
-    mod = importlib.util.module_from_spec(spec)  # type: ignore[arg-type]
-    spec.loader.exec_module(mod)  # type: ignore[union-attr]
+    if spec is None or spec.loader is None:
+        raise RuntimeError(f"Unable to load threat model module from {model_path}")
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
-    return container.children  # type: ignore[return-value]
+    return list(container.children)

Applies to lines 369–371 and 667–677.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
spec = importlib.util.spec_from_file_location("_pytm_model_tmp", model_path)
mod = importlib.util.module_from_spec(spec) # type: ignore[arg-type]
spec.loader.exec_module(mod) # type: ignore[union-attr]
spec = importlib.util.spec_from_file_location("_pytm_model_tmp", model_path)
if spec is None or spec.loader is None:
raise RuntimeError(f"Unable to load threat model module from {model_path}")
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/_ext/pytm_directive.py` around lines 369 - 371, The code currently uses
inline "# type: ignore" on the importlib spec and on a docutils container
children usage; instead explicitly handle nullable and type-mismatch cases:
after calling importlib.util.spec_from_file_location("_pytm_model_tmp",
model_path) check that the returned spec is not None and that spec.loader is
present (raise a clear ImportError or ValueError with context if absent) before
calling spec.loader.exec_module(mod), and call
importlib.util.module_from_spec(spec) only after that check; for the docutils
container, replace direct uses of container.children with an explicit list
conversion (e.g., children = list(container.children)) or appropriate typing
cast so the code no longer needs "# type: ignore" and types are explicit.

Comment on lines +75 to +77
- A manifest crafted with a malicious ``dst:`` path could attempt to write
files outside the project root (path-traversal). dfetch mitigates this with
``check_no_path_traversal()`` applied to every file copy.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Definition and call sites for check_no_path_traversal():"
rg -n -C2 '\bdef\s+check_no_path_traversal\b|\bcheck_no_path_traversal\s*\(' --type=py

echo
echo "Potential destination-path validation paths:"
rg -n -C2 '\bdst\b|destination|path_traversal' --type=py

Repository: dfetch-org/dfetch

Length of output: 50373


🏁 Script executed:

# Check Manifest.validate_destination() implementation
rg -n -A10 'def validate_destination' --type=py

Repository: dfetch-org/dfetch

Length of output: 883


🏁 Script executed:

# Check how _check_destination is called in the update flow
rg -n -B5 -A15 '_check_destination\(' dfetch/commands/update.py --type=py

Repository: dfetch-org/dfetch

Length of output: 2013


🏁 Script executed:

# Check where manifest destinations are actually used during copy/extract
rg -n -A5 'def _fetch_impl|def update' dfetch/project/gitsubproject.py dfetch/project/archivesubproject.py --type=py | head -80

Repository: dfetch-org/dfetch

Length of output: 1253


Narrow the claim about path-traversal validation scope.

The implementation shows check_no_path_traversal() is called once per manifest entry via Update._check_destination() (lines 89, 117–128 in dfetch/commands/update.py) to validate the destination path before extraction starts. This is a single pre-check, not a check "applied to every file copy." Additionally, archive symlinks are validated separately post-extraction in dfetch/vcs/archive.py:430.

The current phrasing overstates coverage. Rewrite lines 75–77 to reflect that the destination path itself is validated against traversal before extraction, and symlinks are validated after:

- A manifest crafted with a malicious ``dst:`` path could attempt to write
  files outside the project root (path-traversal). dfetch mitigates this by
  validating the destination path via ``check_no_path_traversal()`` before
  extraction and post-extraction symlink walks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@doc/explanation/security.rst` around lines 75 - 77, Update the text to narrow
the claim: change the sentence to state that dfetch validates the destination
path via check_no_path_traversal() before extraction (as performed in
Update._check_destination()) and that archive symlinks are validated
post-extraction (see archive symlink checks in dfetch/vcs/archive.py), rather
than implying the check runs on every individual file copy.

Comment thread security/threat_model.py Outdated
Comment thread security/threat_model.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants