feat(artifacts): define artifact storage types, config, and persistence layer#2801
Conversation
…ce layer Add new `artifacts` domain with ArtifactMeta struct, ArtifactKind/ArtifactStatus enums, filesystem-backed persistence under <workspace>/artifacts/<uuid>/, and three RPC endpoints (ai_list_artifacts, ai_get_artifact, ai_delete_artifact). Includes 38 unit tests covering CRUD operations, pagination, path-sandboxing enforcement, and corrupt metadata handling. Closes tinyhumansai#2776
📝 WalkthroughWalkthroughThis PR implements the artifact storage domain for OpenHuman, introducing typed artifact metadata (ArtifactMeta), filesystem-backed persistence with path validation, and three RPC endpoints (ai_list_artifacts, ai_get_artifact, ai_delete_artifact) that enable workspace artifact management with pagination, sorting, corruption resilience, and comprehensive test coverage. ChangesArtifact Storage Domain
Sequence Diagram(s)sequenceDiagram
participant Client
participant ops
participant store
participant filesystem
Client->>ops: ai_list_artifacts(offset, limit)
ops->>store: list_artifacts(workspace_dir, offset, limit)
store->>filesystem: read artifacts/ directory entries
store->>filesystem: read each meta.json
store->>store: sort by created_at descending
store-->>ops: (Vec<ArtifactMeta>, total)
ops-->>Client: JSON with artifacts, total, offset, limit
Client->>ops: ai_get_artifact(artifact_id)
ops->>store: get_artifact(workspace_dir, artifact_id)
store->>store: validate_artifact_id
store->>store: assert_within_root
store->>filesystem: read artifact/meta.json
store-->>ops: ArtifactMeta
ops->>ops: compute absolute_path
ops-->>Client: JSON with meta + absolute_path
Client->>ops: ai_delete_artifact(artifact_id)
ops->>store: delete_artifact(workspace_dir, artifact_id)
store->>store: validate and assert containment
store->>filesystem: remove artifact directory recursively
store-->>ops: success
ops-->>Client: JSON with artifact_id, deleted: true
sequenceDiagram
participant Client
participant schemas
participant ops
Client->>schemas: RPC request with function and params JSON
schemas->>schemas: load config with timeout
schemas->>schemas: parse/validate parameters
schemas->>ops: call ai_* operation
ops-->>schemas: RpcOutcome result
schemas->>schemas: convert to JSON
schemas-->>Client: JSON response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.claude/memory.md:
- Line 217: Update the note to correct the behavior and point to the canonical
policy: clarify that assert_within_root does not canonicalize but only checks
prefix containment (starts_with), remove the instruction to “replicate this
pattern” for new domains, and instead direct teams to use the centralized file
I/O validation in src/openhuman/security/policy.rs (i.e., call validate_path /
validate_parent_path) while keeping validate_artifact_id's two-layer checks as
an example for artifact IDs only.
In `@src/openhuman/artifacts/ops.rs`:
- Around line 63-69: The current derivation of absolute_path uses meta.path
directly which allows absolute paths or parent-traversal to escape the artifacts
root; update the logic that builds absolute_path (the code touching
absolute_path, config.workspace_dir, and meta.path) to first validate/sanitize
meta.path: reject or canonicalize if Path::new(&meta.path).is_absolute() and
reject or disallow any Path::new(&meta.path).components() containing
Component::ParentDir, or alternatively rebuild a safe relative PathBuf by
filtering components to only Normal components before joining; if validation
fails return an error instead of joining, then join the sanitized relative path
to config.workspace_dir.join("artifacts") and convert to string as before.
In `@src/openhuman/artifacts/schemas.rs`:
- Around line 87-92: The output schema for get_artifact currently uses
TypeSchema::Ref("ArtifactMeta") but the comment claims an extra absolute_path
field; update the contract to match by replacing the output FieldSchema type
with an explicit schema that includes ArtifactMeta plus absolute_path (either by
changing the type to Ref("ArtifactMetaWithPath") and adding that new type
definition, or by using an inline TypeSchema::Object that spreads ArtifactMeta
fields and adds absolute_path: TypeSchema::String required: true); ensure the
FieldSchema name stays "artifact" and adjust any schema registry entries that
reference ArtifactMeta if you introduce ArtifactMetaWithPath.
In `@src/openhuman/artifacts/store.rs`:
- Around line 26-57: The validator validate_artifact_id currently allows the
single-dot "." which leads to delete_artifact calling remove_dir_all on the
artifacts root; update validate_artifact_id to reject "." (and also reject "./"
and ".\\" to be safe) by returning an Err with a clear message (same format as
other errors), so delete_artifact and any other callers cannot use "." as an
artifact_id and accidentally operate on the whole artifacts tree.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 37ca5f18-69fc-4ea8-ad37-543ac0921a5a
📒 Files selected for processing (10)
.claude/memory.mdsrc/core/all.rssrc/openhuman/artifacts/mod.rssrc/openhuman/artifacts/ops.rssrc/openhuman/artifacts/ops_tests.rssrc/openhuman/artifacts/schemas.rssrc/openhuman/artifacts/store.rssrc/openhuman/artifacts/store_tests.rssrc/openhuman/artifacts/types.rssrc/openhuman/mod.rs
|
|
||
| - **Filesystem-backed persistence, no SQLite** — `src/openhuman/artifacts/` stores JSON metadata (`meta.json`) + binary blobs under `<workspace_dir>/artifacts/<uuid>/`. Pattern mirrors `memory/ops/files.rs` but simpler. | ||
| - **`"ai"` namespace in controller registry** — RPC methods are `openhuman.ai_list_artifacts`, `openhuman.ai_get_artifact`, `openhuman.ai_delete_artifact`. Future `ai_*` methods should use this same namespace. | ||
| - **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain. |
There was a problem hiding this comment.
Correct the security guidance and assert_within_root description.
This line is currently misleading in two ways: assert_within_root here does not canonicalize paths (it checks starts_with), and advising teams to “replicate this pattern” conflicts with the documented policy that file I/O validation must go through src/openhuman/security/policy.rs (validate_path / validate_parent_path). Please update this note to avoid steering future domains toward a non-policy pattern.
Suggested doc fix
-- **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain.
+- **Artifacts path safety (domain-specific)** — this domain currently uses (1) `validate_artifact_id` input checks and (2) `assert_within_root` containment checks before filesystem operations. Note: `assert_within_root` is a containment check (not canonicalization). For new filesystem-backed domains, follow the security policy entry points in `src/openhuman/security/policy.rs` (`validate_path` / `validate_parent_path`).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - **Two-layer path validation required** — (1) `validate_artifact_id` rejects empty strings, `/`, `\`, `..`, absolute Unix paths, Windows `C:` and UNC `\\` paths; (2) `assert_within_root` canonicalizes and checks containment. Replicate this pattern for any new filesystem-backed domain. | |
| - **Artifacts path safety (domain-specific)** — this domain currently uses (1) `validate_artifact_id` input checks and (2) `assert_within_root` containment checks before filesystem operations. Note: `assert_within_root` is a containment check (not canonicalization). For new filesystem-backed domains, follow the security policy entry points in `src/openhuman/security/policy.rs` (`validate_path` / `validate_parent_path`). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.claude/memory.md at line 217, Update the note to correct the behavior and
point to the canonical policy: clarify that assert_within_root does not
canonicalize but only checks prefix containment (starts_with), remove the
instruction to “replicate this pattern” for new domains, and instead direct
teams to use the centralized file I/O validation in
src/openhuman/security/policy.rs (i.e., call validate_path /
validate_parent_path) while keeping validate_artifact_id's two-layer checks as
an example for artifact IDs only.
| // Compute absolute path for the caller's convenience. | ||
| let absolute_path = config | ||
| .workspace_dir | ||
| .join("artifacts") | ||
| .join(&meta.path) | ||
| .to_string_lossy() | ||
| .into_owned(); |
There was a problem hiding this comment.
Harden absolute_path derivation against unsafe meta.path.
meta.path is used directly in join(...). If it is absolute (or contains parent traversal), the computed absolute_path can escape the artifacts root.
Suggested fix
+ use std::path::{Component, Path};
+
+ let rel = Path::new(&meta.path);
+ if rel.is_absolute()
+ || rel
+ .components()
+ .any(|c| matches!(c, Component::ParentDir | Component::RootDir | Component::Prefix(_)))
+ {
+ return Err(format!(
+ "[artifacts] invalid artifact path for id={artifact_id}: {:?}",
+ meta.path
+ ));
+ }
+
+ let artifacts_root = config.workspace_dir.join("artifacts");
let absolute_path = config
- .workspace_dir
- .join("artifacts")
+ .workspace_dir
+ .join("artifacts")
.join(&meta.path)
.to_string_lossy()
.into_owned();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Compute absolute path for the caller's convenience. | |
| let absolute_path = config | |
| .workspace_dir | |
| .join("artifacts") | |
| .join(&meta.path) | |
| .to_string_lossy() | |
| .into_owned(); | |
| // Compute absolute path for the caller's convenience. | |
| use std::path::{Component, Path}; | |
| let rel = Path::new(&meta.path); | |
| if rel.is_absolute() | |
| || rel | |
| .components() | |
| .any(|c| matches!(c, Component::ParentDir | Component::RootDir | Component::Prefix(_))) | |
| { | |
| return Err(format!( | |
| "[artifacts] invalid artifact path for id={artifact_id}: {:?}", | |
| meta.path | |
| )); | |
| } | |
| let artifacts_root = config.workspace_dir.join("artifacts"); | |
| let absolute_path = config | |
| .workspace_dir | |
| .join("artifacts") | |
| .join(&meta.path) | |
| .to_string_lossy() | |
| .into_owned(); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/ops.rs` around lines 63 - 69, The current derivation
of absolute_path uses meta.path directly which allows absolute paths or
parent-traversal to escape the artifacts root; update the logic that builds
absolute_path (the code touching absolute_path, config.workspace_dir, and
meta.path) to first validate/sanitize meta.path: reject or canonicalize if
Path::new(&meta.path).is_absolute() and reject or disallow any
Path::new(&meta.path).components() containing Component::ParentDir, or
alternatively rebuild a safe relative PathBuf by filtering components to only
Normal components before joining; if validation fails return an error instead of
joining, then join the sanitized relative path to
config.workspace_dir.join("artifacts") and convert to string as before.
| outputs: vec![FieldSchema { | ||
| name: "artifact", | ||
| ty: TypeSchema::Ref("ArtifactMeta"), | ||
| comment: "Artifact metadata plus absolute_path field.", | ||
| required: true, | ||
| }], |
There was a problem hiding this comment.
get_artifact output schema does not encode the documented absolute_path field.
The schema advertises artifact as ArtifactMeta, but the comment says it includes an extra absolute_path. That makes discovery/docs contract ambiguous for clients. Please model the extra field explicitly (or remove that claim if not returned).
Suggested schema shape
- outputs: vec![FieldSchema {
- name: "artifact",
- ty: TypeSchema::Ref("ArtifactMeta"),
- comment: "Artifact metadata plus absolute_path field.",
- required: true,
- }],
+ outputs: vec![FieldSchema {
+ name: "artifact",
+ ty: TypeSchema::Object {
+ fields: vec![
+ FieldSchema {
+ name: "meta",
+ ty: TypeSchema::Ref("ArtifactMeta"),
+ comment: "Artifact metadata.",
+ required: true,
+ },
+ FieldSchema {
+ name: "absolute_path",
+ ty: TypeSchema::String,
+ comment: "Resolved absolute file path for the artifact.",
+ required: true,
+ },
+ ],
+ },
+ comment: "Artifact metadata plus resolved absolute path.",
+ required: true,
+ }],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| outputs: vec![FieldSchema { | |
| name: "artifact", | |
| ty: TypeSchema::Ref("ArtifactMeta"), | |
| comment: "Artifact metadata plus absolute_path field.", | |
| required: true, | |
| }], | |
| outputs: vec![FieldSchema { | |
| name: "artifact", | |
| ty: TypeSchema::Object { | |
| fields: vec![ | |
| FieldSchema { | |
| name: "meta", | |
| ty: TypeSchema::Ref("ArtifactMeta"), | |
| comment: "Artifact metadata.", | |
| required: true, | |
| }, | |
| FieldSchema { | |
| name: "absolute_path", | |
| ty: TypeSchema::String, | |
| comment: "Resolved absolute file path for the artifact.", | |
| required: true, | |
| }, | |
| ], | |
| }, | |
| comment: "Artifact metadata plus resolved absolute path.", | |
| required: true, | |
| }], |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/schemas.rs` around lines 87 - 92, The output schema
for get_artifact currently uses TypeSchema::Ref("ArtifactMeta") but the comment
claims an extra absolute_path field; update the contract to match by replacing
the output FieldSchema type with an explicit schema that includes ArtifactMeta
plus absolute_path (either by changing the type to Ref("ArtifactMetaWithPath")
and adding that new type definition, or by using an inline TypeSchema::Object
that spreads ArtifactMeta fields and adds absolute_path: TypeSchema::String
required: true); ensure the FieldSchema name stays "artifact" and adjust any
schema registry entries that reference ArtifactMeta if you introduce
ArtifactMetaWithPath.
| fn validate_artifact_id(id: &str) -> Result<(), String> { | ||
| if id.is_empty() { | ||
| return Err("[artifacts] artifact_id must not be empty".to_string()); | ||
| } | ||
| if id.contains('/') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not contain '/': {id:?}" | ||
| )); | ||
| } | ||
| if id.contains('\\') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not contain '\\': {id:?}" | ||
| )); | ||
| } | ||
| if id == ".." || id.starts_with("../") || id.starts_with("..\\") { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be a path traversal: {id:?}" | ||
| )); | ||
| } | ||
| // Reject absolute paths (Unix /foo or Windows C:\foo / \\server\share) | ||
| if id.starts_with('/') || id.starts_with('\\') { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be an absolute path: {id:?}" | ||
| )); | ||
| } | ||
| // Reject Windows drive-letter paths like C: | ||
| if id.len() >= 2 && id.as_bytes()[1] == b':' { | ||
| return Err(format!( | ||
| "[artifacts] artifact_id must not be an absolute path: {id:?}" | ||
| )); | ||
| } | ||
| Ok(()) |
There was a problem hiding this comment.
Reject . as an artifact ID to prevent full-root deletion.
validate_artifact_id allows ".". That value flows into delete_artifact, where remove_dir_all(<workspace>/artifacts/.) can delete the entire artifacts tree.
Suggested fix
fn validate_artifact_id(id: &str) -> Result<(), String> {
if id.is_empty() {
return Err("[artifacts] artifact_id must not be empty".to_string());
}
+ if id == "." || id == ".." {
+ return Err(format!(
+ "[artifacts] artifact_id must not be a reserved path component: {id:?}"
+ ));
+ }
if id.contains('/') {
return Err(format!(
"[artifacts] artifact_id must not contain '/': {id:?}"
));
}
@@
- if id == ".." || id.starts_with("../") || id.starts_with("..\\") {
+ if id.starts_with("../") || id.starts_with("..\\") {
return Err(format!(
"[artifacts] artifact_id must not be a path traversal: {id:?}"
));
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/openhuman/artifacts/store.rs` around lines 26 - 57, The validator
validate_artifact_id currently allows the single-dot "." which leads to
delete_artifact calling remove_dir_all on the artifacts root; update
validate_artifact_id to reject "." (and also reject "./" and ".\\" to be safe)
by returning an Err with a clear message (same format as other errors), so
delete_artifact and any other callers cannot use "." as an artifact_id and
accidentally operate on the whole artifacts tree.
oxoxDev
left a comment
There was a problem hiding this comment.
Author: @graycyrus (
MEMBER— core team)
Substantial new subsystem — artifact persistence layer with proper structure, tests, and schema definitions. Core architecture is sound but one critical security issue + three majors need addressing before merge. coderabbit caught all four; surfacing here to gate the CHANGES_REQUESTED.
Blocker — verified against current code
🔴 src/openhuman/artifacts/store.rs:26-57 — validate_artifact_id allows "." → full artifacts-root deletion
Coderabbit's finding is correct and I traced it end-to-end:
validate_artifact_idonly rejects..,../,..\\—.passesdelete_artifact(workspace, ".")→root.join(".")=rootitselfassert_within_root(root, root.join("."))→starts_withcheck passes (PathBuf normalizes"./"to justroot)tokio::fs::remove_dir_all(&artifact_dir)wipes the entire artifacts tree
Reproducer: ai_delete_artifact(&config, ".") would delete every artifact in the workspace.
Required fix — reject . explicitly + simplify the .. check (the standalone-equal .. case is already covered by the new arm):
fn validate_artifact_id(id: &str) -> Result<(), String> {
if id.is_empty() {
return Err("[artifacts] artifact_id must not be empty".to_string());
}
if id == "." || id == ".." {
return Err(format!(
"[artifacts] artifact_id must not be a reserved path component: {id:?}"
));
}
if id.contains('/') { /* unchanged */ }
if id.contains('\\') { /* unchanged */ }
if id.starts_with("../") || id.starts_with("..\\") {
return Err(format!(
"[artifacts] artifact_id must not be a path traversal: {id:?}"
));
}
// rest unchanged
}Also add a regression test:
#[test]
fn validate_artifact_id_rejects_dot() {
assert!(validate_artifact_id(".").is_err());
}
#[tokio::test]
async fn delete_artifact_rejects_dot_does_not_wipe_root() {
let tmp = tempfile::tempdir().unwrap();
save_artifact_meta(tmp.path(), &dummy_meta("real-id")).await.unwrap();
assert!(delete_artifact(tmp.path(), ".").await.is_err());
// root + sibling artifact still exist
assert!(tmp.path().join("artifacts/real-id/meta.json").exists());
}Major — verified
🟠 src/openhuman/artifacts/ops.rs:63-69 — absolute_path derivation allows escape via meta.path
meta.path is joined directly without validation:
let absolute_path = config.workspace_dir.join("artifacts").join(&meta.path)If a stored ArtifactMeta.path is absolute or contains .. (a corrupted/malicious meta.json that survived disk inspection), absolute_path escapes the artifacts root and is returned to the caller. The caller may then pass it to file-read tools.
Coderabbit's suggested fix (reject is_absolute() + Component::ParentDir / RootDir / Prefix) is correct. Apply it.
🟠 src/openhuman/artifacts/schemas.rs:87-92 — get_artifact schema contract mismatch
Output schema advertises artifact as Ref("ArtifactMeta") but the doc-comment + code adds an absolute_path field. Discovery clients won't know about the extra field. Coderabbit's TypeSchema::Object shape fix is right.
🟠 .claude/memory.md:217 — security doc steers future domains away from security/policy.rs
The memory note advises future authors to "replicate this pattern" for new filesystem-backed domains. That conflicts with the documented policy that file I/O validation should route through src/openhuman/security/policy.rs::validate_path_within_root (verified at line 2030). The artifacts module reimplements containment checks locally instead of delegating — that's a defensible per-module choice, but the memory note shouldn't propagate it as the recommended pattern.
Coderabbit's doc rewrite (clarify the local pattern is domain-specific; new domains should use policy.rs entry points) is correct.
Stronger suggestion — consider whether this PR should also be migrated to use validate_path_within_root:
// replace assert_within_root with the policy-canonical entry point
let resolved = crate::openhuman::security::policy::validate_path_within_root(
&artifact_dir, &root
)?;validate_path_within_root canonicalizes both paths before starts_with, which closes the symlink-escape gap that assert_within_root (pure starts_with) doesn't catch. If <workspace>/artifacts/<id> is a symlink pointing outside the workspace, this PR's check passes; the canonical check would catch it. Worth at minimum a comment explaining why symlink resolution was deemed unnecessary if the local check stays.
Minor / nitpicks
src/openhuman/artifacts/store.rs:233—_assert_status_useddead-code helper feels like a smell; ifArtifactStatusis only referenced in tests, mark the type#[cfg_attr(not(test), allow(dead_code))]or move the test-only usage into a#[cfg(test)]block that uses it directly.src/openhuman/artifacts/store.rs:18—artifacts_rootcallscreate_dir_allon every invocation including read paths (list_artifacts,get_artifact). Mildly wasteful + races on first concurrent call. Splitting intoartifacts_root_create()(write paths) vsartifacts_root_existing()(read paths) would be cleaner.
Verified / looks good
- Module structure (
types/store/ops/schemas/mod) is clean and matches openhuman conventions - Pagination (
offset/limitwithMAX_LIMIT=200) follows established RPC patterns - Corrupt-meta skipping with
log::warn!is the right call (vs propagating) ArtifactKind::parse/ArtifactStatus::parsecorrectly fall back toOther/Pendingon unknown values- Test scaffolding present (
store_tests.rs,ops_tests.rs) - 1214 lines all-new with no deletions — clean greenfield
- coderabbit CHANGES_REQUESTED captures all the security issues
Recommendation
CHANGES_REQUESTED — primarily on the critical . deletion vector. Once that + the two security majors (ops absolute_path, memory doc) land, this is approve-ready.
Summary
artifactsdomain (src/openhuman/artifacts/) withArtifactMetastruct,ArtifactKindenum (Presentation, Document, Image, Other), andArtifactStatusenum (Pending, Ready, Failed)<workspace>/artifacts/<uuid>/meta.json+ binary blobs with lazy directory creationai_list_artifacts(paginated),ai_get_artifact(metadata + absolute path),ai_delete_artifact(removes metadata + blob)Test plan
cargo checkpassescargo test -p openhuman --lib -- "artifacts"— 38/38 tests passcargo fmt --checkcleanChecklist
[artifacts]prefix on all operationsCloses #2776
Summary by CodeRabbit
New Features