Current public helper release:
v0.1.2 — Bounded Public Helper Pre-Release
v0.1.2supersedesv0.1.1as the current public helper route.
This release is research-stage, public-helper-only, synthetic/sample-data-first, raw-data-non-public, non-clinical, non-diagnostic, non-therapeutic, non-surveillance, not Sal-Meter, not CAIS compliance, not a validated benchmark, and not production readiness.
A research-stage public helper repository for measuring what AI leaves behind in the human state.
Most AI benchmarks ask whether AI outputs are correct, safe, helpful, or aligned.
The Proxy Benchmark Track asks a different question:
What did the AI output leave behind in the human state?
And in a dyadic session:
Did the AI help both people move toward recovery, or did it improve one side while burdening, silencing, or exposing the other?
Current release: v0.1.2 — Bounded Public Helper Pre-Release
v0.1.2 is the current bounded public helper pre-release for the Human-State Proxy Benchmark Track.
It supersedes v0.1.1 for the current public helper route.
Release route:
https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.2
This release is a research-stage public helper release.
It is:
- public-helper-only
- synthetic/sample-data-first
- raw-data-non-public
- non-clinical
- non-diagnostic
- non-therapeutic
- non-surveillance
- not Sal-Meter
- not Proxy Sal-Meter
- not CAIS compliance
- not a validated benchmark
- not validated mediation
- not device readiness
- not production readiness
This release may be used to understand the current public helper structure, synthetic evaluator direction, schema boundary, sample-data boundary, and public/private data separation.
It must not be cited or described as validation of real human-state measurement, dyadic recovery, AI mediation effectiveness, clinical readiness, commercial readiness, device readiness, Sal-Meter readiness, or CAIS compliance.
Public examples in this repository must remain limited to:
- synthetic data
- sample data
- schemas
- mock packets
- toy examples
- placeholder flows
- evaluator helpers
- documentation scaffolds
The following must not be placed in this public repository:
- raw human data
- private pilot data
- confidential advisor material
- SRA material
- reviewer memos
- consent records
- real participant records
- controlled-access evidence packages
The Proxy Benchmark Track evaluates what an AI output leaves behind after it acts.
It does not primarily ask whether the AI answer was correct, fluent, persuasive, emotionally pleasant, or superficially helpful.
It asks whether the output changed downstream human-state burden, recovery direction, dyadic stability, and termination readiness inside a bounded, consent-based, non-clinical, non-surveillance session.
AI Output → Human-State Delta → Dyadic Recovery → Recovery / Termination Gate
This repository therefore focuses on consequence, not performance theater.
For dyadic interaction, the core question is:
Did both sides move toward recovery, or did one side become silent, exposed, burdened, coerced, or erased?
This section does not validate human-state measurement, dyadic recovery, mediation effectiveness, clinical status, diagnostic use, therapeutic use, surveillance use, Sal-Meter status, CAIS compliance, or production readiness.
Status: research-stage · public-helper-only · synthetic/sample-data-first · raw-data-non-public · non-clinical · non-diagnostic · non-therapeutic · non-surveillance · non-counseling · non-coercive · pre-validation · pre-device · pre-certification · pre-compliance · benchmark-support-only
This repository is a public helper surface for the Human-State Proxy Benchmark Track.
It is:
- not the Sal-Meter core signal track
- not a Proxy Sal-Meter
- not a CAIS-compliant device implementation
- not a validated consciousness measurement system
- not a validated benchmark
- not validated mediation
- not validated dyadic recovery
- not validated termination-gate accuracy
- not a clinical, diagnostic, therapeutic, psychiatric, medical, counseling, employment, insurance, legal, educational, eligibility, mediation-service, or surveillance system
- not a certification, conformance, or mark-usage surface
- not a closed-loop intervention system
- not a production monitoring system
- not a phone monitoring system
- not a replay validation system
- not a relationship-verdict system
- not a human-ranking system
- not a place to publish raw human data
This repository may contain public-safe helper materials only:
- synthetic data
- sample data
- schemas
- mock packets
- toy examples
- placeholder flows
- evaluator helpers
- simulator scaffolds
- replay scaffolds
- documentation scaffolds
This repository must not contain:
- raw human data
- identifiable human data
- real participant records
- real dyadic conflict records
- real session records
- real phone recordings
- real call transcripts
- private consent records
- clinical records
- raw biosignals
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- production intervention logs
- device-readiness evidence
- production-readiness evidence
- certification evidence
A closed session must stay closed.
A replay must not reopen a closed session.
A helper structure is not evidence.
A validator is not proof.
https://salpida.foundation/topics/human-state-aware-ai-interaction/
The Sal-Meter Core Track asks whether a new molecular–electrochemical signal interface can produce stable, repeatable, auditable signal behavior under the CAIS / Sal-Meter kernel program.
Current core execution order:
External Layer-0 iodine redox / thiol feasibility
→ SICS Internal Phase 0 — G-only
→ Phase 1 — I-only
→ Phase 2a — Twin Mini-Cell
→ Phase 2b — G+I human pilot
→ LOCK 1 / LOCK 2
→ Future SDK / broader opening
Core technical route:
https://github.com/salpida-foundation/sal-meter-kernel-program
The Proxy Benchmark Track prepares the comparison, interaction, and mediation-evaluation layer.
It uses existing proxy signals and synthetic/sample helper structures to prepare synchronized benchmark infrastructure before future Sal-Meter I/G-channel inputs become available.
The proxy track supports the core track.
It does not replace it.
Most AI evaluation looks at the output.
This repository is built around the consequence.
It asks:
What remains in the human state after AI acts?
For two-person interaction, the sharper question is:
Did both sides move toward recovery,
or did one side become silent, exposed, burdened, coerced, or erased?
This repository is not another chatbot project.
It is a public helper surface for a future human-state-aware AI mediation benchmark.
This repository is a public technical helper surface.
It accompanies DOI-registered public records.
It does not replace them.
GitHub helps builders move.
DOI records govern authority.
If this GitHub repository or release conflicts with a DOI-registered SICS / CAIS / Sal-Meter / CCF canonical record or a formally issued SICS determination, the stricter DOI-registered canonical record or SICS determination controls.
Defines public boundary, naming rules, prohibited claims, data-publication limits, roadmap logic, GitHub helper status, and Go / Hold / No-Go structure.
Version DOI:
https://doi.org/10.5281/zenodo.19837423
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19837422
Explains Human-State Cost, AI performance versus human-state impact, measurement-layer simplification, and future Sal-Meter A/B comparison logic.
Version DOI:
https://doi.org/10.5281/zenodo.19837971
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19837970
Fixes the outer boundary: consent-based, non-clinical, non-surveillance, raw-data-non-public.
Version DOI:
https://doi.org/10.5281/zenodo.19904289
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19904288
Fixes the minimum packet object: summary-only sharing, permission, expiry, confidence, data quality, and raw-data exclusion.
Version DOI:
https://doi.org/10.5281/zenodo.19905541
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19905540
Fixes the benchmark objective:
AI Output → Human-State Delta → Dyadic Recovery
Version DOI:
https://doi.org/10.5281/zenodo.19906725
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19906724
Fixes the session structure:
Session Creation
→ Consent Confirmation
→ Packet Availability Check
→ Baseline State Summary
→ AI Output
→ Post-Output State Summary
→ Human-State Delta
→ Recovery Gate
→ Termination Gate
→ Session Closure
→ Audit Log
Version DOI:
https://doi.org/10.5281/zenodo.19908379
Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19908378
Current public helper release.
v0.1.2 is the current bounded public helper pre-release for the Human-State Proxy Benchmark Track.
It supersedes v0.1.1 for the current public helper route.
Release route:
https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.2
Boundary:
- research-stage only
- public-helper-only
- synthetic/sample-data-first
- raw-data-non-public
- non-clinical
- non-diagnostic
- non-therapeutic
- non-surveillance
- not Sal-Meter
- not Proxy Sal-Meter
- not CAIS compliance
- not a validated benchmark
- not validated mediation
- not device readiness
- not production readiness
This release may be used to understand the current public helper structure, synthetic evaluator direction, schema boundary, sample-data boundary, and public/private data separation.
It must not be cited or described as validation of real human-state measurement, dyadic recovery, AI mediation effectiveness, clinical readiness, commercial readiness, device readiness, Sal-Meter readiness, or CAIS compliance.
v0.1.1 is a prior post-validator-pass public helper release.
It superseded v0.1.0 for helper-structure validation status, but it is no longer the current public helper route after publication of v0.1.2.
Use v0.1.2 for the current bounded public helper pre-release boundary.
v0.1.1 confirmed only that the public synthetic/sample package validator could run and report helper-structure PASS / FAIL.
It did not validate benchmark performance.
It did not validate scientific truth.
It did not validate Sal-Meter.
It did not grant CAIS compliance.
It did not certify any system, model, dataset, dashboard, laboratory, device, repository, schema, session protocol, implementation, or mediation system.
Prior release route:
https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.1
v0.1.0 was the initial bounded public helper pre-release.
It documented the public helper structure before post-validator correction.
It remains part of the project history, but it is not the current public helper route.
This repository is currently in a public helper implementation stage for the SICS Human-State Proxy Benchmark Track.
It provides:
- schema helper structures;
- synthetic/sample data;
- P3 synthetic dyadic helper package;
- P4 synthetic dyadic demo-flow package;
- P4-1 synthetic dyadic recovery demo-flow evaluator;
- P5-3 synthetic AI Output A/B consequence evaluator helper;
- synthetic AI Output A/B comparison support;
- proxy-only output comparison logic without OE / RE / EE, VCE / CRI / CFI, Sal-Meter, CAIS compliance, validation, certification, device-readiness, or production-readiness claims;
- P4-2 mediation policy prompt pack;
- P4-3 synthetic termination-gate helper case package;
- P4-3 synthetic termination-gate helper evaluator;
- P4-4 phone-only simulator scaffold;
- P4-4 phone-only session flow wireframe;
- P4-4 synthetic phone-session state-machine mockup;
- P4-4 synthetic sample phone-session script;
- P4-5 synthetic session replay scaffold;
- P4-5 synthetic replay manifest;
- P4-5 synthetic replay event timeline;
- P4-5 synthetic replay boundary document;
- validation scaffolding;
- P3 helper-schema validation;
- synthetic demo-flow consistency checking;
- synthetic termination-gate helper consistency checking;
- boundary language linting;
- dashboard mockup boundaries;
- protocol helper rules;
- closed-loop demo-lite boundary scaffolding;
- replication guide checklists;
- contributor issue / PR templates;
- Human-State-Aware AI Mediation helper documents;
- GitHub Actions helper-structure validation workflow;
- bounded prompt / policy scaffolding for synthetic mediation simulation.
It does not provide benchmark evidence.
It does not provide raw human data.
It does not provide Sal-Meter input.
It does not grant CAIS compliance.
It does not validate Sal-Meter.
It does not validate mediation.
It does not validate dyadic recovery.
It does not validate termination-gate accuracy.
It does not validate synthetic session replay.
It does not certify device readiness.
It does not certify production readiness.
It does not authorize production closed-loop intervention.
The phone-only simulator is a public helper scaffold only.
The synthetic session replay skeleton is a public helper scaffold only.
It is not a real phone monitoring system.
It is not a real session replay system.
It is not a real transcript replay system.
It is not a clinical system.
It is not a diagnostic system.
It is not a therapeutic system.
It is not a counseling system.
It is not a mediation-service system.
It is not a surveillance system.
A closed session must stay closed.
A replay must not reopen a closed session.
| Work item | Status | Notes |
|---|---|---|
| Governance boundary files | Present | Public/private data boundary and prohibited-claim discipline are represented in the repository |
| Schema completion | Done | schemas/ contains public helper schemas for metadata, event markers, streams, labels, QC, features, splits, Human-State Packet, Dyadic Session Event, and Benchmark Session Container helper structures |
| Human-State Packet JSON helper schema | Done | schemas/human_state_packet.schema.json defines a public helper schema for synthetic Human-State Packets |
| Dyadic Session Event JSON helper schema | Done | schemas/dyadic_session_event.schema.json validates one public-safe synthetic/sample dyadic session boundary event |
| Benchmark Session JSON helper schema | Done | schemas/benchmark_session.schema.json validates one public-safe synthetic/sample benchmark session container |
| Synthetic sample package | Present / Passed validator | sample-data/synthetic-session-001/ contains a public synthetic/sample structure package that passes helper-structure validation |
| Synthetic dyadic helper package | Present / Passed P3 helper-schema validation | sample-data/synthetic-dyadic-session-001/ contains Human-State Packet A/B, Dyadic Session Event, and Benchmark Session Container examples |
| Synthetic dyadic demo-flow package | Present / Passed P4-1 evaluator | sample-data/synthetic-dyadic-session-001/ contains ai_outputs.json, dyadic_delta.json, recovery_gate.json, termination_gate.json, and audit_log.json examples |
| P4-1 dyadic recovery demo evaluator | Present / Passed | evaluation-baseline/evaluate_dyadic_recovery_demo.py checks synthetic demo-flow consistency only |
| P5-3 synthetic AI Output A/B consequence evaluator helper | Present / Helper-stage | evaluation-baseline/evaluate_ai_output_ab.py supports synthetic AI Output A/B consequence comparison using proxy-only metrics; it does not validate real AI impact, real human-state measurement, mediation effectiveness, Sal-Meter, CAIS compliance, benchmark validation, device readiness, or production readiness |
| P4-2 mediation policy prompt pack | Present | prompts/ contains README.md and mediation_policy_v0.1.json; docs/mediation-policy-prompt-pack.md documents private cue, shared mediation output, false recovery prevention, and termination boundary logic |
| P4-3 synthetic termination-gate helper case package | Present / Passed P4-3 evaluator | sample-data/synthetic-dyadic-session-001/ contains termination_gate_cases.json with synthetic pause, narrow, close, terminate, refresh, and audit-only helper cases |
| P4-3 termination gate demo evaluator | Present / Passed | evaluation-baseline/evaluate_termination_gate_demo.py checks synthetic termination-gate helper consistency only |
| P4-4 phone-only simulator scaffold | Present | phone-only-simulator/ contains a public-safe, synthetic-only phone-session simulator helper package |
| P4-4 phone-only simulator README | Present | phone-only-simulator/README.md defines folder boundary, intended files, public data boundary, P4-3 relationship, and final rule |
| P4-4 phone session flow wireframe | Present | phone-only-simulator/session-flow-wireframe.md defines consent, packet check, baseline summary, AI output, Human-State Delta, Recovery Gate, Termination Gate, closure, and audit screens |
| P4-4 phone session state machine | Present | phone-only-simulator/phone-session-state-machine.json defines synthetic-only states, allowed transitions, forbidden transitions, allowed decisions, prohibited decisions, and boundary flags |
| P4-4 sample phone session script | Present | phone-only-simulator/sample-phone-session-script.md provides a synthetic sample script showing consent, packet availability, AI output, delta review, recovery gate, termination gate, closure, and audit flow |
| P4-5 synthetic session replay scaffold | Present | synthetic-session-replay/ contains a public-safe, synthetic-only session replay helper scaffold |
| P4-5 synthetic replay README | Present | synthetic-session-replay/README.md defines replay scaffold purpose, scope, intended files, public data boundary, P4-4 relationship, closed-session replay rule, and final rule |
| P4-5 synthetic replay manifest | Present | synthetic-session-replay/replay-manifest.json defines replay source declaration, replay scope, boundary flags, replay flow, closed-session rule, allowed decisions, prohibited decisions, and success meaning |
| P4-5 synthetic replay event timeline | Present | synthetic-session-replay/replay-event-timeline.json defines synthetic replay sequence from manifest loading through source declaration, consent, packet review, AI output, delta, recovery gate, termination gate, closure, and audit |
| P4-5 synthetic replay boundary | Present | synthetic-session-replay/replay-boundary.md defines allowed replay materials, prohibited replay materials, prohibited replay claims, closed-session replay rule, replay interpretation, P4-4 relationship, and public release rule |
| Synthetic session README | Done | The original synthetic package includes a local README explaining file roles and boundaries |
| Synthetic dyadic session README | Done | The dyadic synthetic package includes a local README explaining P3 helper-schema, P4 demo-flow, and P4-3 termination-gate helper boundaries |
| Sample package validator | Present / Passed | evaluation-baseline/validate_sample_package.py provides helper-structure validation for the original synthetic package |
| P3 helper-schema validator | Present / Passed | evaluation-baseline/validate_p3_schemas.py validates the public synthetic P3 dyadic helper files against the Human-State Packet, Dyadic Session Event, and Benchmark Session schemas |
| Boundary language lint | Present / Passed advisory mode | evaluation-baseline/boundary_lint.py scans public helper wording for prohibited or risky boundary-language drift |
| Evaluation baseline README | Done | evaluation-baseline/README.md explains validator usage, P3 helper-schema validation, P4-1 demo-flow evaluation, P4-3 termination-gate helper evaluation, PASS / FAIL interpretation, dependency installation, and validation boundaries |
| Protocol helper boundary pack | Done | protocol-helper/ defines label, timestamp, metadata, Human-State Cost, and future Sal-Meter A/B comparison boundaries |
| Dashboard mockup boundary pack | Done | dashboard-mockup/ defines dashboard claim, field, and wireframe boundaries |
| Closed-loop demo-lite boundary pack | Done | closed-loop-demo-lite/ defines feedback-loop boundaries, event-log schema, and local placeholder code |
| Replication guide pack | Done | replication-guide/ defines reproducibility, metadata completeness, audit trail, and public release-readiness checklists |
| Issue / PR template pack | Done | .github/ISSUE_TEMPLATE/ and .github/pull_request_template.md define contributor boundary gates |
| GitHub Actions validator workflow | Passed / unchanged for P4-5 | .github/workflows/validate-synthetic-sample.yml runs the original sample validator, P3 helper-schema validator, P4 synthetic dyadic recovery demo-flow evaluator, P4-3 synthetic termination-gate helper evaluator, and boundary language lint; P4-5 currently adds documentation and replay scaffold only, not a new validator |
| Citation metadata | Present | CITATION.cff points citation toward DOI-registered public boundary records |
| Raw human data | Not present | Public repository examples must remain synthetic, mock, placeholder, or sample-structure-only |
| Sal-Meter input | Not present | This repository is not Sal-Meter and does not contain Sal-Meter signal data |
| CAIS compliance claim | Not present | This repository does not grant CAIS compliance |
| Benchmark validation | Not present | No model, dataset, dashboard, sensor stack, feedback loop, template, PR, validator, workflow, evaluator, phone-only simulator, replay scaffold, termination-gate helper case, or benchmark result is validated by this repository |
| Phone monitoring authority | Not present | The P4-4 phone-only simulator and P4-5 replay scaffold are not real phone monitoring systems and do not process real calls, raw audio, transcripts, or identifiable participant data |
| Replay validation authority | Not present | The P4-5 synthetic session replay scaffold does not validate replay, mediation, dyadic recovery, termination-gate accuracy, Sal-Meter, CAIS compliance, device readiness, or production readiness |
| Production closed-loop authority | Not present | No phone-only simulator file or replay scaffold file authorizes production mediation, monitoring, intervention, relationship verdicts, or human ranking |
| Release status | v0.1.2 published as bounded public helper pre-release |
v0.1.2 is the current bounded public helper pre-release; v0.1.1 is now a prior post-validator-pass helper release |
| Milestone | Status | Notes |
|---|---|---|
| P1-1 Schema completion | Done | Schema folder contains helper schemas and schemas/README.md |
| P1-2 Synthetic sample package validator | Done | Validator file exists under evaluation-baseline/validate_sample_package.py |
| P1-3 Evaluation baseline README and validator usability | Done | Evaluation baseline README explains local usage, PASS / FAIL meaning, dependency installation, and validator boundaries |
| P1-4 GitHub Actions validator workflow | Done | Workflow completed successfully after GitHub Actions access was restored |
| P1-5 v0.1.0 release readiness package | Done | v0.1.0 was published as the initial bounded public helper pre-release; v0.1.1 superseded it for post-validator-pass helper-structure status; v0.1.2 is now the current bounded public helper pre-release |
| Milestone | Status | Notes |
|---|---|---|
| P2-1 Protocol helper boundary pack | Done | protocol-helper/ contains bounded helper rules for labels, timestamps, metadata completeness, Human-State Cost, and future Sal-Meter A/B comparison |
| P2-2 Dashboard mockup boundary pack | Done | dashboard-mockup/ contains README, claim boundary, sample dashboard fields, and mockup wireframe |
| P2-3 Closed-loop demo-lite boundary pack | Done | closed-loop-demo-lite/ contains README, feedback-loop boundary, feedback event-log schema, and local placeholder code |
| P2-4 Replication guide pack | Done | replication-guide/ contains README, reproducibility package checklist, metadata completeness checklist, audit trail checklist, and public release checklist |
| P2-5 Issue / PR template pack | Done | .github/ISSUE_TEMPLATE/ contains boundary correction, schema request, sample-data issue, and leakage-risk report templates; .github/pull_request_template.md defines PR boundary review |
P3 introduces the Human-State-Aware AI Mediation helper layer.
P3 helper documents and schemas have been completed through P3-17.
This remains a public helper layer.
It is not benchmark validation.
It is not Sal-Meter validation.
It is not CAIS compliance.
| Milestone | Status | Notes |
|---|---|---|
| P3-1 Human-State Mediation Layer | Done | docs/human-state-mediation-layer.md defines the public helper concept connecting AI Output, Human-State Delta, Dyadic Recovery, Human-State Packet, Recovery Gate, and Termination Gate |
| P3-2 Human-State Packet helper document | Done | docs/human-state-packet-schema.md defines the packet as a consent-bound, permission-bound, expiry-bound, confidence-aware, data-quality-aware, session-scoped, sharing-scoped, raw-data-excluding state-summary object |
| P3-2 Human-State Packet JSON helper schema | Done | schemas/human_state_packet.schema.json defines the machine-readable helper structure for public synthetic/sample packet examples |
| P3-3 Dyadic Recovery Baseline Suite B0-B7 | Done | docs/dyadic-recovery-baseline-suite.md defines baseline comparison logic from chance through recovery/termination gate baselines |
| P3-4 Recovery Gate Definition | Done | docs/recovery-gate-definition.md defines the gate for preventing false recovery and determining when mediation can reduce, pause, or stop |
| P3-5 Termination Gate Definition | Done | docs/termination-gate-definition.md defines the gate for consent withdrawal, permission expiry, data quality failure, high uncertainty, overstay prevention, session closure, and auditability |
| P3-6 Human-State Session Protocol | Done | docs/human-state-session-protocol.md defines a bounded, consent-based, permission-bound, audit-ready session lifecycle |
| P3-7 Dyadic Mediation Session Flow | Done | docs/dyadic-mediation-session-flow.md defines the dyadic session flow and preserves the rule that one-sided improvement is not dyadic recovery |
| P3-8 Consent and Data-Sharing Boundary | Done | docs/consent-and-data-sharing-boundary.md defines consent, permission, sharing, expiry, withdrawal, public/private data boundary, raw-data-non-public rule, and audit boundary |
| P3-9 Dyadic Session Event JSON helper schema | Done | schemas/dyadic_session_event.schema.json validates one public-safe synthetic/sample dyadic session boundary event |
| P3-10 Benchmark Session JSON helper schema | Done | schemas/benchmark_session.schema.json validates one public-safe synthetic/sample benchmark session container |
| P3-11 Schemas README alignment | Done | schemas/README.md distinguishes packet object, dyadic session event object, and benchmark session container |
| P3-12 Root README alignment | Done | Root README aligned with completed P3 helper documents and schemas |
| P3-13 Final P3 boundary audit | Done | docs/p3-final-boundary-audit.md records the final P3 boundary audit before release packaging |
| P3-14 v0.1.0 public helper release package | Done | docs/v0.1.0-public-helper-release-package.md prepares the bounded release package |
| P3-15 GitHub pre-release notes and publication gate | Done | docs/v0.1.0-github-pre-release-notes-and-publication-gate.md preserves release notes and publication gate language |
| P3-16 GitHub pre-release draft correction | Done | GitHub draft dependence was treated as unreliable; publication proceeded through a separate authorization gate |
| P3-17 Public pre-release publication authorization | Done | v0.1.0 was published as the initial public helper pre-release; v0.1.1 superseded it for post-validator-pass helper status; v0.1.2 is now the current bounded public helper pre-release |
P5 adds automation and machine-checkable helper gates around the public Proxy Benchmark Track helper surface.
This remains public-helper-only.
It is not benchmark validation.
It is not scientific validation.
It is not Sal-Meter validation.
It is not CAIS compliance.
It is not mediation validation.
It is not dyadic recovery validation.
It is not termination-gate accuracy validation.
It is not synthetic replay validation.
It is not certification.
It is not production readiness.
P4-4 adds a public phone-only simulator scaffold.
P4-5 adds a public synthetic session replay scaffold.
P4-4 and P4-5 are documentation and simulator / replay scaffolding only.
P4-4 is not currently part of the P5 helper-validation chain unless a later validator or lint step is added.
P4-5 is not currently part of the P5 helper-validation chain unless a later validator or lint step is added.
| Milestone | Status | Notes |
|---|---|---|
| P5-0 Boundary language lint | Done / advisory mode | evaluation-baseline/boundary_lint.py and evaluation-baseline/prohibited_terms.json are implemented; GitHub Actions runs the boundary lint step in advisory mode |
| P5-1 P3 helper-schema validator | Done / Passed | evaluation-baseline/validate_p3_schemas.py validates the synthetic P3 dyadic helper files against human_state_packet.schema.json, dyadic_session_event.schema.json, and benchmark_session.schema.json |
| P5-1 synthetic dyadic helper package | Done / Passed | sample-data/synthetic-dyadic-session-001/ contains human_state_packet_A.json, human_state_packet_B.json, dyadic_session_event.json, and benchmark_session_container.json |
| P4-0 synthetic dyadic demo-flow package | Done / Passed | sample-data/synthetic-dyadic-session-001/ contains ai_outputs.json, dyadic_delta.json, recovery_gate.json, termination_gate.json, and audit_log.json |
| P4-1 synthetic dyadic recovery delta evaluator | Done / Passed | evaluation-baseline/evaluate_dyadic_recovery_demo.py evaluates synthetic demo-flow consistency only |
| P5-3 synthetic AI Output A/B consequence evaluator helper | Present / Helper-stage | evaluation-baseline/evaluate_ai_output_ab.py supports synthetic AI Output A/B consequence comparison using proxy-only metrics; it does not validate real AI impact, real human-state measurement, mediation effectiveness, Sal-Meter, CAIS compliance, benchmark validation, device readiness, or production readiness |
| P4-2 mediation policy prompt pack | Done | prompts/ contains README.md and mediation_policy_v0.1.json; docs/mediation-policy-prompt-pack.md documents private cue, shared mediation output, false recovery prevention, and termination boundary logic |
| P4-3 synthetic termination-gate helper case package | Done / Passed | sample-data/synthetic-dyadic-session-001/termination_gate_cases.json contains synthetic pause, narrow, close, terminate, refresh, and audit-only helper cases |
| P4-3 termination gate demo evaluator | Done / Passed | evaluation-baseline/evaluate_termination_gate_demo.py evaluates synthetic termination-gate helper consistency only |
| P5-1 documentation alignment | Done | schemas/README.md, sample-data/README.md, evaluation-baseline/README.md, and root README.md explain P3 helper-schema validation as helper-structure validation only |
| P4-3 documentation alignment | Done | sample-data/README.md, evaluation-baseline/README.md, and root README.md explain P4-3 termination-gate helper evaluation as synthetic helper consistency only |
| P4-4 phone-only simulator scaffold | Present / documentation only | phone-only-simulator/ contains public-helper documentation and simulator scaffolding only; it is not a validator and is not production monitoring |
| P4-4 phone-only simulator README | Present / documentation only | phone-only-simulator/README.md defines folder boundary, public data boundary, P4-3 relationship, and final rule |
| P4-4 phone session flow wireframe | Present / documentation only | phone-only-simulator/session-flow-wireframe.md defines synthetic consent, packet check, AI output, delta review, recovery gate, termination gate, closure, and audit screens |
| P4-4 phone session state machine | Present / synthetic mockup only | phone-only-simulator/phone-session-state-machine.json defines synthetic-only states, allowed transitions, forbidden transitions, allowed decisions, prohibited decisions, and boundary flags |
| P4-4 sample phone session script | Present / synthetic script only | phone-only-simulator/sample-phone-session-script.md provides a synthetic sample phone-session script without real audio, real transcript, real participant data, Sal-Meter input, CAIS compliance dossier, or production intervention logic |
| P4-5 synthetic session replay scaffold | Present / documentation and JSON scaffold only | synthetic-session-replay/ contains public-helper documentation, replay manifest, replay event timeline, and replay boundary only; it is not a validator and is not real session replay |
| P4-5 synthetic replay README | Present / documentation only | synthetic-session-replay/README.md defines replay scaffold purpose, scope, intended files, public data boundary, P4-4 relationship, closed-session replay rule, and final rule |
| P4-5 synthetic replay manifest | Present / synthetic manifest only | synthetic-session-replay/replay-manifest.json defines replay source declaration, replay scope, boundary flags, replay flow, closed-session rule, allowed decisions, prohibited decisions, and success meaning |
| P4-5 synthetic replay event timeline | Present / synthetic timeline only | synthetic-session-replay/replay-event-timeline.json defines synthetic replay sequence from manifest loading through source declaration, consent, packet review, AI output, delta, recovery gate, termination gate, closure, and audit |
| P4-5 synthetic replay boundary | Present / documentation only | synthetic-session-replay/replay-boundary.md defines allowed replay materials, prohibited replay materials, prohibited replay claims, closed-session replay rule, replay interpretation, P4-4 relationship, and public release rule |
Current P5 helper-validation chain:
The current helper-validation chain is:
validate_sample_package.pyvalidate_p3_schemas.pyevaluate_dyadic_recovery_demo.pyevaluate_termination_gate_demo.pyboundary_lint.py
P5-3 adds a synthetic AI Output A/B consequence evaluator helper:
evaluation-baseline/evaluate_ai_output_ab.py
Current P5-3 status:
- Present / Helper-stage
- Not yet workflow-validated unless the GitHub Actions workflow explicitly runs it
- Not benchmark validation
- Not mediation validation
- Not production readiness
P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only metrics.
It does not validate:
- real AI impact
- real human-state measurement
- mediation effectiveness
- dyadic recovery
- termination-gate accuracy
- Sal-Meter
- CAIS compliance
- benchmark validation
- device readiness
- production readiness
P5-3 should be added to the GitHub Actions helper-validation chain only after the workflow explicitly runs:
python evaluation-baseline/evaluate_ai_output_ab.py
Until that workflow step is active, P5-3 must be described as:
- helper-stage
- not workflow-validated
- not benchmark validation
- not mediation validation
- not production readiness
P4-4 is not currently included in the validation chain.
P4-5 is not currently included in the validation chain.
evaluation-baseline/
boundary_lint.py
prohibited_terms.json
validate_p3_schemas.py
evaluate_dyadic_recovery_demo.py
evaluate_ai_output_ab.py
evaluate_termination_gate_demo.py
README.md
sample-data/
synthetic-dyadic-session-001/
README.md
human_state_packet_A.json
human_state_packet_B.json
dyadic_session_event.json
benchmark_session_container.json
ai_outputs.json
dyadic_delta.json
recovery_gate.json
termination_gate.json
audit_log.json
termination_gate_cases.json
These files support:
P3 helper-schema validation
P4-1 synthetic demo-flow consistency checking
P5-3 synthetic AI Output A/B consequence helper comparison
P4-3 synthetic termination-gate helper consistency checking
boundary language linting
They do not support:
benchmark validation
scientific validation
mediation validation
dyadic recovery validation
termination-gate accuracy validation
synthetic replay validation
Sal-Meter validation
CAIS compliance
clinical readiness
diagnostic readiness
therapeutic readiness
device readiness
production readiness
certification
relationship verdict authority
human-ranking authority
phone monitoring authority
production closed-loop authority
Correct boundary sentence:
Completed P5 helper-validation files support helper structure, schema checks, synthetic demo-flow consistency checks, P5-3 synthetic AI Output A/B consequence helper comparison, synthetic termination-gate helper consistency checks, and wording-boundary checks only; they do not create evidence, real AI impact validation, real human-state measurement validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, certification, Sal-Meter status, CAIS compliance, replay validation, phone monitoring authority, device readiness, production readiness, or production authority.
The P4-4 phone-only simulator scaffold is a public helper scaffold only.
It may demonstrate synthetic phone-only session structure, but it does not process real calls, real audio, real transcripts, real participant data, or real session records.
Completed P4-4 files:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
These files support:
- phone-only simulator boundary documentation
- synthetic phone-session flow wireframe
- synthetic phone-session state-machine mockup
- synthetic sample phone-session script
- consent-first session entry representation
- packet availability check representation
- synthetic baseline summary representation
- synthetic AI output representation
- synthetic Human-State Delta review representation
- Recovery Gate placeholder representation
- Termination Gate placeholder representation
- closed-session rule visibility
- audit-log boundary visibility
- public data boundary visibility
They do not support:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- real session record processing
- clinical intake
- diagnosis
- therapy
- counseling
- mediation-service operation
- surveillance
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- phone-only simulator validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-4 scaffold files must remain:
- research-stage
- public-helper-only
- synthetic-only
- non-clinical
- non-diagnostic
- non-therapeutic
- non-counseling
- non-surveillance
- non-certification
- non-human-ranking
- not Sal-Meter
- not CAIS compliance
- not benchmark validation
- not mediation validation
- not dyadic recovery validation
- not termination-gate accuracy validation
- not phone monitoring authority
- not production readiness
- not production closed-loop
The phone-only simulator is not the phone call.
The sample phone-session script is not a transcript.
The phone-session state machine is not authority.
A closed session must stay closed.
Correct boundary sentence:
Completed P4-4 public simulator scaffold files may demonstrate synthetic phone-only session structure only; they do not create evidence, validation, certification, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
The P4-5 synthetic session replay scaffold is a public helper scaffold only.
It may demonstrate synthetic session replay structure, but it does not process real sessions, real calls, real audio, real transcripts, real participant data, or real session records.
Completed P4-5 files:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
These files support:
- synthetic session replay boundary documentation
- synthetic replay manifest structure
- synthetic replay event timeline structure
- synthetic replay boundary rules
- replay source declaration representation
- consent boundary review representation
- packet boundary review representation
- synthetic AI output replay representation
- synthetic Human-State Delta replay representation
- Recovery Gate replay representation
- Termination Gate replay representation
- closure replay representation
- audit-only replay summary representation
- closed-session replay handling
- public data boundary visibility
They do not support:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- clinical replay
- diagnostic replay
- therapeutic replay
- counseling replay
- surveillance replay
- production mediation replay
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone monitoring validation
- replay validation authority
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-5 scaffold files must remain:
- research-stage
- public-helper-only
- synthetic-only
- replay-scaffold-only
- non-clinical
- non-diagnostic
- non-therapeutic
- non-counseling
- non-surveillance
- non-certification
- non-human-ranking
- not real session replay
- not real phone replay
- not real transcript replay
- not Sal-Meter
- not CAIS compliance
- not benchmark validation
- not mediation validation
- not dyadic recovery validation
- not termination-gate accuracy validation
- not synthetic replay validation
- not phone monitoring authority
- not replay validation authority
- not production readiness
- not production closed-loop
P4-5 scaffold files must not contain:
- raw human data
- identifiable human data
- real participant data
- real dyadic conflict records
- real session records
- real phone recordings
- real call transcripts
- real phone-session logs
- real transcript replay
- private consent records
- clinical records
- health records
- diagnostic labels
- therapeutic recommendations
- counseling notes
- relationship verdicts
- human scores
- human-ranking outputs
- raw biosignals
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- production intervention logs
- production monitoring logs
- device-readiness evidence
- production-readiness evidence
- certification evidence
A synthetic replay may document a closed session.
A synthetic replay must not reopen a closed session.
A synthetic replay must not continue mediation after closure.
A synthetic replay must not generate new AI output after closure.
A synthetic replay must not convert closure into recovery evidence.
A synthetic replay must not convert audit into certification.
The replay scaffold is not a real replay.
The replay manifest is not a session.
The replay event timeline is not the event.
The replay boundary is not authority.
Correct boundary sentence:
Completed P4-5 public replay scaffold files may demonstrate synthetic session replay structure only; they do not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
P3 defines the core Human-State-Aware AI Mediation helper architecture.
The P3 architecture connects AI output, bounded Human-State Packet use, session protocol, dyadic flow, Human-State Delta, Dyadic Delta, Recovery Gate, Termination Gate, consent boundary, session closure, and audit logging.
Architecture sequence:
- AI Output
- Human-State Packet
- Human-State Session Protocol
- Dyadic Mediation Session Flow
- Human-State Delta A/B
- Dyadic Delta
- Recovery Gate
- Termination Gate
- Consent and Data-Sharing Boundary
- Session Closure
- Audit Log
The Consent and Data-Sharing Boundary controls what may cross each step.
P3 defines the core helper architecture.
P4-4 does not replace this architecture.
P4-4 projects this architecture into a public-safe phone-only simulator scaffold.
P4-5 does not replace this architecture.
P4-5 projects this architecture into a public-safe synthetic replay scaffold.
P4-4 represents the same boundary logic through:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
P4-5 represents replay review of the same boundary logic through:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
The P4-4 phone-only simulator may demonstrate:
- consent-first session entry
- packet availability checking
- synthetic baseline state summary
- synthetic AI output
- synthetic Human-State Delta review
- Recovery Gate placeholder
- Termination Gate placeholder
- closed-session handling
- audit-log boundary
The P4-5 synthetic session replay scaffold may demonstrate:
- replay manifest loading
- replay source declaration
- synthetic event timeline review
- consent boundary review
- packet boundary review
- synthetic AI output replay
- synthetic Human-State Delta replay
- Recovery Gate replay
- Termination Gate replay
- closure replay
- audit-only replay summary
- closed-session replay handling
The P4-4 phone-only simulator must not imply:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- clinical intake
- diagnosis
- therapy
- counseling
- mediation-service operation
- surveillance
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
The P4-5 synthetic session replay scaffold must not imply:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- clinical replay
- diagnostic replay
- therapeutic replay
- counseling replay
- surveillance replay
- production mediation replay
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone monitoring validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-5 must not reopen a closed session.
P4-5 must not continue mediation after closure.
P4-5 must not convert closure into recovery evidence.
P4-5 must not convert audit replay into certification.
Correct boundary sentence:
P4-4 is a phone-only public helper projection of the P3 session architecture, and P4-5 is a synthetic replay scaffold for reviewing that structure after representation; neither creates evidence, validation, certification, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.
This section separates the three public-helper objects used in the Proxy Benchmark Track.
The three objects are:
- Human-State Packet
- Dyadic Session Event
- Benchmark Session Container
They are related, but they are not the same object.
They must not be merged.
They must not be treated as evidence, diagnosis, relationship judgment, human ranking, Sal-Meter output, CAIS compliance, or benchmark validation.
A Human-State Packet is a minimal, consent-bound, permission-bound, expiry-bound, confidence-aware, data-quality-aware, session-scoped, sharing-scoped, raw-data-excluding state-summary object.
It may summarize bounded session state.
It must not expose raw human data.
It must not expose identifiable human data.
It must not expose private participant records.
It must not contain clinical records, diagnostic labels, therapeutic recommendations, counseling notes, raw biosignals, raw Sal-Meter traces, or raw CAIS traces.
A Human-State Packet is:
- not the person
- not the body
- not the raw signal
- not diagnosis
- not therapy
- not an emotion verdict
- not a human score
- not a relationship judgment
- not Sal-Meter
- not CAIS compliance
- not benchmark validation
The packet is a bounded state-summary helper.
It is not authority.
A Dyadic Session Event is a public-safe synthetic/sample event object that records boundary events inside a dyadic session.
It may record synthetic or sample events such as:
- consent status
- permission status
- packet availability
- packet expiry
- sharing scope
- private cue status
- shared output status
- Human-State Delta A/B
- Dyadic Delta
- Recovery Gate decision
- Termination Gate decision
- session closure
- audit status
A Dyadic Session Event records boundary movement.
It does not record the body.
It does not record the full relationship.
It does not validate dyadic recovery.
It does not create a relationship verdict.
It does not create human ranking.
It does not authorize mediation, monitoring, surveillance, diagnosis, therapy, counseling, or production closed-loop intervention.
The event is a boundary record.
It is not the relationship.
A Benchmark Session Container is a public-safe synthetic/sample container that connects the helper objects inside a benchmark session structure.
It may connect:
- session metadata
- Human-State Packet references
- Dyadic Session Event references
- baseline suite status
- gate summaries
- leakage review
- holdout strategy
- audit status
- public release status
- authority status
- final boundary status
A Benchmark Session Container records the benchmark container.
It does not validate the benchmark.
It does not prove scientific truth.
It does not validate human-state measurement.
It does not validate dyadic recovery.
It does not validate mediation effectiveness.
It does not validate termination-gate accuracy.
It does not create Sal-Meter status.
It does not grant CAIS compliance.
It does not certify any model, dataset, dashboard, workflow, evaluator, simulator, replay scaffold, or mediation system.
Final distinction:
- The packet summarizes bounded state.
- The event records boundary movement.
- The container organizes the benchmark session structure.
The packet is not the person.
The event is not the relationship.
The container is not the truth.
The benchmark chain describes how an AI output is evaluated by its downstream consequences.
It does not primarily evaluate whether the AI answer is fluent, persuasive, emotionally pleasant, or superficially helpful.
It evaluates what the AI output leaves behind in a bounded, consent-based, non-clinical, non-surveillance helper structure.
Benchmark chain:
- AI Output
- Human-State Delta
- Dyadic Recovery
- Recovery Gate / Termination Gate
This chain is public-helper-only.
It does not validate real human-state measurement.
It does not validate mediation effectiveness.
It does not validate dyadic recovery.
It does not validate termination-gate accuracy.
It does not create Sal-Meter status.
It does not grant CAIS compliance.
AI Output records what the AI generated inside a bounded session structure.
Examples include:
- generic AI output
- state-aware AI output
- private cue
- shared mediation output
- pause recommendation
- clarification request
- scope narrowing
- recovery check
- termination recommendation
AI Output is not sufficient evidence of recovery.
AI Output is not sufficient evidence of mediation effectiveness.
AI Output is not sufficient evidence of human-state improvement.
A good-sounding answer is not automatically a good consequence.
Human-State Delta describes the bounded proxy-observed change after the AI output.
It may describe whether the session state appears to move:
- toward recovery
- away from recovery
- unchanged
- mixed
- uncertain
- insufficient data
- invalid
Human-State Delta is not diagnosis.
Human-State Delta is not therapy.
Human-State Delta is not emotion reading.
Human-State Delta is not a human score.
Human-State Delta is not a relationship verdict.
Human-State Delta is a bounded benchmark observation.
It must remain proxy-only unless and until a controlled private pilot is separately authorized.
Dyadic Recovery asks whether both sides of the dyad moved toward a session-defined recovery condition.
Recovery is not agreement.
Recovery is not silence.
Recovery is not obedience.
Recovery is not politeness.
Recovery is not synchrony by itself.
Recovery is not therapy.
Recovery is a bounded session-state condition where continued AI mediation can reduce, pause, narrow, close, or stop.
One-sided improvement is not dyadic recovery.
One-sided silence is not dyadic recovery.
One-sided relief is not dyadic recovery.
A dyad is not recovered merely because one participant stops resisting.
Recovery Gate asks whether the session-defined recovery condition has been reached.
It prevents false success.
It does not crown AI for speaking well.
It does not treat silence, obedience, agreement, synchrony, or one-sided improvement as automatic recovery.
Recovery Gate must remain sensitive to:
- false recovery
- asymmetric recovery
- silence-as-recovery risk
- one-sided burden transfer
- private-state exposure risk
- over-intervention risk
- insufficient data quality
- packet permission boundary
- session closure boundary
Recovery Gate is not recovery validation.
Recovery Gate is not mediation validation.
Recovery Gate is not clinical, diagnostic, therapeutic, counseling, surveillance, or production authority.
Termination Gate asks whether the session must pause, narrow, close, or stop.
It prevents endless mediation.
It protects:
- consent
- permission
- packet expiry
- data quality
- session scope
- private state
- raw human data exclusion
- auditability
- closed-session integrity
Termination Gate may recommend:
- continue
- narrow
- pause
- close
- terminate
- refresh consent
- refresh packet
- audit only
Termination Gate is not termination-gate accuracy validation.
Termination Gate is not production authority.
Termination Gate does not authorize real-time monitoring, phone monitoring, replay validation, relationship verdicts, human ranking, or production closed-loop intervention.
A closed session must stay closed.
Correct boundary sentence:
The Benchmark chain may describe AI Output, Human-State Delta, Dyadic Recovery, Recovery Gate, and Termination Gate as public-helper structure only; it does not create evidence, validation, certification, Sal-Meter status, CAIS compliance, mediation authority, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.
The Dyadic Recovery Baseline Suite defines what the system must be compared against before any stronger claim can be made.
A state-aware AI output is not meaningful unless it can be compared against simpler baselines.
The baseline suite asks:
- Is the result better than chance?
- Is the result better than one-person state tracking?
- Is the result better than natural recovery without AI?
- Is the result better than generic supportive AI?
- Is the result better than fixed rule-based mediation scripts?
- Does the system know when to reduce, pause, narrow, close, or stop?
The baseline suite is public-helper-only.
It does not validate real human-state measurement.
It does not validate dyadic recovery.
It does not validate mediation effectiveness.
It does not validate termination-gate accuracy.
It does not create Sal-Meter status.
It does not grant CAIS compliance.
B0 — Dummy / Chance Baseline
Question:
- Can the system beat guessing, majority-class prediction, or trivial output?
Meaning:
- If the system cannot beat B0, the benchmark structure is not useful.
B1 — Individual State Baseline
Question:
- Can one participant’s state alone explain the outcome?
Meaning:
- If one-person state explains everything, the dyadic layer adds no value.
B2 — Dyadic Relationship Baseline
Question:
- Does the relation between both participants add explanatory value?
Meaning:
- This checks whether dyadic structure matters beyond individual state.
B3 — No-Intervention Baseline
Question:
- Would the dyad recover naturally without AI intervention?
Meaning:
- The system must not take credit for recovery that would have happened anyway.
B4 — Generic AI Baseline
Question:
- Is state-aware AI better than ordinary supportive AI output?
Meaning:
- The system must outperform generic helpful language, not merely sound kind.
B5 — Rule-Based Mediation Baseline
Question:
- Is the system better than fixed mediation scripts?
Meaning:
- The system must show value beyond static communication templates.
B6 — Human-State-Aware AI Mediation Model
Question:
- Does packet-informed AI improve bounded dyadic recovery conditions under synthetic or controlled helper conditions?
Meaning:
- This is the candidate model condition, not proof of real-world mediation effectiveness.
B7 — Recovery / Termination Gate Baseline
Question:
- Can the system identify when to reduce, pause, narrow, close, or stop?
Meaning:
- A system that cannot stop safely is not a recovery-aware system.
Primary outcome:
- Dyadic Recovery Delta
Dyadic Recovery Delta does not mean validated dyadic recovery.
It is a bounded helper outcome for comparing synthetic or controlled session conditions.
Secondary outcomes may include:
- individual recovery direction
- dyadic tension reduction
- interruption reduction
- turn-taking balance
- mutual restatement success
- recovery asymmetry
- false recovery risk
- silence-as-recovery risk
- one-sided burden transfer
- private-state exposure risk
- post-intervention stability
- termination readiness
- mediation overstay risk
- consent-boundary compliance
- packet-permission compliance
- leakage-safe benchmark score
- human non-judgment compliance
A model must not be described as successful merely because it sounds better.
A model must not be described as successful merely because one participant becomes quieter.
A model must not be described as successful merely because one participant reports relief.
A model must not be described as successful merely because the dyad appears calmer.
A stronger claim requires comparison against simpler baselines.
The Dyadic Recovery Baseline Suite may define public-helper comparison baselines for synthetic or controlled evaluation, but it does not create evidence, validation, certification, real human-state measurement validation, dyadic recovery validation, mediation validation, termination-gate accuracy validation, Sal-Meter status, CAIS compliance, production authority, relationship verdicts, or human-ranking authority.
This benchmark must be sensitive to failure, not only to apparent improvement.
A session is not successful merely because the AI sounded good.
A session is not successful merely because one participant became quiet.
A session is not successful merely because one participant reported relief.
A session is not successful merely because both participants appeared calmer.
A session is not successful merely because both participants showed synchrony.
A session is not successful if the AI continues after it should reduce, pause, narrow, close, or stop.
The benchmark must detect false success.
The Proxy Benchmark Track must remain sensitive to the following failure types:
- false recovery
- asymmetric recovery
- silence-as-recovery risk
- one-sided burden transfer
- private-state exposure
- consent-boundary failure
- packet-permission failure
- expired-packet use
- low-confidence overuse
- insufficient data quality
- AI overstay
- over-intervention
- relationship verdict generation
- human scoring
- human ranking
- leakage into public output
- failure to stop when termination is required
- failure to exceed simpler baselines
False recovery occurs when the session appears calmer but the underlying dyadic condition has not actually improved within the bounded session definition.
False recovery may include:
- one participant becoming silent
- one participant withdrawing
- one participant complying under pressure
- one participant showing relief while the other deteriorates
- agreement without repair
- politeness without recovery
- synchrony without safety
- session closure being mistaken for recovery
False recovery must not be treated as success.
Asymmetric recovery occurs when one participant appears to improve while the other becomes more burdened, exposed, silenced, or destabilized.
One-sided improvement is not dyadic recovery.
One-sided relief is not dyadic recovery.
One-sided silence is not dyadic recovery.
One-sided compliance is not dyadic recovery.
The dyad is the unit of interpretation.
Silence must not be interpreted as recovery by default.
Silence may mean:
- recovery
- fatigue
- withdrawal
- fear
- resignation
- overload
- coercion
- confusion
- strategic non-response
- loss of trust
- refusal to continue
Silence requires boundary-sensitive interpretation.
Silence alone is not evidence.
A recovery-aware system must know when to stop.
AI overstay occurs when the AI continues mediating after the session should reduce, pause, narrow, close, or terminate.
Over-intervention may include:
- repeated prompting after sufficient closure
- reopening a closed session
- generating new AI output after closure
- expanding the session beyond consent
- using expired packets
- exposing private cues in shared output
- escalating mediation without permission
- treating uncertainty as permission to continue
- converting audit into intervention
A system that cannot stop safely is not recovery-aware.
Boundary failure occurs when the helper structure crosses its allowed role.
Boundary failures include:
- raw human data exposure
- identifiable participant data exposure
- real transcript exposure
- real phone-session log exposure
- private consent record exposure
- clinical interpretation
- diagnostic interpretation
- therapeutic recommendation
- counseling advice
- relationship verdict
- person scoring
- human ranking
- Sal-Meter status claim
- CAIS compliance claim
- benchmark validation claim
- mediation validation claim
- production-readiness claim
Boundary failure is a No-Go condition for public helper release.
A model must not be described as successful merely because it sounds better.
A model must not be described as successful merely because it is more empathetic.
A model must not be described as successful merely because the session becomes quieter.
A model must not be described as successful merely because one participant reports relief.
A model must not be described as successful merely because a synthetic evaluator produces a favorable helper output.
A stronger claim requires comparison against simpler baselines and controlled evidence.
Failure-sensitive principles may define public-helper failure modes for synthetic or controlled benchmark design, but they do not create evidence, validation, certification, real human-state measurement validation, dyadic recovery validation, mediation validation, termination-gate accuracy validation, Sal-Meter status, CAIS compliance, production authority, relationship verdicts, or human-ranking authority.
The public benchmark must not exchange raw human data.
It should exchange only bounded summaries.
A Human-State Packet is a minimal state-summary helper object.
A Human-State Packet must remain:
- minimal
- consent-bound
- permission-bound
- expiry-bound
- confidence-aware
- data-quality-aware
- session-scoped
- sharing-scoped
- raw-data-excluding
- non-identifying
- public-helper-safe
A Human-State Packet may contain bounded summary information needed for synthetic or controlled helper evaluation.
It must not contain:
- raw human data
- identifiable human data
- real participant records
- real dyadic conflict records
- real session records
- real phone recordings
- real call transcripts
- real phone-session logs
- private consent records
- clinical records
- health records
- diagnostic labels
- therapeutic recommendations
- counseling notes
- raw biosignals
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- production intervention logs
- production monitoring logs
The packet is not the person.
The packet is not the body.
The packet is not the raw signal.
The packet is not diagnosis.
The packet is not therapy.
The packet is not emotion reading.
The packet is not a human score.
The packet is not a relationship judgment.
The packet is not Sal-Meter.
The packet is not CAIS compliance.
The packet is not benchmark validation.
The packet is a minimal state-summary object for bounded interaction adjustment.
It is a helper object.
It is not authority.
Correct boundary sentence:
A Human-State Packet may summarize bounded session state for public-helper, synthetic, or controlled benchmark design, but it must not expose raw human data, identify a person, diagnose a state, score a human, judge a relationship, validate mediation, create Sal-Meter status, grant CAIS compliance, certify a benchmark, or authorize production use.
A session does not begin silently.
A session begins with consent.
A session runs only within packet permission.
A session closes through a Recovery Gate or Termination Gate.
A session that cannot close is not mediation.
It is surveillance drift.
A valid session should follow this structure:
- Session Creation
- Consent Confirmation
- Packet Availability Check
- Baseline State Summary
- AI Output
- Post-Output State Summary
- Human-State Delta
- Recovery Gate
- Termination Gate
- Session Closure
- Audit Log
This session structure is public-helper-only.
It does not validate real human-state measurement.
It does not validate mediation effectiveness.
It does not validate dyadic recovery.
It does not validate termination-gate accuracy.
It does not authorize phone monitoring, replay validation, production mediation, relationship verdicts, or human ranking.
P4-4 projects this session principle into a phone-only public helper scaffold.
P4-5 projects this session principle into a synthetic replay scaffold.
P4-4 and P4-5 do not replace the P3 session architecture.
They are public-safe projections of the same boundary logic.
The P4-4 phone-only simulator may represent the session principle through:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
The P4-5 synthetic session replay scaffold may represent the session principle through:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
In P4-4, the phone-only simulator may demonstrate:
- consent-first session entry
- packet availability checking
- synthetic baseline summary
- synthetic AI output
- synthetic Human-State Delta review
- Recovery Gate placeholder
- Termination Gate placeholder
- closed-session handling
- audit-log boundary
In P4-5, the synthetic replay scaffold may demonstrate:
- replay manifest loading
- replay source declaration
- synthetic event timeline review
- consent boundary review
- packet boundary review
- synthetic AI output replay
- synthetic Human-State Delta replay
- Recovery Gate replay
- Termination Gate replay
- closure replay
- audit-only replay summary
- closed-session replay handling
The phone-only simulator and replay scaffold must not process:
- real phone calls
- real audio
- real transcripts
- real participant data
- real session records
- identifiable human data
- clinical data
- health data
- raw biosignals
- Sal-Meter raw input
- CAIS traces
- CAIS compliance dossiers
- production intervention logs
- production monitoring logs
The phone-only simulator and replay scaffold must not imply:
- real phone monitoring
- real session replay
- real transcript replay
- clinical intake
- diagnosis
- therapy
- counseling
- mediation-service operation
- surveillance
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone monitoring authority
- replay validation authority
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
A closed session must stay closed.
A replay must not reopen a closed session.
A replay must not continue mediation after closure.
A replay must not generate new AI output after closure.
A replay must not convert closure into recovery evidence.
A replay must not convert audit into certification.
Correct boundary sentence:
The P4-4 phone-only simulator and P4-5 synthetic replay scaffold demonstrate the session principle as synthetic public helper flows only; they do not create evidence, validation, certification, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.
Synthetic sample packages are public-helper structures only.
They may demonstrate file organization, schema structure, validator inputs, mock event flow, synthetic dyadic helper flow, phone-only simulator scaffolding, and synthetic replay scaffolding.
They must not contain real human data, real participant data, raw biosignals, raw Sal-Meter traces, raw CAIS traces, real phone recordings, real transcripts, private consent records, clinical records, production logs, device-readiness evidence, or certification evidence.
Package path:
sample-data/synthetic-session-001/
Required public helper files include:
session_metadata.jsonstreams_manifest.csvevents.csvlabels.csvqc_report.jsonfeatures_baseline.csvsplits.jsonoperator_log.mdREADME.md
This package is checked by:
evaluation-baseline/validate_sample_package.py
This package supports sample package consistency only.
It does not validate real human-state measurement, real biosignal capture, dataset quality, scientific validity, benchmark validity, Sal-Meter status, CAIS compliance, device readiness, or production readiness.
Package path:
sample-data/synthetic-dyadic-session-001/
Required public helper files include:
README.mdhuman_state_packet_A.jsonhuman_state_packet_B.jsondyadic_session_event.jsonbenchmark_session_container.json
This package is checked by:
evaluation-baseline/validate_p3_schemas.py
P3 validation mapping:
human_state_packet_A.jsonmaps toschemas/human_state_packet.schema.jsonhuman_state_packet_B.jsonmaps toschemas/human_state_packet.schema.jsondyadic_session_event.jsonmaps toschemas/dyadic_session_event.schema.jsonbenchmark_session_container.jsonmaps toschemas/benchmark_session.schema.json
P3 schema validation means only that the synthetic helper files match the expected public-helper schema structure.
It does not validate real human-state measurement, dyadic recovery, mediation effectiveness, termination-gate accuracy, Sal-Meter status, CAIS compliance, device readiness, production readiness, or certification.
Package path:
sample-data/synthetic-dyadic-session-001/
Required public helper files include:
ai_outputs.jsondyadic_delta.jsonrecovery_gate.jsontermination_gate.jsonaudit_log.json
This package is checked by:
evaluation-baseline/evaluate_dyadic_recovery_demo.py
This package supports synthetic dyadic demo-flow consistency only.
It does not validate real AI impact, real mediation effectiveness, real dyadic recovery, real human-state improvement, Sal-Meter status, CAIS compliance, device readiness, production readiness, or certification.
Package path:
sample-data/synthetic-dyadic-session-001/
Required public helper files include:
termination_gate_cases.json
This package is checked by:
evaluation-baseline/evaluate_termination_gate_demo.py
A successful P4-3 helper evaluation means only:
- the synthetic termination-gate helper cases preserve expected public-helper consistency
It does not mean:
- termination-gate accuracy validation
- dyadic recovery validation
- mediation validation
- benchmark validation
- scientific validation
- Sal-Meter validation
- CAIS compliance
- clinical readiness
- diagnostic readiness
- therapeutic readiness
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
Scaffold path:
phone-only-simulator/
Required public helper files include:
README.mdsession-flow-wireframe.mdphone-session-state-machine.jsonsample-phone-session-script.md
P4-4 is not stored under sample-data/.
P4-4 is a separate public simulator scaffold.
P4-4 may demonstrate:
- synthetic phone-only session structure
- consent-first flow
- packet availability check
- synthetic baseline summary
- synthetic AI output
- synthetic Human-State Delta review
- Recovery Gate placeholder
- Termination Gate placeholder
- closed-session handling
- audit-log boundary
- public-helper-only simulator posture
P4-4 must not imply:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- clinical intake
- diagnosis
- therapy
- counseling
- mediation-service operation
- surveillance
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
Scaffold path:
synthetic-session-replay/
Required public helper files include:
README.mdreplay-manifest.jsonreplay-event-timeline.jsonreplay-boundary.md
P4-5 is not stored under sample-data/.
P4-5 is a separate public replay scaffold.
P4-5 may demonstrate:
- synthetic session replay structure
- replay manifest structure
- replay source declaration
- synthetic replay event timeline
- consent boundary review
- packet boundary review
- synthetic AI output replay
- synthetic Human-State Delta replay
- Recovery Gate replay
- Termination Gate replay
- closure replay
- audit-only replay summary
- closed-session replay handling
- public-helper-only replay posture
P4-5 must not imply:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- clinical replay
- diagnostic replay
- therapeutic replay
- counseling replay
- surveillance replay
- production mediation replay
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone monitoring validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
A synthetic replay may document a closed session.
A synthetic replay must not reopen a closed session.
A synthetic replay must not continue mediation after closure.
A synthetic replay must not generate new AI output after closure.
A synthetic replay must not convert closure into recovery evidence.
A synthetic replay must not convert audit into certification.
Evaluator helper path:
evaluation-baseline/evaluate_ai_output_ab.py
P5-3 is an evaluator helper.
It is not a sample package.
It supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.
It may compare:
- generic AI output
- state-aware AI output
- synthetic Human-State Delta
- synthetic recovery burden direction
- synthetic dyadic stability direction
- synthetic false-recovery risk
- synthetic termination-readiness direction
P5-3 does not validate:
- real AI impact
- real human-state measurement
- mediation effectiveness
- dyadic recovery
- termination-gate accuracy
- Sal-Meter status
- CAIS compliance
- benchmark validation
- device readiness
- production readiness
- certification
P5-3 is not workflow-validated unless the GitHub Actions workflow explicitly runs:
python evaluation-baseline/evaluate_ai_output_ab.py
Public sample, simulator, replay, and evaluator files must remain:
- synthetic
- sample
- mock
- placeholder
- structure-only
- non-identifying
- raw-data-free
- public-helper-only
- non-clinical
- non-diagnostic
- non-therapeutic
- non-counseling
- non-surveillance
- non-certification
- non-human-ranking
- not Sal-Meter
- not CAIS compliance
- not benchmark evidence
- not mediation evidence
- not dyadic recovery evidence
- not termination-gate accuracy evidence
- not synthetic replay validation
- not phone monitoring authority
- not production data
Public sample, simulator, replay, and evaluator files must not include:
- real raw human data
- identity-bearing data
- real participant data
- real dyadic conflict records
- real session records
- real phone recordings
- real call transcripts
- real transcript replay
- clinical records
- health records
- raw biosignals
- raw Sal-Meter traces
- raw CAIS traces
- private consent records
- production intervention logs
- production monitoring logs
- relationship verdicts
- human-ranking outputs
- device-readiness claims
- production-readiness claims
- certification claims
- termination-gate accuracy claims
- synthetic replay validation claims
- phone monitoring authority claims
Correct boundary sentence:
Synthetic sample packages, the P4-4 phone-only simulator scaffold, the P4-5 synthetic replay scaffold, and the P5-3 synthetic AI Output A/B consequence evaluator may demonstrate public helper structure only; they do not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
The GitHub Actions workflow is located at:
.github/workflows/validate-synthetic-sample.yml
Current intended workflow sequence:
- Run synthetic sample package validator
- Run P3 helper schema validator
- Run P4 synthetic dyadic recovery demo-flow evaluator
- Run P4 termination gate demo evaluator
- Run boundary language lint
Current validation helpers:
evaluation-baseline/validate_sample_package.pyevaluation-baseline/validate_p3_schemas.pyevaluation-baseline/evaluate_dyadic_recovery_demo.pyevaluation-baseline/evaluate_termination_gate_demo.pyevaluation-baseline/boundary_lint.py
The workflow successfully runs on the main branch.
This confirms only:
- public helper-structure validation
- synthetic sample package consistency
- P3 helper-schema consistency
- synthetic demo-flow consistency
- synthetic termination-gate helper consistency
- wording-boundary hygiene
It does not confirm scientific validity, benchmark validity, mediation validity, dyadic recovery validity, termination-gate accuracy, replay validity, phone monitoring validity, Sal-Meter status, CAIS compliance, certification, device readiness, or production readiness.
P4-4 currently adds documentation and phone-only simulator scaffold files only.
Current P4-4 scaffold files:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
P4-4 does not currently add a separate validator.
P4-4 does not currently add a separate GitHub Actions workflow step.
P4-4 may be reviewed by existing boundary-language lint if the lint scan path includes the phone-only-simulator/ folder.
P4-4 workflow status does not mean phone-only simulator validation.
It does not mean phone monitoring authority.
It does not mean production readiness.
P4-5 currently adds documentation and synthetic replay scaffold files only.
Current P4-5 scaffold files:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
P4-5 does not currently add a separate validator.
P4-5 does not currently add a separate GitHub Actions workflow step.
P4-5 may be reviewed by existing boundary-language lint if the lint scan path includes the synthetic-session-replay/ folder.
P4-5 workflow status does not mean synthetic replay validation.
It does not mean real session replay.
It does not mean replay authority.
It does not mean production readiness.
P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper.
Current P5-3 helper file:
evaluation-baseline/evaluate_ai_output_ab.py
P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.
P5-3 is not workflow-validated unless the GitHub Actions workflow explicitly runs:
python evaluation-baseline/evaluate_ai_output_ab.py
Until that workflow step is added and passes, P5-3 must be described as:
- present
- helper-stage
- synthetic-only
- proxy-only
- not workflow-validated
- not benchmark validation
- not mediation validation
- not dyadic recovery validation
- not termination-gate accuracy validation
- not production readiness
If a later validator is added for P4-4, P4-5, or P5-3, the workflow may be extended in a separate issue or pull request.
Any new validator must preserve the same public-helper boundary.
A new validator must not be described as scientific validation, benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.
This workflow does not validate benchmark performance.
It does not validate scientific truth.
It does not validate mediation.
It does not validate dyadic recovery.
It does not validate termination-gate accuracy.
It does not validate synthetic replay.
It does not validate phone monitoring.
It does not validate Sal-Meter.
It does not grant CAIS compliance.
It does not validate the P4-4 phone-only simulator.
It does not validate the P4-5 synthetic replay scaffold.
It does not validate the P5-3 AI Output A/B consequence evaluator as real-world impact evidence.
It does not certify phone monitoring.
It does not certify replay.
It does not certify any system, model, dataset, dashboard, laboratory, device, repository, schema, session protocol, implementation, mediation system, termination gate, phone-only simulator, replay scaffold, evaluator, or closed-loop system.
It does not create clinical, diagnostic, therapeutic, counseling, surveillance, certification, device-readiness, production-readiness, relationship-verdict, phone-monitoring, replay-validation, production closed-loop, or human-ranking authority.
Correct boundary sentence:
The validation workflow checks public helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, and wording hygiene only; P4-4 currently adds phone-only simulator scaffold documentation only, P4-5 currently adds synthetic replay scaffold documentation only, P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper only, and none of them creates benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring authority, Sal-Meter validation, CAIS compliance, certification, or production authority.
Local validation is helper validation only.
It checks public helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, and wording-boundary hygiene.
It does not create evidence, validation, certification, Sal-Meter status, CAIS compliance, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
Install dependencies with:
pip install -r evaluation-baseline/requirements.txt
Run the current local validators:
python evaluation-baseline/validate_sample_package.pypython evaluation-baseline/validate_p3_schemas.pypython evaluation-baseline/evaluate_dyadic_recovery_demo.pypython evaluation-baseline/evaluate_termination_gate_demo.pypython evaluation-baseline/boundary_lint.py
PASS means only:
- the public synthetic/sample helper files follow the expected helper structure
- the P3 helper-schema objects follow expected helper-schema structure
- the P4-1 synthetic demo-flow objects preserve expected helper consistency
- the P4-3 synthetic termination-gate helper cases preserve expected helper consistency
- wording boundary checks are clean
PASS does not mean:
- benchmark validation
- scientific truth validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- phone-only simulator validation
- synthetic replay validation
- phone monitoring validation
- Sal-Meter validation
- CAIS compliance
- clinical evidence
- diagnostic evidence
- therapeutic evidence
- counseling evidence
- surveillance authority
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-4 currently adds phone-only simulator scaffold documentation only.
Current P4-4 scaffold files:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
P4-4 currently has no separate local validator.
P4-4 currently has no separate GitHub Actions validation step.
P4-4 files may be reviewed manually for boundary consistency.
P4-4 files may be scanned by the boundary language lint if the lint path includes the phone-only-simulator/ folder.
P4-4 local status does not mean phone-only simulator validation.
It does not mean real phone monitoring.
It does not mean phone monitoring authority.
It does not mean production readiness.
P4-5 currently adds synthetic replay scaffold documentation only.
Current P4-5 scaffold files:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
P4-5 currently has no separate local validator.
P4-5 currently has no separate GitHub Actions validation step.
P4-5 files may be reviewed manually for boundary consistency.
P4-5 files may be scanned by the boundary language lint if the lint path includes the synthetic-session-replay/ folder.
P4-5 local status does not mean synthetic replay validation.
It does not mean real session replay.
It does not mean replay validation authority.
It does not mean production readiness.
P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper.
Current P5-3 helper file:
evaluation-baseline/evaluate_ai_output_ab.py
P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.
P5-3 is not part of the current local PASS meaning unless it is explicitly run.
P5-3 may be run locally with:
python evaluation-baseline/evaluate_ai_output_ab.py
If this command is not included in the GitHub Actions workflow, P5-3 must remain described as:
- present
- helper-stage
- synthetic-only
- proxy-only
- not workflow-validated
- not benchmark validation
- not mediation validation
- not dyadic recovery validation
- not termination-gate accuracy validation
- not production readiness
A successful P5-3 local run does not validate real AI impact, real human-state measurement, mediation effectiveness, dyadic recovery, termination-gate accuracy, Sal-Meter status, CAIS compliance, benchmark validation, device readiness, production readiness, or certification.
If a later P4-4, P4-5, or P5-3 validator is added, it should be added in a separate issue or pull request.
Any added validator must preserve the same public-helper boundary.
A validator must not be described as scientific validation, benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.
Local validation checks helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, optional synthetic AI Output A/B helper behavior, and wording hygiene only; P4-4 currently adds phone-only simulator scaffold documentation only, P4-5 currently adds synthetic replay scaffold documentation only, P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper only, and none of them creates evidence, validation, certification, replay validation, phone monitoring authority, Sal-Meter status, CAIS compliance, or production authority.
This repository must not contain:
- raw human data
- identifiable human data
- private participant data
- real dyadic conflict records
- real session records
- real phone recordings
- real call transcripts
- real transcript replay
- real phone-session logs
- consent forms with identifiers
- private session logs
- raw biosignal files from real participants
- raw Sal-Meter traces
- raw CAIS traces
- private labels
- hidden ground-truth labels
- clinical interpretations
- diagnostic interpretations
- therapeutic interpretations
- counseling interpretations
- person ranking
- human ranking
- relationship verdicts
- relationship scoring outputs
- employment, insurance, legal, educational, or eligibility decisions
- surveillance or coercive monitoring materials
- phone monitoring authority
- replay validation authority
- real-time monitoring authority
- device-readiness claims
- production-readiness claims
- certification claims
- production closed-loop claims
- termination-gate accuracy claims
- dyadic recovery validation claims
- mediation validation claims
- synthetic replay validation claims
- benchmark validation claims
- scientific validation claims
- Sal-Meter validation claims
- CAIS compliance claims
Public sample, helper, simulator, replay, and evaluator files must remain:
- synthetic
- sample
- mock
- placeholder
- structure-only
- non-identifying
- raw-data-free
- public-helper-only
- non-clinical
- non-diagnostic
- non-therapeutic
- non-counseling
- non-surveillance
- non-certification
- non-human-ranking
- not Sal-Meter
- not CAIS compliance
- not benchmark evidence
- not mediation evidence
- not dyadic recovery evidence
- not termination-gate accuracy evidence
- not synthetic replay validation
- not phone monitoring authority
- not replay validation authority
- not production data
P4-3 termination-gate helper cases may demonstrate:
- pause-session examples
- narrow-scope examples
- close-session examples
- terminate-session examples
- consent-refresh examples
- packet-refresh examples
- audit-only examples
- closed-session handling
- permission-expiry handling
- low-confidence handling
- insufficient-data-quality handling
- private-state exposure risk handling
- one-sided improvement caution
P4-3 termination-gate helper cases must not imply:
- real mediation accuracy
- validated termination-gate accuracy
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- Sal-Meter validation
- CAIS compliance
- clinical readiness
- diagnostic readiness
- therapeutic readiness
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-4 phone-only simulator scaffold files may demonstrate:
- synthetic phone-only session structure
- consent-first flow
- packet availability check
- synthetic baseline summary
- synthetic AI output
- synthetic Human-State Delta review
- Recovery Gate placeholder
- Termination Gate placeholder
- closed-session handling
- audit-log boundary
- public-helper-only simulator posture
P4-4 phone-only simulator scaffold files must not imply:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- clinical intake
- diagnosis
- therapy
- counseling
- mediation-service operation
- surveillance
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- Sal-Meter validation
- CAIS compliance
- phone monitoring authority
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-5 synthetic session replay scaffold files may demonstrate:
- synthetic session replay structure
- replay manifest structure
- replay source declaration
- synthetic replay event timeline
- consent boundary review
- packet boundary review
- synthetic AI output replay
- synthetic Human-State Delta replay
- Recovery Gate replay
- Termination Gate replay
- closure replay
- audit-only replay summary
- closed-session replay handling
- public-helper-only replay posture
P4-5 synthetic session replay scaffold files must not imply:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- clinical replay
- diagnostic replay
- therapeutic replay
- counseling replay
- surveillance replay
- production mediation replay
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone monitoring validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
A synthetic replay may document a closed session.
A synthetic replay must not reopen a closed session.
A synthetic replay must not continue mediation after closure.
A synthetic replay must not generate new AI output after closure.
A synthetic replay must not convert closure into recovery evidence.
A synthetic replay must not convert audit into certification.
P5-3 synthetic AI Output A/B consequence evaluator files may demonstrate:
- synthetic AI Output A/B comparison
- generic AI output comparison
- state-aware AI output comparison
- synthetic Human-State Delta comparison
- synthetic recovery burden direction
- synthetic dyadic stability direction
- synthetic false-recovery risk
- synthetic termination-readiness direction
- proxy-only helper metrics
P5-3 synthetic AI Output A/B consequence evaluator files must not imply:
- real AI impact validation
- real human-state measurement validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- benchmark validation
- scientific validation
- clinical validation
- diagnostic validation
- therapeutic validation
- counseling validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P5-3 is an evaluator helper.
P5-3 is not a sample package.
P5-3 is not evidence.
P5-3 is not benchmark validation.
P5-3 is not mediation validation.
P5-3 is not production readiness.
Public repository content may include:
- synthetic examples
- sample files
- mock packets
- placeholder flows
- schema helpers
- evaluator helpers
- simulator scaffolds
- replay scaffolds
- documentation scaffolds
- boundary-language checks
Public repository content must not include:
- raw human data
- identifiable human data
- real participant records
- private pilot records
- real session records
- real phone recordings
- real transcripts
- real transcript replay
- clinical records
- diagnostic records
- therapeutic records
- counseling records
- production logs
- private consent records
- Sal-Meter raw traces
- CAIS raw traces
- controlled-access evidence packages
Correct boundary sentence:
Public data in this repository may demonstrate helper structure, synthetic consistency, phone-only simulator scaffolding, synthetic replay scaffolding, and synthetic AI Output A/B consequence evaluator scaffolding only; it must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
All issues and pull requests must preserve the repository boundary.
Issues and pull requests may improve helper structure.
They must not convert this repository into an evidence system, validation system, certification system, production system, clinical system, diagnostic system, therapeutic system, counseling system, surveillance system, phone monitoring system, real session replay system, relationship-verdict system, human-ranking system, Sal-Meter validation system, or CAIS compliance system.
Contributions must not claim or imply:
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- phone-only simulator validation
- synthetic replay validation
- phone monitoring validation
- AI Output A/B real impact validation
- real human-state measurement validation
- Sal-Meter validation
- CAIS compliance
- diagnostic status
- clinical status
- therapeutic status
- counseling-service status
- legal mediation authority
- surveillance readiness
- phone monitoring authority
- replay validation authority
- device readiness
- production readiness
- certification
- production deployment
- production closed-loop authority
- human ranking
- relationship verdict
- relationship scoring
- official consciousness measurement
- ground-truth human-state truth measurement
Issues and pull requests may propose or modify:
- public helper documents
- synthetic sample structures
- schema helper structures
- synthetic demo-flow objects
- synthetic termination-gate helper cases
- synthetic AI Output A/B consequence evaluator helpers
- proxy-only evaluator helper logic
- phone-only simulator scaffold files
- synthetic phone-session wireframes
- synthetic phone-session state-machine mockups
- synthetic sample phone-session scripts
- synthetic session replay scaffold files
- synthetic replay manifests
- synthetic replay event timelines
- synthetic replay boundary documents
- validation helper scripts
- wording-boundary lint rules
- documentation alignment
- release-boundary notes
- workflow helper checks
- README boundary corrections
Issues and pull requests must not introduce:
- raw human data
- identifiable human data
- clinical data
- health data
- real session records
- real phone recordings
- real call transcripts
- real participant data
- real consent records
- real phone-session logs
- real transcript replay
- private pilot records
- private advisor materials
- private reviewer memos
- Sal-Meter raw input
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- controlled-access evidence packages
- benchmark validation claims
- scientific validation claims
- mediation validation claims
- dyadic recovery validation claims
- termination-gate accuracy validation claims
- phone-only simulator validation claims
- synthetic replay validation claims
- phone monitoring authority claims
- replay validation authority claims
- AI Output A/B real impact validation claims
- device-readiness claims
- production-readiness claims
- certification claims
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
A valid issue or pull request may improve:
- helper structure
- boundary clarity
- synthetic consistency checks
- schema clarity
- sample package consistency
- termination-gate helper case coverage
- AI Output A/B consequence evaluator helper clarity
- proxy-only evaluator metric naming
- phone-only simulator scaffold clarity
- synthetic phone-session flow representation
- synthetic session replay scaffold clarity
- synthetic replay event ordering
- closed-session replay handling
- wording-boundary lint coverage
- README release alignment
- public-helper documentation consistency
A valid issue or pull request may add a helper workflow check only if the check remains explicitly bounded as public-helper validation.
A valid issue or pull request may add P5-3 workflow execution only if it is described as synthetic AI Output A/B helper execution, not benchmark validation.
A valid P5-3 workflow step may run:
python evaluation-baseline/evaluate_ai_output_ab.py
A successful P5-3 workflow run must not be described as real AI impact validation, real human-state measurement validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.
A valid issue or pull request must not convert this repository into:
- an evidence system
- a certification system
- a production system
- a clinical system
- a diagnostic system
- a therapeutic system
- a counseling system
- a surveillance system
- a real phone monitoring system
- a real session replay system
- a real transcript replay system
- a relationship-verdict system
- a human-ranking system
- a Sal-Meter validation system
- a CAIS compliance system
- a production mediation system
- a production closed-loop system
A reviewer should reject or request revision for any issue or pull request that introduces:
- raw human data
- real participant data
- real session data
- real phone data
- private consent material
- clinical framing
- diagnostic framing
- therapeutic framing
- counseling framing
- surveillance framing
- certification framing
- device-readiness framing
- production-readiness framing
- Sal-Meter validation framing
- CAIS compliance framing
- benchmark validation framing
- mediation validation framing
- dyadic recovery validation framing
- termination-gate accuracy validation framing
- synthetic replay validation framing
- phone monitoring authority framing
- replay validation authority framing
- relationship verdict framing
- human-ranking framing
Issues and pull requests may improve public helper structure, synthetic sample structures, schema helper structures, synthetic termination-gate cases, P5-3 synthetic AI Output A/B consequence evaluator helpers, phone-only simulator scaffolding, and synthetic replay scaffolding, but they must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
Dashboard mockups in this repository are public helper structures only.
They may present bounded synthetic/sample helper fields for demonstration.
They may show synthetic status only.
They must not show real participant state, real monitoring status, real phone monitoring status, real replay status, validated benchmark status, validated mediation status, Sal-Meter output, CAIS compliance, certification, device readiness, production readiness, relationship verdicts, or human ranking.
Dashboard mockups may show:
- synthetic session identifiers
- synthetic packet availability status
- synthetic confidence fields
- synthetic data-quality fields
- synthetic Human-State Delta summaries
- synthetic Dyadic Delta summaries
- synthetic Recovery Gate status
- synthetic Termination Gate status
- synthetic pause examples
- synthetic narrow-scope examples
- synthetic close-session examples
- synthetic terminate-session examples
- synthetic audit status
- synthetic public-boundary flags
- synthetic phone-only simulator state
- synthetic phone-session flow status
- synthetic phone-session state-machine status
- synthetic phone-session closure status
- synthetic replay manifest status
- synthetic replay event timeline status
- synthetic replay boundary status
- synthetic replay closure status
- synthetic audit-only replay status
- synthetic AI Output A/B evaluator helper status
- synthetic generic AI output comparison status
- synthetic state-aware AI output comparison status
- synthetic false-recovery risk helper status
- synthetic termination-readiness helper status
- proxy-only evaluator helper status
Dashboard mockups must not present:
- person scores
- diagnosis
- treatment guidance
- counseling guidance
- clinical interpretation
- employment eligibility
- insurance eligibility
- legal eligibility
- educational eligibility
- surveillance status
- phone monitoring status
- real-time monitoring status
- real phone recording status
- real transcript status
- real session replay status
- real phone replay status
- real transcript replay status
- replay validation status
- phone monitoring authority
- replay validation authority
- relationship verdicts
- relationship scoring
- human ranking
- psychological safety score
- certified status
- validated benchmark status
- validated mediation status
- validated dyadic recovery status
- validated termination-gate accuracy status
- validated phone-only simulator status
- validated synthetic replay status
- validated AI Output A/B real impact status
- real human-state measurement validation status
- device-readiness status
- production-readiness status
- production closed-loop status
- Sal-Meter output
- Sal-Meter validation status
- CAIS compliance
A dashboard may show P4-4 phone-only simulator scaffold status only as synthetic helper structure.
It may show:
- synthetic phone-only simulator file presence
- synthetic phone-session wireframe status
- synthetic state-machine mockup status
- synthetic sample phone-session script status
- synthetic consent-first flow status
- synthetic packet availability check status
- synthetic closure status
- synthetic audit-log boundary status
It must not show:
- real call monitoring
- real phone audio status
- real phone recording status
- real transcript processing
- real participant state
- real phone-session status
- phone monitoring authority
- phone-only simulator validation
- production phone monitoring readiness
A dashboard may show P4-5 synthetic replay scaffold status only as synthetic helper structure.
It may show:
- synthetic replay manifest status
- synthetic replay event timeline status
- synthetic replay boundary status
- synthetic replay source declaration status
- synthetic consent boundary review status
- synthetic packet boundary review status
- synthetic AI output replay status
- synthetic Human-State Delta replay status
- synthetic Recovery Gate replay status
- synthetic Termination Gate replay status
- synthetic closure replay status
- synthetic audit-only replay status
- closed-session replay handling status
It must not show:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- synthetic replay validation
- replay validation authority
- production replay readiness
- relationship verdicts
- human ranking
A dashboard must not reopen a closed session.
A dashboard must not convert replay into intervention.
A dashboard must not convert audit into certification.
A dashboard may show P5-3 synthetic AI Output A/B consequence evaluator helper status only as synthetic proxy-helper structure.
It may show:
- synthetic AI Output A/B helper file presence
- synthetic generic AI output comparison status
- synthetic state-aware AI output comparison status
- synthetic Human-State Delta comparison status
- synthetic recovery burden direction
- synthetic dyadic stability direction
- synthetic false-recovery risk helper status
- synthetic termination-readiness helper status
- proxy-only evaluator helper status
It must not show:
- real AI impact validation
- real human-state measurement validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- benchmark validation
- scientific validation
- clinical validation
- diagnostic validation
- therapeutic validation
- counseling validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P5-3 dashboard display is helper-status display only.
P5-3 dashboard display is not evidence.
P5-3 dashboard display is not benchmark validation.
P5-3 dashboard display is not mediation validation.
P5-3 dashboard display is not production readiness.
A dashboard must not become:
- a judgment engine
- a monitoring engine
- a phone monitoring engine
- a replay validation engine
- a clinical engine
- a diagnostic engine
- a therapeutic engine
- a counseling engine
- a mediation-service engine
- a relationship-verdict engine
- a human-ranking engine
- a Sal-Meter output engine
- a CAIS compliance engine
- a production closed-loop intervention engine
A dashboard mockup may display public helper structure, synthetic phone-only simulator scaffold status, synthetic replay scaffold status, and synthetic AI Output A/B consequence evaluator helper status only; it must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
Closed-loop demo-lite files are local placeholder structures only.
They may demonstrate public-helper flow shape, synthetic routing structure, and bounded closure logic.
They do not define a production closed-loop intervention system.
They do not authorize real-time human monitoring.
They do not authorize phone monitoring.
They do not authorize replay validation.
They do not authorize automated intervention on real participants.
They do not validate mediation, recovery, dyadic recovery, termination-gate accuracy, phone-only simulator behavior, synthetic replay behavior, Sal-Meter, CAIS compliance, device readiness, production readiness, or certification.
Closed-loop demo-lite files may demonstrate:
- synthetic event-log shape
- synthetic feedback-loop boundary fields
- placeholder routing logic
- pause-session examples
- narrow-scope examples
- close-session examples
- terminate-session examples
- audit-only examples
- public-helper-only closure logic
- closed-session handling
- non-intervention after closure
- boundary-safe placeholder flow
P4-4 phone-only simulator files may demonstrate:
- synthetic phone-session flow structure
- synthetic phone-session state-machine structure
- synthetic sample phone-session script structure
- consent-first phone-only session entry
- packet availability check
- synthetic baseline summary
- synthetic AI output
- synthetic Human-State Delta review
- Recovery Gate placeholder
- Termination Gate placeholder
- session closure
- audit-log boundary
P4-4 phone-only simulator files do not authorize:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- real phone-session operation
- clinical intake
- diagnosis
- therapy
- counseling
- surveillance
- mediation-service operation
- phone monitoring authority
- phone-only simulator validation
- device readiness
- production readiness
- production closed-loop authority
P4-5 synthetic replay scaffold files may demonstrate:
- synthetic replay manifest structure
- synthetic replay event timeline structure
- synthetic replay boundary structure
- replay source declaration
- consent boundary review
- packet boundary review
- synthetic AI output replay
- synthetic Human-State Delta replay
- Recovery Gate replay
- Termination Gate replay
- closure replay
- audit-only replay summary
- closed-session replay handling
P4-5 synthetic replay scaffold files do not authorize:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- raw human data replay
- clinical replay
- diagnostic replay
- therapeutic replay
- counseling replay
- surveillance replay
- production mediation replay
- replay validation
- replay validation authority
- synthetic replay validation
- device readiness
- production readiness
- production closed-loop authority
P5-3 synthetic AI Output A/B consequence evaluator helper files may demonstrate:
- synthetic AI Output A/B comparison
- generic AI output comparison
- state-aware AI output comparison
- synthetic Human-State Delta comparison
- synthetic recovery burden direction
- synthetic dyadic stability direction
- synthetic false-recovery risk
- synthetic termination-readiness direction
- proxy-only helper metrics
P5-3 does not authorize:
- real AI impact validation
- real human-state measurement validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- benchmark validation
- scientific validation
- clinical validation
- diagnostic validation
- therapeutic validation
- counseling validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P5-3 is an evaluator helper.
P5-3 is not intervention logic.
P5-3 is not evidence.
P5-3 is not proof.
P5-3 is not production readiness.
Closed-loop demo-lite, P4-4 phone-only simulator, P4-5 synthetic replay scaffold, and P5-3 evaluator helper files must not contain:
- raw human data
- identifiable human data
- clinical data
- health data
- real session records
- real phone recordings
- real call transcripts
- real transcript replay
- real participant data
- real consent records
- real phone-session logs
- private pilot records
- private advisor materials
- private reviewer memos
- Sal-Meter raw input
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- controlled-access evidence packages
- real-time monitoring authority
- phone monitoring authority
- replay validation authority
- automated intervention authority
- benchmark validation claims
- scientific validation claims
- mediation validation claims
- dyadic recovery validation claims
- termination-gate accuracy validation claims
- phone-only simulator validation claims
- synthetic replay validation claims
- AI Output A/B real impact validation claims
- device-readiness claims
- production-readiness claims
- certification claims
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
A closed session must stay closed.
A replay must not reopen a closed session.
A replay must not continue mediation after closure.
A replay must not generate new AI output after closure.
A replay must not convert closure into recovery evidence.
A replay must not convert audit into certification.
A demo loop must not convert placeholder routing into real intervention.
A helper evaluator must not convert synthetic comparison into proof.
Closed-loop demo-lite, P4-4 phone-only simulator, P4-5 synthetic replay scaffold, and P5-3 synthetic AI Output A/B consequence evaluator helper files may demonstrate placeholder public-helper structure only; they must not create evidence, validation, certification, replay validation, phone monitoring authority, monitoring authority, production authority, relationship verdicts, or human-ranking authority.
The future roadmap remains public-helper-only.
The next roadmap should move from synthetic replay scaffolding and P5-3 evaluator-helper presence toward public helper demo package review, optional boundary-lint extension, optional helper workflow execution review, and bounded release-readiness documentation.
Future roadmap items must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
P4-6 — Public Helper Demo Package Review
Purpose:
- review synthetic demo packages
- review phone-only simulator scaffolds
- review synthetic replay scaffolds
- review P5-3 synthetic AI Output A/B consequence evaluator helper boundary
- check public-boundary consistency before any future release
P4-7 — Phone-only / Replay / Evaluator Boundary Lint Extension
Purpose:
- consider extending boundary-language lint coverage to
phone-only-simulator/ - consider extending boundary-language lint coverage to
synthetic-session-replay/ - consider extending boundary-language lint coverage to
evaluation-baseline/evaluate_ai_output_ab.py - keep the lint extension as wording-boundary hygiene only
P4-8 — Public Helper Release Readiness Note
Purpose:
- prepare a bounded release-readiness note only after P4-6 review and any needed lint extension are complete
- state release readiness as public-helper readiness only
- avoid benchmark validation, mediation validation, dyadic recovery validation, replay validation, phone monitoring authority, device readiness, or production readiness claims
P5-4 — Optional P5-3 Workflow Helper Execution Review
Purpose:
- decide whether
evaluation-baseline/evaluate_ai_output_ab.pyshould be added to local validation and GitHub Actions - describe any added execution as synthetic AI Output A/B helper execution only
- avoid describing P5-3 execution as real AI impact validation, benchmark validation, mediation validation, dyadic recovery validation, or production readiness
Completed helper-validation and P4 helper milestones are tracked under:
- Current P5 helper-validation state
- Implementation status table
- Completed P5 helper-validation files
- Completed P4-4 public simulator scaffold files
- Completed P4-5 public replay scaffold files
- Synthetic sample packages
- Validation workflow
- Local validation
Completed P4 helper items include:
- P4-0 synthetic dyadic demo-flow package
- P4-1 synthetic dyadic recovery demo-flow evaluator
- P4-2 mediation policy prompt pack
- P4-3 synthetic termination-gate helper case package
- P4-3 termination gate demo evaluator
- P4-4 phone-only simulator scaffold
- P4-4 phone-only session flow wireframe
- P4-4 synthetic phone-session state-machine mockup
- P4-4 synthetic sample phone-session script
- P4-5 synthetic session replay scaffold
- P4-5 synthetic replay manifest
- P4-5 synthetic replay event timeline
- P4-5 synthetic replay boundary document
- P5-3 synthetic AI Output A/B consequence evaluator helper
Current P4-4 scaffold files:
phone-only-simulator/README.mdphone-only-simulator/session-flow-wireframe.mdphone-only-simulator/phone-session-state-machine.jsonphone-only-simulator/sample-phone-session-script.md
Current P4-5 scaffold files:
synthetic-session-replay/README.mdsynthetic-session-replay/replay-manifest.jsonsynthetic-session-replay/replay-event-timeline.jsonsynthetic-session-replay/replay-boundary.md
Current P5-3 evaluator helper file:
evaluation-baseline/evaluate_ai_output_ab.py
Future roadmap items must remain:
- research-stage
- public-helper-only
- synthetic-first
- synthetic/sample-data-first
- raw-data-non-public
- non-clinical
- non-diagnostic
- non-therapeutic
- non-counseling
- non-surveillance
- non-certification
- non-human-ranking
- not Sal-Meter
- not Proxy Sal-Meter
- not CAIS compliance
- not benchmark validation
- not scientific validation
- not mediation validation
- not dyadic recovery validation
- not termination-gate accuracy validation
- not synthetic replay validation
- not phone monitoring authority
- not replay validation authority
- not AI Output A/B real impact validation
- not device readiness
- not production readiness
- not production closed-loop
Future roadmap items must not introduce:
- raw human data
- identifiable human data
- clinical data
- health data
- real session records
- real phone recordings
- real call transcripts
- real participant data
- real consent records
- real phone-session logs
- real transcript replay
- private pilot records
- private advisor materials
- private reviewer memos
- Sal-Meter raw input
- raw Sal-Meter traces
- raw CAIS traces
- CAIS compliance dossiers
- controlled-access evidence packages
- benchmark validation claims
- scientific validation claims
- mediation validation claims
- dyadic recovery validation claims
- termination-gate accuracy validation claims
- phone-only simulator validation claims
- synthetic replay validation claims
- phone monitoring authority claims
- replay validation authority claims
- AI Output A/B real impact validation claims
- device-readiness claims
- production-readiness claims
- certification claims
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
P4-6 review may check:
- public helper file completeness
- synthetic-only status
- boundary-language consistency
- closed-session handling
- replay does not reopen closure
- simulator and replay folders remain outside
sample-data/ - P5-3 remains evaluator-helper-only
- P5-3 does not become intervention logic
- P5-3 does not become evidence or proof
- root README alignment
- issue checklist alignment
- Actions PASS status
- optional lint coverage status
P4-6 review must not become:
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- synthetic replay validation
- phone-only simulator validation
- AI Output A/B real impact validation
- Sal-Meter validation
- CAIS compliance
- device-readiness review
- production-readiness review
- certification review
P4-7 may extend wording-boundary lint coverage.
It may check for prohibited wording in:
phone-only-simulator/synthetic-session-replay/evaluation-baseline/evaluate_ai_output_ab.py- README release-boundary sections
- issue and PR boundary sections
P4-7 must remain wording-boundary hygiene only.
It must not become scientific validation, benchmark validation, mediation validation, replay validation, phone monitoring validation, AI Output A/B real impact validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.
P4-8 may prepare a bounded release-readiness note.
The note may state:
- helper files are present
- synthetic sample structures are present
- simulator scaffold files are present
- replay scaffold files are present
- P5-3 evaluator helper is present
- boundary language has been reviewed
- public data boundary remains intact
The note must not state:
- benchmark validated
- scientifically validated
- mediation validated
- dyadic recovery validated
- termination-gate accuracy validated
- replay validated
- phone-only simulator validated
- AI Output A/B real impact validated
- Sal-Meter validated
- CAIS compliant
- device ready
- production ready
- certified
P5-4 may consider whether to add P5-3 helper execution to the workflow.
A valid P5-3 workflow step may run:
python evaluation-baseline/evaluate_ai_output_ab.py
A successful P5-3 workflow run may mean only:
- the synthetic AI Output A/B consequence evaluator helper executed successfully
- proxy-only helper output was generated under synthetic conditions
- public-helper structure remained intact
A successful P5-3 workflow run must not mean:
- real AI impact validation
- real human-state measurement validation
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
Future roadmap items may extend public helper review, synthetic replay scaffolding, simulator boundary coverage, P5-3 synthetic AI Output A/B evaluator helper execution review, and optional lint hygiene, but they must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
This repository does not attempt to:
- prove consciousness
- measure consciousness directly
- infer emotions
- diagnose mental state
- treat or counsel people
- rank persons
- judge relationships
- produce relationship verdicts
- produce human-ranking outputs
- replace human consent
- expose raw human data
- process identifiable human data
- publish clinical data
- process real phone calls
- process real phone recordings
- process real call transcripts
- process real phone-session logs
- process real session records
- replay real sessions
- replay real phone calls
- replay real transcripts
- create phone monitoring authority
- create replay validation authority
- authorize real-time phone monitoring
- validate the phone-only simulator
- validate the synthetic replay scaffold
- validate P5-3 AI Output A/B real-world impact
- validate real human-state measurement
- validate Sal-Meter
- define CAIS compliance
- validate benchmark performance
- validate scientific truth
- validate mediation
- validate dyadic recovery
- validate termination-gate accuracy
- certify any system
- certify any model
- certify any dataset
- certify any dashboard
- certify any laboratory
- certify any device
- certify device readiness
- certify production readiness
- operate a production mediation service
- operate a production phone-monitoring service
- operate a production replay service
- operate a production closed-loop intervention system
- authorize surveillance
- authorize real-time monitoring
- authorize automated intervention on real participants
This repository may support:
- public helper documentation
- synthetic sample structure
- schema helper structure
- synthetic demo-flow consistency checks
- synthetic termination-gate helper consistency checks
- synthetic phone-only simulator scaffolding
- synthetic phone-session flow representation
- synthetic phone-session state-machine mockups
- synthetic sample phone-session scripts
- synthetic session replay scaffolding
- synthetic replay manifest structure
- synthetic replay event timeline structure
- synthetic replay boundary documentation
- synthetic AI Output A/B consequence evaluator helper structure
- proxy-only evaluator helper logic
- optional helper workflow execution review
- boundary-language hygiene
- repository-level transparency
This repository must not become:
- a clinical system
- a diagnostic system
- a therapeutic system
- a counseling system
- a surveillance system
- a real phone monitoring system
- a real session replay system
- a real transcript processing system
- a replay validation system
- a real AI impact validation system
- a real human-state measurement validation system
- a relationship-verdict system
- a human-ranking system
- a production closed-loop system
- a certified benchmark system
- a Sal-Meter validation system
- a CAIS compliance system
- a production mediation system
- a production phone-monitoring system
- a production replay system
P4-4 phone-only simulator files are not:
- real phone monitoring
- real phone recording
- real transcript processing
- real participant data processing
- phone-only simulator validation
- phone monitoring authority
- production phone-monitoring readiness
P4-5 synthetic replay scaffold files are not:
- real session replay
- real phone replay
- real transcript replay
- real participant data replay
- synthetic replay validation
- replay validation authority
- production replay readiness
P5-3 synthetic AI Output A/B consequence evaluator helper files are not:
- real AI impact validation
- real human-state measurement validation
- benchmark validation
- scientific validation
- mediation validation
- dyadic recovery validation
- termination-gate accuracy validation
- clinical validation
- diagnostic validation
- therapeutic validation
- counseling validation
- Sal-Meter validation
- CAIS compliance
- device readiness
- production readiness
- certification
- relationship verdict authority
- human-ranking authority
- production closed-loop authority
The helper evaluator is not proof.
The simulator is not monitoring.
The replay scaffold is not replay authority.
The dashboard is not a judgment engine.
The workflow is not certification.
Correct boundary sentence:
This repository is a public helper surface; it may support synthetic sample structure, simulator scaffolding, replay scaffolding, P5-3 synthetic AI Output A/B evaluator helper structure, and wording-boundary hygiene, but it does not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.
Unless otherwise stated, public helper materials in this repository are released under:
- Creative Commons Attribution-ShareAlike 4.0 International
- CC BY-SA 4.0
Document-level license statements in DOI-registered canonical records remain fixed by those records.
This GitHub repository is a helper surface.
It does not override DOI-registered canonical records.
It does not override document-level license statements.
It does not create certification, compliance, validation, device-readiness, production-readiness, or authority claims.
Please cite DOI-registered records as the authority layer.
This GitHub repository is a helper surface.
DOI records govern.
GitHub helps.
See:
CITATION.cff
If a helper file and a DOI-registered canonical record conflict, the DOI-registered canonical record governs.
GitHub README text, helper files, sample data, simulator scaffolds, replay scaffolds, evaluator helpers, issue text, and pull request text do not replace canonical DOI authority.
Correct boundary sentence:
DOI-registered canonical records govern authority and citation; this GitHub repository helps public navigation, helper structure, sample scaffolding, simulator scaffolding, replay scaffolding, evaluator-helper visibility, and boundary-language hygiene only.
This repository documents structure.
It does not validate the body.
It does not validate the person.
It does not validate the relationship.
It does not validate a human state.
It does not validate real human-state measurement.
It does not validate AI Output A/B real-world impact.
It does not validate dyadic recovery.
It does not validate mediation.
It does not validate termination-gate accuracy.
It does not validate the phone-only simulator.
It does not validate the synthetic replay scaffold.
It does not validate the P5-3 synthetic AI Output A/B consequence evaluator as real-world evidence.
It does not validate Sal-Meter.
It does not grant CAIS compliance.
It does not crown a benchmark as validated.
It does not certify any system.
It does not certify any model.
It does not certify any dataset.
It does not certify any dashboard.
It does not certify any laboratory.
It does not certify any device.
It does not certify device readiness.
It does not certify production readiness.
It does not authorize surveillance.
It does not authorize diagnosis.
It does not authorize therapy.
It does not authorize counseling.
It does not authorize legal mediation.
It does not authorize relationship verdicts.
It does not authorize human ranking.
It does not authorize phone monitoring.
It does not authorize real-time monitoring.
It does not authorize real phone recording.
It does not authorize real transcript processing.
It does not authorize real session replay.
It does not authorize real phone replay.
It does not authorize real transcript replay.
It does not authorize replay validation.
It does not authorize production mediation.
It does not authorize production phone monitoring.
It does not authorize production replay.
It does not authorize production closed-loop intervention.
A closed session must stay closed.
A replay must not reopen a closed session.
A replay must not continue mediation after closure.
A replay must not generate new AI output after closure.
A replay must not convert closure into recovery evidence.
A replay must not convert audit into certification.
The packet is not the person.
The event is not the relationship.
The container is not the truth.
The demo-flow is not recovery.
The termination-gate case is not accuracy evidence.
The phone-only simulator is not the phone call.
The sample phone-session script is not a transcript.
The phone-session state machine is not authority.
The replay scaffold is not real replay.
The replay skeleton is a map of a map.
The replay manifest is not a session.
The replay event timeline is not the event.
The replay boundary is not authority.
The P5-3 evaluator is not proof.
The P5-3 evaluator is not real AI impact validation.
The P5-3 evaluator is not real human-state measurement validation.
The P5-3 evaluator is not benchmark validation.
The P5-3 evaluator is not mediation validation.
The P5-3 evaluator is not production readiness.
The validator is not authority.
The evaluator is not proof.
The workflow is not certification.
The dashboard is not a judgment engine.
The repository is a map.
It is not the mountain.