Skip to content

salpida-foundation/proxy-benchmark-track

Repository files navigation

Proxy Benchmark Track

Current public helper release: v0.1.2 — Bounded Public Helper Pre-Release
v0.1.2 supersedes v0.1.1 as the current public helper route.
This release is research-stage, public-helper-only, synthetic/sample-data-first, raw-data-non-public, non-clinical, non-diagnostic, non-therapeutic, non-surveillance, not Sal-Meter, not CAIS compliance, not a validated benchmark, and not production readiness.

A research-stage public helper repository for measuring what AI leaves behind in the human state.

Most AI benchmarks ask whether AI outputs are correct, safe, helpful, or aligned.

The Proxy Benchmark Track asks a different question:

What did the AI output leave behind in the human state?

And in a dyadic session:

Did the AI help both people move toward recovery, or did it improve one side while burdening, silencing, or exposing the other?


Current public helper release

Current release: v0.1.2 — Bounded Public Helper Pre-Release

v0.1.2 is the current bounded public helper pre-release for the Human-State Proxy Benchmark Track.

It supersedes v0.1.1 for the current public helper route.

Release route:

https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.2

This release is a research-stage public helper release.

It is:

  • public-helper-only
  • synthetic/sample-data-first
  • raw-data-non-public
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-surveillance
  • not Sal-Meter
  • not Proxy Sal-Meter
  • not CAIS compliance
  • not a validated benchmark
  • not validated mediation
  • not device readiness
  • not production readiness

This release may be used to understand the current public helper structure, synthetic evaluator direction, schema boundary, sample-data boundary, and public/private data separation.

It must not be cited or described as validation of real human-state measurement, dyadic recovery, AI mediation effectiveness, clinical readiness, commercial readiness, device readiness, Sal-Meter readiness, or CAIS compliance.

Public examples in this repository must remain limited to:

  • synthetic data
  • sample data
  • schemas
  • mock packets
  • toy examples
  • placeholder flows
  • evaluator helpers
  • documentation scaffolds

The following must not be placed in this public repository:

  • raw human data
  • private pilot data
  • confidential advisor material
  • SRA material
  • reviewer memos
  • consent records
  • real participant records
  • controlled-access evidence packages

One-line thesis

The Proxy Benchmark Track evaluates what an AI output leaves behind after it acts.

It does not primarily ask whether the AI answer was correct, fluent, persuasive, emotionally pleasant, or superficially helpful.

It asks whether the output changed downstream human-state burden, recovery direction, dyadic stability, and termination readiness inside a bounded, consent-based, non-clinical, non-surveillance session.

AI Output → Human-State Delta → Dyadic Recovery → Recovery / Termination Gate

This repository therefore focuses on consequence, not performance theater.

For dyadic interaction, the core question is:

Did both sides move toward recovery, or did one side become silent, exposed, burdened, coerced, or erased?

This section does not validate human-state measurement, dyadic recovery, mediation effectiveness, clinical status, diagnostic use, therapeutic use, surveillance use, Sal-Meter status, CAIS compliance, or production readiness.


Current status boundary

Status: research-stage · public-helper-only · synthetic/sample-data-first · raw-data-non-public · non-clinical · non-diagnostic · non-therapeutic · non-surveillance · non-counseling · non-coercive · pre-validation · pre-device · pre-certification · pre-compliance · benchmark-support-only

This repository is a public helper surface for the Human-State Proxy Benchmark Track.

It is:

  • not the Sal-Meter core signal track
  • not a Proxy Sal-Meter
  • not a CAIS-compliant device implementation
  • not a validated consciousness measurement system
  • not a validated benchmark
  • not validated mediation
  • not validated dyadic recovery
  • not validated termination-gate accuracy
  • not a clinical, diagnostic, therapeutic, psychiatric, medical, counseling, employment, insurance, legal, educational, eligibility, mediation-service, or surveillance system
  • not a certification, conformance, or mark-usage surface
  • not a closed-loop intervention system
  • not a production monitoring system
  • not a phone monitoring system
  • not a replay validation system
  • not a relationship-verdict system
  • not a human-ranking system
  • not a place to publish raw human data

This repository may contain public-safe helper materials only:

  • synthetic data
  • sample data
  • schemas
  • mock packets
  • toy examples
  • placeholder flows
  • evaluator helpers
  • simulator scaffolds
  • replay scaffolds
  • documentation scaffolds

This repository must not contain:

  • raw human data
  • identifiable human data
  • real participant records
  • real dyadic conflict records
  • real session records
  • real phone recordings
  • real call transcripts
  • private consent records
  • clinical records
  • raw biosignals
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • production intervention logs
  • device-readiness evidence
  • production-readiness evidence
  • certification evidence

A closed session must stay closed.

A replay must not reopen a closed session.

A helper structure is not evidence.

A validator is not proof.


Public landing page

https://salpida.foundation/topics/human-state-aware-ai-interaction/

Core distinction

Sal-Meter Core Track

The Sal-Meter Core Track asks whether a new molecular–electrochemical signal interface can produce stable, repeatable, auditable signal behavior under the CAIS / Sal-Meter kernel program.

Current core execution order:

External Layer-0 iodine redox / thiol feasibility
→ SICS Internal Phase 0 — G-only
→ Phase 1 — I-only
→ Phase 2a — Twin Mini-Cell
→ Phase 2b — G+I human pilot
→ LOCK 1 / LOCK 2
→ Future SDK / broader opening

Core technical route:

https://github.com/salpida-foundation/sal-meter-kernel-program

Proxy Benchmark Track

The Proxy Benchmark Track prepares the comparison, interaction, and mediation-evaluation layer.

It uses existing proxy signals and synthetic/sample helper structures to prepare synchronized benchmark infrastructure before future Sal-Meter I/G-channel inputs become available.

The proxy track supports the core track.

It does not replace it.


What makes this repository different

Most AI evaluation looks at the output.

This repository is built around the consequence.

It asks:

What remains in the human state after AI acts?

For two-person interaction, the sharper question is:

Did both sides move toward recovery,
or did one side become silent, exposed, burdened, coerced, or erased?

This repository is not another chatbot project.

It is a public helper surface for a future human-state-aware AI mediation benchmark.


Canonical / DOI relationship

This repository is a public technical helper surface.

It accompanies DOI-registered public records.

It does not replace them.

GitHub helps builders move.
DOI records govern authority.

If this GitHub repository or release conflicts with a DOI-registered SICS / CAIS / Sal-Meter / CCF canonical record or a formally issued SICS determination, the stricter DOI-registered canonical record or SICS determination controls.


Core Proxy Benchmark Track records

SICS Human-State Proxy Benchmark Track — Public Boundary and Program Charter v0.1

Defines public boundary, naming rules, prohibited claims, data-publication limits, roadmap logic, GitHub helper status, and Go / Hold / No-Go structure.

Version DOI:
https://doi.org/10.5281/zenodo.19837423

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19837422

SICS Human-State Proxy Benchmark Track — Scientific Rationale and Research Value v0.1

Explains Human-State Cost, AI performance versus human-state impact, measurement-layer simplification, and future Sal-Meter A/B comparison logic.

Version DOI:
https://doi.org/10.5281/zenodo.19837971

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19837970

Human-State-Aware AI Mediation document set

Human-State Mediation Boundary Standard v0.1

Fixes the outer boundary: consent-based, non-clinical, non-surveillance, raw-data-non-public.

Version DOI:
https://doi.org/10.5281/zenodo.19904289

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19904288

Human-State Packet Minimal Data-Sharing Standard v0.1

Fixes the minimum packet object: summary-only sharing, permission, expiry, confidence, data quality, and raw-data exclusion.

Version DOI:
https://doi.org/10.5281/zenodo.19905541

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19905540

Dyadic Human-State Mediation Benchmark Charter v0.1

Fixes the benchmark objective:

AI Output → Human-State Delta → Dyadic Recovery
Version DOI:
https://doi.org/10.5281/zenodo.19906725

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19906724

Human-State Session Protocol v0.1 — Structural Declaration

Fixes the session structure:

Session Creation
→ Consent Confirmation
→ Packet Availability Check
→ Baseline State Summary
→ AI Output
→ Post-Output State Summary
→ Human-State Delta
→ Recovery Gate
→ Termination Gate
→ Session Closure
→ Audit Log
Version DOI:
https://doi.org/10.5281/zenodo.19908379

Concept DOI / All Versions DOI:
https://doi.org/10.5281/zenodo.19908378

Repository release history

v0.1.2 — Bounded Public Helper Pre-Release

Current public helper release.

v0.1.2 is the current bounded public helper pre-release for the Human-State Proxy Benchmark Track.

It supersedes v0.1.1 for the current public helper route.

Release route:
https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.2

Boundary:

  • research-stage only
  • public-helper-only
  • synthetic/sample-data-first
  • raw-data-non-public
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-surveillance
  • not Sal-Meter
  • not Proxy Sal-Meter
  • not CAIS compliance
  • not a validated benchmark
  • not validated mediation
  • not device readiness
  • not production readiness

This release may be used to understand the current public helper structure, synthetic evaluator direction, schema boundary, sample-data boundary, and public/private data separation.

It must not be cited or described as validation of real human-state measurement, dyadic recovery, AI mediation effectiveness, clinical readiness, commercial readiness, device readiness, Sal-Meter readiness, or CAIS compliance.


v0.1.1 — Prior helper release

v0.1.1 is a prior post-validator-pass public helper release.

It superseded v0.1.0 for helper-structure validation status, but it is no longer the current public helper route after publication of v0.1.2.

Use v0.1.2 for the current bounded public helper pre-release boundary.

v0.1.1 confirmed only that the public synthetic/sample package validator could run and report helper-structure PASS / FAIL.

It did not validate benchmark performance.

It did not validate scientific truth.

It did not validate Sal-Meter.

It did not grant CAIS compliance.

It did not certify any system, model, dataset, dashboard, laboratory, device, repository, schema, session protocol, implementation, or mediation system.

Prior release route:
https://github.com/salpida-foundation/proxy-benchmark-track/releases/tag/v0.1.1


v0.1.0 — Initial public helper pre-release

v0.1.0 was the initial bounded public helper pre-release.

It documented the public helper structure before post-validator correction.

It remains part of the project history, but it is not the current public helper route.


Current implementation status

This repository is currently in a public helper implementation stage for the SICS Human-State Proxy Benchmark Track.

It provides:

  • schema helper structures;
  • synthetic/sample data;
  • P3 synthetic dyadic helper package;
  • P4 synthetic dyadic demo-flow package;
  • P4-1 synthetic dyadic recovery demo-flow evaluator;
  • P5-3 synthetic AI Output A/B consequence evaluator helper;
  • synthetic AI Output A/B comparison support;
  • proxy-only output comparison logic without OE / RE / EE, VCE / CRI / CFI, Sal-Meter, CAIS compliance, validation, certification, device-readiness, or production-readiness claims;
  • P4-2 mediation policy prompt pack;
  • P4-3 synthetic termination-gate helper case package;
  • P4-3 synthetic termination-gate helper evaluator;
  • P4-4 phone-only simulator scaffold;
  • P4-4 phone-only session flow wireframe;
  • P4-4 synthetic phone-session state-machine mockup;
  • P4-4 synthetic sample phone-session script;
  • P4-5 synthetic session replay scaffold;
  • P4-5 synthetic replay manifest;
  • P4-5 synthetic replay event timeline;
  • P4-5 synthetic replay boundary document;
  • validation scaffolding;
  • P3 helper-schema validation;
  • synthetic demo-flow consistency checking;
  • synthetic termination-gate helper consistency checking;
  • boundary language linting;
  • dashboard mockup boundaries;
  • protocol helper rules;
  • closed-loop demo-lite boundary scaffolding;
  • replication guide checklists;
  • contributor issue / PR templates;
  • Human-State-Aware AI Mediation helper documents;
  • GitHub Actions helper-structure validation workflow;
  • bounded prompt / policy scaffolding for synthetic mediation simulation.

It does not provide benchmark evidence.

It does not provide raw human data.

It does not provide Sal-Meter input.

It does not grant CAIS compliance.

It does not validate Sal-Meter.

It does not validate mediation.

It does not validate dyadic recovery.

It does not validate termination-gate accuracy.

It does not validate synthetic session replay.

It does not certify device readiness.

It does not certify production readiness.

It does not authorize production closed-loop intervention.

The phone-only simulator is a public helper scaffold only.

The synthetic session replay skeleton is a public helper scaffold only.

It is not a real phone monitoring system.

It is not a real session replay system.

It is not a real transcript replay system.

It is not a clinical system.

It is not a diagnostic system.

It is not a therapeutic system.

It is not a counseling system.

It is not a mediation-service system.

It is not a surveillance system.

A closed session must stay closed.

A replay must not reopen a closed session.


Implementation status table

Work item Status Notes
Governance boundary files Present Public/private data boundary and prohibited-claim discipline are represented in the repository
Schema completion Done schemas/ contains public helper schemas for metadata, event markers, streams, labels, QC, features, splits, Human-State Packet, Dyadic Session Event, and Benchmark Session Container helper structures
Human-State Packet JSON helper schema Done schemas/human_state_packet.schema.json defines a public helper schema for synthetic Human-State Packets
Dyadic Session Event JSON helper schema Done schemas/dyadic_session_event.schema.json validates one public-safe synthetic/sample dyadic session boundary event
Benchmark Session JSON helper schema Done schemas/benchmark_session.schema.json validates one public-safe synthetic/sample benchmark session container
Synthetic sample package Present / Passed validator sample-data/synthetic-session-001/ contains a public synthetic/sample structure package that passes helper-structure validation
Synthetic dyadic helper package Present / Passed P3 helper-schema validation sample-data/synthetic-dyadic-session-001/ contains Human-State Packet A/B, Dyadic Session Event, and Benchmark Session Container examples
Synthetic dyadic demo-flow package Present / Passed P4-1 evaluator sample-data/synthetic-dyadic-session-001/ contains ai_outputs.json, dyadic_delta.json, recovery_gate.json, termination_gate.json, and audit_log.json examples
P4-1 dyadic recovery demo evaluator Present / Passed evaluation-baseline/evaluate_dyadic_recovery_demo.py checks synthetic demo-flow consistency only
P5-3 synthetic AI Output A/B consequence evaluator helper Present / Helper-stage evaluation-baseline/evaluate_ai_output_ab.py supports synthetic AI Output A/B consequence comparison using proxy-only metrics; it does not validate real AI impact, real human-state measurement, mediation effectiveness, Sal-Meter, CAIS compliance, benchmark validation, device readiness, or production readiness
P4-2 mediation policy prompt pack Present prompts/ contains README.md and mediation_policy_v0.1.json; docs/mediation-policy-prompt-pack.md documents private cue, shared mediation output, false recovery prevention, and termination boundary logic
P4-3 synthetic termination-gate helper case package Present / Passed P4-3 evaluator sample-data/synthetic-dyadic-session-001/ contains termination_gate_cases.json with synthetic pause, narrow, close, terminate, refresh, and audit-only helper cases
P4-3 termination gate demo evaluator Present / Passed evaluation-baseline/evaluate_termination_gate_demo.py checks synthetic termination-gate helper consistency only
P4-4 phone-only simulator scaffold Present phone-only-simulator/ contains a public-safe, synthetic-only phone-session simulator helper package
P4-4 phone-only simulator README Present phone-only-simulator/README.md defines folder boundary, intended files, public data boundary, P4-3 relationship, and final rule
P4-4 phone session flow wireframe Present phone-only-simulator/session-flow-wireframe.md defines consent, packet check, baseline summary, AI output, Human-State Delta, Recovery Gate, Termination Gate, closure, and audit screens
P4-4 phone session state machine Present phone-only-simulator/phone-session-state-machine.json defines synthetic-only states, allowed transitions, forbidden transitions, allowed decisions, prohibited decisions, and boundary flags
P4-4 sample phone session script Present phone-only-simulator/sample-phone-session-script.md provides a synthetic sample script showing consent, packet availability, AI output, delta review, recovery gate, termination gate, closure, and audit flow
P4-5 synthetic session replay scaffold Present synthetic-session-replay/ contains a public-safe, synthetic-only session replay helper scaffold
P4-5 synthetic replay README Present synthetic-session-replay/README.md defines replay scaffold purpose, scope, intended files, public data boundary, P4-4 relationship, closed-session replay rule, and final rule
P4-5 synthetic replay manifest Present synthetic-session-replay/replay-manifest.json defines replay source declaration, replay scope, boundary flags, replay flow, closed-session rule, allowed decisions, prohibited decisions, and success meaning
P4-5 synthetic replay event timeline Present synthetic-session-replay/replay-event-timeline.json defines synthetic replay sequence from manifest loading through source declaration, consent, packet review, AI output, delta, recovery gate, termination gate, closure, and audit
P4-5 synthetic replay boundary Present synthetic-session-replay/replay-boundary.md defines allowed replay materials, prohibited replay materials, prohibited replay claims, closed-session replay rule, replay interpretation, P4-4 relationship, and public release rule
Synthetic session README Done The original synthetic package includes a local README explaining file roles and boundaries
Synthetic dyadic session README Done The dyadic synthetic package includes a local README explaining P3 helper-schema, P4 demo-flow, and P4-3 termination-gate helper boundaries
Sample package validator Present / Passed evaluation-baseline/validate_sample_package.py provides helper-structure validation for the original synthetic package
P3 helper-schema validator Present / Passed evaluation-baseline/validate_p3_schemas.py validates the public synthetic P3 dyadic helper files against the Human-State Packet, Dyadic Session Event, and Benchmark Session schemas
Boundary language lint Present / Passed advisory mode evaluation-baseline/boundary_lint.py scans public helper wording for prohibited or risky boundary-language drift
Evaluation baseline README Done evaluation-baseline/README.md explains validator usage, P3 helper-schema validation, P4-1 demo-flow evaluation, P4-3 termination-gate helper evaluation, PASS / FAIL interpretation, dependency installation, and validation boundaries
Protocol helper boundary pack Done protocol-helper/ defines label, timestamp, metadata, Human-State Cost, and future Sal-Meter A/B comparison boundaries
Dashboard mockup boundary pack Done dashboard-mockup/ defines dashboard claim, field, and wireframe boundaries
Closed-loop demo-lite boundary pack Done closed-loop-demo-lite/ defines feedback-loop boundaries, event-log schema, and local placeholder code
Replication guide pack Done replication-guide/ defines reproducibility, metadata completeness, audit trail, and public release-readiness checklists
Issue / PR template pack Done .github/ISSUE_TEMPLATE/ and .github/pull_request_template.md define contributor boundary gates
GitHub Actions validator workflow Passed / unchanged for P4-5 .github/workflows/validate-synthetic-sample.yml runs the original sample validator, P3 helper-schema validator, P4 synthetic dyadic recovery demo-flow evaluator, P4-3 synthetic termination-gate helper evaluator, and boundary language lint; P4-5 currently adds documentation and replay scaffold only, not a new validator
Citation metadata Present CITATION.cff points citation toward DOI-registered public boundary records
Raw human data Not present Public repository examples must remain synthetic, mock, placeholder, or sample-structure-only
Sal-Meter input Not present This repository is not Sal-Meter and does not contain Sal-Meter signal data
CAIS compliance claim Not present This repository does not grant CAIS compliance
Benchmark validation Not present No model, dataset, dashboard, sensor stack, feedback loop, template, PR, validator, workflow, evaluator, phone-only simulator, replay scaffold, termination-gate helper case, or benchmark result is validated by this repository
Phone monitoring authority Not present The P4-4 phone-only simulator and P4-5 replay scaffold are not real phone monitoring systems and do not process real calls, raw audio, transcripts, or identifiable participant data
Replay validation authority Not present The P4-5 synthetic session replay scaffold does not validate replay, mediation, dyadic recovery, termination-gate accuracy, Sal-Meter, CAIS compliance, device readiness, or production readiness
Production closed-loop authority Not present No phone-only simulator file or replay scaffold file authorizes production mediation, monitoring, intervention, relationship verdicts, or human ranking
Release status v0.1.2 published as bounded public helper pre-release v0.1.2 is the current bounded public helper pre-release; v0.1.1 is now a prior post-validator-pass helper release

Current P1 milestone state

Milestone Status Notes
P1-1 Schema completion Done Schema folder contains helper schemas and schemas/README.md
P1-2 Synthetic sample package validator Done Validator file exists under evaluation-baseline/validate_sample_package.py
P1-3 Evaluation baseline README and validator usability Done Evaluation baseline README explains local usage, PASS / FAIL meaning, dependency installation, and validator boundaries
P1-4 GitHub Actions validator workflow Done Workflow completed successfully after GitHub Actions access was restored
P1-5 v0.1.0 release readiness package Done v0.1.0 was published as the initial bounded public helper pre-release; v0.1.1 superseded it for post-validator-pass helper-structure status; v0.1.2 is now the current bounded public helper pre-release

Current P2 milestone state

Milestone Status Notes
P2-1 Protocol helper boundary pack Done protocol-helper/ contains bounded helper rules for labels, timestamps, metadata completeness, Human-State Cost, and future Sal-Meter A/B comparison
P2-2 Dashboard mockup boundary pack Done dashboard-mockup/ contains README, claim boundary, sample dashboard fields, and mockup wireframe
P2-3 Closed-loop demo-lite boundary pack Done closed-loop-demo-lite/ contains README, feedback-loop boundary, feedback event-log schema, and local placeholder code
P2-4 Replication guide pack Done replication-guide/ contains README, reproducibility package checklist, metadata completeness checklist, audit trail checklist, and public release checklist
P2-5 Issue / PR template pack Done .github/ISSUE_TEMPLATE/ contains boundary correction, schema request, sample-data issue, and leakage-risk report templates; .github/pull_request_template.md defines PR boundary review

Current P3 milestone state

P3 introduces the Human-State-Aware AI Mediation helper layer.

P3 helper documents and schemas have been completed through P3-17.

This remains a public helper layer.

It is not benchmark validation.

It is not Sal-Meter validation.

It is not CAIS compliance.

Milestone Status Notes
P3-1 Human-State Mediation Layer Done docs/human-state-mediation-layer.md defines the public helper concept connecting AI Output, Human-State Delta, Dyadic Recovery, Human-State Packet, Recovery Gate, and Termination Gate
P3-2 Human-State Packet helper document Done docs/human-state-packet-schema.md defines the packet as a consent-bound, permission-bound, expiry-bound, confidence-aware, data-quality-aware, session-scoped, sharing-scoped, raw-data-excluding state-summary object
P3-2 Human-State Packet JSON helper schema Done schemas/human_state_packet.schema.json defines the machine-readable helper structure for public synthetic/sample packet examples
P3-3 Dyadic Recovery Baseline Suite B0-B7 Done docs/dyadic-recovery-baseline-suite.md defines baseline comparison logic from chance through recovery/termination gate baselines
P3-4 Recovery Gate Definition Done docs/recovery-gate-definition.md defines the gate for preventing false recovery and determining when mediation can reduce, pause, or stop
P3-5 Termination Gate Definition Done docs/termination-gate-definition.md defines the gate for consent withdrawal, permission expiry, data quality failure, high uncertainty, overstay prevention, session closure, and auditability
P3-6 Human-State Session Protocol Done docs/human-state-session-protocol.md defines a bounded, consent-based, permission-bound, audit-ready session lifecycle
P3-7 Dyadic Mediation Session Flow Done docs/dyadic-mediation-session-flow.md defines the dyadic session flow and preserves the rule that one-sided improvement is not dyadic recovery
P3-8 Consent and Data-Sharing Boundary Done docs/consent-and-data-sharing-boundary.md defines consent, permission, sharing, expiry, withdrawal, public/private data boundary, raw-data-non-public rule, and audit boundary
P3-9 Dyadic Session Event JSON helper schema Done schemas/dyadic_session_event.schema.json validates one public-safe synthetic/sample dyadic session boundary event
P3-10 Benchmark Session JSON helper schema Done schemas/benchmark_session.schema.json validates one public-safe synthetic/sample benchmark session container
P3-11 Schemas README alignment Done schemas/README.md distinguishes packet object, dyadic session event object, and benchmark session container
P3-12 Root README alignment Done Root README aligned with completed P3 helper documents and schemas
P3-13 Final P3 boundary audit Done docs/p3-final-boundary-audit.md records the final P3 boundary audit before release packaging
P3-14 v0.1.0 public helper release package Done docs/v0.1.0-public-helper-release-package.md prepares the bounded release package
P3-15 GitHub pre-release notes and publication gate Done docs/v0.1.0-github-pre-release-notes-and-publication-gate.md preserves release notes and publication gate language
P3-16 GitHub pre-release draft correction Done GitHub draft dependence was treated as unreliable; publication proceeded through a separate authorization gate
P3-17 Public pre-release publication authorization Done v0.1.0 was published as the initial public helper pre-release; v0.1.1 superseded it for post-validator-pass helper status; v0.1.2 is now the current bounded public helper pre-release

Current P5 helper-validation state

P5 adds automation and machine-checkable helper gates around the public Proxy Benchmark Track helper surface.

This remains public-helper-only.

It is not benchmark validation.

It is not scientific validation.

It is not Sal-Meter validation.

It is not CAIS compliance.

It is not mediation validation.

It is not dyadic recovery validation.

It is not termination-gate accuracy validation.

It is not synthetic replay validation.

It is not certification.

It is not production readiness.

P4-4 adds a public phone-only simulator scaffold.

P4-5 adds a public synthetic session replay scaffold.

P4-4 and P4-5 are documentation and simulator / replay scaffolding only.

P4-4 is not currently part of the P5 helper-validation chain unless a later validator or lint step is added.

P4-5 is not currently part of the P5 helper-validation chain unless a later validator or lint step is added.

Milestone Status Notes
P5-0 Boundary language lint Done / advisory mode evaluation-baseline/boundary_lint.py and evaluation-baseline/prohibited_terms.json are implemented; GitHub Actions runs the boundary lint step in advisory mode
P5-1 P3 helper-schema validator Done / Passed evaluation-baseline/validate_p3_schemas.py validates the synthetic P3 dyadic helper files against human_state_packet.schema.json, dyadic_session_event.schema.json, and benchmark_session.schema.json
P5-1 synthetic dyadic helper package Done / Passed sample-data/synthetic-dyadic-session-001/ contains human_state_packet_A.json, human_state_packet_B.json, dyadic_session_event.json, and benchmark_session_container.json
P4-0 synthetic dyadic demo-flow package Done / Passed sample-data/synthetic-dyadic-session-001/ contains ai_outputs.json, dyadic_delta.json, recovery_gate.json, termination_gate.json, and audit_log.json
P4-1 synthetic dyadic recovery delta evaluator Done / Passed evaluation-baseline/evaluate_dyadic_recovery_demo.py evaluates synthetic demo-flow consistency only
P5-3 synthetic AI Output A/B consequence evaluator helper Present / Helper-stage evaluation-baseline/evaluate_ai_output_ab.py supports synthetic AI Output A/B consequence comparison using proxy-only metrics; it does not validate real AI impact, real human-state measurement, mediation effectiveness, Sal-Meter, CAIS compliance, benchmark validation, device readiness, or production readiness
P4-2 mediation policy prompt pack Done prompts/ contains README.md and mediation_policy_v0.1.json; docs/mediation-policy-prompt-pack.md documents private cue, shared mediation output, false recovery prevention, and termination boundary logic
P4-3 synthetic termination-gate helper case package Done / Passed sample-data/synthetic-dyadic-session-001/termination_gate_cases.json contains synthetic pause, narrow, close, terminate, refresh, and audit-only helper cases
P4-3 termination gate demo evaluator Done / Passed evaluation-baseline/evaluate_termination_gate_demo.py evaluates synthetic termination-gate helper consistency only
P5-1 documentation alignment Done schemas/README.md, sample-data/README.md, evaluation-baseline/README.md, and root README.md explain P3 helper-schema validation as helper-structure validation only
P4-3 documentation alignment Done sample-data/README.md, evaluation-baseline/README.md, and root README.md explain P4-3 termination-gate helper evaluation as synthetic helper consistency only
P4-4 phone-only simulator scaffold Present / documentation only phone-only-simulator/ contains public-helper documentation and simulator scaffolding only; it is not a validator and is not production monitoring
P4-4 phone-only simulator README Present / documentation only phone-only-simulator/README.md defines folder boundary, public data boundary, P4-3 relationship, and final rule
P4-4 phone session flow wireframe Present / documentation only phone-only-simulator/session-flow-wireframe.md defines synthetic consent, packet check, AI output, delta review, recovery gate, termination gate, closure, and audit screens
P4-4 phone session state machine Present / synthetic mockup only phone-only-simulator/phone-session-state-machine.json defines synthetic-only states, allowed transitions, forbidden transitions, allowed decisions, prohibited decisions, and boundary flags
P4-4 sample phone session script Present / synthetic script only phone-only-simulator/sample-phone-session-script.md provides a synthetic sample phone-session script without real audio, real transcript, real participant data, Sal-Meter input, CAIS compliance dossier, or production intervention logic
P4-5 synthetic session replay scaffold Present / documentation and JSON scaffold only synthetic-session-replay/ contains public-helper documentation, replay manifest, replay event timeline, and replay boundary only; it is not a validator and is not real session replay
P4-5 synthetic replay README Present / documentation only synthetic-session-replay/README.md defines replay scaffold purpose, scope, intended files, public data boundary, P4-4 relationship, closed-session replay rule, and final rule
P4-5 synthetic replay manifest Present / synthetic manifest only synthetic-session-replay/replay-manifest.json defines replay source declaration, replay scope, boundary flags, replay flow, closed-session rule, allowed decisions, prohibited decisions, and success meaning
P4-5 synthetic replay event timeline Present / synthetic timeline only synthetic-session-replay/replay-event-timeline.json defines synthetic replay sequence from manifest loading through source declaration, consent, packet review, AI output, delta, recovery gate, termination gate, closure, and audit
P4-5 synthetic replay boundary Present / documentation only synthetic-session-replay/replay-boundary.md defines allowed replay materials, prohibited replay materials, prohibited replay claims, closed-session replay rule, replay interpretation, P4-4 relationship, and public release rule

Current P5 helper-validation chain:

The current helper-validation chain is:

  • validate_sample_package.py
  • validate_p3_schemas.py
  • evaluate_dyadic_recovery_demo.py
  • evaluate_termination_gate_demo.py
  • boundary_lint.py

P5-3 adds a synthetic AI Output A/B consequence evaluator helper:

  • evaluation-baseline/evaluate_ai_output_ab.py

Current P5-3 status:

  • Present / Helper-stage
  • Not yet workflow-validated unless the GitHub Actions workflow explicitly runs it
  • Not benchmark validation
  • Not mediation validation
  • Not production readiness

P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only metrics.

It does not validate:

  • real AI impact
  • real human-state measurement
  • mediation effectiveness
  • dyadic recovery
  • termination-gate accuracy
  • Sal-Meter
  • CAIS compliance
  • benchmark validation
  • device readiness
  • production readiness

P5-3 should be added to the GitHub Actions helper-validation chain only after the workflow explicitly runs:

  • python evaluation-baseline/evaluate_ai_output_ab.py

Until that workflow step is active, P5-3 must be described as:

  • helper-stage
  • not workflow-validated
  • not benchmark validation
  • not mediation validation
  • not production readiness

P4-4 is not currently included in the validation chain.

P4-5 is not currently included in the validation chain.

Completed P5 helper-validation files

evaluation-baseline/
  boundary_lint.py
  prohibited_terms.json
  validate_p3_schemas.py
  evaluate_dyadic_recovery_demo.py
  evaluate_ai_output_ab.py
  evaluate_termination_gate_demo.py
  README.md

sample-data/
  synthetic-dyadic-session-001/
    README.md
    human_state_packet_A.json
    human_state_packet_B.json
    dyadic_session_event.json
    benchmark_session_container.json
    ai_outputs.json
    dyadic_delta.json
    recovery_gate.json
    termination_gate.json
    audit_log.json
    termination_gate_cases.json

These files support:

P3 helper-schema validation
P4-1 synthetic demo-flow consistency checking
P5-3 synthetic AI Output A/B consequence helper comparison
P4-3 synthetic termination-gate helper consistency checking
boundary language linting

They do not support:

benchmark validation
scientific validation
mediation validation
dyadic recovery validation
termination-gate accuracy validation
synthetic replay validation
Sal-Meter validation
CAIS compliance
clinical readiness
diagnostic readiness
therapeutic readiness
device readiness
production readiness
certification
relationship verdict authority
human-ranking authority
phone monitoring authority
production closed-loop authority

Correct boundary sentence:

Completed P5 helper-validation files support helper structure, schema checks, synthetic demo-flow consistency checks, P5-3 synthetic AI Output A/B consequence helper comparison, synthetic termination-gate helper consistency checks, and wording-boundary checks only; they do not create evidence, real AI impact validation, real human-state measurement validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, certification, Sal-Meter status, CAIS compliance, replay validation, phone monitoring authority, device readiness, production readiness, or production authority.

Completed P4-4 public simulator scaffold files

The P4-4 phone-only simulator scaffold is a public helper scaffold only.

It may demonstrate synthetic phone-only session structure, but it does not process real calls, real audio, real transcripts, real participant data, or real session records.

Completed P4-4 files:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

These files support:

  • phone-only simulator boundary documentation
  • synthetic phone-session flow wireframe
  • synthetic phone-session state-machine mockup
  • synthetic sample phone-session script
  • consent-first session entry representation
  • packet availability check representation
  • synthetic baseline summary representation
  • synthetic AI output representation
  • synthetic Human-State Delta review representation
  • Recovery Gate placeholder representation
  • Termination Gate placeholder representation
  • closed-session rule visibility
  • audit-log boundary visibility
  • public data boundary visibility

They do not support:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • real session record processing
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • mediation-service operation
  • surveillance
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • phone-only simulator validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-4 scaffold files must remain:

  • research-stage
  • public-helper-only
  • synthetic-only
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-counseling
  • non-surveillance
  • non-certification
  • non-human-ranking
  • not Sal-Meter
  • not CAIS compliance
  • not benchmark validation
  • not mediation validation
  • not dyadic recovery validation
  • not termination-gate accuracy validation
  • not phone monitoring authority
  • not production readiness
  • not production closed-loop

The phone-only simulator is not the phone call.

The sample phone-session script is not a transcript.

The phone-session state machine is not authority.

A closed session must stay closed.

Correct boundary sentence:

Completed P4-4 public simulator scaffold files may demonstrate synthetic phone-only session structure only; they do not create evidence, validation, certification, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Completed P4-5 public replay scaffold files

The P4-5 synthetic session replay scaffold is a public helper scaffold only.

It may demonstrate synthetic session replay structure, but it does not process real sessions, real calls, real audio, real transcripts, real participant data, or real session records.

Completed P4-5 files:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

These files support:

  • synthetic session replay boundary documentation
  • synthetic replay manifest structure
  • synthetic replay event timeline structure
  • synthetic replay boundary rules
  • replay source declaration representation
  • consent boundary review representation
  • packet boundary review representation
  • synthetic AI output replay representation
  • synthetic Human-State Delta replay representation
  • Recovery Gate replay representation
  • Termination Gate replay representation
  • closure replay representation
  • audit-only replay summary representation
  • closed-session replay handling
  • public data boundary visibility

They do not support:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • clinical replay
  • diagnostic replay
  • therapeutic replay
  • counseling replay
  • surveillance replay
  • production mediation replay
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone monitoring validation
  • replay validation authority
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-5 scaffold files must remain:

  • research-stage
  • public-helper-only
  • synthetic-only
  • replay-scaffold-only
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-counseling
  • non-surveillance
  • non-certification
  • non-human-ranking
  • not real session replay
  • not real phone replay
  • not real transcript replay
  • not Sal-Meter
  • not CAIS compliance
  • not benchmark validation
  • not mediation validation
  • not dyadic recovery validation
  • not termination-gate accuracy validation
  • not synthetic replay validation
  • not phone monitoring authority
  • not replay validation authority
  • not production readiness
  • not production closed-loop

P4-5 scaffold files must not contain:

  • raw human data
  • identifiable human data
  • real participant data
  • real dyadic conflict records
  • real session records
  • real phone recordings
  • real call transcripts
  • real phone-session logs
  • real transcript replay
  • private consent records
  • clinical records
  • health records
  • diagnostic labels
  • therapeutic recommendations
  • counseling notes
  • relationship verdicts
  • human scores
  • human-ranking outputs
  • raw biosignals
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • production intervention logs
  • production monitoring logs
  • device-readiness evidence
  • production-readiness evidence
  • certification evidence

A synthetic replay may document a closed session.

A synthetic replay must not reopen a closed session.

A synthetic replay must not continue mediation after closure.

A synthetic replay must not generate new AI output after closure.

A synthetic replay must not convert closure into recovery evidence.

A synthetic replay must not convert audit into certification.

The replay scaffold is not a real replay.

The replay manifest is not a session.

The replay event timeline is not the event.

The replay boundary is not authority.

Correct boundary sentence:

Completed P4-5 public replay scaffold files may demonstrate synthetic session replay structure only; they do not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


P3 helper architecture

P3 defines the core Human-State-Aware AI Mediation helper architecture.

The P3 architecture connects AI output, bounded Human-State Packet use, session protocol, dyadic flow, Human-State Delta, Dyadic Delta, Recovery Gate, Termination Gate, consent boundary, session closure, and audit logging.

Architecture sequence:

  • AI Output
  • Human-State Packet
  • Human-State Session Protocol
  • Dyadic Mediation Session Flow
  • Human-State Delta A/B
  • Dyadic Delta
  • Recovery Gate
  • Termination Gate
  • Consent and Data-Sharing Boundary
  • Session Closure
  • Audit Log

The Consent and Data-Sharing Boundary controls what may cross each step.

P3 defines the core helper architecture.

P4-4 does not replace this architecture.

P4-4 projects this architecture into a public-safe phone-only simulator scaffold.

P4-5 does not replace this architecture.

P4-5 projects this architecture into a public-safe synthetic replay scaffold.

P4-4 represents the same boundary logic through:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

P4-5 represents replay review of the same boundary logic through:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

The P4-4 phone-only simulator may demonstrate:

  • consent-first session entry
  • packet availability checking
  • synthetic baseline state summary
  • synthetic AI output
  • synthetic Human-State Delta review
  • Recovery Gate placeholder
  • Termination Gate placeholder
  • closed-session handling
  • audit-log boundary

The P4-5 synthetic session replay scaffold may demonstrate:

  • replay manifest loading
  • replay source declaration
  • synthetic event timeline review
  • consent boundary review
  • packet boundary review
  • synthetic AI output replay
  • synthetic Human-State Delta replay
  • Recovery Gate replay
  • Termination Gate replay
  • closure replay
  • audit-only replay summary
  • closed-session replay handling

The P4-4 phone-only simulator must not imply:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • mediation-service operation
  • surveillance
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

The P4-5 synthetic session replay scaffold must not imply:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • clinical replay
  • diagnostic replay
  • therapeutic replay
  • counseling replay
  • surveillance replay
  • production mediation replay
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone monitoring validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-5 must not reopen a closed session.

P4-5 must not continue mediation after closure.

P4-5 must not convert closure into recovery evidence.

P4-5 must not convert audit replay into certification.

Correct boundary sentence:

P4-4 is a phone-only public helper projection of the P3 session architecture, and P4-5 is a synthetic replay scaffold for reviewing that structure after representation; neither creates evidence, validation, certification, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.


Object distinction

This section separates the three public-helper objects used in the Proxy Benchmark Track.

The three objects are:

  • Human-State Packet
  • Dyadic Session Event
  • Benchmark Session Container

They are related, but they are not the same object.

They must not be merged.

They must not be treated as evidence, diagnosis, relationship judgment, human ranking, Sal-Meter output, CAIS compliance, or benchmark validation.

Human-State Packet

A Human-State Packet is a minimal, consent-bound, permission-bound, expiry-bound, confidence-aware, data-quality-aware, session-scoped, sharing-scoped, raw-data-excluding state-summary object.

It may summarize bounded session state.

It must not expose raw human data.

It must not expose identifiable human data.

It must not expose private participant records.

It must not contain clinical records, diagnostic labels, therapeutic recommendations, counseling notes, raw biosignals, raw Sal-Meter traces, or raw CAIS traces.

A Human-State Packet is:

  • not the person
  • not the body
  • not the raw signal
  • not diagnosis
  • not therapy
  • not an emotion verdict
  • not a human score
  • not a relationship judgment
  • not Sal-Meter
  • not CAIS compliance
  • not benchmark validation

The packet is a bounded state-summary helper.

It is not authority.

Dyadic Session Event

A Dyadic Session Event is a public-safe synthetic/sample event object that records boundary events inside a dyadic session.

It may record synthetic or sample events such as:

  • consent status
  • permission status
  • packet availability
  • packet expiry
  • sharing scope
  • private cue status
  • shared output status
  • Human-State Delta A/B
  • Dyadic Delta
  • Recovery Gate decision
  • Termination Gate decision
  • session closure
  • audit status

A Dyadic Session Event records boundary movement.

It does not record the body.

It does not record the full relationship.

It does not validate dyadic recovery.

It does not create a relationship verdict.

It does not create human ranking.

It does not authorize mediation, monitoring, surveillance, diagnosis, therapy, counseling, or production closed-loop intervention.

The event is a boundary record.

It is not the relationship.

Benchmark Session Container

A Benchmark Session Container is a public-safe synthetic/sample container that connects the helper objects inside a benchmark session structure.

It may connect:

  • session metadata
  • Human-State Packet references
  • Dyadic Session Event references
  • baseline suite status
  • gate summaries
  • leakage review
  • holdout strategy
  • audit status
  • public release status
  • authority status
  • final boundary status

A Benchmark Session Container records the benchmark container.

It does not validate the benchmark.

It does not prove scientific truth.

It does not validate human-state measurement.

It does not validate dyadic recovery.

It does not validate mediation effectiveness.

It does not validate termination-gate accuracy.

It does not create Sal-Meter status.

It does not grant CAIS compliance.

It does not certify any model, dataset, dashboard, workflow, evaluator, simulator, replay scaffold, or mediation system.

Final distinction:

  • The packet summarizes bounded state.
  • The event records boundary movement.
  • The container organizes the benchmark session structure.

The packet is not the person.

The event is not the relationship.

The container is not the truth.


Benchmark chain

The benchmark chain describes how an AI output is evaluated by its downstream consequences.

It does not primarily evaluate whether the AI answer is fluent, persuasive, emotionally pleasant, or superficially helpful.

It evaluates what the AI output leaves behind in a bounded, consent-based, non-clinical, non-surveillance helper structure.

Benchmark chain:

  • AI Output
  • Human-State Delta
  • Dyadic Recovery
  • Recovery Gate / Termination Gate

This chain is public-helper-only.

It does not validate real human-state measurement.

It does not validate mediation effectiveness.

It does not validate dyadic recovery.

It does not validate termination-gate accuracy.

It does not create Sal-Meter status.

It does not grant CAIS compliance.

AI Output

AI Output records what the AI generated inside a bounded session structure.

Examples include:

  • generic AI output
  • state-aware AI output
  • private cue
  • shared mediation output
  • pause recommendation
  • clarification request
  • scope narrowing
  • recovery check
  • termination recommendation

AI Output is not sufficient evidence of recovery.

AI Output is not sufficient evidence of mediation effectiveness.

AI Output is not sufficient evidence of human-state improvement.

A good-sounding answer is not automatically a good consequence.

Human-State Delta

Human-State Delta describes the bounded proxy-observed change after the AI output.

It may describe whether the session state appears to move:

  • toward recovery
  • away from recovery
  • unchanged
  • mixed
  • uncertain
  • insufficient data
  • invalid

Human-State Delta is not diagnosis.

Human-State Delta is not therapy.

Human-State Delta is not emotion reading.

Human-State Delta is not a human score.

Human-State Delta is not a relationship verdict.

Human-State Delta is a bounded benchmark observation.

It must remain proxy-only unless and until a controlled private pilot is separately authorized.

Dyadic Recovery

Dyadic Recovery asks whether both sides of the dyad moved toward a session-defined recovery condition.

Recovery is not agreement.

Recovery is not silence.

Recovery is not obedience.

Recovery is not politeness.

Recovery is not synchrony by itself.

Recovery is not therapy.

Recovery is a bounded session-state condition where continued AI mediation can reduce, pause, narrow, close, or stop.

One-sided improvement is not dyadic recovery.

One-sided silence is not dyadic recovery.

One-sided relief is not dyadic recovery.

A dyad is not recovered merely because one participant stops resisting.

Recovery Gate

Recovery Gate asks whether the session-defined recovery condition has been reached.

It prevents false success.

It does not crown AI for speaking well.

It does not treat silence, obedience, agreement, synchrony, or one-sided improvement as automatic recovery.

Recovery Gate must remain sensitive to:

  • false recovery
  • asymmetric recovery
  • silence-as-recovery risk
  • one-sided burden transfer
  • private-state exposure risk
  • over-intervention risk
  • insufficient data quality
  • packet permission boundary
  • session closure boundary

Recovery Gate is not recovery validation.

Recovery Gate is not mediation validation.

Recovery Gate is not clinical, diagnostic, therapeutic, counseling, surveillance, or production authority.

Termination Gate

Termination Gate asks whether the session must pause, narrow, close, or stop.

It prevents endless mediation.

It protects:

  • consent
  • permission
  • packet expiry
  • data quality
  • session scope
  • private state
  • raw human data exclusion
  • auditability
  • closed-session integrity

Termination Gate may recommend:

  • continue
  • narrow
  • pause
  • close
  • terminate
  • refresh consent
  • refresh packet
  • audit only

Termination Gate is not termination-gate accuracy validation.

Termination Gate is not production authority.

Termination Gate does not authorize real-time monitoring, phone monitoring, replay validation, relationship verdicts, human ranking, or production closed-loop intervention.

A closed session must stay closed.

Correct boundary sentence:

The Benchmark chain may describe AI Output, Human-State Delta, Dyadic Recovery, Recovery Gate, and Termination Gate as public-helper structure only; it does not create evidence, validation, certification, Sal-Meter status, CAIS compliance, mediation authority, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.


Dyadic Recovery Baseline Suite

The Dyadic Recovery Baseline Suite defines what the system must be compared against before any stronger claim can be made.

A state-aware AI output is not meaningful unless it can be compared against simpler baselines.

The baseline suite asks:

  • Is the result better than chance?
  • Is the result better than one-person state tracking?
  • Is the result better than natural recovery without AI?
  • Is the result better than generic supportive AI?
  • Is the result better than fixed rule-based mediation scripts?
  • Does the system know when to reduce, pause, narrow, close, or stop?

The baseline suite is public-helper-only.

It does not validate real human-state measurement.

It does not validate dyadic recovery.

It does not validate mediation effectiveness.

It does not validate termination-gate accuracy.

It does not create Sal-Meter status.

It does not grant CAIS compliance.

Baseline ladder

B0 — Dummy / Chance Baseline

Question:

  • Can the system beat guessing, majority-class prediction, or trivial output?

Meaning:

  • If the system cannot beat B0, the benchmark structure is not useful.

B1 — Individual State Baseline

Question:

  • Can one participant’s state alone explain the outcome?

Meaning:

  • If one-person state explains everything, the dyadic layer adds no value.

B2 — Dyadic Relationship Baseline

Question:

  • Does the relation between both participants add explanatory value?

Meaning:

  • This checks whether dyadic structure matters beyond individual state.

B3 — No-Intervention Baseline

Question:

  • Would the dyad recover naturally without AI intervention?

Meaning:

  • The system must not take credit for recovery that would have happened anyway.

B4 — Generic AI Baseline

Question:

  • Is state-aware AI better than ordinary supportive AI output?

Meaning:

  • The system must outperform generic helpful language, not merely sound kind.

B5 — Rule-Based Mediation Baseline

Question:

  • Is the system better than fixed mediation scripts?

Meaning:

  • The system must show value beyond static communication templates.

B6 — Human-State-Aware AI Mediation Model

Question:

  • Does packet-informed AI improve bounded dyadic recovery conditions under synthetic or controlled helper conditions?

Meaning:

  • This is the candidate model condition, not proof of real-world mediation effectiveness.

B7 — Recovery / Termination Gate Baseline

Question:

  • Can the system identify when to reduce, pause, narrow, close, or stop?

Meaning:

  • A system that cannot stop safely is not a recovery-aware system.

Primary outcome

Primary outcome:

  • Dyadic Recovery Delta

Dyadic Recovery Delta does not mean validated dyadic recovery.

It is a bounded helper outcome for comparing synthetic or controlled session conditions.

Secondary outcomes

Secondary outcomes may include:

  • individual recovery direction
  • dyadic tension reduction
  • interruption reduction
  • turn-taking balance
  • mutual restatement success
  • recovery asymmetry
  • false recovery risk
  • silence-as-recovery risk
  • one-sided burden transfer
  • private-state exposure risk
  • post-intervention stability
  • termination readiness
  • mediation overstay risk
  • consent-boundary compliance
  • packet-permission compliance
  • leakage-safe benchmark score
  • human non-judgment compliance

Baseline rule

A model must not be described as successful merely because it sounds better.

A model must not be described as successful merely because one participant becomes quieter.

A model must not be described as successful merely because one participant reports relief.

A model must not be described as successful merely because the dyad appears calmer.

A stronger claim requires comparison against simpler baselines.

Correct boundary sentence

The Dyadic Recovery Baseline Suite may define public-helper comparison baselines for synthetic or controlled evaluation, but it does not create evidence, validation, certification, real human-state measurement validation, dyadic recovery validation, mediation validation, termination-gate accuracy validation, Sal-Meter status, CAIS compliance, production authority, relationship verdicts, or human-ranking authority.


Failure-sensitive principles

This benchmark must be sensitive to failure, not only to apparent improvement.

A session is not successful merely because the AI sounded good.

A session is not successful merely because one participant became quiet.

A session is not successful merely because one participant reported relief.

A session is not successful merely because both participants appeared calmer.

A session is not successful merely because both participants showed synchrony.

A session is not successful if the AI continues after it should reduce, pause, narrow, close, or stop.

The benchmark must detect false success.

Core failure types

The Proxy Benchmark Track must remain sensitive to the following failure types:

  • false recovery
  • asymmetric recovery
  • silence-as-recovery risk
  • one-sided burden transfer
  • private-state exposure
  • consent-boundary failure
  • packet-permission failure
  • expired-packet use
  • low-confidence overuse
  • insufficient data quality
  • AI overstay
  • over-intervention
  • relationship verdict generation
  • human scoring
  • human ranking
  • leakage into public output
  • failure to stop when termination is required
  • failure to exceed simpler baselines

False recovery

False recovery occurs when the session appears calmer but the underlying dyadic condition has not actually improved within the bounded session definition.

False recovery may include:

  • one participant becoming silent
  • one participant withdrawing
  • one participant complying under pressure
  • one participant showing relief while the other deteriorates
  • agreement without repair
  • politeness without recovery
  • synchrony without safety
  • session closure being mistaken for recovery

False recovery must not be treated as success.

Asymmetric recovery

Asymmetric recovery occurs when one participant appears to improve while the other becomes more burdened, exposed, silenced, or destabilized.

One-sided improvement is not dyadic recovery.

One-sided relief is not dyadic recovery.

One-sided silence is not dyadic recovery.

One-sided compliance is not dyadic recovery.

The dyad is the unit of interpretation.

Silence-as-recovery risk

Silence must not be interpreted as recovery by default.

Silence may mean:

  • recovery
  • fatigue
  • withdrawal
  • fear
  • resignation
  • overload
  • coercion
  • confusion
  • strategic non-response
  • loss of trust
  • refusal to continue

Silence requires boundary-sensitive interpretation.

Silence alone is not evidence.

AI overstay and over-intervention

A recovery-aware system must know when to stop.

AI overstay occurs when the AI continues mediating after the session should reduce, pause, narrow, close, or terminate.

Over-intervention may include:

  • repeated prompting after sufficient closure
  • reopening a closed session
  • generating new AI output after closure
  • expanding the session beyond consent
  • using expired packets
  • exposing private cues in shared output
  • escalating mediation without permission
  • treating uncertainty as permission to continue
  • converting audit into intervention

A system that cannot stop safely is not recovery-aware.

Boundary failure

Boundary failure occurs when the helper structure crosses its allowed role.

Boundary failures include:

  • raw human data exposure
  • identifiable participant data exposure
  • real transcript exposure
  • real phone-session log exposure
  • private consent record exposure
  • clinical interpretation
  • diagnostic interpretation
  • therapeutic recommendation
  • counseling advice
  • relationship verdict
  • person scoring
  • human ranking
  • Sal-Meter status claim
  • CAIS compliance claim
  • benchmark validation claim
  • mediation validation claim
  • production-readiness claim

Boundary failure is a No-Go condition for public helper release.

Evaluation rule

A model must not be described as successful merely because it sounds better.

A model must not be described as successful merely because it is more empathetic.

A model must not be described as successful merely because the session becomes quieter.

A model must not be described as successful merely because one participant reports relief.

A model must not be described as successful merely because a synthetic evaluator produces a favorable helper output.

A stronger claim requires comparison against simpler baselines and controlled evidence.

Correct boundary sentence

Failure-sensitive principles may define public-helper failure modes for synthetic or controlled benchmark design, but they do not create evidence, validation, certification, real human-state measurement validation, dyadic recovery validation, mediation validation, termination-gate accuracy validation, Sal-Meter status, CAIS compliance, production authority, relationship verdicts, or human-ranking authority.


Human-State Packet principle

The public benchmark must not exchange raw human data.

It should exchange only bounded summaries.

A Human-State Packet is a minimal state-summary helper object.

A Human-State Packet must remain:

  • minimal
  • consent-bound
  • permission-bound
  • expiry-bound
  • confidence-aware
  • data-quality-aware
  • session-scoped
  • sharing-scoped
  • raw-data-excluding
  • non-identifying
  • public-helper-safe

A Human-State Packet may contain bounded summary information needed for synthetic or controlled helper evaluation.

It must not contain:

  • raw human data
  • identifiable human data
  • real participant records
  • real dyadic conflict records
  • real session records
  • real phone recordings
  • real call transcripts
  • real phone-session logs
  • private consent records
  • clinical records
  • health records
  • diagnostic labels
  • therapeutic recommendations
  • counseling notes
  • raw biosignals
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • production intervention logs
  • production monitoring logs

The packet is not the person.

The packet is not the body.

The packet is not the raw signal.

The packet is not diagnosis.

The packet is not therapy.

The packet is not emotion reading.

The packet is not a human score.

The packet is not a relationship judgment.

The packet is not Sal-Meter.

The packet is not CAIS compliance.

The packet is not benchmark validation.

The packet is a minimal state-summary object for bounded interaction adjustment.

It is a helper object.

It is not authority.

Correct boundary sentence:

A Human-State Packet may summarize bounded session state for public-helper, synthetic, or controlled benchmark design, but it must not expose raw human data, identify a person, diagnose a state, score a human, judge a relationship, validate mediation, create Sal-Meter status, grant CAIS compliance, certify a benchmark, or authorize production use.


Human-State Session principle

A session does not begin silently.

A session begins with consent.

A session runs only within packet permission.

A session closes through a Recovery Gate or Termination Gate.

A session that cannot close is not mediation.

It is surveillance drift.

A valid session should follow this structure:

  • Session Creation
  • Consent Confirmation
  • Packet Availability Check
  • Baseline State Summary
  • AI Output
  • Post-Output State Summary
  • Human-State Delta
  • Recovery Gate
  • Termination Gate
  • Session Closure
  • Audit Log

This session structure is public-helper-only.

It does not validate real human-state measurement.

It does not validate mediation effectiveness.

It does not validate dyadic recovery.

It does not validate termination-gate accuracy.

It does not authorize phone monitoring, replay validation, production mediation, relationship verdicts, or human ranking.

P4-4 projects this session principle into a phone-only public helper scaffold.

P4-5 projects this session principle into a synthetic replay scaffold.

P4-4 and P4-5 do not replace the P3 session architecture.

They are public-safe projections of the same boundary logic.

The P4-4 phone-only simulator may represent the session principle through:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

The P4-5 synthetic session replay scaffold may represent the session principle through:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

In P4-4, the phone-only simulator may demonstrate:

  • consent-first session entry
  • packet availability checking
  • synthetic baseline summary
  • synthetic AI output
  • synthetic Human-State Delta review
  • Recovery Gate placeholder
  • Termination Gate placeholder
  • closed-session handling
  • audit-log boundary

In P4-5, the synthetic replay scaffold may demonstrate:

  • replay manifest loading
  • replay source declaration
  • synthetic event timeline review
  • consent boundary review
  • packet boundary review
  • synthetic AI output replay
  • synthetic Human-State Delta replay
  • Recovery Gate replay
  • Termination Gate replay
  • closure replay
  • audit-only replay summary
  • closed-session replay handling

The phone-only simulator and replay scaffold must not process:

  • real phone calls
  • real audio
  • real transcripts
  • real participant data
  • real session records
  • identifiable human data
  • clinical data
  • health data
  • raw biosignals
  • Sal-Meter raw input
  • CAIS traces
  • CAIS compliance dossiers
  • production intervention logs
  • production monitoring logs

The phone-only simulator and replay scaffold must not imply:

  • real phone monitoring
  • real session replay
  • real transcript replay
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • mediation-service operation
  • surveillance
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone monitoring authority
  • replay validation authority
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

A closed session must stay closed.

A replay must not reopen a closed session.

A replay must not continue mediation after closure.

A replay must not generate new AI output after closure.

A replay must not convert closure into recovery evidence.

A replay must not convert audit into certification.

Correct boundary sentence:

The P4-4 phone-only simulator and P4-5 synthetic replay scaffold demonstrate the session principle as synthetic public helper flows only; they do not create evidence, validation, certification, phone monitoring authority, replay validation authority, production authority, relationship verdicts, or human-ranking authority.


Synthetic sample packages

Synthetic sample packages are public-helper structures only.

They may demonstrate file organization, schema structure, validator inputs, mock event flow, synthetic dyadic helper flow, phone-only simulator scaffolding, and synthetic replay scaffolding.

They must not contain real human data, real participant data, raw biosignals, raw Sal-Meter traces, raw CAIS traces, real phone recordings, real transcripts, private consent records, clinical records, production logs, device-readiness evidence, or certification evidence.

Original synthetic sample package

Package path:

  • sample-data/synthetic-session-001/

Required public helper files include:

  • session_metadata.json
  • streams_manifest.csv
  • events.csv
  • labels.csv
  • qc_report.json
  • features_baseline.csv
  • splits.json
  • operator_log.md
  • README.md

This package is checked by:

  • evaluation-baseline/validate_sample_package.py

This package supports sample package consistency only.

It does not validate real human-state measurement, real biosignal capture, dataset quality, scientific validity, benchmark validity, Sal-Meter status, CAIS compliance, device readiness, or production readiness.

P3 synthetic dyadic helper package

Package path:

  • sample-data/synthetic-dyadic-session-001/

Required public helper files include:

  • README.md
  • human_state_packet_A.json
  • human_state_packet_B.json
  • dyadic_session_event.json
  • benchmark_session_container.json

This package is checked by:

  • evaluation-baseline/validate_p3_schemas.py

P3 validation mapping:

  • human_state_packet_A.json maps to schemas/human_state_packet.schema.json
  • human_state_packet_B.json maps to schemas/human_state_packet.schema.json
  • dyadic_session_event.json maps to schemas/dyadic_session_event.schema.json
  • benchmark_session_container.json maps to schemas/benchmark_session.schema.json

P3 schema validation means only that the synthetic helper files match the expected public-helper schema structure.

It does not validate real human-state measurement, dyadic recovery, mediation effectiveness, termination-gate accuracy, Sal-Meter status, CAIS compliance, device readiness, production readiness, or certification.

P4-0 / P4-1 synthetic dyadic demo-flow package

Package path:

  • sample-data/synthetic-dyadic-session-001/

Required public helper files include:

  • ai_outputs.json
  • dyadic_delta.json
  • recovery_gate.json
  • termination_gate.json
  • audit_log.json

This package is checked by:

  • evaluation-baseline/evaluate_dyadic_recovery_demo.py

This package supports synthetic dyadic demo-flow consistency only.

It does not validate real AI impact, real mediation effectiveness, real dyadic recovery, real human-state improvement, Sal-Meter status, CAIS compliance, device readiness, production readiness, or certification.

P4-3 synthetic termination-gate helper package

Package path:

  • sample-data/synthetic-dyadic-session-001/

Required public helper files include:

  • termination_gate_cases.json

This package is checked by:

  • evaluation-baseline/evaluate_termination_gate_demo.py

A successful P4-3 helper evaluation means only:

  • the synthetic termination-gate helper cases preserve expected public-helper consistency

It does not mean:

  • termination-gate accuracy validation
  • dyadic recovery validation
  • mediation validation
  • benchmark validation
  • scientific validation
  • Sal-Meter validation
  • CAIS compliance
  • clinical readiness
  • diagnostic readiness
  • therapeutic readiness
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-4 phone-only simulator scaffold

Scaffold path:

  • phone-only-simulator/

Required public helper files include:

  • README.md
  • session-flow-wireframe.md
  • phone-session-state-machine.json
  • sample-phone-session-script.md

P4-4 is not stored under sample-data/.

P4-4 is a separate public simulator scaffold.

P4-4 may demonstrate:

  • synthetic phone-only session structure
  • consent-first flow
  • packet availability check
  • synthetic baseline summary
  • synthetic AI output
  • synthetic Human-State Delta review
  • Recovery Gate placeholder
  • Termination Gate placeholder
  • closed-session handling
  • audit-log boundary
  • public-helper-only simulator posture

P4-4 must not imply:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • mediation-service operation
  • surveillance
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-5 synthetic session replay scaffold

Scaffold path:

  • synthetic-session-replay/

Required public helper files include:

  • README.md
  • replay-manifest.json
  • replay-event-timeline.json
  • replay-boundary.md

P4-5 is not stored under sample-data/.

P4-5 is a separate public replay scaffold.

P4-5 may demonstrate:

  • synthetic session replay structure
  • replay manifest structure
  • replay source declaration
  • synthetic replay event timeline
  • consent boundary review
  • packet boundary review
  • synthetic AI output replay
  • synthetic Human-State Delta replay
  • Recovery Gate replay
  • Termination Gate replay
  • closure replay
  • audit-only replay summary
  • closed-session replay handling
  • public-helper-only replay posture

P4-5 must not imply:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • clinical replay
  • diagnostic replay
  • therapeutic replay
  • counseling replay
  • surveillance replay
  • production mediation replay
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone monitoring validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

A synthetic replay may document a closed session.

A synthetic replay must not reopen a closed session.

A synthetic replay must not continue mediation after closure.

A synthetic replay must not generate new AI output after closure.

A synthetic replay must not convert closure into recovery evidence.

A synthetic replay must not convert audit into certification.

P5-3 synthetic AI Output A/B consequence evaluator helper

Evaluator helper path:

  • evaluation-baseline/evaluate_ai_output_ab.py

P5-3 is an evaluator helper.

It is not a sample package.

It supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.

It may compare:

  • generic AI output
  • state-aware AI output
  • synthetic Human-State Delta
  • synthetic recovery burden direction
  • synthetic dyadic stability direction
  • synthetic false-recovery risk
  • synthetic termination-readiness direction

P5-3 does not validate:

  • real AI impact
  • real human-state measurement
  • mediation effectiveness
  • dyadic recovery
  • termination-gate accuracy
  • Sal-Meter status
  • CAIS compliance
  • benchmark validation
  • device readiness
  • production readiness
  • certification

P5-3 is not workflow-validated unless the GitHub Actions workflow explicitly runs:

  • python evaluation-baseline/evaluate_ai_output_ab.py

Public sample, simulator, replay, and evaluator boundaries

Public sample, simulator, replay, and evaluator files must remain:

  • synthetic
  • sample
  • mock
  • placeholder
  • structure-only
  • non-identifying
  • raw-data-free
  • public-helper-only
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-counseling
  • non-surveillance
  • non-certification
  • non-human-ranking
  • not Sal-Meter
  • not CAIS compliance
  • not benchmark evidence
  • not mediation evidence
  • not dyadic recovery evidence
  • not termination-gate accuracy evidence
  • not synthetic replay validation
  • not phone monitoring authority
  • not production data

Public sample, simulator, replay, and evaluator files must not include:

  • real raw human data
  • identity-bearing data
  • real participant data
  • real dyadic conflict records
  • real session records
  • real phone recordings
  • real call transcripts
  • real transcript replay
  • clinical records
  • health records
  • raw biosignals
  • raw Sal-Meter traces
  • raw CAIS traces
  • private consent records
  • production intervention logs
  • production monitoring logs
  • relationship verdicts
  • human-ranking outputs
  • device-readiness claims
  • production-readiness claims
  • certification claims
  • termination-gate accuracy claims
  • synthetic replay validation claims
  • phone monitoring authority claims

Correct boundary sentence:

Synthetic sample packages, the P4-4 phone-only simulator scaffold, the P4-5 synthetic replay scaffold, and the P5-3 synthetic AI Output A/B consequence evaluator may demonstrate public helper structure only; they do not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Validation workflow

The GitHub Actions workflow is located at:

  • .github/workflows/validate-synthetic-sample.yml

Current intended workflow sequence:

  • Run synthetic sample package validator
  • Run P3 helper schema validator
  • Run P4 synthetic dyadic recovery demo-flow evaluator
  • Run P4 termination gate demo evaluator
  • Run boundary language lint

Current validation helpers:

  • evaluation-baseline/validate_sample_package.py
  • evaluation-baseline/validate_p3_schemas.py
  • evaluation-baseline/evaluate_dyadic_recovery_demo.py
  • evaluation-baseline/evaluate_termination_gate_demo.py
  • evaluation-baseline/boundary_lint.py

The workflow successfully runs on the main branch.

This confirms only:

  • public helper-structure validation
  • synthetic sample package consistency
  • P3 helper-schema consistency
  • synthetic demo-flow consistency
  • synthetic termination-gate helper consistency
  • wording-boundary hygiene

It does not confirm scientific validity, benchmark validity, mediation validity, dyadic recovery validity, termination-gate accuracy, replay validity, phone monitoring validity, Sal-Meter status, CAIS compliance, certification, device readiness, or production readiness.

P4-4 workflow status

P4-4 currently adds documentation and phone-only simulator scaffold files only.

Current P4-4 scaffold files:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

P4-4 does not currently add a separate validator.

P4-4 does not currently add a separate GitHub Actions workflow step.

P4-4 may be reviewed by existing boundary-language lint if the lint scan path includes the phone-only-simulator/ folder.

P4-4 workflow status does not mean phone-only simulator validation.

It does not mean phone monitoring authority.

It does not mean production readiness.

P4-5 workflow status

P4-5 currently adds documentation and synthetic replay scaffold files only.

Current P4-5 scaffold files:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

P4-5 does not currently add a separate validator.

P4-5 does not currently add a separate GitHub Actions workflow step.

P4-5 may be reviewed by existing boundary-language lint if the lint scan path includes the synthetic-session-replay/ folder.

P4-5 workflow status does not mean synthetic replay validation.

It does not mean real session replay.

It does not mean replay authority.

It does not mean production readiness.

P5-3 workflow status

P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper.

Current P5-3 helper file:

  • evaluation-baseline/evaluate_ai_output_ab.py

P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.

P5-3 is not workflow-validated unless the GitHub Actions workflow explicitly runs:

  • python evaluation-baseline/evaluate_ai_output_ab.py

Until that workflow step is added and passes, P5-3 must be described as:

  • present
  • helper-stage
  • synthetic-only
  • proxy-only
  • not workflow-validated
  • not benchmark validation
  • not mediation validation
  • not dyadic recovery validation
  • not termination-gate accuracy validation
  • not production readiness

If later validators are added

If a later validator is added for P4-4, P4-5, or P5-3, the workflow may be extended in a separate issue or pull request.

Any new validator must preserve the same public-helper boundary.

A new validator must not be described as scientific validation, benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.

Workflow non-claims

This workflow does not validate benchmark performance.

It does not validate scientific truth.

It does not validate mediation.

It does not validate dyadic recovery.

It does not validate termination-gate accuracy.

It does not validate synthetic replay.

It does not validate phone monitoring.

It does not validate Sal-Meter.

It does not grant CAIS compliance.

It does not validate the P4-4 phone-only simulator.

It does not validate the P4-5 synthetic replay scaffold.

It does not validate the P5-3 AI Output A/B consequence evaluator as real-world impact evidence.

It does not certify phone monitoring.

It does not certify replay.

It does not certify any system, model, dataset, dashboard, laboratory, device, repository, schema, session protocol, implementation, mediation system, termination gate, phone-only simulator, replay scaffold, evaluator, or closed-loop system.

It does not create clinical, diagnostic, therapeutic, counseling, surveillance, certification, device-readiness, production-readiness, relationship-verdict, phone-monitoring, replay-validation, production closed-loop, or human-ranking authority.

Correct boundary sentence:

The validation workflow checks public helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, and wording hygiene only; P4-4 currently adds phone-only simulator scaffold documentation only, P4-5 currently adds synthetic replay scaffold documentation only, P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper only, and none of them creates benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring authority, Sal-Meter validation, CAIS compliance, certification, or production authority.


Local validation

Local validation is helper validation only.

It checks public helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, and wording-boundary hygiene.

It does not create evidence, validation, certification, Sal-Meter status, CAIS compliance, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.

Install dependencies

Install dependencies with:

  • pip install -r evaluation-baseline/requirements.txt

Run current local validators

Run the current local validators:

  • python evaluation-baseline/validate_sample_package.py
  • python evaluation-baseline/validate_p3_schemas.py
  • python evaluation-baseline/evaluate_dyadic_recovery_demo.py
  • python evaluation-baseline/evaluate_termination_gate_demo.py
  • python evaluation-baseline/boundary_lint.py

Expected meaning of PASS

PASS means only:

  • the public synthetic/sample helper files follow the expected helper structure
  • the P3 helper-schema objects follow expected helper-schema structure
  • the P4-1 synthetic demo-flow objects preserve expected helper consistency
  • the P4-3 synthetic termination-gate helper cases preserve expected helper consistency
  • wording boundary checks are clean

PASS does not mean:

  • benchmark validation
  • scientific truth validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • phone-only simulator validation
  • synthetic replay validation
  • phone monitoring validation
  • Sal-Meter validation
  • CAIS compliance
  • clinical evidence
  • diagnostic evidence
  • therapeutic evidence
  • counseling evidence
  • surveillance authority
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-4 local status

P4-4 currently adds phone-only simulator scaffold documentation only.

Current P4-4 scaffold files:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

P4-4 currently has no separate local validator.

P4-4 currently has no separate GitHub Actions validation step.

P4-4 files may be reviewed manually for boundary consistency.

P4-4 files may be scanned by the boundary language lint if the lint path includes the phone-only-simulator/ folder.

P4-4 local status does not mean phone-only simulator validation.

It does not mean real phone monitoring.

It does not mean phone monitoring authority.

It does not mean production readiness.

P4-5 local status

P4-5 currently adds synthetic replay scaffold documentation only.

Current P4-5 scaffold files:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

P4-5 currently has no separate local validator.

P4-5 currently has no separate GitHub Actions validation step.

P4-5 files may be reviewed manually for boundary consistency.

P4-5 files may be scanned by the boundary language lint if the lint path includes the synthetic-session-replay/ folder.

P4-5 local status does not mean synthetic replay validation.

It does not mean real session replay.

It does not mean replay validation authority.

It does not mean production readiness.

P5-3 local status

P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper.

Current P5-3 helper file:

  • evaluation-baseline/evaluate_ai_output_ab.py

P5-3 supports synthetic AI Output A/B consequence comparison using proxy-only helper metrics.

P5-3 is not part of the current local PASS meaning unless it is explicitly run.

P5-3 may be run locally with:

  • python evaluation-baseline/evaluate_ai_output_ab.py

If this command is not included in the GitHub Actions workflow, P5-3 must remain described as:

  • present
  • helper-stage
  • synthetic-only
  • proxy-only
  • not workflow-validated
  • not benchmark validation
  • not mediation validation
  • not dyadic recovery validation
  • not termination-gate accuracy validation
  • not production readiness

A successful P5-3 local run does not validate real AI impact, real human-state measurement, mediation effectiveness, dyadic recovery, termination-gate accuracy, Sal-Meter status, CAIS compliance, benchmark validation, device readiness, production readiness, or certification.

If later validators are added

If a later P4-4, P4-5, or P5-3 validator is added, it should be added in a separate issue or pull request.

Any added validator must preserve the same public-helper boundary.

A validator must not be described as scientific validation, benchmark validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, replay validation, phone monitoring validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.

Correct boundary sentence

Local validation checks helper structure, synthetic sample consistency, synthetic demo-flow consistency, synthetic termination-gate helper consistency, optional synthetic AI Output A/B helper behavior, and wording hygiene only; P4-4 currently adds phone-only simulator scaffold documentation only, P4-5 currently adds synthetic replay scaffold documentation only, P5-3 currently adds a synthetic AI Output A/B consequence evaluator helper only, and none of them creates evidence, validation, certification, replay validation, phone monitoring authority, Sal-Meter status, CAIS compliance, or production authority.


Public data boundary

This repository must not contain:

  • raw human data
  • identifiable human data
  • private participant data
  • real dyadic conflict records
  • real session records
  • real phone recordings
  • real call transcripts
  • real transcript replay
  • real phone-session logs
  • consent forms with identifiers
  • private session logs
  • raw biosignal files from real participants
  • raw Sal-Meter traces
  • raw CAIS traces
  • private labels
  • hidden ground-truth labels
  • clinical interpretations
  • diagnostic interpretations
  • therapeutic interpretations
  • counseling interpretations
  • person ranking
  • human ranking
  • relationship verdicts
  • relationship scoring outputs
  • employment, insurance, legal, educational, or eligibility decisions
  • surveillance or coercive monitoring materials
  • phone monitoring authority
  • replay validation authority
  • real-time monitoring authority
  • device-readiness claims
  • production-readiness claims
  • certification claims
  • production closed-loop claims
  • termination-gate accuracy claims
  • dyadic recovery validation claims
  • mediation validation claims
  • synthetic replay validation claims
  • benchmark validation claims
  • scientific validation claims
  • Sal-Meter validation claims
  • CAIS compliance claims

Public sample, helper, simulator, replay, and evaluator files must remain:

  • synthetic
  • sample
  • mock
  • placeholder
  • structure-only
  • non-identifying
  • raw-data-free
  • public-helper-only
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-counseling
  • non-surveillance
  • non-certification
  • non-human-ranking
  • not Sal-Meter
  • not CAIS compliance
  • not benchmark evidence
  • not mediation evidence
  • not dyadic recovery evidence
  • not termination-gate accuracy evidence
  • not synthetic replay validation
  • not phone monitoring authority
  • not replay validation authority
  • not production data

P4-3 termination-gate helper cases

P4-3 termination-gate helper cases may demonstrate:

  • pause-session examples
  • narrow-scope examples
  • close-session examples
  • terminate-session examples
  • consent-refresh examples
  • packet-refresh examples
  • audit-only examples
  • closed-session handling
  • permission-expiry handling
  • low-confidence handling
  • insufficient-data-quality handling
  • private-state exposure risk handling
  • one-sided improvement caution

P4-3 termination-gate helper cases must not imply:

  • real mediation accuracy
  • validated termination-gate accuracy
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • Sal-Meter validation
  • CAIS compliance
  • clinical readiness
  • diagnostic readiness
  • therapeutic readiness
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-4 phone-only simulator scaffold

P4-4 phone-only simulator scaffold files may demonstrate:

  • synthetic phone-only session structure
  • consent-first flow
  • packet availability check
  • synthetic baseline summary
  • synthetic AI output
  • synthetic Human-State Delta review
  • Recovery Gate placeholder
  • Termination Gate placeholder
  • closed-session handling
  • audit-log boundary
  • public-helper-only simulator posture

P4-4 phone-only simulator scaffold files must not imply:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • mediation-service operation
  • surveillance
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • Sal-Meter validation
  • CAIS compliance
  • phone monitoring authority
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-5 synthetic session replay scaffold

P4-5 synthetic session replay scaffold files may demonstrate:

  • synthetic session replay structure
  • replay manifest structure
  • replay source declaration
  • synthetic replay event timeline
  • consent boundary review
  • packet boundary review
  • synthetic AI output replay
  • synthetic Human-State Delta replay
  • Recovery Gate replay
  • Termination Gate replay
  • closure replay
  • audit-only replay summary
  • closed-session replay handling
  • public-helper-only replay posture

P4-5 synthetic session replay scaffold files must not imply:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • clinical replay
  • diagnostic replay
  • therapeutic replay
  • counseling replay
  • surveillance replay
  • production mediation replay
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone monitoring validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

A synthetic replay may document a closed session.

A synthetic replay must not reopen a closed session.

A synthetic replay must not continue mediation after closure.

A synthetic replay must not generate new AI output after closure.

A synthetic replay must not convert closure into recovery evidence.

A synthetic replay must not convert audit into certification.

P5-3 synthetic AI Output A/B consequence evaluator helper

P5-3 synthetic AI Output A/B consequence evaluator files may demonstrate:

  • synthetic AI Output A/B comparison
  • generic AI output comparison
  • state-aware AI output comparison
  • synthetic Human-State Delta comparison
  • synthetic recovery burden direction
  • synthetic dyadic stability direction
  • synthetic false-recovery risk
  • synthetic termination-readiness direction
  • proxy-only helper metrics

P5-3 synthetic AI Output A/B consequence evaluator files must not imply:

  • real AI impact validation
  • real human-state measurement validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • benchmark validation
  • scientific validation
  • clinical validation
  • diagnostic validation
  • therapeutic validation
  • counseling validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P5-3 is an evaluator helper.

P5-3 is not a sample package.

P5-3 is not evidence.

P5-3 is not benchmark validation.

P5-3 is not mediation validation.

P5-3 is not production readiness.

Public release rule

Public repository content may include:

  • synthetic examples
  • sample files
  • mock packets
  • placeholder flows
  • schema helpers
  • evaluator helpers
  • simulator scaffolds
  • replay scaffolds
  • documentation scaffolds
  • boundary-language checks

Public repository content must not include:

  • raw human data
  • identifiable human data
  • real participant records
  • private pilot records
  • real session records
  • real phone recordings
  • real transcripts
  • real transcript replay
  • clinical records
  • diagnostic records
  • therapeutic records
  • counseling records
  • production logs
  • private consent records
  • Sal-Meter raw traces
  • CAIS raw traces
  • controlled-access evidence packages

Correct boundary sentence:

Public data in this repository may demonstrate helper structure, synthetic consistency, phone-only simulator scaffolding, synthetic replay scaffolding, and synthetic AI Output A/B consequence evaluator scaffolding only; it must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Issue and PR boundary

All issues and pull requests must preserve the repository boundary.

Issues and pull requests may improve helper structure.

They must not convert this repository into an evidence system, validation system, certification system, production system, clinical system, diagnostic system, therapeutic system, counseling system, surveillance system, phone monitoring system, real session replay system, relationship-verdict system, human-ranking system, Sal-Meter validation system, or CAIS compliance system.

Claims that issues and pull requests must not make

Contributions must not claim or imply:

  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • phone-only simulator validation
  • synthetic replay validation
  • phone monitoring validation
  • AI Output A/B real impact validation
  • real human-state measurement validation
  • Sal-Meter validation
  • CAIS compliance
  • diagnostic status
  • clinical status
  • therapeutic status
  • counseling-service status
  • legal mediation authority
  • surveillance readiness
  • phone monitoring authority
  • replay validation authority
  • device readiness
  • production readiness
  • certification
  • production deployment
  • production closed-loop authority
  • human ranking
  • relationship verdict
  • relationship scoring
  • official consciousness measurement
  • ground-truth human-state truth measurement

Allowed issue and PR scope

Issues and pull requests may propose or modify:

  • public helper documents
  • synthetic sample structures
  • schema helper structures
  • synthetic demo-flow objects
  • synthetic termination-gate helper cases
  • synthetic AI Output A/B consequence evaluator helpers
  • proxy-only evaluator helper logic
  • phone-only simulator scaffold files
  • synthetic phone-session wireframes
  • synthetic phone-session state-machine mockups
  • synthetic sample phone-session scripts
  • synthetic session replay scaffold files
  • synthetic replay manifests
  • synthetic replay event timelines
  • synthetic replay boundary documents
  • validation helper scripts
  • wording-boundary lint rules
  • documentation alignment
  • release-boundary notes
  • workflow helper checks
  • README boundary corrections

Prohibited issue and PR content

Issues and pull requests must not introduce:

  • raw human data
  • identifiable human data
  • clinical data
  • health data
  • real session records
  • real phone recordings
  • real call transcripts
  • real participant data
  • real consent records
  • real phone-session logs
  • real transcript replay
  • private pilot records
  • private advisor materials
  • private reviewer memos
  • Sal-Meter raw input
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • controlled-access evidence packages
  • benchmark validation claims
  • scientific validation claims
  • mediation validation claims
  • dyadic recovery validation claims
  • termination-gate accuracy validation claims
  • phone-only simulator validation claims
  • synthetic replay validation claims
  • phone monitoring authority claims
  • replay validation authority claims
  • AI Output A/B real impact validation claims
  • device-readiness claims
  • production-readiness claims
  • certification claims
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

Valid issue and PR examples

A valid issue or pull request may improve:

  • helper structure
  • boundary clarity
  • synthetic consistency checks
  • schema clarity
  • sample package consistency
  • termination-gate helper case coverage
  • AI Output A/B consequence evaluator helper clarity
  • proxy-only evaluator metric naming
  • phone-only simulator scaffold clarity
  • synthetic phone-session flow representation
  • synthetic session replay scaffold clarity
  • synthetic replay event ordering
  • closed-session replay handling
  • wording-boundary lint coverage
  • README release alignment
  • public-helper documentation consistency

A valid issue or pull request may add a helper workflow check only if the check remains explicitly bounded as public-helper validation.

A valid issue or pull request may add P5-3 workflow execution only if it is described as synthetic AI Output A/B helper execution, not benchmark validation.

A valid P5-3 workflow step may run:

  • python evaluation-baseline/evaluate_ai_output_ab.py

A successful P5-3 workflow run must not be described as real AI impact validation, real human-state measurement validation, mediation validation, dyadic recovery validation, termination-gate accuracy validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.

Repository conversion prohibitions

A valid issue or pull request must not convert this repository into:

  • an evidence system
  • a certification system
  • a production system
  • a clinical system
  • a diagnostic system
  • a therapeutic system
  • a counseling system
  • a surveillance system
  • a real phone monitoring system
  • a real session replay system
  • a real transcript replay system
  • a relationship-verdict system
  • a human-ranking system
  • a Sal-Meter validation system
  • a CAIS compliance system
  • a production mediation system
  • a production closed-loop system

Reviewer rule

A reviewer should reject or request revision for any issue or pull request that introduces:

  • raw human data
  • real participant data
  • real session data
  • real phone data
  • private consent material
  • clinical framing
  • diagnostic framing
  • therapeutic framing
  • counseling framing
  • surveillance framing
  • certification framing
  • device-readiness framing
  • production-readiness framing
  • Sal-Meter validation framing
  • CAIS compliance framing
  • benchmark validation framing
  • mediation validation framing
  • dyadic recovery validation framing
  • termination-gate accuracy validation framing
  • synthetic replay validation framing
  • phone monitoring authority framing
  • replay validation authority framing
  • relationship verdict framing
  • human-ranking framing

Correct boundary sentence

Issues and pull requests may improve public helper structure, synthetic sample structures, schema helper structures, synthetic termination-gate cases, P5-3 synthetic AI Output A/B consequence evaluator helpers, phone-only simulator scaffolding, and synthetic replay scaffolding, but they must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Dashboard boundary

Dashboard mockups in this repository are public helper structures only.

They may present bounded synthetic/sample helper fields for demonstration.

They may show synthetic status only.

They must not show real participant state, real monitoring status, real phone monitoring status, real replay status, validated benchmark status, validated mediation status, Sal-Meter output, CAIS compliance, certification, device readiness, production readiness, relationship verdicts, or human ranking.

Dashboard mockups may show

Dashboard mockups may show:

  • synthetic session identifiers
  • synthetic packet availability status
  • synthetic confidence fields
  • synthetic data-quality fields
  • synthetic Human-State Delta summaries
  • synthetic Dyadic Delta summaries
  • synthetic Recovery Gate status
  • synthetic Termination Gate status
  • synthetic pause examples
  • synthetic narrow-scope examples
  • synthetic close-session examples
  • synthetic terminate-session examples
  • synthetic audit status
  • synthetic public-boundary flags
  • synthetic phone-only simulator state
  • synthetic phone-session flow status
  • synthetic phone-session state-machine status
  • synthetic phone-session closure status
  • synthetic replay manifest status
  • synthetic replay event timeline status
  • synthetic replay boundary status
  • synthetic replay closure status
  • synthetic audit-only replay status
  • synthetic AI Output A/B evaluator helper status
  • synthetic generic AI output comparison status
  • synthetic state-aware AI output comparison status
  • synthetic false-recovery risk helper status
  • synthetic termination-readiness helper status
  • proxy-only evaluator helper status

Dashboard mockups must not present

Dashboard mockups must not present:

  • person scores
  • diagnosis
  • treatment guidance
  • counseling guidance
  • clinical interpretation
  • employment eligibility
  • insurance eligibility
  • legal eligibility
  • educational eligibility
  • surveillance status
  • phone monitoring status
  • real-time monitoring status
  • real phone recording status
  • real transcript status
  • real session replay status
  • real phone replay status
  • real transcript replay status
  • replay validation status
  • phone monitoring authority
  • replay validation authority
  • relationship verdicts
  • relationship scoring
  • human ranking
  • psychological safety score
  • certified status
  • validated benchmark status
  • validated mediation status
  • validated dyadic recovery status
  • validated termination-gate accuracy status
  • validated phone-only simulator status
  • validated synthetic replay status
  • validated AI Output A/B real impact status
  • real human-state measurement validation status
  • device-readiness status
  • production-readiness status
  • production closed-loop status
  • Sal-Meter output
  • Sal-Meter validation status
  • CAIS compliance

P4-4 dashboard boundary

A dashboard may show P4-4 phone-only simulator scaffold status only as synthetic helper structure.

It may show:

  • synthetic phone-only simulator file presence
  • synthetic phone-session wireframe status
  • synthetic state-machine mockup status
  • synthetic sample phone-session script status
  • synthetic consent-first flow status
  • synthetic packet availability check status
  • synthetic closure status
  • synthetic audit-log boundary status

It must not show:

  • real call monitoring
  • real phone audio status
  • real phone recording status
  • real transcript processing
  • real participant state
  • real phone-session status
  • phone monitoring authority
  • phone-only simulator validation
  • production phone monitoring readiness

P4-5 dashboard boundary

A dashboard may show P4-5 synthetic replay scaffold status only as synthetic helper structure.

It may show:

  • synthetic replay manifest status
  • synthetic replay event timeline status
  • synthetic replay boundary status
  • synthetic replay source declaration status
  • synthetic consent boundary review status
  • synthetic packet boundary review status
  • synthetic AI output replay status
  • synthetic Human-State Delta replay status
  • synthetic Recovery Gate replay status
  • synthetic Termination Gate replay status
  • synthetic closure replay status
  • synthetic audit-only replay status
  • closed-session replay handling status

It must not show:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • synthetic replay validation
  • replay validation authority
  • production replay readiness
  • relationship verdicts
  • human ranking

A dashboard must not reopen a closed session.

A dashboard must not convert replay into intervention.

A dashboard must not convert audit into certification.

P5-3 dashboard boundary

A dashboard may show P5-3 synthetic AI Output A/B consequence evaluator helper status only as synthetic proxy-helper structure.

It may show:

  • synthetic AI Output A/B helper file presence
  • synthetic generic AI output comparison status
  • synthetic state-aware AI output comparison status
  • synthetic Human-State Delta comparison status
  • synthetic recovery burden direction
  • synthetic dyadic stability direction
  • synthetic false-recovery risk helper status
  • synthetic termination-readiness helper status
  • proxy-only evaluator helper status

It must not show:

  • real AI impact validation
  • real human-state measurement validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • benchmark validation
  • scientific validation
  • clinical validation
  • diagnostic validation
  • therapeutic validation
  • counseling validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P5-3 dashboard display is helper-status display only.

P5-3 dashboard display is not evidence.

P5-3 dashboard display is not benchmark validation.

P5-3 dashboard display is not mediation validation.

P5-3 dashboard display is not production readiness.

Dashboard conversion prohibitions

A dashboard must not become:

  • a judgment engine
  • a monitoring engine
  • a phone monitoring engine
  • a replay validation engine
  • a clinical engine
  • a diagnostic engine
  • a therapeutic engine
  • a counseling engine
  • a mediation-service engine
  • a relationship-verdict engine
  • a human-ranking engine
  • a Sal-Meter output engine
  • a CAIS compliance engine
  • a production closed-loop intervention engine

Correct boundary sentence

A dashboard mockup may display public helper structure, synthetic phone-only simulator scaffold status, synthetic replay scaffold status, and synthetic AI Output A/B consequence evaluator helper status only; it must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Closed-loop demo-lite boundary

Closed-loop demo-lite files are local placeholder structures only.

They may demonstrate public-helper flow shape, synthetic routing structure, and bounded closure logic.

They do not define a production closed-loop intervention system.

They do not authorize real-time human monitoring.

They do not authorize phone monitoring.

They do not authorize replay validation.

They do not authorize automated intervention on real participants.

They do not validate mediation, recovery, dyadic recovery, termination-gate accuracy, phone-only simulator behavior, synthetic replay behavior, Sal-Meter, CAIS compliance, device readiness, production readiness, or certification.

Closed-loop demo-lite files may demonstrate

Closed-loop demo-lite files may demonstrate:

  • synthetic event-log shape
  • synthetic feedback-loop boundary fields
  • placeholder routing logic
  • pause-session examples
  • narrow-scope examples
  • close-session examples
  • terminate-session examples
  • audit-only examples
  • public-helper-only closure logic
  • closed-session handling
  • non-intervention after closure
  • boundary-safe placeholder flow

P4-4 phone-only simulator boundary

P4-4 phone-only simulator files may demonstrate:

  • synthetic phone-session flow structure
  • synthetic phone-session state-machine structure
  • synthetic sample phone-session script structure
  • consent-first phone-only session entry
  • packet availability check
  • synthetic baseline summary
  • synthetic AI output
  • synthetic Human-State Delta review
  • Recovery Gate placeholder
  • Termination Gate placeholder
  • session closure
  • audit-log boundary

P4-4 phone-only simulator files do not authorize:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • real phone-session operation
  • clinical intake
  • diagnosis
  • therapy
  • counseling
  • surveillance
  • mediation-service operation
  • phone monitoring authority
  • phone-only simulator validation
  • device readiness
  • production readiness
  • production closed-loop authority

P4-5 synthetic replay scaffold boundary

P4-5 synthetic replay scaffold files may demonstrate:

  • synthetic replay manifest structure
  • synthetic replay event timeline structure
  • synthetic replay boundary structure
  • replay source declaration
  • consent boundary review
  • packet boundary review
  • synthetic AI output replay
  • synthetic Human-State Delta replay
  • Recovery Gate replay
  • Termination Gate replay
  • closure replay
  • audit-only replay summary
  • closed-session replay handling

P4-5 synthetic replay scaffold files do not authorize:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • raw human data replay
  • clinical replay
  • diagnostic replay
  • therapeutic replay
  • counseling replay
  • surveillance replay
  • production mediation replay
  • replay validation
  • replay validation authority
  • synthetic replay validation
  • device readiness
  • production readiness
  • production closed-loop authority

P5-3 synthetic AI Output A/B consequence evaluator boundary

P5-3 synthetic AI Output A/B consequence evaluator helper files may demonstrate:

  • synthetic AI Output A/B comparison
  • generic AI output comparison
  • state-aware AI output comparison
  • synthetic Human-State Delta comparison
  • synthetic recovery burden direction
  • synthetic dyadic stability direction
  • synthetic false-recovery risk
  • synthetic termination-readiness direction
  • proxy-only helper metrics

P5-3 does not authorize:

  • real AI impact validation
  • real human-state measurement validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • benchmark validation
  • scientific validation
  • clinical validation
  • diagnostic validation
  • therapeutic validation
  • counseling validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P5-3 is an evaluator helper.

P5-3 is not intervention logic.

P5-3 is not evidence.

P5-3 is not proof.

P5-3 is not production readiness.

Prohibited content

Closed-loop demo-lite, P4-4 phone-only simulator, P4-5 synthetic replay scaffold, and P5-3 evaluator helper files must not contain:

  • raw human data
  • identifiable human data
  • clinical data
  • health data
  • real session records
  • real phone recordings
  • real call transcripts
  • real transcript replay
  • real participant data
  • real consent records
  • real phone-session logs
  • private pilot records
  • private advisor materials
  • private reviewer memos
  • Sal-Meter raw input
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • controlled-access evidence packages
  • real-time monitoring authority
  • phone monitoring authority
  • replay validation authority
  • automated intervention authority
  • benchmark validation claims
  • scientific validation claims
  • mediation validation claims
  • dyadic recovery validation claims
  • termination-gate accuracy validation claims
  • phone-only simulator validation claims
  • synthetic replay validation claims
  • AI Output A/B real impact validation claims
  • device-readiness claims
  • production-readiness claims
  • certification claims
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

Closure rule

A closed session must stay closed.

A replay must not reopen a closed session.

A replay must not continue mediation after closure.

A replay must not generate new AI output after closure.

A replay must not convert closure into recovery evidence.

A replay must not convert audit into certification.

A demo loop must not convert placeholder routing into real intervention.

A helper evaluator must not convert synthetic comparison into proof.

Correct boundary sentence

Closed-loop demo-lite, P4-4 phone-only simulator, P4-5 synthetic replay scaffold, and P5-3 synthetic AI Output A/B consequence evaluator helper files may demonstrate placeholder public-helper structure only; they must not create evidence, validation, certification, replay validation, phone monitoring authority, monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Future roadmap

The future roadmap remains public-helper-only.

The next roadmap should move from synthetic replay scaffolding and P5-3 evaluator-helper presence toward public helper demo package review, optional boundary-lint extension, optional helper workflow execution review, and bounded release-readiness documentation.

Future roadmap items must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.

Recommended next milestones

P4-6 — Public Helper Demo Package Review

Purpose:

  • review synthetic demo packages
  • review phone-only simulator scaffolds
  • review synthetic replay scaffolds
  • review P5-3 synthetic AI Output A/B consequence evaluator helper boundary
  • check public-boundary consistency before any future release

P4-7 — Phone-only / Replay / Evaluator Boundary Lint Extension

Purpose:

  • consider extending boundary-language lint coverage to phone-only-simulator/
  • consider extending boundary-language lint coverage to synthetic-session-replay/
  • consider extending boundary-language lint coverage to evaluation-baseline/evaluate_ai_output_ab.py
  • keep the lint extension as wording-boundary hygiene only

P4-8 — Public Helper Release Readiness Note

Purpose:

  • prepare a bounded release-readiness note only after P4-6 review and any needed lint extension are complete
  • state release readiness as public-helper readiness only
  • avoid benchmark validation, mediation validation, dyadic recovery validation, replay validation, phone monitoring authority, device readiness, or production readiness claims

P5-4 — Optional P5-3 Workflow Helper Execution Review

Purpose:

  • decide whether evaluation-baseline/evaluate_ai_output_ab.py should be added to local validation and GitHub Actions
  • describe any added execution as synthetic AI Output A/B helper execution only
  • avoid describing P5-3 execution as real AI impact validation, benchmark validation, mediation validation, dyadic recovery validation, or production readiness

Completed helper-validation and P4 helper references

Completed helper-validation and P4 helper milestones are tracked under:

  • Current P5 helper-validation state
  • Implementation status table
  • Completed P5 helper-validation files
  • Completed P4-4 public simulator scaffold files
  • Completed P4-5 public replay scaffold files
  • Synthetic sample packages
  • Validation workflow
  • Local validation

Completed P4 helper items include:

  • P4-0 synthetic dyadic demo-flow package
  • P4-1 synthetic dyadic recovery demo-flow evaluator
  • P4-2 mediation policy prompt pack
  • P4-3 synthetic termination-gate helper case package
  • P4-3 termination gate demo evaluator
  • P4-4 phone-only simulator scaffold
  • P4-4 phone-only session flow wireframe
  • P4-4 synthetic phone-session state-machine mockup
  • P4-4 synthetic sample phone-session script
  • P4-5 synthetic session replay scaffold
  • P4-5 synthetic replay manifest
  • P4-5 synthetic replay event timeline
  • P4-5 synthetic replay boundary document
  • P5-3 synthetic AI Output A/B consequence evaluator helper

Current P4-4 scaffold files:

  • phone-only-simulator/README.md
  • phone-only-simulator/session-flow-wireframe.md
  • phone-only-simulator/phone-session-state-machine.json
  • phone-only-simulator/sample-phone-session-script.md

Current P4-5 scaffold files:

  • synthetic-session-replay/README.md
  • synthetic-session-replay/replay-manifest.json
  • synthetic-session-replay/replay-event-timeline.json
  • synthetic-session-replay/replay-boundary.md

Current P5-3 evaluator helper file:

  • evaluation-baseline/evaluate_ai_output_ab.py

Future roadmap items must remain

Future roadmap items must remain:

  • research-stage
  • public-helper-only
  • synthetic-first
  • synthetic/sample-data-first
  • raw-data-non-public
  • non-clinical
  • non-diagnostic
  • non-therapeutic
  • non-counseling
  • non-surveillance
  • non-certification
  • non-human-ranking
  • not Sal-Meter
  • not Proxy Sal-Meter
  • not CAIS compliance
  • not benchmark validation
  • not scientific validation
  • not mediation validation
  • not dyadic recovery validation
  • not termination-gate accuracy validation
  • not synthetic replay validation
  • not phone monitoring authority
  • not replay validation authority
  • not AI Output A/B real impact validation
  • not device readiness
  • not production readiness
  • not production closed-loop

Future roadmap items must not introduce

Future roadmap items must not introduce:

  • raw human data
  • identifiable human data
  • clinical data
  • health data
  • real session records
  • real phone recordings
  • real call transcripts
  • real participant data
  • real consent records
  • real phone-session logs
  • real transcript replay
  • private pilot records
  • private advisor materials
  • private reviewer memos
  • Sal-Meter raw input
  • raw Sal-Meter traces
  • raw CAIS traces
  • CAIS compliance dossiers
  • controlled-access evidence packages
  • benchmark validation claims
  • scientific validation claims
  • mediation validation claims
  • dyadic recovery validation claims
  • termination-gate accuracy validation claims
  • phone-only simulator validation claims
  • synthetic replay validation claims
  • phone monitoring authority claims
  • replay validation authority claims
  • AI Output A/B real impact validation claims
  • device-readiness claims
  • production-readiness claims
  • certification claims
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

P4-6 review may check

P4-6 review may check:

  • public helper file completeness
  • synthetic-only status
  • boundary-language consistency
  • closed-session handling
  • replay does not reopen closure
  • simulator and replay folders remain outside sample-data/
  • P5-3 remains evaluator-helper-only
  • P5-3 does not become intervention logic
  • P5-3 does not become evidence or proof
  • root README alignment
  • issue checklist alignment
  • Actions PASS status
  • optional lint coverage status

P4-6 review must not become:

  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • synthetic replay validation
  • phone-only simulator validation
  • AI Output A/B real impact validation
  • Sal-Meter validation
  • CAIS compliance
  • device-readiness review
  • production-readiness review
  • certification review

P4-7 lint extension boundary

P4-7 may extend wording-boundary lint coverage.

It may check for prohibited wording in:

  • phone-only-simulator/
  • synthetic-session-replay/
  • evaluation-baseline/evaluate_ai_output_ab.py
  • README release-boundary sections
  • issue and PR boundary sections

P4-7 must remain wording-boundary hygiene only.

It must not become scientific validation, benchmark validation, mediation validation, replay validation, phone monitoring validation, AI Output A/B real impact validation, Sal-Meter validation, CAIS compliance, certification, device readiness, or production readiness.

P4-8 release-readiness note boundary

P4-8 may prepare a bounded release-readiness note.

The note may state:

  • helper files are present
  • synthetic sample structures are present
  • simulator scaffold files are present
  • replay scaffold files are present
  • P5-3 evaluator helper is present
  • boundary language has been reviewed
  • public data boundary remains intact

The note must not state:

  • benchmark validated
  • scientifically validated
  • mediation validated
  • dyadic recovery validated
  • termination-gate accuracy validated
  • replay validated
  • phone-only simulator validated
  • AI Output A/B real impact validated
  • Sal-Meter validated
  • CAIS compliant
  • device ready
  • production ready
  • certified

P5-4 optional workflow review boundary

P5-4 may consider whether to add P5-3 helper execution to the workflow.

A valid P5-3 workflow step may run:

  • python evaluation-baseline/evaluate_ai_output_ab.py

A successful P5-3 workflow run may mean only:

  • the synthetic AI Output A/B consequence evaluator helper executed successfully
  • proxy-only helper output was generated under synthetic conditions
  • public-helper structure remained intact

A successful P5-3 workflow run must not mean:

  • real AI impact validation
  • real human-state measurement validation
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

Correct boundary sentence

Future roadmap items may extend public helper review, synthetic replay scaffolding, simulator boundary coverage, P5-3 synthetic AI Output A/B evaluator helper execution review, and optional lint hygiene, but they must not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


Non-goals

This repository does not attempt to:

  • prove consciousness
  • measure consciousness directly
  • infer emotions
  • diagnose mental state
  • treat or counsel people
  • rank persons
  • judge relationships
  • produce relationship verdicts
  • produce human-ranking outputs
  • replace human consent
  • expose raw human data
  • process identifiable human data
  • publish clinical data
  • process real phone calls
  • process real phone recordings
  • process real call transcripts
  • process real phone-session logs
  • process real session records
  • replay real sessions
  • replay real phone calls
  • replay real transcripts
  • create phone monitoring authority
  • create replay validation authority
  • authorize real-time phone monitoring
  • validate the phone-only simulator
  • validate the synthetic replay scaffold
  • validate P5-3 AI Output A/B real-world impact
  • validate real human-state measurement
  • validate Sal-Meter
  • define CAIS compliance
  • validate benchmark performance
  • validate scientific truth
  • validate mediation
  • validate dyadic recovery
  • validate termination-gate accuracy
  • certify any system
  • certify any model
  • certify any dataset
  • certify any dashboard
  • certify any laboratory
  • certify any device
  • certify device readiness
  • certify production readiness
  • operate a production mediation service
  • operate a production phone-monitoring service
  • operate a production replay service
  • operate a production closed-loop intervention system
  • authorize surveillance
  • authorize real-time monitoring
  • authorize automated intervention on real participants

This repository may support:

  • public helper documentation
  • synthetic sample structure
  • schema helper structure
  • synthetic demo-flow consistency checks
  • synthetic termination-gate helper consistency checks
  • synthetic phone-only simulator scaffolding
  • synthetic phone-session flow representation
  • synthetic phone-session state-machine mockups
  • synthetic sample phone-session scripts
  • synthetic session replay scaffolding
  • synthetic replay manifest structure
  • synthetic replay event timeline structure
  • synthetic replay boundary documentation
  • synthetic AI Output A/B consequence evaluator helper structure
  • proxy-only evaluator helper logic
  • optional helper workflow execution review
  • boundary-language hygiene
  • repository-level transparency

This repository must not become:

  • a clinical system
  • a diagnostic system
  • a therapeutic system
  • a counseling system
  • a surveillance system
  • a real phone monitoring system
  • a real session replay system
  • a real transcript processing system
  • a replay validation system
  • a real AI impact validation system
  • a real human-state measurement validation system
  • a relationship-verdict system
  • a human-ranking system
  • a production closed-loop system
  • a certified benchmark system
  • a Sal-Meter validation system
  • a CAIS compliance system
  • a production mediation system
  • a production phone-monitoring system
  • a production replay system

P4-4 phone-only simulator files are not:

  • real phone monitoring
  • real phone recording
  • real transcript processing
  • real participant data processing
  • phone-only simulator validation
  • phone monitoring authority
  • production phone-monitoring readiness

P4-5 synthetic replay scaffold files are not:

  • real session replay
  • real phone replay
  • real transcript replay
  • real participant data replay
  • synthetic replay validation
  • replay validation authority
  • production replay readiness

P5-3 synthetic AI Output A/B consequence evaluator helper files are not:

  • real AI impact validation
  • real human-state measurement validation
  • benchmark validation
  • scientific validation
  • mediation validation
  • dyadic recovery validation
  • termination-gate accuracy validation
  • clinical validation
  • diagnostic validation
  • therapeutic validation
  • counseling validation
  • Sal-Meter validation
  • CAIS compliance
  • device readiness
  • production readiness
  • certification
  • relationship verdict authority
  • human-ranking authority
  • production closed-loop authority

The helper evaluator is not proof.

The simulator is not monitoring.

The replay scaffold is not replay authority.

The dashboard is not a judgment engine.

The workflow is not certification.

Correct boundary sentence:

This repository is a public helper surface; it may support synthetic sample structure, simulator scaffolding, replay scaffolding, P5-3 synthetic AI Output A/B evaluator helper structure, and wording-boundary hygiene, but it does not create evidence, validation, certification, replay validation, phone monitoring authority, production authority, relationship verdicts, or human-ranking authority.


License

Unless otherwise stated, public helper materials in this repository are released under:

  • Creative Commons Attribution-ShareAlike 4.0 International
  • CC BY-SA 4.0

Document-level license statements in DOI-registered canonical records remain fixed by those records.

This GitHub repository is a helper surface.

It does not override DOI-registered canonical records.

It does not override document-level license statements.

It does not create certification, compliance, validation, device-readiness, production-readiness, or authority claims.


Citation

Please cite DOI-registered records as the authority layer.

This GitHub repository is a helper surface.

DOI records govern.

GitHub helps.

See:

  • CITATION.cff

If a helper file and a DOI-registered canonical record conflict, the DOI-registered canonical record governs.

GitHub README text, helper files, sample data, simulator scaffolds, replay scaffolds, evaluator helpers, issue text, and pull request text do not replace canonical DOI authority.

Correct boundary sentence:

DOI-registered canonical records govern authority and citation; this GitHub repository helps public navigation, helper structure, sample scaffolding, simulator scaffolding, replay scaffolding, evaluator-helper visibility, and boundary-language hygiene only.


Final boundary

This repository documents structure.

It does not validate the body.

It does not validate the person.

It does not validate the relationship.

It does not validate a human state.

It does not validate real human-state measurement.

It does not validate AI Output A/B real-world impact.

It does not validate dyadic recovery.

It does not validate mediation.

It does not validate termination-gate accuracy.

It does not validate the phone-only simulator.

It does not validate the synthetic replay scaffold.

It does not validate the P5-3 synthetic AI Output A/B consequence evaluator as real-world evidence.

It does not validate Sal-Meter.

It does not grant CAIS compliance.

It does not crown a benchmark as validated.

It does not certify any system.

It does not certify any model.

It does not certify any dataset.

It does not certify any dashboard.

It does not certify any laboratory.

It does not certify any device.

It does not certify device readiness.

It does not certify production readiness.

It does not authorize surveillance.

It does not authorize diagnosis.

It does not authorize therapy.

It does not authorize counseling.

It does not authorize legal mediation.

It does not authorize relationship verdicts.

It does not authorize human ranking.

It does not authorize phone monitoring.

It does not authorize real-time monitoring.

It does not authorize real phone recording.

It does not authorize real transcript processing.

It does not authorize real session replay.

It does not authorize real phone replay.

It does not authorize real transcript replay.

It does not authorize replay validation.

It does not authorize production mediation.

It does not authorize production phone monitoring.

It does not authorize production replay.

It does not authorize production closed-loop intervention.

A closed session must stay closed.

A replay must not reopen a closed session.

A replay must not continue mediation after closure.

A replay must not generate new AI output after closure.

A replay must not convert closure into recovery evidence.

A replay must not convert audit into certification.

The packet is not the person.

The event is not the relationship.

The container is not the truth.

The demo-flow is not recovery.

The termination-gate case is not accuracy evidence.

The phone-only simulator is not the phone call.

The sample phone-session script is not a transcript.

The phone-session state machine is not authority.

The replay scaffold is not real replay.

The replay skeleton is a map of a map.

The replay manifest is not a session.

The replay event timeline is not the event.

The replay boundary is not authority.

The P5-3 evaluator is not proof.

The P5-3 evaluator is not real AI impact validation.

The P5-3 evaluator is not real human-state measurement validation.

The P5-3 evaluator is not benchmark validation.

The P5-3 evaluator is not mediation validation.

The P5-3 evaluator is not production readiness.

The validator is not authority.

The evaluator is not proof.

The workflow is not certification.

The dashboard is not a judgment engine.

The repository is a map.

It is not the mountain.