Skip to content

Ethics guardrails: rejection hard-constraint + unified/extended anti-exhaustion (audit P4)#5

Open
BeamusWayne wants to merge 1 commit into
mainfrom
fix/ethics-guardrails
Open

Ethics guardrails: rejection hard-constraint + unified/extended anti-exhaustion (audit P4)#5
BeamusWayne wants to merge 1 commit into
mainfrom
fix/ethics-guardrails

Conversation

@BeamusWayne

Copy link
Copy Markdown
Owner

Follow-up to #4 (P1–P3). This is the ethics/behavioral phase of the audit, kept in its own PR for focused review. Prompt/SKILL only — no Python.

Part A — clear rejection is a non-overridable hard constraint

  • SKILL.md run-rule 4: once the other person clearly rejects (不喜欢你 / 只是朋友 / 把你当朋友 / 喜欢别人), message/confess/crisis/progress must NOT output any pursuit advice — overrides every signal, score, stage, and rejection subtype.
  • crisis_handler.md C-1:
    • "我把你当朋友 / 我们只是朋友" reclassified as clear rejection (was "温和型 → 可能有余地"), aligning with SKILL.md's "一句明确的'我们只是朋友'覆盖所有绿灯".
    • Removed the "不能表现出来" (hide-your-intent) line.
    • Default post-rejection action is graduated disengagement.
    • "继续追" downgraded from a default branch to a tightly-gated exception requiring the other person to re-initiate explicit mutual signals, with mandatory frequency/intensity decay. Removed the "用时间冲淡被拒标签" framing.

Part B — anti-exhaustion precheck unified and extended

  • New prompts/burnout_precheck.md = single source of truth: unified triggers (scope = all /simp, OR logic), 劝歇 output, escalation policy, no-profile session-level fallback.
  • Extended the precheck to /simp message, /simp analyze, /simp crisis — the real anxiety-peak entry points — not just daily/progress.
  • daily_coach Step 0 and progress_tracker risk row now reference the single source instead of carrying divergent thresholds (was OR vs AND, different counting scopes).
  • Escalation strengthened: on a double-hit (high frequency AND anxiety words) the tool holds firm and refuses pursuit advice instead of yielding on the second ask.

Why separate from #4

#4 was mechanical/low-risk (test reliability, data integrity, naming consistency). This PR changes how the product behaves around rejection and user anxiety — it encodes ethical stances and benefits from focused review.

Test plan

  • python3 -m pytest -q → 225 passed (prompt-only change, no regression)
  • grep clean: no "不能表现出来" / "让时间冲淡…标签" left
  • burnout_precheck wired into SKILL.md + daily/progress/message/analyze/crisis

Still pending (not in this PR)

  • P5 — data-consent product positioning (whether/how to analyze a non-consenting person's private data); needs a product decision.

Part A — clear rejection as a non-overridable hard constraint:
- SKILL.md run-rule 4 upgraded: once the other person clearly rejects
  (不喜欢你 / 只是朋友 / 把你当朋友 / 喜欢别人), message/confess/crisis/
  progress must NOT output any pursuit advice — overrides every signal,
  score, stage, and rejection subtype.
- crisis_handler C-1: reclassified "我把你当朋友 / 我们只是朋友" as clear
  rejection (was "温和型 -> 可能有余地"); removed the "不能表现出来"
  (hide-your-intent) line; default post-rejection action is graduated
  disengagement. "继续追" downgraded from a default branch to a tightly
  gated exception requiring the OTHER person to re-initiate explicit mutual
  signals, with mandatory frequency/intensity decay; removed the
  "让时间冲淡被拒标签" framing.

Part B — anti-exhaustion precheck unified and extended:
- New prompts/burnout_precheck.md is the single source: unified triggers
  (scope = all /simp, OR logic), 劝歇 output, escalation policy, and a
  no-profile session-level fallback.
- Extended the precheck to /simp message, /simp analyze, /simp crisis (the
  real anxiety-peak entry points), not just daily/progress.
- daily_coach Step 0 and progress_tracker risk row now reference the single
  source instead of carrying divergent thresholds (was OR vs AND, different
  counting scopes).
- Escalation strengthened: on a double-hit (high frequency AND anxiety
  words) the tool holds firm and refuses pursuit advice instead of yielding
  on the second ask.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant