Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions docs/05-IPC-MAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,8 +104,8 @@ résumé, persisted, and indexed as `story` chunks so they ground live answers.
### session
| Channel | Request | Response |
|---|---|---|
| `session:start` | `{ profileId, interviewType, answerStyle, jobId, answerLength }` | `Session` (`answerStyle` = format/tone; `answerLength` = key_points\|detailed) |
| `session:resume` | `{ sessionId, answerStyle?, answerLength? }` | `Session` (re-activate an existing session row and continue it; interview type is restored from the row — one session per interview, type is dynamic) |
| `session:start` | `{ profileId, interviewType, jobId, answerFormat }` | `Session` (`answerFormat` = key_points\|explanation\|detailed — the single answer control) |
| `session:resume` | `{ sessionId, answerFormat? }` | `Session` (re-activate an existing session row and continue it; interview type is restored from the row — one session per interview, type is dynamic) |
| `session:stop` | `{ sessionId }` | `Session` |
| `session:toggle-pause` | `{ sessionId }` | `{ paused }` |
| `session:toggle-pause-active` | — | `{ paused, active }` (global shortcut target — toggles the live session) |
Expand All @@ -120,7 +120,7 @@ résumé, persisted, and indexed as `story` chunks so they ground live answers.
| `session:ask` | `{ sessionId, questionText }` | `{ questionId }` (manual ask; answer streams) |
| `session:ask-active` | `{ questionText }` | `{ ok }` (Cue Card "Ask" box — manual ask for the active session, no id) |
| `session:set-interview-type` | `{ sessionId, interviewType }` | `{ ok }` (set the session-level type — chosen by the user in the save prompt at stop) |
| `session:set-answer-prefs` | `{ interviewType?, style?, length?, pronunciation? }` | `{ interviewType, style, length, pronunciation }` (live Cue Card controls; acts on the active session. Switching `interviewType` is dynamic — it persists onto the session row + reframes later answers) |
| `session:set-answer-prefs` | `{ interviewType?, format?, pronunciation? }` | `{ interviewType, format, pronunciation }` (live Cue Card controls; acts on the active session. Switching `interviewType` is dynamic — it persists onto the session row + reframes later answers) |
| `session:set-answering` | `{ enabled }` | `{ enabled, answered }` (coding "listen-only" toggle: when disabled, the interviewer is still transcribed but not auto-answered; enabling it also answers the question they just asked) |
| `session:regenerate` | — | `{ regenerated }` (re-answer the last question for the active session) |
| `session:clear-answer` | — | `{ cleared }` (abort the in-flight answer for the active session) |
Expand Down Expand Up @@ -195,7 +195,7 @@ Channel constants live in `EVENTS` (`src/shared/ipc.ts`); payload types are in
| `session:answer-done` | `{ questionId }` | overlay |
| `session:answer-reset` | `{ questionId }` | overlay (regenerate: clear the Cue Card answer but keep the transcript — no new question row/line) |
| `session:client-info` | `ClientInfo \| null` | overlay (active interview: company/title/notes + profileName + grounding flags hasResume/hasJd/hasCompany, for the Cue Card header + session bar + ⓘ panel; `null` clears on stop) |
| `session:answer-prefs` | `AnswerPrefs` (`{ interviewType, style, length, pronunciation }`) | overlay (seeds the Cue Card answer-control toggles) |
| `session:answer-prefs` | `AnswerPrefs` (`{ interviewType, format, pronunciation }`) | overlay (seeds the Cue Card answer-control toggles) |
| `session:audio-level` | `{ level }` (0-1 RMS, ~12/sec) | overlay (drives the Cue Card mic meter; computed in `feedRealtimeAudio` since the stream lives in the dashboard renderer) |
| `session:save-prompt` | `SavePrompt` (`{ sessionId, interviewType, jobTitle, questionCount }`) | dashboard (a session just stopped → prompt save-or-discard + pick the type) |
| `session:context` | `{ questionId, question, chunks }` | dashboard (debug: retrieved chunks) |
Expand Down
41 changes: 29 additions & 12 deletions docs/06-OPENAI-SERVICE.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,22 +94,33 @@ Small/fast model. Returns `{ text, type, confidence, strategy }`. Also used as a
cheap "is this actually a question?" gate before answer generation.

### answer.ts — `streamAnswer(input) => AsyncIterable<AnswerEvent>`
Input: `{ question, contextChunks, profile, style, length, pronunciation, interviewType, signal? }`.
Input: `{ question, contextChunks, profile, format, pronunciation, interviewType, signal? }`.
Builds a **grounding** prompt:
- System: persona + rules ("ground answers in provided context; never invent
experience; if no relevant experience, give a transferable-skills answer and
set a risk warning"); LENGTH is a hard constraint.
set a risk warning"); FORMAT is a hard constraint; plus a **naturalness / anti-AI-tone**
directive (contractions, varied sentence length, no corporate/AI tells or hedging — must
read 100% human, never AI-generated).
- **Grounded / proof-linked answers:** `buildContext` numbers the chunks `[1] (resume) …`;
the prompt makes the model cite those numbers inline after each grounded claim
(e.g. `…cut p99 latency 40% [1]`). The Cue Card renders the cited `[i]` as source chips
(the `Citations` component, backed by the `contextSent` chunks). **Fabrication guard:**
for anything the context can't support the model must not invent it — it leads with
`⚠`, says it's not in the candidate's background, and pivots to a cited transferable
framing.
- User: question + retrieved context + profile summary + the chosen format/length,
plus optional pronunciation hints for rare/technical terms.
- `length` (`key_points` | `detailed`) also sets a hard `max_output_tokens` ceiling
(220 / 800) so "key points" can never drift long regardless of the prompt.
- User: question + retrieved context + profile summary + the chosen answer format.
- **Pronunciation guide** (v1.2, ON by default, live-toggleable): the answer stays clean
(no inline respellings); instead, if any words are genuinely hard, the model appends a
`[[PRONUNCIATION]]` section with one pipe-delimited line per word
(`word | part of speech | singular | respelling`). The Cue Card splits this out
(`overlay/pronunciation.ts` `splitPronunciation`, tolerant of model-output variance) and
renders a structured "🗣 How to say it" panel below the answer. Adds +160 `max_output_tokens`
headroom so the guide never eats the answer.
- `format` — the single answer control (v1.2): `key_points` (terse bullets) | `explanation`
(a natural, flowing first-person explanation) | `detailed` (thorough, with one example).
It also sets a hard `max_output_tokens` ceiling (220 / 340 / 800) so "key points" can never
drift long regardless of the prompt. (The old format/tone × length split — `star`/`technical`/
`conversational` — was removed.)
Streams tokens (`{type:'delta', token}`), then a `usage` event, then a structured
`meta` event `{ talkingPoints[], resumeMatch, star?, clarifyingQuestion?, riskWarning?,
followupQuestion }`. **Status:** the prose answer + token usage are live; the meta pass
Expand All @@ -125,13 +136,19 @@ chunked path and mock-answer audio).
Realtime API session for delta-level STT latency; PCM is streamed one-way via
`session:realtime-audio`. Event parsing lives in `realtimeEvents.ts`.

### coding.ts — `solveFromOcr(text)`, vision.ts — `solveFromImages(dataUrls[])`
### coding.ts — `solveFromOcr(text, language)`, vision.ts — `solveFromImages(dataUrls[], language)`
Given a coding problem as text (clipboard/selection) or as one-or-more screenshots,
streams: approach, edge cases, time/space complexity, solution outline (and code).
Both paths use the same `coding` model + `reasoningParam('coding')`. A long problem
spans several viewports, so `solveFromImages` sends all captured screenshots in ONE
request (instruction-first, scroll order, `detail:'high'`) and the model reconstructs
them — the buffer + thumbnail strip live in `capture/codingMode.ts` (see `capture:add-region`/`solve-buffer`).
streams (explanation-first): a natural **Approach** paragraph, complexity, edge cases,
then the **optimal** solution as commented, runnable code. The shared prompt is
`codingRules(language)` (`codingPrompt.ts`): mandates the optimal solution + stated
time/space complexity, writes the code in the chosen `language` (default `javascript`,
a live Cue Card picker persisted as the `codingLanguage` setting), and requires clear
inline comments. Deliberately **résumé/JD-free** — a coding problem is unrelated to the
candidate's profile. Both paths use the same `coding` model + `reasoningParam('coding')`.
A long problem spans several viewports, so `solveFromImages` sends all captured screenshots
in ONE request (instruction-first, scroll order, `detail:'high'`) and the model reconstructs
them — the buffer + thumbnail strip + the `codingLanguage` lookup live in
`capture/codingMode.ts` (see `capture:add-region`/`solve-buffer`).

### interviewer.ts — `generateQuestion(...)` & tts.ts — `speak(text, voice)`
Power the mock-interview mode: `generateQuestion` produces the next question and
Expand Down
92 changes: 92 additions & 0 deletions docs/sessions/2026-07-01.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# 2026-07-01

Started **v1.2** (branch `feat/prompt-overhaul`) — a prompt/answer overhaul planned with the
user in three increments: (1) Answer Format + naturalness, (2) Coding solver, (3) Pronunciation.
(v1.1.0 shipped first: the five trainer features + competency coverage, released via a
`v1.1.0` tag → GitHub Release.)

## Increment 1 — Answer Format + naturalness

Collapsed the two overlapping answer-control axes into ONE. The old **Answer Format** (tone:
`default`/`star`/`technical`/`conversational`) was muddy and underused, so it's gone; the length
control is promoted into the new single **Answer Format**:

- `AnswerFormat = 'key_points' | 'explanation' | 'detailed'` (replaces `AnswerStyle` + `AnswerLength`).
- **key_points** — terse, glanceable bullets (~60w, cap 220 tok).
- **explanation** *(new)* — a natural, flowing first-person explanation (~90–130w, cap 340 tok).
- **detailed** — thorough, with one concrete example (~150–220w, cap 800 tok).
- `answer.ts`: merged `STYLE_INSTRUCTION` + `LENGTH_INSTRUCTION` → `FORMAT_INSTRUCTION`, and
`LENGTH_MAX_TOKENS` → `FORMAT_MAX_TOKENS`. Added a **naturalness / anti-AI directive** to the
system prompt: contractions, varied sentence length, banned corporate/AI tells + hedging,
first-person, confident — but explicitly *not* fake disfluency ("um"/"uh"). Citation +
fabrication-guard rules kept intact.
- Threaded the single `format` field through everything: `AnswerPrefs` (dropped `style`+`length`),
`sessionManager` (`LiveState.answerFormat`; goLive/start/resume/setAnswerPrefs/generateAnswer +
the `answerPrefs` broadcast), `session.ipc` zod, preload, `mockManager`, `useLiveSession.startNew`,
InterviewPage/ProfilesPage/sampleData. Cue Card: the two controls collapse into **one 3-way
Format toggle** (Key points / Explanation / Detailed).
- Removed `Profile.answerStyle`. The `profiles.answer_style` DB column is **kept** (NOT NULL default
`'concise'`) but no longer read/written — **no migration**.

**Adversarial review** (3 dimensions → refute-verify) found no app-code bugs but caught a real
**e2e regression**: `error-handling.spec.ts` still called `api.session.start` with the old 5-arg
signature, so after the positional reorder `answerFormat` became `null` and zod rejected it
(`.default` only fills `undefined`, not `null`). Fixed that call + two stale `answerStyle` fields in
e2e specs. (Typecheck couldn't catch it — it lives inside a `page.evaluate` string typed as `any`.)

**Naturalness validation:** the anti-AI tone is prompt-only and subjective — best judged live in
`npm run dev` with a real key; carefully engineered here (no auto-spike since the answer path can't
run headlessly under vitest).

Verified: `typecheck` · 104 unit (+2) · `build` green.

## Increment 2 — Coding solver

Upgraded the coding SOLVER (screenshot/clipboard mode; live-coding deferred to a later version).
- Shared prompt: `CODING_RULES` const → `codingRules(language)`. Now writes the solution in a
chosen **language** (default `javascript`), REQUIRES clear inline **comments**, delivers
**explanation-first** (a natural Approach paragraph, then the code), and keeps the OPTIMALITY +
explicit time/space complexity mandate. Confirmed **résumé/JD-free**.
- `solveFromOcr(text, language)` / `solveFromImages(dataUrls, language)` build their system prompt
per-call; `codingMode.ts` reads a persisted `codingLanguage` setting and threads it into all
three solve entry points.
- New `AppSettings.codingLanguage` (`SETTINGS_KEYS.codingLanguage`, in `APP_SETTING_KEYS` so a
factory reset clears it), default `'javascript'`, zod `z.string().min(1).max(40)`. A **Language**
dropdown (12 languages) in the Cue Card coding-solver controls, seeded from settings + persisted
via `api.settings.set({ codingLanguage })` — mirrors the existing model/effort pickers.

**Adversarial review** (2 dimensions → refute-verify): **no findings** — the increment is clean
(callers are compiler-enforced to pass `language`; the settings round-trip + prompt hold together).

Verified: `typecheck` · 104 unit · `build` green.

## Increment 3 — Pronunciation: default-on + structured guide panel

Pronunciation is now **ON by default** (still live-toggleable). Instead of inline respellings
(which made answers read oddly), the answer stays clean and the model appends a structured guide:
a `[[PRONUNCIATION]]` marker then one pipe-delimited line per hard word
(`word | part of speech | singular | respelling`). The Cue Card splits it out and renders a
teal **"🗣 How to say it"** panel below the answer. `answer.ts` gives +160 `max_output_tokens`
headroom when pronunciation is on so the guide never eats the answer.

- `goLive` defaults `pronunciation: true` (live-state + the answerPrefs broadcast); Overlay's
toggle initial state matches.
- Parsing extracted to `overlay/pronunciation.ts` (`splitPronunciation`) so it's unit-testable.

**Adversarial review** (2 dimensions → refute-verify) found **5 real parser-robustness gaps** —
all model-output-variance issues in `splitPronunciation`:
1. *(med)* a 3-field line (model dropped the optional singular) dropped the whole word.
2. *(low)* case-sensitive `[[PRON` match → a Title-cased marker leaked the raw guide into the answer.
3/5. *(low)* a bare partial marker (`[[`, `[[P`, `[[PR`, `[[PRO`) flickered into the body mid-stream.
4. *(low)* the "no singular" placeholder was only suppressed for the exact em-dash.

**Fix:** rewrote `splitPronunciation` to be tolerant — respelling is the LAST field (2/3/4-field
lines all work), marker match is case/space-insensitive AND strips trailing partial prefixes, and
`—/–/-/n/a/none` placeholders normalize to empty. Locked with `pronunciation.test.ts` (+8) covering
each finding.

Verified: `typecheck` · 113 unit (+9 across v1.2 #3) · `build` green.

## v1.2 status
All three increments done on `feat/prompt-overhaul` (Answer Format + naturalness · Coding solver ·
Pronunciation). Ready for one PR. Version/changelog bump deferred until the user asks (would be v1.2.0).
1 change: 0 additions & 1 deletion e2e/data-integrity.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ const newProfile = {
targetRole: 'SWE',
targetCompany: null,
interviewType: 'general',
answerStyle: 'default',
language: 'en',
resumeText: null,
jdText: null,
Expand Down
3 changes: 1 addition & 2 deletions e2e/error-handling.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,11 @@ test('a failed answer surfaces an error and clears the streaming state (B1/B2)',
targetRole: 'SWE',
targetCompany: null,
interviewType: 'general',
answerStyle: 'default',
language: 'en',
resumeText: null,
jdText: null,
});
const session = await api.session.start(profile.id, 'general', 'default', null, 'key_points');
const session = await api.session.start(profile.id, 'general', null, 'key_points');

// Listen BEFORE asking. answerDone firing on a failed ask is the core B1 fix
// (the Cue Card card stops spinning); sessionError proves the failure isn't silent.
Expand Down
11 changes: 2 additions & 9 deletions src/main/db/repositories/profiles.repo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,15 @@ import type { Profile, ProfileInput } from '@shared/types';

type Row = typeof schema.profiles.$inferSelect;

/** Map legacy answer-style values (length is now a separate axis) to a valid
* format. Older profiles stored 'concise'/'detailed'. */
function toAnswerStyle(v: string): Profile['answerStyle'] {
return v === 'star' || v === 'technical' || v === 'conversational' ? v : 'default';
}

// The `answer_style` DB column is retained (NOT NULL default 'concise') but no longer
// read or written — the answer format is a single live Cue Card control now (v1.2).
function toProfile(r: Row): Profile {
return {
id: r.id,
name: r.name,
targetRole: r.targetRole,
targetCompany: r.targetCompany,
interviewType: r.interviewType as Profile['interviewType'],
answerStyle: toAnswerStyle(r.answerStyle),
language: r.language,
resumeText: r.resumeText,
jdText: r.jdText,
Expand Down Expand Up @@ -53,7 +48,6 @@ export const profilesRepo = {
targetRole: input.targetRole,
targetCompany: input.targetCompany,
interviewType: input.interviewType,
answerStyle: input.answerStyle,
language: input.language,
resumeText: input.resumeText,
jdText: input.jdText,
Expand All @@ -69,7 +63,6 @@ export const profilesRepo = {
targetRole: patch.targetRole,
targetCompany: patch.targetCompany,
interviewType: patch.interviewType,
answerStyle: patch.answerStyle,
language: patch.language,
resumeText: patch.resumeText,
jdText: patch.jdText,
Expand Down
2 changes: 2 additions & 0 deletions src/main/db/repositories/settings.repo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ export const SETTINGS_KEYS = {
overlayBounds: 'overlay_bounds',
audioPrefs: 'audio_prefs',
hideTaskbarIcon: 'hide_taskbar_icon',
codingLanguage: 'coding_language',
} as const;

/** Non-secret settings cleared by a factory reset (everything except the API key). */
Expand All @@ -71,4 +72,5 @@ const APP_SETTING_KEYS: string[] = [
SETTINGS_KEYS.overlayBounds,
SETTINGS_KEYS.audioPrefs,
SETTINGS_KEYS.hideTaskbarIcon,
SETTINGS_KEYS.codingLanguage,
];
10 changes: 1 addition & 9 deletions src/main/ipc/profiles.ipc.ts
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,12 @@ const interviewType = z.enum([
'sales',
'general',
]);
// Accept legacy length values ('concise'/'detailed') on the wire and fold them
// into the format axis, so old clients/profiles don't fail validation.
const answerStyle = z
.enum(['default', 'star', 'technical', 'conversational', 'concise', 'detailed'])
.transform((v) => (v === 'concise' || v === 'detailed' ? 'default' : v));

const profileInput = z.object({
name: z.string().min(1),
targetRole: z.string().default(''),
targetCompany: z.string().nullable().default(null),
// Interview type & answer style are chosen per run now; kept optional/legacy.
// Interview type is chosen per run now; kept optional/legacy on the profile.
interviewType: interviewType.default('general'),
answerStyle: answerStyle.default('default'),
language: z.string().default('en'),
resumeText: z.string().nullable().default(null),
jdText: z.string().nullable().default(null),
Expand Down Expand Up @@ -60,7 +53,6 @@ export function registerProfilesIpc(): void {
targetRole: src.targetRole,
targetCompany: src.targetCompany,
interviewType: src.interviewType,
answerStyle: src.answerStyle,
language: src.language,
resumeText: src.resumeText,
jdText: src.jdText,
Expand Down
Loading
Loading