-
Notifications
You must be signed in to change notification settings - Fork 3
Add slice-v1 retrieval and benchmark tooling #144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
32 changes: 32 additions & 0 deletions
32
docs/benchmarks/2026-05-11-spi-vs-legacy/REAL_WORKSPACE_REPORT_TEMPLATE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,32 @@ | ||
| # Real workspace benchmark report template | ||
|
|
||
| This benchmark can be run on private repos locally. | ||
| No private paths or artifacts are committed. | ||
| If GoValidate is unavailable, no GoValidate-specific numbers are claimed. | ||
|
|
||
| ## Workspace matrix | ||
|
|
||
| | Workspace | Variant | Build time (ms) | Graph size (bytes) | Nodes | Edges | | ||
| |---|---|---:|---:|---:|---:| | ||
|
|
||
| ## Strategy / resolution comparisons | ||
|
|
||
| | Workspace | Prompt | Strategy | Resolution | Tokens | Nodes | Quality | Notes | | ||
| |---|---|---|---|---:|---:|---:|---| | ||
|
|
||
| ## Retrieval-level comparisons | ||
|
|
||
| | Workspace | Prompt | Retrieval level | Tokens | Nodes | Gate reason | | ||
| |---|---|---:|---:|---:|---| | ||
|
|
||
| ## Value-per-token calibration | ||
|
|
||
| - Where value-per-token helps: | ||
| - Where it does not change output: | ||
| - Where it hurts or increases tokens: | ||
| - Suggested scoring adjustments: | ||
|
|
||
| ## Qualitative notes | ||
|
|
||
| - Objective metrics are listed separately from qualitative notes. | ||
| - Private workspace paths must be redacted before sharing any report excerpt. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| #!/usr/bin/env node | ||
|
|
||
| import { readFileSync } from 'node:fs' | ||
|
|
||
| const graphPath = process.argv[2] | ||
| if (!graphPath) { | ||
| console.error('usage: graph-stats.mjs <graph.json>') | ||
| process.exit(2) | ||
| } | ||
|
|
||
| const graphJson = readFileSync(graphPath, 'utf8') | ||
| let graph | ||
| try { | ||
| graph = JSON.parse(graphJson) | ||
| } catch (error) { | ||
| const message = error instanceof Error ? error.message : String(error) | ||
| console.error(`failed to parse graph JSON at ${graphPath}: ${message}`) | ||
| process.exit(1) | ||
| } | ||
| const nodeCount = Array.isArray(graph.nodes) ? graph.nodes.length : 0 | ||
| const edgeCount = Array.isArray(graph.edges) ? graph.edges.length : 0 | ||
|
|
||
| console.log(JSON.stringify({ | ||
| node_count: nodeCount, | ||
| edge_count: edgeCount, | ||
| })) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
55 changes: 55 additions & 0 deletions
55
docs/benchmarks/2026-05-11-spi-vs-legacy/prompts.real-workspace.example.json
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,55 @@ | ||
| { | ||
| "schema_version": 1, | ||
| "prompts": [ | ||
| { | ||
| "id": "auth-flow", | ||
| "intent": "explain", | ||
| "text": "Explain auth flow end to end." | ||
| }, | ||
| { | ||
| "id": "report-generation", | ||
| "intent": "explain", | ||
| "text": "Explain validation report generation end to end." | ||
| }, | ||
| { | ||
| "id": "report-generation-slow", | ||
| "intent": "debug", | ||
| "text": "Why is validation report generation slow?" | ||
| }, | ||
| { | ||
| "id": "research-agent-impact", | ||
| "intent": "impact", | ||
| "text": "What can break if the research agent changes?" | ||
| }, | ||
| { | ||
| "id": "report-generation-tests", | ||
| "intent": "explain", | ||
| "text": "Which tests are relevant for report generation?" | ||
| }, | ||
| { | ||
| "id": "controller-to-persistence", | ||
| "intent": "explain", | ||
| "text": "Find the call path from controller to final report persistence." | ||
| }, | ||
| { | ||
| "id": "config-runtime-effect", | ||
| "intent": "debug", | ||
| "text": "Where does this env/config variable affect runtime behavior?" | ||
| }, | ||
| { | ||
| "id": "auth-config-impact", | ||
| "intent": "impact", | ||
| "text": "What can break if session/cookie/auth config changes?" | ||
| }, | ||
| { | ||
| "id": "review-current-diff", | ||
| "intent": "review", | ||
| "text": "Review current backend diff for risky changes." | ||
| }, | ||
| { | ||
| "id": "onboarding-routes", | ||
| "intent": "explain", | ||
| "text": "Which routes/controllers/services are involved in onboarding?" | ||
| } | ||
| ] | ||
| } |
45 changes: 45 additions & 0 deletions
45
docs/benchmarks/2026-05-11-spi-vs-legacy/run-real-workspace.sh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,45 @@ | ||
| #!/usr/bin/env bash | ||
|
|
||
| set -euo pipefail | ||
|
|
||
| HERE="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| TS="$(date -u +%Y-%m-%dT%H%M%SZ)" | ||
| BUNDLE_DIR="${GRAPHIFY_BENCH_REAL_RESULTS_DIR:-$HERE/results/real-workspaces/$TS}" | ||
| PROMPTS_FILE="${GRAPHIFY_BENCH_REAL_PROMPTS:-$HERE/prompts.real-workspace.example.json}" | ||
|
|
||
| if [[ ! -f "$PROMPTS_FILE" ]]; then | ||
| echo "GRAPHIFY_BENCH_REAL_PROMPTS must point to an existing prompts JSON file: $PROMPTS_FILE" >&2 | ||
| exit 2 | ||
| fi | ||
|
|
||
| run_workspace() { | ||
| local workspace_name="$1" | ||
| local workspace_path="$2" | ||
| local workspace_var_name="$3" | ||
| if [[ -z "$workspace_path" ]]; then | ||
| return | ||
| fi | ||
| if [[ ! -d "$workspace_path" ]]; then | ||
| echo "$workspace_var_name must point to an existing workspace directory: $workspace_path" >&2 | ||
| exit 2 | ||
| fi | ||
|
|
||
| mkdir -p "$BUNDLE_DIR/$workspace_name" | ||
| echo "[real-workspace] $workspace_name -> $workspace_path" | ||
| GRAPHIFY_BENCH_FIXTURE="$workspace_path" \ | ||
| GRAPHIFY_BENCH_PROMPTS="$PROMPTS_FILE" \ | ||
| GRAPHIFY_BENCH_RESULTS_DIR="$BUNDLE_DIR/$workspace_name" \ | ||
| bash "$HERE/run.sh" | ||
| } | ||
|
|
||
| if [[ -z "${GRAPHIFY_BENCH_BACKEND:-}" && -z "${GRAPHIFY_BENCH_MONOREPO:-}" ]]; then | ||
| echo "Set GRAPHIFY_BENCH_BACKEND and/or GRAPHIFY_BENCH_MONOREPO before running." >&2 | ||
| exit 2 | ||
| fi | ||
|
|
||
| mkdir -p "$BUNDLE_DIR" | ||
| run_workspace "backend" "${GRAPHIFY_BENCH_BACKEND:-}" "GRAPHIFY_BENCH_BACKEND" | ||
| run_workspace "monorepo" "${GRAPHIFY_BENCH_MONOREPO:-}" "GRAPHIFY_BENCH_MONOREPO" | ||
|
|
||
| node "$HERE/summarize-real-workspaces.mjs" "$BUNDLE_DIR" > "$BUNDLE_DIR/real-workspaces.summary.json" | ||
| cat "$BUNDLE_DIR/real-workspaces.summary.json" | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.