Skip to content

feat: support global config fallback and configurable embedding batch size#25

Merged
giancarloerra merged 2 commits intogiancarloerra:mainfrom
sb-giithub:feat/global-config-and-batch-size-env
Apr 14, 2026
Merged

feat: support global config fallback and configurable embedding batch size#25
giancarloerra merged 2 commits intogiancarloerra:mainfrom
sb-giithub:feat/global-config-and-batch-size-env

Conversation

@sb-giithub
Copy link
Copy Markdown

@sb-giithub sb-giithub commented Apr 13, 2026

Summary

Two small, independent improvements that make SocratiCode easier to share knowledge across projects and to tune for different hardware:

  1. Global fallback for .socraticodecontextartifacts.json — when a project root has no config file, fall back to a global location. Configurable via SOCRATICODE_GLOBAL_CONFIG_DIR env var; defaults to ~/.claude/arch/. Lets users maintain a shared knowledge base (architecture docs, SDK conventions, etc.) once and reuse it across all projects without duplicating config files.

  2. EMBEDDING_BATCH_SIZE env var — the hardcoded BATCH_SIZE = 32 in embeddings.ts is now configurable. Default unchanged (32). Users with high-throughput embedding providers (TEI on GPU, OpenAI, etc.) can raise it (e.g. 128) for better throughput; users on constrained hardware can lower it.

Both changes are fully backwards-compatible — without setting either env var, behavior is identical to current main.

Changes

  • src/services/context-artifacts.ts: loadConfig() falls back to global config dir when project-level file is absent
  • src/services/embeddings.ts: BATCH_SIZE reads from EMBEDDING_BATCH_SIZE env var with validation (positive integer, throws on invalid)

Test plan

  • Project-level config still takes priority when present
  • Global config used when project-level absent
  • SOCRATICODE_GLOBAL_CONFIG_DIR env var override works
  • Returns null when neither project nor global config exists (no breaking change)
  • EMBEDDING_BATCH_SIZE=64 loads correctly
  • EMBEDDING_BATCH_SIZE=invalid throws clear error
  • Default 32 preserved when env var unset
  • npm run build passes

Summary by CodeRabbit

  • New Features

    • Embedding batch size is now configurable via environment, enabling performance tuning.
    • Configuration loading supports a global fallback when project-level config is missing.
  • Tests

    • Added and expanded unit tests to validate global-fallback config behavior and embedding batch-size parsing/validation.

- loadConfig: fall back to ~/.claude/arch (or SOCRATICODE_GLOBAL_CONFIG_DIR
  env var) when project root has no .socraticodecontextartifacts.json
- BATCH_SIZE: configurable via EMBEDDING_BATCH_SIZE env var, default 32
  (validates positive integer, throws on invalid input)
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

loadConfig now falls back to a global config file (env: SOCRATICODE_GLOBAL_CONFIG_DIR, default ~/.claude/arch) when a project-level .socraticodecontextartifacts.json is missing and resolves artifact relative paths against the global config directory. Embeddings read EMBEDDING_BATCH_SIZE at module load with integer validation (default 32).

Changes

Cohort / File(s) Summary
Context artifacts service
src/services/context-artifacts.ts
loadConfig(projectPath) now checks project root for .socraticodecontextartifacts.json and, if missing, falls back to a global path from process.env.SOCRATICODE_GLOBAL_CONFIG_DIR (default ~/.claude/arch). Tracks which path was used and, when using global config, resolves relative artifact path entries against the global config directory.
Embeddings service
src/services/embeddings.ts
Replaced fixed BATCH_SIZE with module-load parsing of process.env.EMBEDDING_BATCH_SIZE. Validates that the env var parses to a base-10 positive integer (>0); defaults to 32 if unset; throws an Error with EMBEDDING_BATCH_SIZE in the message for invalid values. Batch loops and slicing use the validated value.
Unit tests
tests/unit/context-artifacts.test.ts, tests/unit/embedding-batch-size.test.ts
Added/extended tests: context-artifacts tests cover global-fallback behavior, precedence, and path-resolution semantics; new embedding-batch-size tests validate default, accepted integers, and rejection of invalid inputs (error messages include EMBEDDING_BATCH_SIZE).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble configs near and far,
From project root to homeward star,
Batches sized by whispered env,
Small hops, big changes, onward I send! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the two main changes: global config fallback support and configurable embedding batch size.
Description check ✅ Passed The description includes all required template sections: Summary, Changes, Type of change (New feature), Testing, and Checklist items are checked. It provides clear context for both features and their backwards compatibility.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/context-artifacts.ts`:
- Around line 56-61: The global config fallback currently sets
globalConfigDir/globalConfigPath and leaves artifact.path values as-is, causing
relative artifact.path to later be resolved against the project root; update the
loader in src/services/context-artifacts.ts so that when actualPath equals
globalConfigPath (or when loading from globalConfigDir) any relative
artifact.path values are resolved immediately against globalConfigDir (use
path.isAbsolute to detect relative paths and path.resolve(globalConfigDir,
artifact.path) to normalize), so downstream resolution logic that assumes
project-root resolution (the existing artifact resolution code) receives
already-correct absolute paths.

In `@src/services/embeddings.ts`:
- Around line 12-16: The EMBEDDING_BATCH_SIZE parsing currently uses
Number.parseInt which allows "64abc" and "1.5" and treats empty string as unset;
update the validation so that when raw is undefined/null you still return the
default 32, but otherwise require raw to be a non-empty string of only digits
(e.g. /^\d+$/) before parsing; then parse with Number.parseInt(raw, 10), ensure
parsed > 0, and throw a clear Error referencing EMBEDDING_BATCH_SIZE/raw/parsed
if the value is empty, contains non-digits, is a float-like string, or is
non-positive.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 102f1e2f-780e-4909-b038-199a461d1c3a

📥 Commits

Reviewing files that changed from the base of the PR and between efec8dd and 9d04c44.

📒 Files selected for processing (2)
  • src/services/context-artifacts.ts
  • src/services/embeddings.ts

Comment thread src/services/context-artifacts.ts
Comment thread src/services/embeddings.ts Outdated
Comment on lines +12 to +16
if (!raw) return 32;
const parsed = Number.parseInt(raw, 10);
if (!Number.isFinite(parsed) || parsed <= 0) {
throw new Error(
`Invalid EMBEDDING_BATCH_SIZE: "${raw}". Must be a positive integer.`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

`In JavaScript, what are the exact results of:

  • Number.parseInt("1.5", 10)
  • Number.parseInt("64abc", 10)
  • Number.parseInt("", 10)
    And does parseInt enforce full-string integer validation?`

💡 Result:

Number.parseInt("1.5", 10) returns 1 Number.parseInt("64abc", 10) returns 64 Number.parseInt("", 10) returns NaN parseInt does not enforce full-string integer validation. It parses the longest prefix of valid integer digits in the specified radix and ignores subsequent invalid characters.

Citations:


🏁 Script executed:

cat -n src/services/embeddings.ts | head -25

Repository: giancarloerra/SocratiCode

Length of output: 1084


Strict integer validation is bypassable for EMBEDDING_BATCH_SIZE.

Current parsing accepts non-integer/partial inputs and treats empty string as unset:

  • Number.parseInt("64abc", 10) returns 64 (ignores trailing chars)
  • Number.parseInt("1.5", 10) returns 1 (truncates decimal)
  • Empty string falls back to default 32 instead of throwing

This breaks the stated "positive integer" contract.

Proposed fix
 const BATCH_SIZE: number = (() => {
   const raw = process.env.EMBEDDING_BATCH_SIZE;
-  if (!raw) return 32;
-  const parsed = Number.parseInt(raw, 10);
-  if (!Number.isFinite(parsed) || parsed <= 0) {
+  if (raw === undefined) return 32;
+  const normalized = raw.trim();
+  if (!/^[1-9]\d*$/.test(normalized)) {
     throw new Error(
       `Invalid EMBEDDING_BATCH_SIZE: "${raw}". Must be a positive integer.`,
     );
   }
+  const parsed = Number(normalized);
   return parsed;
 })();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/embeddings.ts` around lines 12 - 16, The EMBEDDING_BATCH_SIZE
parsing currently uses Number.parseInt which allows "64abc" and "1.5" and treats
empty string as unset; update the validation so that when raw is undefined/null
you still return the default 32, but otherwise require raw to be a non-empty
string of only digits (e.g. /^\d+$/) before parsing; then parse with
Number.parseInt(raw, 10), ensure parsed > 0, and throw a clear Error referencing
EMBEDDING_BATCH_SIZE/raw/parsed if the value is empty, contains non-digits, is a
float-like string, or is non-positive.

@giancarloerra
Copy link
Copy Markdown
Owner

@sb-giithub thanks for the PR, both features are useful additions. A couple of things need addressing before this can be merged (flagged by Code Rabbit):

1. Relative artifact paths break with global config fallback

When loadConfig() loads from the global directory, artifacts with relative paths (e.g. "path": "docs/schema.sql") will resolve against the project root in indexArtifact(), not the global config directory. This means the main use case for this feature, that is shared artifacts across projects, won't actually work unless all paths are absolute.

You'll need to either resolve relative paths against the global config directory at load time, or propagate the config source directory so downstream code resolves correctly.

2. EMBEDDING_BATCH_SIZE validation isn't strict enough

Number.parseInt("64abc", 10) silently returns 64, and "1.5" returns 1, both pass validation despite not being valid positive integers. Consider something like the below?

const num = Number(raw);
if (!Number.isInteger(num) || num <= 0) {
  throw new Error(`Invalid EMBEDDING_BATCH_SIZE: "${raw}". Must be a positive integer.`);
}
return num;

@giancarloerra giancarloerra self-assigned this Apr 13, 2026
…ch size validation

1. Global config fallback: relative artifact paths are now resolved against
   the global config directory (not the project root) when loading from fallback.
   Absolute paths are left unchanged. Project-level config behavior is unaffected.

2. EMBEDDING_BATCH_SIZE: replaced Number.parseInt with Number() + Number.isInteger()
   so that inputs like "64abc", "1.5", and "" are correctly rejected instead of
   silently accepted.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sb-giithub
Copy link
Copy Markdown
Author

Both issues fixed in 49b5b35:

1. Relative artifact paths with global config fallback

Added a usingGlobalFallback flag. After config validation, if loading from the global directory, all relative artifact paths are resolved against path.dirname(actualPath) (the global config dir). Absolute paths are left as-is. Project-level config behavior is unchanged.

2. Stricter EMBEDDING_BATCH_SIZE validation

Replaced Number.parseInt(raw, 10) with Number(raw) + Number.isInteger(num) as suggested. Now:

  • "64abc"NaN → rejected
  • "1.5" → not integer → rejected
  • ""0 → rejected (also changed guard from !raw to raw === undefined so empty string doesn't silently fall back to default)

Added tests covering both fixes, including the existing "returns null when no config" test which now isolates itself from real global configs via env var override.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/unit/embedding-batch-size.test.ts (1)

12-21: Avoid duplicating parsing logic in test code.

parseBatchSize mirrors production logic, so test and implementation can drift together. Prefer sharing one parser implementation (e.g., internal util used by both src/services/embeddings.ts and this test).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/embedding-batch-size.test.ts` around lines 12 - 21, The test
defines a local parseBatchSize function that duplicates production parsing
logic; remove this duplication by importing and using the canonical parser from
the production code (the parser used by src/services/embeddings.ts) instead of
the local parseBatchSize; update the test to import the shared function (or an
exported utility) and assert behavior through that single implementation so
tests and production stay in sync, ensuring any test-specific defaults or error
messages match the exported function's behavior.
tests/unit/context-artifacts.test.ts (1)

45-57: Consider extracting env override/restore into a helper.

The SOCRATICODE_GLOBAL_CONFIG_DIR save/restore block is duplicated in several tests; a small helper would make this suite easier to maintain.

♻️ Optional refactor sketch
+async function withGlobalConfigDir<T>(dir: string, run: () => Promise<T>): Promise<T> {
+  const originalEnv = process.env.SOCRATICODE_GLOBAL_CONFIG_DIR;
+  process.env.SOCRATICODE_GLOBAL_CONFIG_DIR = dir;
+  try {
+    return await run();
+  } finally {
+    if (originalEnv === undefined) {
+      delete process.env.SOCRATICODE_GLOBAL_CONFIG_DIR;
+    } else {
+      process.env.SOCRATICODE_GLOBAL_CONFIG_DIR = originalEnv;
+    }
+  }
+}

Also applies to: 175-188, 205-221, 257-269

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/context-artifacts.test.ts` around lines 45 - 57, Extract the
repeated save/override/restore logic for
process.env.SOCRATICODE_GLOBAL_CONFIG_DIR used around tests that call loadConfig
into a small test helper (e.g., withTemporaryEnv or overrideEnv) that accepts
the env key and temporary value and a callback/async function to run; replace
the duplicated try/finally blocks in tests (the blocks around loadConfig at
lines shown) with calls to this helper so it sets the env var to
path.join(tempDir, "nonexistent-global"), runs the assertion
(expect(config).toBeNull()) and then reliably restores the original environment
whether the callback throws or resolves.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/unit/context-artifacts.test.ts`:
- Around line 45-57: Extract the repeated save/override/restore logic for
process.env.SOCRATICODE_GLOBAL_CONFIG_DIR used around tests that call loadConfig
into a small test helper (e.g., withTemporaryEnv or overrideEnv) that accepts
the env key and temporary value and a callback/async function to run; replace
the duplicated try/finally blocks in tests (the blocks around loadConfig at
lines shown) with calls to this helper so it sets the env var to
path.join(tempDir, "nonexistent-global"), runs the assertion
(expect(config).toBeNull()) and then reliably restores the original environment
whether the callback throws or resolves.

In `@tests/unit/embedding-batch-size.test.ts`:
- Around line 12-21: The test defines a local parseBatchSize function that
duplicates production parsing logic; remove this duplication by importing and
using the canonical parser from the production code (the parser used by
src/services/embeddings.ts) instead of the local parseBatchSize; update the test
to import the shared function (or an exported utility) and assert behavior
through that single implementation so tests and production stay in sync,
ensuring any test-specific defaults or error messages match the exported
function's behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1176d5b4-209a-44f9-892d-64eb5ec20c49

📥 Commits

Reviewing files that changed from the base of the PR and between 9d04c44 and 49b5b35.

📒 Files selected for processing (4)
  • src/services/context-artifacts.ts
  • src/services/embeddings.ts
  • tests/unit/context-artifacts.test.ts
  • tests/unit/embedding-batch-size.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/services/context-artifacts.ts

@giancarloerra giancarloerra merged commit af7d1d4 into giancarloerra:main Apr 14, 2026
5 checks passed
@giancarloerra
Copy link
Copy Markdown
Owner

@sb-giithub thanks, merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants