Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/add-batch-size.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"lingo.dev": minor
---

feat: add `--batch-size` parameter to `run` and `i18n` commands to prevent context leaking
7 changes: 7 additions & 0 deletions packages/cli/src/cli/cmd/i18n.ts
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,11 @@ export default new Command()
"--strict",
"Stop immediately on first error instead of continuing to process remaining buckets and locales (fail-fast mode)",
)
.option(
"--batch-size <number>",
"Number of translations to process in a single batch",
parseInt,
)
Comment on lines +93 to +97
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

interactive-commander option parser callback signature: does the parser receive (value, previousValue), and is passing JavaScript parseInt directly considered unsafe because of its second-argument radix behavior?

💡 Result:

No, the option parser (argParser) callback in interactive-commander receives only a single value parameter, not (value, previousValue). This is shown in official documentation examples where .argParser(Number) and .argParser((value) => { ... }) are used, with the lambda taking one argument. Yes, passing JavaScript parseInt directly is considered unsafe in Commander.js option processing (which interactive-commander extends and is compatible with). The parser receives (value, previousValue) for repeatable options, so parseInt(value, previousValue) treats previousValue as radix, often causing NaN (e.g., radix=1). Multiple GitHub issues confirm this: defaults become NaN (Issue #523), and wrappers like v => parseInt(v) are recommended (Issue #1007). MDN notes parseInt requires explicit radix (10) for safety.

Citations:


Use a dedicated numeric parser and add integer validation for --batch-size.

Line 96 passes parseInt directly as the parser callback. When interactive-commander extends Commander.js behavior, repeatable options pass both value and previousValue to the callback, causing parseInt(value, previousValue) to treat the previous value as a radix argument, often producing unexpected results. Use an explicit wrapper: (value) => Number.parseInt(value, 10).

Additionally, since batchSize represents a count and must be a whole number, add .int() validation to the schema at line 671.

🔧 Proposed fix
   .option(
     "--batch-size <number>",
     "Number of translations to process in a single batch",
-    parseInt,
+    (value: string) => Number.parseInt(value, 10),
   )
-    batchSize: Z.number().min(1).max(250).optional(),
+    batchSize: Z.number().int().min(1).max(250).optional(),

Also applies to: 671-672

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/cmd/i18n.ts` around lines 93 - 97, Replace the direct
use of parseInt in the .option("--batch-size <number>", ...) call with a
dedicated wrapper like (v) => Number.parseInt(v, 10) to avoid Commander passing
a previousValue as a radix; also update the validation schema for batchSize to
enforce an integer (add .int() to the batchSize schema entry) so batchSize is
validated as a whole-number count.

.action(async function (options) {
updateGitignore();

Expand Down Expand Up @@ -440,6 +445,7 @@ export default new Command()
apiKey: settings.auth.apiKey,
apiUrl: settings.auth.apiUrl,
engineId: i18nConfig!.engineId,
batchSize: flags.batchSize,
});
processPayload = withExponentialBackoff(
processPayload,
Expand Down Expand Up @@ -662,6 +668,7 @@ function parseFlags(options: any) {
file: Z.array(Z.string()).optional(),
interactive: Z.boolean().prefault(false),
debug: Z.boolean().prefault(false),
batchSize: Z.number().min(1).max(250).optional(),
}).parse(options);
}

Expand Down
1 change: 1 addition & 0 deletions packages/cli/src/cli/cmd/run/_types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,5 +56,6 @@ export const flagsSchema = z.object({
debounce: z.number().positive().prefault(5000), // 5 seconds default
sound: z.boolean().optional(),
pseudo: z.boolean().optional(),
batchSize: z.number().min(1).max(250).optional(),
});
export type CmdRunFlags = z.infer<typeof flagsSchema>;
5 changes: 5 additions & 0 deletions packages/cli/src/cli/cmd/run/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,11 @@ export default new Command()
"--pseudo",
"Enable pseudo-localization mode: automatically pseudo-translates all extracted strings with accented characters and visual markers without calling any external API. Useful for testing UI internationalization readiness",
)
.option(
"--batch-size <number>",
"Number of translations to process in a single batch (not applicable when using lingo.dev provider)",
(val: string) => parseInt(val),
)
.action(async (args) => {
let userIdentity: UserIdentity = null;
try {
Expand Down
7 changes: 6 additions & 1 deletion packages/cli/src/cli/cmd/run/setup.ts
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,12 @@ export default async function setup(input: CmdRunContext) {
ctx.flags.pseudo || ctx.config?.dev?.usePseudotranslator;
const provider = isPseudo ? "pseudo" : ctx.config?.provider;
const engineId = ctx.config?.engineId;
ctx.localizer = createLocalizer(provider, engineId, ctx.flags.apiKey);
ctx.localizer = createLocalizer(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use prettier to match the project formatting.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this and tried reformatting it. I also cross-verified it (including with AI and Prettier), and it seems the line is being split due to the printWidth: 80 rule in the project config.

provider,
engineId,
ctx.flags.apiKey,
ctx.flags.batchSize,
);
if (!ctx.localizer) {
throw new Error(
"Could not create localization provider. Please check your i18n.json configuration.",
Expand Down
226 changes: 142 additions & 84 deletions packages/cli/src/cli/localizer/explicit.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,16 @@ import { createMistral } from "@ai-sdk/mistral";
import { I18nConfig } from "@lingo.dev/_spec";
import chalk from "chalk";
import dedent from "dedent";
import { ILocalizer, LocalizerData } from "./_types";
import { ILocalizer, LocalizerData, LocalizerProgressFn } from "./_types";
import { LanguageModel, ModelMessage, generateText } from "ai";
import { colors } from "../constants";
import { jsonrepair } from "jsonrepair";
import { createOllama } from "ollama-ai-provider-v2";

import _ from "lodash";
import { extractPayloadChunks } from "../utils/chunk";
export default function createExplicitLocalizer(
provider: NonNullable<I18nConfig["provider"]>,
batchSize?: number,
): ILocalizer {
const settings = provider.settings || {};

Expand All @@ -26,10 +28,10 @@ export default function createExplicitLocalizer(
To fix this issue:
1. Switch to one of the supported providers, or
2. Remove the ${chalk.italic(
"provider",
)} node from your i18n.json configuration to switch to ${chalk.hex(
colors.green,
)("Lingo.dev")}
"provider",
)} node from your i18n.json configuration to switch to ${chalk.hex(
colors.green,
)("Lingo.dev")}

${chalk.hex(colors.blue)("Docs: https://lingo.dev/go/docs")}
`,
Expand All @@ -42,6 +44,7 @@ export default function createExplicitLocalizer(
apiKeyName: "OPENAI_API_KEY",
baseUrl: provider.baseUrl,
settings,
batchSize,
});
case "anthropic":
return createAiSdkLocalizer({
Expand All @@ -52,6 +55,7 @@ export default function createExplicitLocalizer(
apiKeyName: "ANTHROPIC_API_KEY",
baseUrl: provider.baseUrl,
settings,
batchSize,
});
case "google":
return createAiSdkLocalizer({
Expand All @@ -62,6 +66,7 @@ export default function createExplicitLocalizer(
apiKeyName: "GOOGLE_API_KEY",
baseUrl: provider.baseUrl,
settings,
batchSize,
});
case "openrouter":
return createAiSdkLocalizer({
Expand All @@ -72,6 +77,7 @@ export default function createExplicitLocalizer(
apiKeyName: "OPENROUTER_API_KEY",
baseUrl: provider.baseUrl,
settings,
batchSize,
});
case "ollama":
return createAiSdkLocalizer({
Expand All @@ -80,6 +86,7 @@ export default function createExplicitLocalizer(
prompt: provider.prompt,
skipAuth: true,
settings,
batchSize,
});
case "mistral":
return createAiSdkLocalizer({
Expand All @@ -90,6 +97,7 @@ export default function createExplicitLocalizer(
apiKeyName: "MISTRAL_API_KEY",
baseUrl: provider.baseUrl,
settings,
batchSize,
});
}
}
Expand Down Expand Up @@ -120,26 +128,29 @@ function createAiSdkLocalizer(params: {
baseUrl?: string;
skipAuth?: boolean;
settings?: { temperature?: number };
batchSize?: number;
}): ILocalizer {
const skipAuth = params.skipAuth === true;

const apiKey = process.env[params?.apiKeyName ?? ""];
if (!skipAuth && (!apiKey || !params.apiKeyName)) {
throw new Error(
dedent`
You're trying to use raw ${chalk.dim(params.id)} API for translation. ${params.apiKeyName
? `However, ${chalk.dim(
params.apiKeyName,
)} environment variable is not set.`
: "However, that provider is unavailable."
You're trying to use raw ${chalk.dim(params.id)} API for translation. ${
params.apiKeyName
? `However, ${chalk.dim(
params.apiKeyName,
)} environment variable is not set.`
: "However, that provider is unavailable."
}

To fix this issue:
1. ${params.apiKeyName
? `Set ${chalk.dim(
params.apiKeyName,
)} in your environment variables`
: "Set the environment variable for your provider (if required)"
1. ${
params.apiKeyName
? `Set ${chalk.dim(
params.apiKeyName,
)} in your environment variables`
: "Set the environment variable for your provider (if required)"
}, or
2. Remove the ${chalk.italic(
"provider",
Expand Down Expand Up @@ -183,85 +194,132 @@ function createAiSdkLocalizer(params: {
return { valid: false, error: errorMessage };
}
},
localize: async (input: LocalizerData) => {
const systemPrompt = params.prompt
.replaceAll("{source}", input.sourceLocale)
.replaceAll("{target}", input.targetLocale);
const shots = [
[
{
sourceLocale: "en",
targetLocale: "es",
data: {
message: "Hello, world!",
},
},
{
sourceLocale: "en",
targetLocale: "es",
data: {
message: "Hola, mundo!",
localize: async (
input: LocalizerData,
onProgress?: LocalizerProgressFn,
) => {
const chunks = extractPayloadChunks(
input.processableData,
params.batchSize,
);
const subResults: Record<string, any>[] = [];

for (let i = 0; i < chunks.length; i++) {
const chunk = chunks[i];

const systemPrompt = params.prompt
.replaceAll("{source}", input.sourceLocale)
.replaceAll("{target}", input.targetLocale);

const shots = [
[
{
sourceLocale: "en",
targetLocale: "es",
data: {
message: "Hello, world!",
},
},
},
],
[
{
sourceLocale: "en",
targetLocale: "es",
data: {
spring: "Spring",
{
sourceLocale: "en",
targetLocale: "es",
data: {
message: "Hola, mundo!",
},
},
hints: {
spring: ["A source of water"],
],
[
{
sourceLocale: "en",
targetLocale: "es",
data: {
spring: "Spring",
},
hints: {
spring: ["A source of water"],
},
},
},
{
sourceLocale: "en",
targetLocale: "es",
data: {
spring: "Manantial",
{
sourceLocale: "en",
targetLocale: "es",
data: {
spring: "Manantial",
},
},
},
],
];
],
];

const hasHints = input.hints && Object.keys(input.hints).length > 0;
const chunkHints = input.hints
? _.pick(input.hints, Object.keys(chunk))
: undefined;
const hasHints = chunkHints && Object.keys(chunkHints).length > 0;

const payload = {
sourceLocale: input.sourceLocale,
targetLocale: input.targetLocale,
data: input.processableData,
...(hasHints && { hints: input.hints }),
};
const payload = {
sourceLocale: input.sourceLocale,
targetLocale: input.targetLocale,
data: chunk,
...(hasHints && { hints: chunkHints }),
};

const response = await generateText({
model,
...params.settings,
messages: [
{ role: "system", content: systemPrompt },
...shots.flatMap(
([userShot, assistantShot]) =>
[
{ role: "user", content: JSON.stringify(userShot) },
{ role: "assistant", content: JSON.stringify(assistantShot) },
] as ModelMessage[],
),
{ role: "user", content: JSON.stringify(payload) },
],
});
const response = await generateText({
model,
...params.settings,
messages: [
{ role: "system", content: systemPrompt },
...shots.flatMap(
([userShot, assistantShot]) =>
[
{ role: "user", content: JSON.stringify(userShot) },
{ role: "assistant", content: JSON.stringify(assistantShot) },
] as ModelMessage[],
),
{ role: "user", content: JSON.stringify(payload) },
],
});

const result = parseModelResponse(response.text);
let result: any;
try {
result = parseModelResponse(response.text);
} catch (e2) {
const snippet =
response.text.length > 500
? `${response.text.slice(0, 500)}…`
: response.text;
console.error(
`Failed to parse response from ${params.id}. Response snippet: ${snippet}`,
);
throw new Error(
`Failed to parse response from ${params.id}: ${e2} (Snippet: ${snippet})`,
);
}
let finalResult: Record<string, any> = {};

// Handle both object and string responses
if (typeof result.data === "object" && result.data !== null) {
return result.data;
// Handle both object and string responses
if (typeof result?.data === "object" && result.data !== null) {
finalResult = result.data;
} else if (typeof result?.data === "string") {
// Handle string responses where the model double-stringified the JSON
try {
const parsed = parseModelResponse(result.data);
finalResult = parsed.data || parsed || {};
} catch (e) {
console.error(
`Failed to parse nested JSON response. Snippet: ${result.data.slice(0, 100)}...`,
);
throw new Error(
`Failed to parse nested JSON response: ${e} (Snippet: ${result.data.slice(0, 100)}...)`,
);
}
}
Comment on lines +280 to +313
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle top-level parsed payloads before assuming a data wrapper.

This branch only succeeds when the parsed response is stored under result.data. If parseModelResponse() returns a plain object or a top-level JSON string, finalResult stays {} and the whole chunk is silently dropped. Normalize result itself into rawData first, then run the existing object/string handling on that value.

Suggested change
-        let finalResult: Record<string, any> = {};
-
-        // Handle both object and string responses
-        if (typeof result?.data === "object" && result.data !== null) {
-          finalResult = result.data;
-        } else if (result?.data) {
+        let finalResult: Record<string, any> = {};
+        const rawData =
+          typeof result === "object" && result !== null && "data" in result
+            ? result.data
+            : result;
+
+        if (typeof rawData === "object" && rawData !== null) {
+          finalResult = rawData as Record<string, any>;
+        } else if (typeof rawData === "string" && rawData) {
           // Handle string responses - extract and repair JSON
-          const index = result.data.indexOf("{");
-          const lastIndex = result.data.lastIndexOf("}");
+          const index = rawData.indexOf("{");
+          const lastIndex = rawData.lastIndexOf("}");
           if (index !== -1 && lastIndex !== -1) {
             try {
-              const trimmed = result.data.slice(index, lastIndex + 1);
+              const trimmed = rawData.slice(index, lastIndex + 1);
               const repaired = jsonrepair(trimmed);
               const parsed = JSON.parse(repaired);
               finalResult = parsed.data || parsed || {};
             } catch (e) {
               console.error(
-                `Failed to parse nested JSON response. Snippet: ${result.data.slice(0, 100)}...`,
+                `Failed to parse nested JSON response. Snippet: ${rawData.slice(0, 100)}...`,
               );
               throw new Error(
-                `Failed to parse nested JSON response: ${e} (Snippet: ${result.data.slice(0, 100)}...)`,
+                `Failed to parse nested JSON response: ${e} (Snippet: ${rawData.slice(0, 100)}...)`,
               );
             }
           } else {
             console.error(
-              `Unexpected response format - no JSON object found. Snippet: ${String(result.data).slice(0, 100)}...`,
+              `Unexpected response format - no JSON object found. Snippet: ${String(rawData).slice(0, 100)}...`,
             );
             throw new Error(
               `Unexpected response format from ${params.id} - no JSON object found in response`,
             );
           }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/cli/src/cli/localizer/explicit.ts` around lines 278 - 324, The code
currently only inspects result.data and drops top-level parsed payloads; change
the logic to first normalize the parsed output into a single variable (e.g.,
rawData = result?.data ?? result) and then run the existing object/string
handling against rawData instead of result.data so plain objects or top-level
JSON strings aren't ignored; update all subsequent references (the object
branch, string branch, error messages and places using result.data) to use
rawData, keeping parseModelResponse, finalResult and jsonrepair behavior intact
and preserving the same error/snippet reporting that includes params.id.


subResults.push(finalResult);
if (onProgress) {
onProgress(((i + 1) / chunks.length) * 100, chunk, finalResult);
}
}

// Handle string responses - extract and repair JSON
const index = result.data.indexOf("{");
const lastIndex = result.data.lastIndexOf("}");
const trimmed = result.data.slice(index, lastIndex + 1);
return JSON.parse(jsonrepair(trimmed)).data;
const finalMergedResult = _.merge({}, ...subResults);
return finalMergedResult;
},
};
}
Loading