Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
81 changes: 79 additions & 2 deletions src/endpoints/inference/cloudflare/chat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,68 @@

import { AIEndpoint } from "../../base";
import type { AppContext, UsageRecord } from "../../../types";
import type { ContentfulStatusCode } from "hono/utils/http-status";

interface CloudflareAIErrorClassification {
message: string;
status: ContentfulStatusCode;
error_code: string;
retryable: boolean;
retry_after_seconds?: number;
}

/**
* Classify a Cloudflare AI error by inspecting its message and name.
* Maps error patterns to appropriate HTTP status codes and retry guidance.
*/
function classifyCloudflareAIError(error: unknown): CloudflareAIErrorClassification {
const errorMessage = error instanceof Error ? error.message : String(error);
const errorName = error instanceof Error ? error.name : "";

// Timeout: AbortError name, "Request timed out" message, or Cloudflare error code 3046
if (
errorName === "AbortError" ||
errorMessage.includes("Request timed out") ||
errorMessage.includes("3046")
) {
return {
message: "Request timed out",
status: 504,
error_code: "TIMEOUT",
retryable: true,
retry_after_seconds: 30,
};
}

// Rate limit: explicit message or 429 code in message
if (errorMessage.includes("Rate limit exceeded") || errorMessage.includes("429")) {
return {
message: "Rate limit exceeded",
status: 429,
error_code: "RATE_LIMIT",
retryable: true,
retry_after_seconds: 60,
};
}

// Model not found: explicit message or 404 code in message
if (errorMessage.includes("Model not found") || errorMessage.includes("404")) {
return {
message: "Model not found",
status: 404,
error_code: "MODEL_NOT_FOUND",
retryable: false,
};
}
Comment on lines +28 to +61
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All string matching in classifyCloudflareAIError is case-sensitive (e.g., errorMessage.includes("Rate limit exceeded"), errorMessage.includes("Model not found"), errorMessage.includes("Request timed out")). Cloudflare AI could return the same messages with different casing (e.g., "rate limit exceeded", "model not found", "request timed out"). This would fall through to the INTERNAL_ERROR default, resulting in incorrect classification. Using .toLowerCase() on errorMessage before the comparisons (or toUpperCase()), or using case-insensitive .includes() alternatives, would make this more robust.

Copilot uses AI. Check for mistakes.
Comment on lines +43 to +61
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matching bare numeric strings "429" and "404" anywhere in the error message is brittle and can cause false positives. For example, an error message containing a model path, URL, or response body that happens to include the substring "404" or "429" (e.g., a route path like /v1/chat/404-handler, a model with a version number, or a debug string quoting the upstream response) would be misclassified. Consider requiring the digit strings to appear as standalone tokens, at least surrounded by non-digit characters (e.g., using a regex like /\b429\b/ or /status:?\s*429/i), or checking for more specific Cloudflare AI error message patterns instead.

Copilot uses AI. Check for mistakes.

// Default: internal error from upstream Cloudflare AI
return {
message: "Chat completion failed",
status: 502,
error_code: "INTERNAL_ERROR",
retryable: false,
};
Comment on lines +36 to +69
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a significant discrepancy between the error codes defined in the code and those specified in the linked issue #61. Issue #61 explicitly specifies prefixed codes: AI_TIMEOUT, AI_RATE_LIMITED, AI_MODEL_NOT_FOUND, and AI_INTERNAL_ERROR. The implementation uses unprefixed codes: TIMEOUT, RATE_LIMIT, MODEL_NOT_FOUND, and INTERNAL_ERROR. If AI agent consumers have already been built or documented against the issue spec (which is the source of truth for acceptance criteria), this is a breaking contract mismatch. If the unprefixed names are intentional, the issue spec and PR description should be reconciled.

Copilot uses AI. Check for mistakes.
Comment on lines +64 to +69
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #61 specifies that AI_INTERNAL_ERROR should be retryable: true (internal errors from an upstream service are generally transient and should be retried). The implementation sets retryable: false for INTERNAL_ERROR. This conflicts with the acceptance criteria in the issue. Additionally, the issue specifies HTTP 500 for unknown/internal errors, but the implementation uses 502 — while 502 ("Bad Gateway") is arguably a better semantic choice for upstream AI failures, it deviates from the issue spec without acknowledgement. Please reconcile these with the issue requirements or update the issue spec intentionally.

Copilot uses AI. Check for mistakes.
}

interface CloudflareMessage {
role: "system" | "user" | "assistant";
Expand Down Expand Up @@ -95,7 +157,10 @@ export class CloudflareChat extends AIEndpoint {
},
"400": { description: "Invalid request" },
"402": { description: "Payment required" },
"500": { description: "Server error" },
"404": { description: "Model not found (error_code: MODEL_NOT_FOUND, retryable: false)" },
"429": { description: "Rate limit exceeded (error_code: RATE_LIMIT, retryable: true)" },
"502": { description: "Upstream AI error (error_code: INTERNAL_ERROR, retryable: false)" },
"504": { description: "Request timed out (error_code: TIMEOUT, retryable: true)" },
},
};

Expand Down Expand Up @@ -224,12 +289,24 @@ export class CloudflareChat extends AIEndpoint {
});
}
} catch (error) {
const classified = classifyCloudflareAIError(error);

log.error("Cloudflare AI chat error", {
model,
error: error instanceof Error ? error.message : String(error),
error_code: classified.error_code,
status: classified.status,
});

return this.errorResponse(c, "Chat completion failed", 500);
const extra: Record<string, unknown> = {
error_code: classified.error_code,
retryable: classified.retryable,
};
if (classified.retry_after_seconds !== undefined) {
extra.retry_after_seconds = classified.retry_after_seconds;
}

Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Cloudflare AI error handler puts retry_after_seconds in the JSON body, but does not set the standard HTTP Retry-After response header. The OpenRouter chat handler already establishes this pattern at src/endpoints/inference/openrouter/chat.ts:242: c.header("Retry-After", "5"). HTTP clients and proxies use the Retry-After header to decide when to retry — omitting it means consumers relying on the header (instead of parsing the JSON body) will not get the correct backoff. The Retry-After header should be set when classified.retryable is true, using the value from classified.retry_after_seconds.

Suggested change
if (classified.retryable && classified.retry_after_seconds !== undefined) {
c.header("Retry-After", String(classified.retry_after_seconds));
}

Copilot uses AI. Check for mistakes.
return this.errorResponse(c, classified.message, classified.status, extra);
}
}
}