Add Models list/get, CountTokens, and legacy completions routes#9
Merged
Conversation
Closes the most-used SDK surface gaps for client init and tokenizer
checks. Six new routes, all backed by existing llama.cpp endpoints:
- POST /v1/completions (OpenAI legacy passthrough)
- GET /v1/models (OpenAI passthrough)
- GET /v1/models/{id} (OpenAI passthrough)
- GET /v1beta/models (Gemini, translated from /v1/models)
- GET /v1beta/models/{name} (Gemini, translated)
- POST /v1beta/models/{m}:countTokens (Gemini, calls llama.cpp /tokenize)
Adds SDK contract tests exercising genai's Models.List / Models.Get /
Models.CountTokens and openai-go's Models.List / Models.Get /
Completions.New against the new routes. Also includes an unrelated
gofmt drive-by on examples/go/gemini-structured/main.go which was
breaking make lint on main.
ComputeTokens is intentionally not implemented: the genai SDK gates it
behind BackendVertexAI and never reaches a Gemini-Developer-API proxy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes five test gaps identified in review: - Upstream-error → Gemini-error translation for /v1beta/models and :countTokens (TestServer*UpstreamError) - Method confusion: POST /v1/models, PUT /v1beta/models, GET on POST routes all return 404 instead of being silently misrouted - OpenAI model-get path-traversal guard (was Gemini-only) - Gemini action-verb collision: GET /v1beta/models/foo:bar must 404, not route to handleGeminiModelGet - Auth header (Authorization, X-Goog-Api-Key) stripping on the new passthrough routes (legacy completions, models list, model get) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move closure vars (seenAuth, seenGoogKey, upstream) inside the
TestServerPassthroughStripsAuthHeaders sub-test so each case is
fully isolated and the test is safe under future t.Parallel()
- Use nil body on GET sub-cases (was bytes.NewBufferString("{}")) to
match the convention used elsewhere in the file
- Add POST /v1beta/models to TestServerMethodConfusion — covers the
prefix-route regression target alongside the existing PUT case
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the six most-used SDK call sites that previously hit the proxy's 404 fallback, all backed by endpoints llama.cpp already exposes:
POST /v1/completionsCompletions.NewGET /v1/modelsModels.ListGET /v1/models/{id}Models.RetrieveGET /v1beta/modelsModels.List/v1/modelsGET /v1beta/models/{name}Models.GetPOST /v1beta/models/{m}:countTokensModels.CountTokens/tokenize, returns{totalTokens}The OpenAI handler was refactored from
handleOpenAIChatCompletionsinto a generichandleOpenAIPassthrough(w, r, upstreamURL)that uses the incomingr.Method. A newfetchUpstreamJSONhelper handles GET-and-decode for the Gemini-side translations.What's intentionally not here
ComputeTokens— the genai SDK gates this behindBackendVertexAI; calls onBackendGeminiAPI(what localaik targets) error inside the SDK before ever issuing an HTTP request. Documenting in README.Drive-by
examples/go/gemini-structured/main.gohad pre-existing gofmt drift that brokemake lintonmain. Auto-formatted to get CI green; one-line whitespace fix.Test plan
make lintcleanmake test-unitgreengo test ./integrationgreen — new SDK contract tests coverModels.List,Models.Get,Models.CountTokens(genai) andModels.List,Models.Get,Completions.New(openai-go)GeminiModelNameFromPathhelper, improvedfetchUpstreamJSONdocstring)make test-integrationagainst the live docker image🤖 Generated with Claude Code