diff --git a/samples/README.md b/samples/README.md index bcac6bf3a..f61f31c45 100644 --- a/samples/README.md +++ b/samples/README.md @@ -9,6 +9,6 @@ Explore complete working examples that demonstrate how to use Foundry Local — | Language | Samples | Description | |----------|---------|-------------| | [**C#**](cs/) | 13 | .NET SDK samples including native chat, embeddings, audio transcription, tool calling, model management, web server, and tutorials. Uses WinML on Windows for hardware acceleration. | -| [**JavaScript**](js/) | 13 | Node.js SDK samples including native chat, embeddings, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, and tutorials. | +| [**JavaScript**](js/) | 14 | Node.js samples including native chat, embeddings, audio transcription, Electron desktop app, Copilot SDK integration, LangChain, tool calling, web server, Responses API, and tutorials. | | [**Python**](python/) | 10 | Python samples using the OpenAI-compatible API, including chat, embeddings, audio transcription, LangChain integration, tool calling, web server, and tutorials. | | [**Rust**](rust/) | 9 | Rust SDK samples including native chat, embeddings, audio transcription, tool calling, web server, and tutorials. | diff --git a/samples/js/README.md b/samples/js/README.md index d334555c3..8f450c9f3 100644 --- a/samples/js/README.md +++ b/samples/js/README.md @@ -19,6 +19,7 @@ These samples demonstrate how to use the Foundry Local JavaScript SDK (`foundry- | [langchain-integration-example](langchain-integration-example/) | LangChain.js integration for building text generation chains. | | [tool-calling-foundry-local](tool-calling-foundry-local/) | Tool calling with custom function definitions and streaming responses. | | [web-server-example](web-server-example/) | Start a local OpenAI-compatible web server and call it with the OpenAI SDK. | +| [web-server-responses](web-server-responses/) | Call a running local OpenAI-compatible web server with the Responses API, including streaming and tool calling. | | [tutorial-chat-assistant](tutorial-chat-assistant/) | Build an interactive multi-turn chat assistant (tutorial). | | [tutorial-document-summarizer](tutorial-document-summarizer/) | Summarize documents with AI (tutorial). | | [tutorial-tool-calling](tutorial-tool-calling/) | Create a tool-calling assistant (tutorial). | diff --git a/samples/js/web-server-responses/README.md b/samples/js/web-server-responses/README.md new file mode 100644 index 000000000..c8382004e --- /dev/null +++ b/samples/js/web-server-responses/README.md @@ -0,0 +1,80 @@ +# Foundry Local Responses web service sample + +This sample starts the Foundry Local OpenAI-compatible web service, then uses the official OpenAI JavaScript SDK to call the Responses API. + +The important pattern is: + +1. `FoundryLocalManager` handles Foundry Local setup, model download/load, web service startup, and cleanup. +1. `openai` handles the actual `/v1/responses` calls. + +## Prerequisites + +- Node.js 18 or later +- Internet access on first run to install npm packages, download execution providers, and download the sample model + +## What gets installed + +Running `npm install` in this folder installs: + +| Package | Why it is used | +|---------|----------------| +| `foundry-local-sdk` | Starts Foundry Local, downloads/loads the model, and runs the local OpenAI-compatible web service. | +| `openai` | Sends Responses API requests to the local web service at `http://localhost:5764/v1`. | +| `foundry-local-sdk-winml` | Optional Windows acceleration package. npm installs it when supported and ignores it otherwise. | + +The Foundry Local SDK install also provisions the native runtime files it needs, including Foundry Local Core, ONNX Runtime, and ONNX Runtime GenAI. + +When you run the sample, it also downloads and loads the `qwen2.5-0.5b` model if it is not already cached. + +## Run the sample + +From the repository root: + +```powershell +cd samples\js\web-server-responses +npm install +npm start +``` + +## What the sample does + +The sample: + +1. Initializes `FoundryLocalManager`. +1. Downloads and registers execution providers. +1. Downloads and loads `qwen2.5-0.5b`. +1. Starts the local web service at `http://localhost:5764`. +1. Uses the OpenAI JavaScript SDK with `baseURL: "http://localhost:5764/v1"`. +1. Runs a non-streaming Responses call. +1. Runs a streaming Responses call. +1. Runs a Responses function-calling flow with a sample `get_weather` tool. +1. Stops the web service and unloads the model. + +## Expected output + +You should see setup logs, then output similar to: + +```text +Testing a non-streaming Responses call... +[ASSISTANT]: ... + +Testing a streaming Responses call... +[ASSISTANT STREAM]: ... + +Testing Responses tool calling... +[TOOL CALL]: get_weather(...) +[ASSISTANT FINAL]: ... +``` + +The exact model text can vary. + +## Troubleshooting + +If the sample fails while creating `FoundryLocalManager` with a native symbol error such as `Failed to resolve 'execute_command_with_binary' symbol`, the installed Foundry Local Core runtime is older than the JavaScript native addon expects. Reinstall the sample dependencies so npm can fetch the latest SDK/runtime packages: + +```powershell +Remove-Item -Recurse -Force node_modules, package-lock.json +npm install +``` + +If port `5764` is already in use, stop the other process or update `endpointUrl` in `app.js` to an available local URL. diff --git a/samples/js/web-server-responses/app.js b/samples/js/web-server-responses/app.js new file mode 100644 index 000000000..f51e656d9 --- /dev/null +++ b/samples/js/web-server-responses/app.js @@ -0,0 +1,140 @@ +// +// +import { FoundryLocalManager } from 'foundry-local-sdk'; +import { OpenAI } from 'openai'; +// + +function getResponseText(response) { + if (typeof response.output_text === 'string') { + return response.output_text; + } + return (response.output ?? []) + .flatMap((item) => Array.isArray(item.content) ? item.content : []) + .filter((part) => part.type === 'output_text' && typeof part.text === 'string') + .map((part) => part.text) + .join(''); +} + +// +const endpointUrl = 'http://localhost:5764'; + +console.log('Initializing Foundry Local SDK...'); +const manager = FoundryLocalManager.create({ + appName: 'foundry_local_samples', + logLevel: 'info', + webServiceUrls: endpointUrl, +}); +console.log('SDK initialized successfully'); + +let currentEp = ''; +await manager.downloadAndRegisterEps((epName, percent) => { + if (epName !== currentEp) { + if (currentEp !== '') process.stdout.write('\n'); + currentEp = epName; + } + process.stdout.write(`\r ${epName.padEnd(30)} ${percent.toFixed(1).padStart(5)}%`); +}); +if (currentEp !== '') process.stdout.write('\n'); +// + +// +const modelAlias = 'qwen2.5-0.5b'; +const model = await manager.catalog.getModel(modelAlias); + +console.log(`\nDownloading model ${modelAlias}...`); +await model.download((progress) => { + process.stdout.write(`\rDownloading... ${progress.toFixed(2)}%`); +}); +console.log('\nModel downloaded'); + +console.log('\nLoading model...'); +await model.load(); +console.log('Model loaded'); +// + +// +console.log('\nStarting web service...'); +manager.startWebService(); +console.log('Web service started'); + +// <<<<<< OPENAI SDK USAGE >>>>>> +// Use the OpenAI SDK to call the local Foundry web service Responses API +const openai = new OpenAI({ + baseURL: endpointUrl + '/v1', + apiKey: 'notneeded', +}); +// + +try { + console.log('\nTesting a non-streaming Responses call...'); + const response = await openai.responses.create({ + model: model.id, + input: 'Reply with one short sentence about local AI.', + }); + console.log(`[ASSISTANT]: ${getResponseText(response)}`); + + console.log('\nTesting a streaming Responses call...'); + const stream = await openai.responses.create({ + model: model.id, + input: 'Count from one to three.', + stream: true, + }); + + process.stdout.write('[ASSISTANT STREAM]: '); + for await (const event of stream) { + if (event.type === 'response.output_text.delta') { + process.stdout.write(event.delta); + } + } + process.stdout.write('\n'); + + console.log('\nTesting Responses tool calling...'); + const tools = [ + { + type: 'function', + name: 'get_weather', + description: 'Get the current weather. This sample always returns Seattle weather.', + parameters: { + type: 'object', + properties: {}, + additionalProperties: false, + }, + }, + ]; + + const toolResponse = await openai.responses.create({ + model: model.id, + input: 'Use the get_weather tool and then answer with the weather.', + tools, + tool_choice: 'required', + store: true, + }); + + const functionCall = toolResponse.output?.find((item) => item.type === 'function_call'); + if (!functionCall) { + throw new Error('Expected the model to call get_weather.'); + } + + console.log(`[TOOL CALL]: ${functionCall.name}(${functionCall.arguments})`); + + const finalResponse = await openai.responses.create({ + model: model.id, + previous_response_id: toolResponse.id, + input: [ + { + type: 'function_call_output', + call_id: functionCall.call_id, + output: JSON.stringify({ location: 'Seattle', weather: '72 degrees F and sunny' }), + }, + ], + tools, + }); + + console.log(`[ASSISTANT FINAL]: ${getResponseText(finalResponse)}`); + // <<<<<< END OPENAI SDK USAGE >>>>>> +} finally { + // Tidy up + manager.stopWebService(); + await model.unload(); +} +// diff --git a/samples/js/web-server-responses/package.json b/samples/js/web-server-responses/package.json new file mode 100644 index 000000000..6c8f2ff51 --- /dev/null +++ b/samples/js/web-server-responses/package.json @@ -0,0 +1,16 @@ +{ + "name": "web-server-responses", + "version": "1.0.0", + "type": "module", + "main": "app.js", + "scripts": { + "start": "node app.js" + }, + "dependencies": { + "foundry-local-sdk": "latest", + "openai": "latest" + }, + "optionalDependencies": { + "foundry-local-sdk-winml": "latest" + } +} diff --git a/sdk/js/test/openai/responsesWebService.test.ts b/sdk/js/test/openai/responsesWebService.test.ts new file mode 100644 index 000000000..2ea6e0197 --- /dev/null +++ b/sdk/js/test/openai/responsesWebService.test.ts @@ -0,0 +1,186 @@ +import { describe, it, before, after } from 'mocha'; +import { expect } from 'chai'; +import { getTestManager, TEST_MODEL_ALIAS, IS_RUNNING_IN_CI } from '../testUtils.js'; +import { FoundryLocalManager } from '../../src/foundryLocalManager.js'; +import type { IModel } from '../../src/imodel.js'; + +function getOutputText(response: any): string { + if (typeof response.output_text === 'string') { + return response.output_text; + } + return (response.output ?? []) + .flatMap((item: any) => Array.isArray(item.content) ? item.content : []) + .filter((part: any) => part.type === 'output_text' && typeof part.text === 'string') + .map((part: any) => part.text) + .join(''); +} + +async function postResponse(baseUrl: string, body: Record): Promise { + const res = await fetch(`${baseUrl}/v1/responses`, { + method: 'POST', + headers: { 'Content-Type': 'application/json' }, + body: JSON.stringify(body), + }); + const text = await res.text(); + expect(res.ok, text).to.equal(true); + return JSON.parse(text); +} + +async function postStreamingResponse(baseUrl: string, body: Record): Promise { + const res = await fetch(`${baseUrl}/v1/responses`, { + method: 'POST', + headers: { + 'Content-Type': 'application/json', + 'Accept': 'text/event-stream', + }, + body: JSON.stringify({ ...body, stream: true }), + }); + if (!res.ok) { + const errorText = await res.text().catch(() => res.statusText); + expect.fail(errorText); + } + expect(res.body).to.not.equal(null); + + const reader = res.body!.getReader(); + const decoder = new TextDecoder(); + const events: any[] = []; + let buffer = ''; + + try { + while (true) { + const { value, done } = await reader.read(); + if (done) break; + buffer += decoder.decode(value, { stream: true }); + const blocks = buffer.split('\n\n'); + buffer = blocks.pop() ?? ''; + for (const block of blocks) { + const data = block + .split('\n') + .filter((line) => line.startsWith('data: ')) + .map((line) => line.slice(6)) + .join('\n') + .trim(); + if (!data) continue; + if (data === '[DONE]') return events; + events.push(JSON.parse(data)); + } + } + } finally { + reader.releaseLock(); + } + return events; +} + +describe('Responses web service Integration', function() { + let manager: FoundryLocalManager; + let model: IModel; + let modelId: string; + let baseUrl: string; + let skipped = false; + + before(async function() { + this.timeout(30000); + if (IS_RUNNING_IN_CI) { + skipped = true; + this.skip(); + return; + } + + manager = getTestManager(); + const cachedModels = await manager.catalog.getCachedModels(); + const cachedVariant = cachedModels.find((m) => m.alias === TEST_MODEL_ALIAS); + if (!cachedVariant) { + skipped = true; + this.skip(); + return; + } + + model = await manager.catalog.getModel(TEST_MODEL_ALIAS); + model.selectVariant(cachedVariant); + modelId = cachedVariant.id; + + await model.load(); + manager.startWebService(); + baseUrl = manager.urls[0]; + }); + + after(async function() { + if (skipped) return; + try { manager.stopWebService(); } catch { /* ignore */ } + try { await model.unload(); } catch { /* ignore */ } + }); + + it('should create a response through the OpenAI-compatible web service', async function() { + this.timeout(30000); + const response = await postResponse(baseUrl, { + model: modelId, + input: 'What is 2 + 2? Answer with just the number.', + temperature: 0, + max_output_tokens: 64, + store: false, + }); + expect(response.object).to.equal('response'); + expect(response.status).to.equal('completed'); + expect(getOutputText(response).length).to.be.greaterThan(0); + }); + + it('should stream response events through the OpenAI-compatible web service', async function() { + this.timeout(30000); + const events = await postStreamingResponse(baseUrl, { + model: modelId, + input: 'Count from 1 to 3.', + temperature: 0, + max_output_tokens: 64, + store: false, + }); + expect(events.some((event) => event.type === 'response.created')).to.equal(true); + expect(events.some((event) => event.type === 'response.output_text.delta')).to.equal(true); + expect(events.some((event) => event.type === 'response.completed')).to.equal(true); + }); + + it('should support Responses function calling through the web service', async function() { + this.timeout(30000); + const tools = [{ + type: 'function', + name: 'get_weather', + description: 'Get the current weather. This test always returns Seattle weather.', + parameters: { + type: 'object', + properties: {}, + additionalProperties: false, + }, + }]; + + const toolResponse = await postResponse(baseUrl, { + model: modelId, + input: 'Use the get_weather tool and then answer with the weather.', + tools, + tool_choice: 'required', + temperature: 0, + max_output_tokens: 64, + store: true, + }); + + const functionCall = toolResponse.output?.find((item: any) => item.type === 'function_call'); + expect(functionCall, JSON.stringify(toolResponse.output)).to.not.equal(undefined); + expect(functionCall.name).to.equal('get_weather'); + expect(functionCall.call_id).to.be.a('string'); + + const finalResponse = await postResponse(baseUrl, { + model: modelId, + previous_response_id: toolResponse.id, + input: [{ + type: 'function_call_output', + call_id: functionCall.call_id, + output: JSON.stringify({ location: 'Seattle', weather: '72 degrees F and sunny' }), + }], + tools, + temperature: 0, + max_output_tokens: 64, + store: false, + }); + + expect(finalResponse.status).to.equal('completed'); + expect(getOutputText(finalResponse).length).to.be.greaterThan(0); + }); +});