[bot] Add IBM watsonx.ai Python SDK integration for ModelInference generate, chat, and embedding instrumentation

## Summary

The IBM watsonx.ai Python SDK (`ibm-watsonx-ai`) is IBM's official client for the watsonx.ai platform, which hosts foundation models (Granite, Llama, Mistral, and others) on IBM Cloud and on-premises. The SDK provides a unique, non-OpenAI-compatible execution surface through `ModelInference.generate()`, `ModelInference.chat()`, streaming variants, and `TextEmbeddings.embed()`. This repository has zero instrumentation for any watsonx.ai SDK surface — no integration directory, no wrapper, no patcher, no `auto_instrument()` support.

Users who call `ibm-watsonx-ai` directly cannot use `wrap_openai()` or any other existing wrapper because `ModelInference` is a distinct client class with its own request/response schema. The IBM watsonx.ai API is not accessible through the Braintrust AI Proxy (which covers OpenAI-compatible endpoints). Enterprise users running watsonx.ai workloads get zero Braintrust spans today.

The SDK is actively maintained with frequent weekly releases (v1.5.12, December 2024). Comparable provider SDKs with dedicated native integrations in this repo: `anthropic`, `cohere`, `mistralai`, `google-genai`, `huggingface-hub`.

## What needs to be instrumented

The `ibm-watsonx-ai` package exposes these execution surfaces via `ModelInference`, none of which are instrumented:

### Text generation (highest priority)

| SDK Method | Description | Streaming |
|---|---|---|
| `ModelInference.generate(prompt, ...)` | Single-prompt text generation | No |
| `ModelInference.generate_stream(prompt, ...)` | Streaming text generation | `Generator` of dicts |

**Response shape:** `generate()` returns a dict with `results[0].generated_text`, `results[0].generated_token_count`, `results[0].input_token_count`, `results[0].stop_reason`, and `results[0].seed`. Token counts are directly available for span metrics.

### Chat completions

| SDK Method | Description | Streaming |
|---|---|---|
| `ModelInference.chat(messages, ...)` | Chat completions (OpenAI-message-format input) | No |
| `ModelInference.chat_stream(messages, ...)` | Streaming chat completions | `Generator` of dicts |

**Response shape:** `chat()` returns a dict with `choices[0].message.content`, `choices[0].finish_reason`, `usage.prompt_tokens`, `usage.completion_tokens`, `usage.total_tokens`. This mirrors an OpenAI-like response but comes from a `ModelInference` instance, not an OpenAI client.

### Embeddings

| SDK Method | Description |
|---|---|
| `TextEmbeddings.embed(inputs, ...)` | Generate embeddings for a list of texts |

**Return type:** dict with `results[0].embedding` (list of floats) and `results[0].input_token_count`.

## Implementation notes

**Client instantiation:** `ModelInference` takes a `model_id` string, `credentials` (API key + URL), and `project_id` or `space_id`. The `model_id` captures the foundation model used (e.g. `"ibm/granite-13b-chat-v2"`, `"meta-llama/llama-3-70b-instruct"`).

**Auth:** Uses IBM Cloud IAM tokens or API keys (not SigV4). VCR cassettes will need IBM IAM auth header sanitization.

**No async client:** The standard `ibm-watsonx-ai` library is synchronous. Async support may be added in a follow-up.

**Parameters relevant for span metadata:** `model_id`, `params` (contains `max_new_tokens`, `temperature`, `top_p`, `top_k`, `repetition_penalty`, `stop_sequences`, `decoding_method`).

## Proposed span shape

### `generate()` / `generate_stream()`

| Span field | Content |
|---|---|
| **input** | `prompt` |
| **output** | `generated_text` from first result |
| **metadata** | `provider: "ibm_watsonx"`, `model` (from `model_id`), generation params |
| **metrics** | `tokens`, `prompt_tokens`, `completion_tokens` |

### `chat()` / `chat_stream()`

| Span field | Content |
|---|---|
| **input** | `messages` |
| **output** | `choices[0].message.content` |
| **metadata** | `provider: "ibm_watsonx"`, `model` (from `model_id`), generation params |
| **metrics** | `tokens`, `prompt_tokens`, `completion_tokens` |

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/watsonx/`)
- No wrapper function (e.g. `wrap_watsonx()`)
- No patcher in any existing integration
- No nox test session (`test_watsonx`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`

A grep for `watsonx`, `ibm_watsonx`, or `ibm-watsonx` across `py/src/braintrust/` returns zero matches.

## Braintrust docs status

`not_found` — IBM watsonx.ai is not listed on the [Braintrust AI providers page](https://www.braintrust.dev/docs/integrations/ai-providers) or the [tracing guide](https://www.braintrust.dev/docs/guides/tracing). A direct docs page (`/docs/integrations/ai-providers/watsonx`) returns 404. There is no proxy path documented for watsonx.ai (which requires IBM Cloud IAM auth, not an OpenAI-compatible endpoint).

## Upstream references

- ibm-watsonx-ai on PyPI: https://pypi.org/project/ibm-watsonx-ai/ (v1.5.12, December 2024)
- IBM watsonx.ai Python SDK docs: https://ibm.github.io/watsonx-ai-python-sdk/
- ModelInference API reference: https://ibm.github.io/watsonx-ai-python-sdk/fm_model_inference.html
- TextEmbeddings API reference: https://ibm.github.io/watsonx-ai-python-sdk/fm_embeddings.html
- watsonx.ai foundation models: https://www.ibm.com/products/watsonx-ai/foundation-models

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `watsonx/` directory on `main`
- `py/src/braintrust/wrappers/` — no watsonx wrapper
- `py/noxfile.py` — no `test_watsonx` session
- `py/pyproject.toml` `[tool.braintrust.matrix]` — no watsonx entry
- `py/src/braintrust/integrations/__init__.py` — watsonx not listed
- `py/src/braintrust/integrations/versioning.py` — no watsonx version matrix
- Full repo grep for `watsonx`, `ibm_watsonx`, `ibm-watsonx` — zero matches in SDK source

Span field	Content
input	`prompt`
output	`generated_text` from first result
metadata	`provider: "ibm_watsonx"`, `model` (from `model_id`), generation params
metrics	`tokens`, `prompt_tokens`, `completion_tokens`

Span field	Content
input	`messages`
output	`choices[0].message.content`
metadata	`provider: "ibm_watsonx"`, `model` (from `model_id`), generation params
metrics	`tokens`, `prompt_tokens`, `completion_tokens`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bot] Add IBM watsonx.ai Python SDK integration for ModelInference generate, chat, and embedding instrumentation #480

Summary

What needs to be instrumented

Text generation (highest priority)

Chat completions

Embeddings

Implementation notes

Proposed span shape

`generate()` / `generate_stream()`

`chat()` / `chat_stream()`

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SDK Method	Description	Streaming
`ModelInference.generate(prompt, ...)`	Single-prompt text generation	No
`ModelInference.generate_stream(prompt, ...)`	Streaming text generation	`Generator` of dicts

SDK Method	Description	Streaming
`ModelInference.chat(messages, ...)`	Chat completions (OpenAI-message-format input)	No
`ModelInference.chat_stream(messages, ...)`	Streaming chat completions	`Generator` of dicts

[bot] Add IBM watsonx.ai Python SDK integration for ModelInference generate, chat, and embedding instrumentation #480

Description

Summary

What needs to be instrumented

Text generation (highest priority)

Chat completions

Embeddings

Implementation notes

Proposed span shape

generate() / generate_stream()

chat() / chat_stream()

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`generate()` / `generate_stream()`

`chat()` / `chat_stream()`