[ISSUE] requests.post() and requests.get() calls in oauth.py have no timeout, can hang indefinitely

## Description

`requests.post()` and `requests.get()` calls in `databricks/sdk/oauth.py` do not pass a `timeout=` parameter. When the OAuth endpoint is unreachable or slow at the moment of token refresh, these calls block indefinitely. The SDK's per-request timeout (`_BaseClient._http_timeout_seconds`) does not protect against this because the token refresh runs inside `session.auth` (the `header_factory` callback), which executes *before* the request timeout takes effect.

Three call sites are affected:

| Location | Line | Call |
|---|---|---|
| `retrieve_token()` | 208 | `requests.post(token_url, params, auth=auth, headers=headers)` |
| `get_azure_entra_id_workspace_endpoints()` | 521 | `requests.get(f"{host}/oidc/oauth2/v2.0/authorize", allow_redirects=False)` |
| `PATOAuthTokenExchange.refresh()` | 889 | `requests.post(token_exchange_url, params)` |

This is related to but distinct from #1046, which is about the `_BaseClient.do()` retry timeout for OIDC endpoint discovery. The calls listed above bypass `_BaseClient` entirely and use the `requests` library directly with no timeout.

## Actual behavior

`requests.post()` and `requests.get()` in `oauth.py` block indefinitely when the remote endpoint is unreachable or slow, because no `timeout=` parameter is passed. The SDK's per-request timeout (`session.request(timeout=60)`) does not help because the token refresh runs inside `session.auth`, before the timeout takes effect.

## Expected behavior

`requests.post()` and `requests.get()` calls in `oauth.py` should include a `timeout=` parameter so that unreachable endpoints cause a timeout exception rather than an indefinite hang.

### Impact

We run an MLflow evaluation pipeline on Databricks that makes hundreds of API calls over ~60 minutes. MLflow's `get_workspace_client()` caches the `WorkspaceClient` via `@lru_cache`, so the same credential provider persists for the process lifetime. The M2M OAuth token (TTL = 3600s, server-dictated) expires at ~59m20s (accounting for the SDK's 40s early-expiry buffer in `Token.expired`). When the next API call triggers a synchronous token refresh and the OAuth endpoint is slow at that moment, `requests.post()` in `retrieve_token()` blocks indefinitely, hanging the entire CI pipeline until the job-level timeout kills it.

### Network failure modes that block forever without `timeout=`

| Failure mode | Phase that blocks | Why it blocks forever |
|---|---|---|
| Firewall DROP (SYN, no reply) | `connect()` | TCP retransmits SYN until OS gives up (~2-4 min on Linux, longer on macOS) |
| Server stall (connected, no response) | `recv()` | Connection established, request sent, server never sends response data |
| Proxy / load balancer stall | `recv()` | Backend unavailable but frontend holds connection open |
| TLS negotiation stall | `ssl.do_handshake()` | TCP connected but peer never completes TLS handshake |

All four are resolved by adding `timeout=` to `requests.post()`/`requests.get()`.

### Network failure modes that fail fast (with or without `timeout=`)

| Failure mode | Why it fails fast |
|---|---|
| DNS failure | `getaddrinfo()` returns error immediately |
| Host unreachable (ICMP) | OS receives ICMP unreachable, `connect()` returns error |
| Port closed (RST) | Server sends TCP RST, `connect()` returns `ConnectionRefused` |
| Server crash after accept | OS sends RST or FIN, `recv()` returns error or EOF |

## Reproduction

The script below reproduces the "server stall" row from the first table above: a local TCP server accepts the connection but never responds, causing `requests.post()` to block on `recv()` indefinitely. The `timeout=` parameter protects against all four blocking failure modes in that table, since it covers both the connect and read phases.

No Databricks credentials are needed — this demonstrates the defect in `requests.post()` as called by the SDK, not a specific production failure:

See attached `reproduce_hang.py` — run with `pip install requests && python reproduce_hang.py` (~30 seconds, no credentials needed).

[reproduce_hang.py](https://github.com/user-attachments/files/26104236/reproduce_hang.py)

Expected output:
```
CONFIRMED: requests.post() with no timeout blocked for >10s (killed)
CONFIRMED: session.get(timeout=60) does NOT protect auth callback
CONFIRMED: requests.post(timeout=5) raises ReadTimeout after 5.1s

All three tests pass. Fix: add timeout= to requests.post()/get() in oauth.py.
```

Code path of the hang:

```
client.current_user.me()
→ _BaseClient._perform() → session.request(timeout=60)
→ session.auth → Config.authenticate() → credential_provider.token()
→ Refreshable._blocking_token() [token is EXPIRED]
→ ClientCredentials.refresh() → retrieve_token()
→ requests.post(token_url, params, auth=auth, headers=headers)
   ↑ NO timeout= PARAMETER — BLOCKS FOREVER
```

## Suggested fix

Suggested fix:

```python
# retrieve_token(), line 208:
resp = requests.post(token_url, params, auth=auth, headers=headers, timeout=30)

# get_azure_entra_id_workspace_endpoints(), line 521:
res = requests.get(f"{host}/oidc/oauth2/v2.0/authorize", allow_redirects=False, timeout=30)

# PATOAuthTokenExchange.refresh(), line 889:
resp = requests.post(token_exchange_url, params, timeout=30)
```

Ideally the timeout value would come from `Config.http_timeout_seconds` (defaulting to the SDK's standard 60s) rather than a hardcoded value.


## Is it a regression?

No — these calls have never had a `timeout` parameter.

## Debug Logs

Not applicable — the process hangs with no output. A `faulthandler` stack dump shows the thread blocked in `ssl.read` → `urllib3` → `requests.post` inside `retrieve_token()`.

## Other Information
 - OS: macOS (Darwin 24.6.0)
 - Version: 0.96.0
 - Python: 3.11.13

## Additional context

Related: #1046 (non-configurable timeout for `_BaseClient.do()` in OIDC endpoint discovery — same family of issue, different code path). PR #1085 addresses #1046 but does not cover the `requests.post()`/`requests.get()` calls listed above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ISSUE] requests.post() and requests.get() calls in oauth.py have no timeout, can hang indefinitely #1338

Description

Actual behavior

Expected behavior

Impact

Network failure modes that block forever without `timeout=`

Network failure modes that fail fast (with or without `timeout=`)

Reproduction

Suggested fix

Is it a regression?

Debug Logs

Other Information

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Location	Line	Call
`retrieve_token()`	208	`requests.post(token_url, params, auth=auth, headers=headers)`
`get_azure_entra_id_workspace_endpoints()`	521	`requests.get(f"{host}/oidc/oauth2/v2.0/authorize", allow_redirects=False)`
`PATOAuthTokenExchange.refresh()`	889	`requests.post(token_exchange_url, params)`

Failure mode	Phase that blocks	Why it blocks forever
Firewall DROP (SYN, no reply)	`connect()`	TCP retransmits SYN until OS gives up (~2-4 min on Linux, longer on macOS)
Server stall (connected, no response)	`recv()`	Connection established, request sent, server never sends response data
Proxy / load balancer stall	`recv()`	Backend unavailable but frontend holds connection open
TLS negotiation stall	`ssl.do_handshake()`	TCP connected but peer never completes TLS handshake

Failure mode	Why it fails fast
DNS failure	`getaddrinfo()` returns error immediately
Host unreachable (ICMP)	OS receives ICMP unreachable, `connect()` returns error
Port closed (RST)	Server sends TCP RST, `connect()` returns `ConnectionRefused`
Server crash after accept	OS sends RST or FIN, `recv()` returns error or EOF

[ISSUE] requests.post() and requests.get() calls in oauth.py have no timeout, can hang indefinitely #1338

Description

Description

Actual behavior

Expected behavior

Impact

Network failure modes that block forever without timeout=

Network failure modes that fail fast (with or without timeout=)

Reproduction

Suggested fix

Is it a regression?

Debug Logs

Other Information

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Network failure modes that block forever without `timeout=`

Network failure modes that fail fast (with or without `timeout=`)