Skip to content

[pull] master from ray-project:master#3975

Merged
pull[bot] merged 6 commits intomiqdigital:masterfrom
ray-project:master
Mar 16, 2026
Merged

[pull] master from ray-project:master#3975
pull[bot] merged 6 commits intomiqdigital:masterfrom
ray-project:master

Conversation

@pull
Copy link

@pull pull bot commented Mar 16, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

elliot-barn and others added 6 commits March 16, 2026 09:46
updating lock file for ci py3.10 deps

Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
…61708)

The compile_pip_requirements rule used the autodetecting Python
toolchain,
which resolved to system python.

Fix by inlining the compile_pip_requirements logic as a py_binary +
py_test
pair with exec_compatible_with = ["//bazel:py310"], which forces Bazel
to
select the hermetic Python 3.10 toolchain already registered in
WORKSPACE.

Topic: fix-requirements-update
Signed-off-by: andrew <andrew@anyscale.com>

Signed-off-by: andrew <andrew@anyscale.com>
## Summary
- Add client IP:port to Ray Serve HTTP access logs
- Thread client address from the proxy through the request context and
metadata to the replica
- Handle both proxy-routed and direct ingress HTTP paths

For services behind a load balancer, uvicorn's `ProxyHeadersMiddleware`
(enabled by default) resolves `X-Forwarded-For` into `scope["client"]`
automatically, so the logged IP reflects the original client when
`FORWARDED_ALLOW_IPS` is configured.

## How It Works

The client IP is available at the entry point (proxy or direct ingress
replica) but needs to reach the replica's access log, which runs in a
separate process. The data flows through existing infrastructure:

```
External Client (10.0.91.46:54321)
       |
   [ Proxy ]
       |  1. Reads scope["client"] via proxy_request.client
       |  2. format_client_address() formats the raw tuple into "host:port"
       |  3. Logs it in the proxy access log
       |  4. Passes it into _RequestContext._client
       |
   [ DeploymentHandle ]
       |  5. default_impl.py copies _RequestContext._client → RequestMetadata._client
       |
   [ Replica ]
       |  6. Reads request_metadata._client and logs it in the replica access log
```

For **direct ingress HTTP** (replica serves HTTP directly, no proxy),
the replica reads `scope["client"]` itself and formats it with the same
`format_client_address()`.

---

## Update: Feature flag gating

Per review feedback, the client IP logging is now gated behind a feature
flag that is **off by default**:

```
RAY_SERVE_LOG_CLIENT_ADDRESS=1
```

The gate is centralized in `access_log_msg()` in `logging_utils.py` —
when the flag is off, the `client` parameter is ignored and the log
format is unchanged from before this PR. The client address data still
flows through the request context, but is simply not rendered in logs
unless the flag is enabled.

**Tests:** Added a parametrized integration test
(`test_http_access_log_client_address`) that verifies both flag states —
client IP present when on, absent when off.

---------

Signed-off-by: harshit <harshit@anyscale.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Bazel 7 removed the exec_tools attribute from genrule; patch protobuf's
BUILD files to use tools instead.

Signed-off-by: andrew <andrew@anyscale.com>
…gs for Bazel 7 (#61695)

gRPC's grpc_deps() pulls in rules_apple 1.1.3 which uses
apple_common.multi_arch_split, removed in Bazel 7. Override to 3.2.1
(compatible with Bazel 6/7/8) before grpc_deps() runs so the maybe()
call is a no-op.

rules_apple 3.2.1 requires apple_support >= 1.11.1. Patch
is_xcode_at_least_version to return False instead of fail() on
CLT-only CI machines where xcode_config.xcode_version() is None.

Set BAZEL_NO_APPLE_CPP_TOOLCHAIN=1 to skip apple_cc_toolchain on
CLT-only machines where it would fail with "Xcode version must be
specified". Override -mmacosx-version-min to 12.0 to satisfy
std::filesystem and std::variant requirements (generic toolchain
defaults to 10.11).

Signed-off-by: andrew <andrew@anyscale.com>
…Killer (#60330)

## Description
In recent investigations on memory usage issues, we found that:
* After a worker becomes IDLE after done with a task, the worker can
still take comparatively large amount of space of memory space (~1GB)
* The current Ray OOM killer will only kill worker processes with a task
scheduled on it

To solve the above findings, ideally we should investigate the root
cause of why the IDLE workers still takes up large memory space, then
fix the memory usage issue and/or update the OOM killer's logic based on
the findings.

While the above are happening, a short term mitigation that can help
with the situation is for the OOM killer to prioritizing killing those
IDLE workers that occupies large memory space.

This PR implements the short term mitigation by:
1. Add a ray config `idle_worker_killing_memory_threshold_bytes` to
indicate the threshold of whether the OOM killer should consider killing
the IDLE worker. The default is set to 1GB. We need a threshold because
we want to avoid the case where the freshly created IDLE workers from
worker pre-start are being killed. This is because killing those IDLE
workers won't help much with the memory usage.
2. Update the current OOM killer logic to check and pick a IDLE worker
to kill if it possible, before applying the current memory killing
logic.
3. Update the `ray_memory_manager_worker_eviction_total` metric to
include `MemoryManager.IdleWorkerEviction.Total` type to track the
number of idle worker termination
4. Add the corresponding test cases
5. Did some code cleanup along the way

## Related issues
N/A

## Additional information
Log line changes. For killing the idle worker, we will output the
following log line:

```
[2026-02-27 23:33:10,289 I 3779325 3779325] (raylet) node_manager.cc:3078: Killing 1 worker(s), kill details: Memory on the node (IP: 172.31.14.189, ID: 60e4ea8d3b8fc0f4e99ed19c87bf5f9282797707af6e5babca343c7d) was 31.01GB / 62.01GB (0.500018), which exceeds the memory usage threshold of 0.500000; Object store memory usage: [- objects spillable: 0; - bytes spillable: 0; - objects unsealed: 0; - bytes unsealed: 0; - objects in use: 0; - bytes in use: 0; - objects evictable: 0; - bytes evictable: 0; ; - objects created by worker: 0; - bytes created by worker: 0; - objects restored: 0; - bytes restored: 0; - objects received: 0; - bytes received: 0; - objects errored: 0; - bytes errored: 0; ; Eviction Stats:; (global lru) capacity: 104857600; (global lru) used: 0%; (global lru) num objects: 0; (global lru) num evictions: 0; (global lru) bytes evicted: 0]; Ray killed 1 worker(s) based on the killing policy:[Worker with no lease granted: job ID=01000000, pid=3779512, required resources={CPU: 1}, actual memory used=1.18GB, worker ID=23139757f947661f6be1db7c25ee7b7ce449c21e927bdc2134d9b08e)]; To see more information about memory usage on this node, use `ray logs raylet.out -ip 172.31.14.189`; Top 10 memory users: PID     MEM(GB) COMMAND, 3779511        18.92   ray::allocate_memory, 3779512   1.18    ray::IDLE, 3779514    1.18    ray::IDLE, 3779513      1.17    ray::IDLE, 3779519      1.17    ray::IDLE, 3752505   0.95     bazel(core-1792) --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.confi..., 3753337      0.86    /home/ubuntu/.cursor-server/cli/servers/Stable-7b98dcb824ea96c9c62362a5e80dbf0d1aae4770/server/node ..., 3754326      0.82    /home/ubuntu/.cursor-server/cli/servers/Stable-7b98dcb824ea96c9c62362a5e80dbf0d1aae4770/server/node ..., 3760346      0.76    /home/ubuntu/.cursor-server/extensions/ms-vscode.cpptools-1.23.6-linux-x64/bin/cpptools-srv 3753753 ..., 3753753      0.65    /home/ubuntu/.cursor-server/extensions/ms-vscode.cpptools-1.23.6-linux-x64/bin/cpptools, suggestions: Refer to the documentation on how to address the out of memory issue: https://docs.ray.io/en/latest/ray-core/scheduling/ray-oom-prevention.html. Consider provisioning more memory on this node or reducing task parallelism by requesting more CPUs per task. To adjust the kill threshold, set the environment variable `RAY_memory_usage_threshold` when starting Ray. To disable worker killing, set the environment variable `RAY_memory_monitor_refresh_ms` to zero.
```
Followup action items:
* Investigate and fix the reason why the workers still take up large
amount of memory after being IDLE
* Based on the above investigation, improve the memory killer with
better heuristic in killing the worker processes

---------

Signed-off-by: myan <myan@anyscale.com>
Signed-off-by: Mengjin Yan <mengjinyan3@gmail.com>
Co-authored-by: Ibrahim Rabbani <israbbani@gmail.com>
Co-authored-by: Kunchen (David) Dai <54918178+Kunchd@users.noreply.github.com>
@pull pull bot locked and limited conversation to collaborators Mar 16, 2026
@pull pull bot added the ⤵️ pull label Mar 16, 2026
@pull pull bot merged commit 4902739 into miqdigital:master Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants