[Core] Publish platform events via Ray Event Recorder#63329
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the PlatformEventBuilder class to support infrastructure platform events, such as those from Kubernetes, and integrates it into the Ray dashboard's observability module. The changes include initializing the EventRecorder in the dashboard head and emitting events during processing callbacks. Feedback suggests allowing the EventRecorder to generate unique IDs for event updates to avoid deduplication issues, refactoring environment variable checks for efficiency, and moving imports out of the event processing hot path.
| if os.environ.get("RAY_ENABLE_PYTHON_RAY_EVENT", "False").lower() in ( | ||
| "true", | ||
| "1", | ||
| ): | ||
| try: | ||
| from ray._common.observability.platform_events import ( | ||
| PlatformEventBuilder, | ||
| ) | ||
| from ray._raylet import EventRecorder |
There was a problem hiding this comment.
8718785 to
b35d418
Compare
b35d418 to
bac7c35
Compare
bac7c35 to
271db30
Compare
a8a923c to
1e59f09
Compare
|
@sampan-s-nayak could you please help take a pass at this PR whenever possible for you? Thanks! |
| @@ -69,32 +103,56 @@ def thread_safe_callback(ray_event: RayEvent): | |||
| ) | |||
|
|
|||
| def _process_event_callback(self, ray_event: RayEvent): | |||
| """Callback running in the main asyncio loop to cache events.""" | |||
| """Thread-safe entry point that dispatches event caching to the main asyncio loop.""" | |||
|
@sampan-s-nayak PTAL |
64e7752 to
07e69ed
Compare
8a3ecd9 to
61c73d8
Compare
38f99f7 to
7286bd3
Compare
|
@sampan-s-nayak I think all comments are now addressed, any other concerns or feedback you have for this? |
sampan-s-nayak
left a comment
There was a problem hiding this comment.
thanks for working on this, LGTM
|
@richabanker looks like there are a couple of CI failures |
ccd1cc8 to
961765a
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
Reviewed by Cursor Bugbot for commit 961765aa39c29585ca1a20a9ca786f02368da2bb. Configure here.
| ) | ||
| EventRecorder.emit(cython_event) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to emit platform event via EventRecorder: {e}") |
There was a problem hiding this comment.
Missing initialization guard causes repeated emit failures per event
Low Severity
If EventRecorder.initialize() is never called (because head_node_id_hex is None or the agent address lookup returns None), _process_event_internal still attempts EventRecorder.emit() on every single incoming event. Each call hits the except branch and logs a warning. In a busy Kubernetes cluster this could produce hundreds of warning-level log messages per minute with no mechanism to stop retrying.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 961765aa39c29585ca1a20a9ca786f02368da2bb. Configure here.
Signed-off-by: Richa Banker <richabanker@google.com>
Signed-off-by: Richa Banker <richabanker@google.com>
Signed-off-by: Richa Banker <richabanker@google.com>
e9beaf2 to
5936b11
Compare
Fixed them now, thanks! |


Description
Add support for publishing Platform events via the python ray event exporter framework