Skip to content

Run plot comms on the R thread#1100

Open
lionel- wants to merge 16 commits intomainfrom
task/sync-plot-comm
Open

Run plot comms on the R thread#1100
lionel- wants to merge 16 commits intomainfrom
task/sync-plot-comm

Conversation

@lionel-
Copy link
Copy Markdown
Contributor

@lionel- lionel- commented Mar 10, 2026

Progress towards #689
Branched from #1099
Addresses posit-dev/positron#12825 (planning to close)

I did not foresee all the simplifications falling out of this one!

  • Plots no longer live in their own threads, they live on the R thread. That frees up memory since each background thread takes about 2mb of stack space (depending on platform), so 20 open plots would take 40mb of memory.

  • DeviceContext moves from a standalone thread_local! into a regular field of Console. All access goes through Console::get().device_context(). The CommHandlerContext provides access to Console via ctx.console(), so plot handlers reach the device context through a controlled context chain.

  • The old idle-time polling loop (process_rpc_requests with crossbeam::Select) is removed. Plot RPCs are now dispatched synchronously through the shell handler on the R thread. This improves plot latency when pre-renderings require adjustments because we no longer timeout-poll for plot updates (related to Performance for plotting positron#5184).

  • The GraphicsDeviceNotification async channel is removed. Since the UI comm now runs on the R thread (from Run UI comm on the R thread #1099), prerender settings are updated synchronously via Console::get().device_context().set_prerender_settings(). This removes quite a bit of plumbing.

  • Fixes a dev.hold() regression I introduced with pre-renderings (Send pre-renderings of new plots to the frontend #775). We were previously sending pre-renderings unconditionally, now they are held until dev.flush(). This is tested by new integration tests.

@lionel- lionel- force-pushed the task/sync-plot-comm branch 2 times, most recently from 34baa70 to 845b4a7 Compare March 11, 2026 07:08
@lionel- lionel- requested a review from DavisVaughan March 11, 2026 13:47
Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have paused on review until we can get these regressions resolved

Comment thread crates/ark/tests/plots.rs Outdated
Comment thread crates/ark/src/console/console_repl.rs Outdated
Comment thread crates/ark/src/console/console_repl.rs Outdated
dap: Arc<Mutex<Dap>>,
session_mode: SessionMode,
) -> Self {
let device_context = DeviceContext::new(iopub_tx.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me to remove iopub_tx from DeviceContext now and only pull iopub_tx directly from Console

Over in graphics_device.rs, you can just Console::get().iopub_tx() anytime you need it now!

It feels like this better scopes responsibilities to me

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel the opposite. It hides that the device context directly communicates with the frontend via IOPub.

Also ideally we should avoid Console::get() calls as these bypass Rust's ownership model. But then the issue is that the device belongs to console and not the other way around. Cloning the IOPub channel is the right ownership structure from that point of view.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hides that the device context directly communicates with the frontend via IOPub

Hmm I see what you mean

Like the DeviceContext struct should have an "include what you use" kind of policy in the struct fields, and by not having iopub_tx in there, we mask what it uses

Comment on lines +128 to +130
Console::get()
.device_context()
.set_prerender_settings(params.settings);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we could now make DeviceContext have non-Cell internals?

I am disturbed by being able to call get() to then perform a mutable update, rather than being forced to go through get_mut()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cell is actually more reassuring, especially regarding the possibility of reentrancy (flushing calls process_changes which can recurse into graphics device callbacks).

In general get() is not a signal that the state is not mutable, it's only a signal that the state is shared, and can still be mutable e.g. through Cells or other mechanisms.

It's all a bit hand wavy considering that our callbacks might produce multiple active get_mut() at the same time, but adding to the unsoundness would be... unsound ;)

Comment on lines -503 to -513
// Don't try to render a plot if we're currently drawing.
if self.is_drawing.get() {
log::trace!("Refusing to render due to `is_drawing`");
return;
}

// Don't try to render a plot if someone is asking us not to, i.e. `dev.hold()`
if !self.should_render.get() {
log::trace!("Refusing to render due to `should_render`");
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure these are important

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we now process the requests when R is idle, is_drawing is structurally false, we're never mid-draw.

Regarding should_render (matching dev.hold()), it's true that holds can span across prompts, I can see that happening when developing a plot function interactively. This would require:

  • Creating a plot that Positron knows about (opened comm)
  • Calling dev.hold() in the console
  • Resizing the plot in the frontend UI

So that's a very narrow case. We show the uncommitted visual changes in that case, which seems like a reasonable compromise. The alternative would be to refuse to render, producing a distorted plot in the frontend after resize.

Comment thread crates/ark/src/plots/graphics_device.rs
Comment thread crates/ark/src/plots/graphics_device.rs
Comment thread crates/ark/src/plots/graphics_device.rs
Comment thread crates/ark/src/plots/graphics_device.rs
@lionel- lionel- force-pushed the task/sync-ui-comm branch 3 times, most recently from bd54059 to be02717 Compare March 20, 2026 06:19
Base automatically changed from task/sync-ui-comm to main March 20, 2026 06:50
@lionel- lionel- force-pushed the task/sync-plot-comm branch 2 times, most recently from 7386726 to be3d85c Compare March 31, 2026 13:24
@lionel- lionel- force-pushed the task/sync-plot-comm branch from be3d85c to 74abf66 Compare April 3, 2026 09:00
@lionel- lionel- requested a review from DavisVaughan April 3, 2026 12:13
@lionel-
Copy link
Copy Markdown
Contributor Author

lionel- commented Apr 3, 2026

I also fixed a race between comm_open and subsequent comm_msg, the tests revealed it was possible to get the latter before the former. That's because the Shell thread is in charge of sending comm_open on IOPub, so we had a triangular configuration where the R thread and the Shell thread were racing to send their messages on IOPub.

To fix this, I made comm opening blocking. There is a new CommEvent::Barrier variant to support this. On the Shell side, the shell thread now uses a select loop that drains comm events while a request is in flight.

A nice consequence of this is that the frontend now receives new plots as soon as they are created, rather than at the end of a loop:

for (i in 1:3) {
  plot(i)
  Sys.sleep(1)
}

I also noticed that posit-dev/positron#6736 is broken again.
I've opened posit-dev/positron#12825 for this.

And of course this is all now under test coverage.

Copy link
Copy Markdown
Contributor

@DavisVaughan DavisVaughan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two main comments to discuss, other things are minor:

  • Not sure about passing console through handle_msg and friends
  • Not sure about CommEvent::Barrier vs passing done through CommEvent::Opened

See below for more on both

My test file seems to be working fine now

Comment on lines +400 to +408
let ready = sel.ready();

while let Ok(event) = self.comm_event_rx.try_recv() {
self.process_comm_event(event);
}

if ready == resp_idx {
break response_rx.recv().unwrap();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even if sel.ready() returns the execute response as the first "ready" thing, you want to go off and process the comm_events first?

I think I was expecting to see

if ready == resp_idx {
    break response_rx.recv().unwrap();
}

right after let ready =

Comment on lines +74 to +75
// Block until Shell has processed the Opened event, ensuring the
// `comm_open` message is on IOPub before we return. Any updates
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a gut check, but is this really what it ensures? In CommEvent::Opened we do an iopub_tx.send() call, but that doesn't necessarily mean that IOPub has received it by the end of CommEvent::Opened and the following CommEvent::Barrier event is processed.

I'm not sure if that matters.

dap: Arc<Mutex<Dap>>,
session_mode: SessionMode,
) -> Self {
let device_context = DeviceContext::new(iopub_tx.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It hides that the device context directly communicates with the frontend via IOPub

Hmm I see what you mean

Like the DeviceContext struct should have an "include what you use" kind of policy in the struct fields, and by not having iopub_tx in there, we mask what it uses

Comment on lines +686 to +687
#[cfg(feature = "testing")]
pub fn test_init() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this up with start() and call it test_start()?

}

fn handle_msg(&mut self, msg: CommMsg, ctx: &CommHandlerContext) {
fn handle_msg(&mut self, msg: CommMsg, ctx: &CommHandlerContext, _console: &Console) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I agree with passing console through all these handlers

It seems like the only current use that I can see is in impl CommHandler for PlotComm for handle_msg() and handle_close().

But those could just do Console::get().

It doesn't seem like it helps clarify ownership very much, because so many other things in graphics_device.rs also just do Console::get() already. And it seems likely that in handle_rpc() (called by handle_msg()) we could very easily just do another Console::get() rather than passing through the &Console we got from handle_msg(). It seems complex to me to know when to use which pattern.

It also seems like this _console: &Console argument is the reason for all of the new test_init() infra, and overall this just seems like it adds quite a bit of complexity for (to me) not a lot of obvious benefit 😢

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, should on_did_execute_request() gain a console: &Console argument and thread it through process_changes -> process_new_plot -> should_use_dynamic_plots to the Console::get call there?

I don't think so, I feel like it makes sense that we just invoke Console::get() from there as required like we do everywhere else.

But now introducing this new pattern of handle_msg() and friends having a console passed through confuses me and makes me question how we should do things in all the other cases, and I'm not sure that's worth it

/// but we can still render this intermediate state and show it to the user until
/// they add more plots or advance to another new page.
#[tracing::instrument(level = "trace", skip_all)]
pub(crate) fn on_did_execute_request() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on_did_execute_request() should perhaps be a &self method

It is only called in handle_active_request(), but we have Console there as self, hence we also have DeviceContext, so we can just call self.device_context().on_did_execute_request() from there?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am also a bit weirded out by the ownership implications of all this (both the current state of things and my proposal)

self.device_context().on_did_execute_request() would call DeviceContext::process_changes(), but that goes through process_new_plot() and should_use_dynamic_plots() and you end up accessing the "parent" owner of DeviceContext from there via Console::get(), which feels very mind boggling!

// Save our new socket.
// Refcell Safety: Short borrows in the file.
self.sockets.borrow_mut().insert(id.clone(), socket);
match Console::get_mut().comm_open_backend(PLOT_COMM_NAME, Box::new(plot_comm)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another place where it feeeels weird that DeviceContext is a field of Console, yet refers to Console as well

Comment thread crates/ark/src/console.rs
}

pub(crate) struct Console {
pub struct Console {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Comment thread crates/ark/tests/plots.rs
@@ -699,3 +699,481 @@ fn test_plot_default_size_without_metadata() {
frontend.recv_iopub_idle();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random note that a cargo build --release is throwing this for me right now

warning: unused import: `libr::SEXP`
  --> crates/ark/src/r_task.rs:19:5
   |
19 | use libr::SEXP;
   |     ^^^^^^^^^^

I think this type is only used in debug builds?

Comment thread crates/ark/tests/plots.rs
frontend.recv_shell_execute_reply();
}

/// Test that `dev.hold()` suppresses intermediate plot output.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like plot(lm(disp ~ drat, data = mtcars)) is working again, but I dont see a backend test for it?

@lionel-
Copy link
Copy Markdown
Contributor Author

lionel- commented Apr 3, 2026

Ok let's discuss Console access by comms in our next sync

@lionel- lionel- force-pushed the task/sync-plot-comm branch from 49a6715 to 9a0725f Compare April 17, 2026 12:16
@lionel-
Copy link
Copy Markdown
Contributor Author

lionel- commented Apr 17, 2026

Edit: Moved to #1161

Updates following our sync (progress towards #1145).

Console::get() pushed to the FFI boundary

The graphics device callbacks are plain C function pointers - there's no way to pass Rust state through them. Console::get() is unavoidable there. But everything past that boundary now receives &Console explicitly. The C callbacks are thin shims: capture Console::get() once, pass it inward.

DeviceContext no longer reaches back into Console

Previously DeviceContext (a field of Console) called Console::get() internally for should_use_dynamic_plots, capture_execution_context, and comm_open_backend. Now:

  • session_mode is stored on DeviceContext directly (it's static, set once at startup)
  • capture_execution_context returns a default when no context was pushed, removing the Console fallback entirely
  • comm_open_backend takes &self (via RefCell on comms), so DeviceContext calls it through the &Console parameter it already has

PlotComm owns what it needs

PlotComm is the only CommHandler that used the console: &Console parameter (for device_context()). Instead of threading Console through the entire CommHandler trait, PlotComm holds an Rc<DeviceContext>. The console parameter is gone from the trait entirely. Other handlers that need Console access (RDataExplorer opening sub-explorers) call Console::get() themselves - they're at a boundary where that's appropriate.

Comm dispatch is &self throughout

All methods in console_comm.rs take &self. Handler dispatch uses a take/remove pattern: the comm is temporarily removed from its RefCell storage before calling the handler, so no borrow guard is held during the call. This prevents panics if the handler reenters the RefCell (e.g. RDataExplorer calling comm_open_backend). The pattern is encapsulated in helpers (with_ui_comm_mut, with_comm_mut, etc.).

Shell deadlock fix

comm_open_backend blocks on a barrier in CommEvent::Opened to guarantee IOPub ordering. Previously, if a handler called comm_open_backend during comm_msg dispatch, Shell was blocked waiting for the handler to finish and couldn't process the barrier. Now handle_comm_msg, handle_comm_close, and handle_comm_open return a completion receiver instead of blocking. Shell select-loops on it, draining comm events while the handler runs - same pattern that handle_execute_request already used. handle_comm_open is also sync now (was async behind block_on).

Changelog

  • Explicit &Console on DeviceContext methods**: process_changes, hooks, and their callees receive console: &Console as a parameter instead of calling Console::get() internally. C callbacks and #[harp::register] functions are the only place Console::get() appears.

  • on_execute_request / on_did_execute_request moved to Console methods in a new console_graphics.rs file, replacing free functions that called Console::get().

  • DeviceContext wrapped in Rc: Console holds Rc<DeviceContext>. PlotComm gets its own Rc<DeviceContext> at construction, giving it direct access without needing &Console.

  • console: &Console removed from CommHandler trait: No handler receives &Console anymore. PlotComm uses its Rc<DeviceContext>. Other handlers that need Console access call Console::get() themselves.

  • session_mode stored on DeviceContext: Passed at construction time instead of queried from Console on every should_use_dynamic_plots call.

  • capture_execution_context simplified: The Console fallback (console.get_execution_context()) removed. Returns ExecutionContext::default() when no context was pushed. get_execution_context method on Console removed as dead code.

  • comms field wrapped in RefCell: Enables comm_open_backend to take &self instead of &mut self, so DeviceContext can call it through its &Console parameter.

  • UI comm stored in its own RefCell field: Separated from the comms HashMap so ui_comm() can borrow independently.

  • comm_id added to ConsoleComm: The UI comm (stored outside the HashMap) carries its own ID for message routing. Eliminated the UiCommEntry wrapper type.

  • All console_comm.rs dispatch methods take &self: Use take/remove pattern via helpers (with_ui_comm_mut, with_comm_mut, take_ui_comm, take_comm, etc.) to avoid holding RefCell borrows during handler dispatch.

  • Shell deadlock fix: handle_comm_msg, handle_comm_close, and handle_comm_open on the ShellHandler trait return a completion Receiver<()> instead of blocking. Amalthea's Shell select-loops on it via drain_comm_events_until to process CommEvents while the handler runs.

  • handle_comm_open made sync: Was async (always called via block_on). Now sync with completion receiver, consistent with handle_comm_msg / handle_comm_close.

  • CommEvent::Barrier folded into CommEvent::Opened: The barrier Sender<()> is now an optional third field on Opened, eliminating a separate event variant.

  • drain_comm_events_until extracted as generic helper: Used by handle_execute_request, handle_comm_msg_request, handle_comm_close_request, and handle_comm_open_request via handle_comm_notification.

@lionel- lionel- force-pushed the task/sync-plot-comm branch 2 times, most recently from a247319 to 06b0cc8 Compare April 22, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants