Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions rfcs/server_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# Persistent compiler processes

Build systems typically perform each compiler invocation as a separate process.
However, starting the compiler has a cost: starting a process has a cost, and
the compiler typically repeats some of the same actions in each invocation (eg
reading and unmarshalling `.cmi` files). That is why some compilers have a
**persistent** or **server** mode (see eg
https://per.bothner.com/papers/GccSummit03/gcc-server.pdf for GCC). Keeping a
single process for longer and passing multiple individual requests to the same
server can significantly reduce the amount of duplicate work and hence of
compilation times.

This RFC proposes the addition of such a "persistent" mode to the compiler tools
(`ocamlc`, `ocamlopt`, `ocamldep`, etc).

Each of these tools is extended with a new `-server` flag. When this flag is
passed, upon being launched the tool waits for requests on `stdin`. Whenever a
request arrives, the tool services the request and replies with a response on
`stdout`, and waits for the next request.

Each request encodes a command-line invocation of the tool and consits of:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is your aim you should add the environment.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, you are right.


1. a request id
2. a directory (the current working directory for the invocation)
3. an array of arguments (ie the `argv` of the invocation)

Each response consists of:

1. a request id (that of the corresponding request)
2. an exit code
3. two strings, containing the stdout and stderr of the invocation

The objective of this RFC is to gather feedback, decide if this is a direction
we want to go into, identify any blockers, etc.

## Prototype

I implemented a prototype in order to do some preliminary benchmarking:

https://github.com/ocaml/ocaml/compare/ocaml:ocaml:5.4...nojb:ocaml:server_mode_540?expand=1

Request:
```
REQ request-id number-of-arguments
current-directory
argument-1
...
argument-N
```
Response:
```
RES request-id exit-code out-length err-length
out-blob err-blob
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned below I would rather like to have a streaming response here and the exit code at the end.

Copy link
Copy Markdown
Contributor

@dbuenzli dbuenzli Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact perhaps the request should just pass a reference to three files for stdin, stderr and stdout and the response just writes back the exit code when it's done.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion, thanks. The client side will need to be more complex to read the files as they are being written to, but it may be cleaner overall.

Copy link
Copy Markdown
Author

@nojb nojb Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this is how Buck2's protocol works (they also pass the arguments using a file): https://buck2.build/docs/prelude/rules/core/worker_tool/#examples

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bazel, on the other hand, uses "captured" stdout and stderr (it does not separate them) a bit like the current prototype: https://bazel.build/remote/creating#work-responses

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact perhaps the request should just pass a reference to three files for stdin, stderr and stdout and the response just writes back the exit code when it's done.

That's a lot of extra temporary files. I'm concerned this wouldn't be cheap, esp. on Windows.

```

Note that the prototype branch has a Python script that can be used as a shim to
simulate the integration with a build system. Roughly one can invoke the shim as
one would the compiler today, but the shim starts a background process that
maintains a pool of workers to service incoming requests. I am using this shim
to test by build existing codebases which do not know anything about the
`-server` mode.

## Benchmark

TL;DR: the `-server` version is around 10-20% faster (depending on the size of
the file being compiled). The speedup is similar on Windows and Linux (I had
expected the speedup to be more pronounced on Windows, but in my tests that was
not the case).

I did the measurements using the following script `bench.sh` which repeats the
same compilation command (with flags given on the command line) some number of
times, once using separate process invocations and once using a single server
process.

```sh
#!/bin/bash

# Usage: bench.sh <numiter> <tool> [args...]

numiter=$1
shift

tool=$1
shift

time {
for i in $(seq 1 $numiter); do
$tool "$@"
done
}

time {
for i in $(seq 1 $numiter); do
echo "REQ t$i $#"
echo "$PWD"
for arg in "$@"; do
echo "$arg"
done
done
} | $tool -server >/dev/null
```

**Linux**

- `typecore.cmx` (30 times)
```
$ ./bench.sh 30 [...] -c typing/typecore.ml

real 0m31.926s
user 0m28.119s
sys 0m3.797s

real 0m29.009s
user 0m27.264s
sys 0m1.731s

# => 10% faster
```
- `clflags.cmx` (100 times)
```
$ ./bench.sh 100 [...] -c utils/clflags.ml

real 0m11.399s
user 0m8.854s
sys 0m2.616s

real 0m9.026s
user 0m8.010s
sys 0m0.894s

# => 20% faster
```

**Windows**

- `typecore.cmx` (30 times)
```
$ ./bench.sh 30 local/bin/ocamlopt.opt [...] -c typing/typecore.ml

real 0m35.535s
user 0m0.304s
sys 0m0.319s

real 0m32.102s
user 0m0.094s
sys 0m0.046s

# => 10% faster
```
- `clflags.cmx` (100 times)
```
$ ./bench.sh 100 local/bin/ocamlopt.opt [...] -c utils/clflags.ml

real 0m26.032s
user 0m0.940s
sys 0m0.990s

real 0m20.317s
user 0m0.154s
sys 0m0.124s

# => 20% faster
```

## Some technical details

- To avoid depending on `unix`, the simplest is to use `stdin` and `stdout` to
communicate with clients. In particular clients will need to guarantee not to
interleave requests (however, pipelining requests, ie have more than one
in-flight request at a time, presents no problem). The compiler will handle
each incoming request in a strictly sequential manner.

- One needs to reset all compiler state between requests. Luckily, we already
have some infrastructure to help with this: `Local_store`. For example, in the
prototype above, all top-level references (notably in `Clflags`) have been
switched from using `ref` to using `Local_store.s_ref`.

- All output (mostly error messages and diagnostic information) needs to be
saved to a buffer when in server-mode. In the prototype this is achieved by
replacing calls to `Stdlib.print_string` by a dedicated function which
captures the output so that it can be sent back to the client when the request
is complete.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This looks bad for usability (lag). You should output errors as soon as you hit them.

Also are there modes in the compilers which output on stdout and stderr? This could be a problem.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? This looks bad for usability (lag). You should output errors as soon as you hit them.

This sounds like a reasonable argument. On the other hand, I suspect most build systems (certainly Dune does) will buffer the output of commands until they finish before outputting them (to avoid interleaving output from different commands when executing in parallel), in which case, it won't make a difference whether we return the output in one go or in a streaming manner. (It is true that one can disable the buffering by passing -j1, though).


- Similarly, naked calls to `Stdlib.exit` must be replaced by an exception or a
similar mechanism to avoid terminating the process, and instead just return a
response to the client. Luckily we already had an exception for this purpose:
`Compenv.Exit_with_status`.

## Integration with Dune and other build systems

Some build systems define generic protocols for persistent worker processes (see
eg [Bazel](https://bazel.build/remote/persistent) and
[Buck2](https://buck2.build/docs/prelude/rules/core/worker_tool/)). Dune may
want to define its own generic protocol which we would then use in the
compiler side. Or we could define an ad-hoc protocol just for use of the
compiler (as I did in my prototype).

Preliminary discussion with @rgrinberg confirms that if this feature existed in
the compiler, there is apetite for it to be supported in Dune.

Technically, to integrate this feature, Dune would have to maintain a pool of
server processes, and dispatch with an RPC call each compilation command
(instead of spawning a new process as today).

## Future directions

Of course, saving on the process startup cost as in this proposal is only the
beginning. Once that is done, it opens the door to caching certain data between
requests, for example unmarshalled `.cmi` files, which is likely to further
reduce compilation times.

## Some references

- Bazel persistent worker protocol: https://blog.bazel.build/2015/12/10/java-workers.html (see also https://bazel.build/remote/persistent)
- Buck2 persistent worker protocol: https://buck2.build/docs/prelude/rules/core/worker_tool/
- GHC persistent worker plugin: https://github.com/MercuryTechnologies/ghc-persistent-worker (see also https://www.tweag.io/blog/2019-09-25-bazel-ghc-persistent-worker-internship/)