Skip to content

Agent Deployment Modes

skobeltsyn edited this page May 4, 2026 · 2 revisions

Agent Deployment Modes

Same agent, three ways to ship it. Each mode is one line of glue away from the next — you can start small and eject the agent into autonomy when it earns its own deployment.

                             ┌─────────────────────────┐
   library mode              │  agent { skills { } }   │  in your JVM
                             └────────────┬────────────┘
                                          │  +1 line
                                          ▼
                             ┌─────────────────────────┐
   hosted mode               │  McpServer.from(agent)  │  in your JVM, but addressable
                             └────────────┬────────────┘
                                          │  +1 line
                                          ▼
                             ┌─────────────────────────┐
   autonomous mode           │  McpRunner.serve(...)   │  its own process / JAR / image
                             └─────────────────────────┘

Three modes at a glance

Mode Glue Where it runs Who can call it
Library agent<IN, OUT>("...") { skills { ... } } Inside your existing JVM, in-process Your Kotlin code, with full type safety
Hosted + McpServer.from(agent) { expose("...") }.start() Inside your existing JVM, plus an MCP endpoint Your Kotlin code (typed) AND any MCP client (over HTTP)
Autonomous fun main(args) = exitProcess(McpRunner.serve(agent, args)) Its own process / JAR / Docker image / native binary Any MCP client, anywhere on the network

1. Library mode — agent as a function

The default. An agent is just a Kotlin function with type-safe inputs and outputs:

val classifier = agent<String, Category>("classifier") {
    skills {
        skill<String, Category>("classify", "Classifies free-text input") {
            implementedBy { text -> Category.of(text) }
        }
    }
}

val result: Category = classifier("Order status request")
  • Zero overhead. No serialization, no wire, no protocol. Direct Kotlin call.
  • Type-safe at compile time. classifier(42) won't compile.
  • Deploy boundary is the host app. The agent ships when your app ships.

Best for: embedding agent logic in an existing Kotlin service, batch jobs, scripts, libraries you publish to Maven Central.


2. Hosted mode — same agent, plus an MCP endpoint

Add McpServer.from(agent).start() inside your service. The agent stays callable internally (typed, zero overhead), and external MCP clients can also reach it over HTTP:

fun main() {
    val classifier = agent<String, Category>("classifier") { /* same as above */ }

    // Existing internal callers — unchanged, still typed:
    val result: Category = classifier("Order status request")

    // New: the same agent, also addressable over MCP
    val server = McpServer.from(classifier) {
        port = 8080
        expose("classify")
    }.start()

    Runtime.getRuntime().addShutdownHook(Thread { server.stop() })
    Thread.currentThread().join()
}
  • Same agent instance serves both internal and external callers.
  • Shared lifecycle with the host process — the agent dies when the host dies.
  • Backward compatible — your existing internal call sites don't change.

Best for: existing services that want to also expose agent capabilities to Claude Code / Cursor / other MCP-aware tools, sidecars, internal platforms wanting to standardize on MCP for cross-team consumption.


3. Autonomous mode — agent ejected into its own process

The agent IS its own deployable unit. One line of main is all you write — McpRunner does parse + start + shutdown hook + block:

val coder = agent<Spec, CodeBundle>("coder") {
    model { ollama("gpt-oss:120b-cloud") }
    skills {
        skill<Spec, CodeBundle>("write-code", "Generate Kotlin from a spec") {
            tools(/* ... */)
        }
    }
}

fun main(args: Array<String>) = exitProcess(McpRunner.serve(coder, args) {
    port = 8080
    expose("write-code")
})

./gradlew shadowJar produces a fat JAR. java -jar coder.jar --port 9000 --expose write-code runs it. Wrap that in a Dockerfile (one FROM eclipse-temurin:21-jre line) and you have a container. Or --no-jre via GraalVM native image (Phase 2) for a single binary.

  • Independent deploy/scale/restart. The agent is a service.
  • Language-neutral consumers. Any MCP client speaks to it — Python LLM frameworks, IDEs, JS tools, other agents.
  • Operational footprint. It's a process you run, monitor, version, and roll back.

Best for: production agent fleets, agent-as-a-service offerings, multi-tenant deployments, anywhere the deploy boundary should match the agent boundary.


Tradeoffs

Concern Library Hosted Autonomous
Per-call overhead None (direct call) None internally; HTTP for external HTTP / serialization always
Compile-time type safety ✅ everywhere ✅ internal callers; schema-validated externally Schema-validated only
Deploy unit Host app Host app The agent itself
Independent scaling Tied to host Tied to host Yes
Failure isolation None (in-process) None (in-process) Yes (separate process)
Cross-language consumers Kotlin / JVM only Any MCP client Any MCP client
Operational cost Zero (rides host) Zero (rides host) Process to run/monitor

Beyond a single agent: the Swarm topology. When you want multiple agents to live in the same JVM but each carry its own Agent<IN, OUT> surface (prompt, knowledge, memory, hooks), ship them as separate JARs and let a captain ServiceLoader-discover and absorb() the rest. Each sibling becomes a tool the captain can call; personality is preserved end-to-end. See Swarm for the mechanism.


The progression

You don't have to pick once. The three modes are a path, not a partition:

  1. Start library. Wire the agent into the code that needs it. Iterate fast — no infra to think about, type errors caught at ./gradlew compileKotlin.
  2. Add hosted when external callers appear. Someone wants to call your agent from Claude Code, or another team wants programmatic access. One McpServer.from(agent).start() and your existing internal call sites are unchanged.
  3. Eject to autonomous when the deploy unit needs to be the agent itself. Independent scaling, separate ops budget, language-neutral fleet. One main line, one gradle shadowJar, done.

The agent definition itself is the same Kotlin code in all three modes. Only the wiring around it changes.


See also: MCP Integration | Architecture Overview | Roadmap

Clone this wiki locally