Agent Deployment Modes

Same agent, three ways to ship it. Each mode is one line of glue away from the next — you can start small and eject the agent into autonomy when it earns its own deployment.

                             ┌─────────────────────────┐
   library mode              │  agent { skills { } }   │  in your JVM
                             └────────────┬────────────┘
                                          │  +1 line
                                          ▼
                             ┌─────────────────────────┐
   hosted mode               │  McpServer.from(agent)  │  in your JVM, but addressable
                             └────────────┬────────────┘
                                          │  +1 line
                                          ▼
                             ┌─────────────────────────┐
   autonomous mode           │  McpRunner.serve(...)   │  its own process / JAR / image
                             └─────────────────────────┘

Three modes at a glance

Mode	Glue	Where it runs	Who can call it
Library	`agent<IN, OUT>("...") { skills { ... } }`	Inside your existing JVM, in-process	Your Kotlin code, with full type safety
Hosted	+ `McpServer.from(agent) { expose("...") }.start()`	Inside your existing JVM, plus an MCP endpoint	Your Kotlin code (typed) AND any MCP client (over HTTP)
Autonomous	`fun main(args) = exitProcess(McpRunner.serve(agent, args))`	Its own process / JAR / Docker image / native binary	Any MCP client, anywhere on the network

1. Library mode — agent as a function

The default. An agent is just a Kotlin function with type-safe inputs and outputs:

val classifier = agent<String, Category>("classifier") {
    skills {
        skill<String, Category>("classify", "Classifies free-text input") {
            implementedBy { text -> Category.of(text) }
        }
    }
}

val result: Category = classifier("Order status request")

Zero overhead. No serialization, no wire, no protocol. Direct Kotlin call.
Type-safe at compile time. classifier(42) won't compile.
Deploy boundary is the host app. The agent ships when your app ships.

Best for: embedding agent logic in an existing Kotlin service, batch jobs, scripts, libraries you publish to Maven Central.

2. Hosted mode — same agent, plus an MCP endpoint

Add McpServer.from(agent).start() inside your service. The agent stays callable internally (typed, zero overhead), and external MCP clients can also reach it over HTTP:

fun main() {
    val classifier = agent<String, Category>("classifier") { /* same as above */ }

    // Existing internal callers — unchanged, still typed:
    val result: Category = classifier("Order status request")

    // New: the same agent, also addressable over MCP
    val server = McpServer.from(classifier) {
        port = 8080
        expose("classify")
    }.start()

    Runtime.getRuntime().addShutdownHook(Thread { server.stop() })
    Thread.currentThread().join()
}

Same agent instance serves both internal and external callers.
Shared lifecycle with the host process — the agent dies when the host dies.
Backward compatible — your existing internal call sites don't change.

Best for: existing services that want to also expose agent capabilities to Claude Code / Cursor / other MCP-aware tools, sidecars, internal platforms wanting to standardize on MCP for cross-team consumption.

3. Autonomous mode — agent ejected into its own process

The agent IS its own deployable unit. One line of main is all you write — McpRunner does parse + start + shutdown hook + block:

val coder = agent<Spec, CodeBundle>("coder") {
    model { ollama("gpt-oss:120b-cloud") }
    skills {
        skill<Spec, CodeBundle>("write-code", "Generate Kotlin from a spec") {
            tools(/* ... */)
        }
    }
}

fun main(args: Array<String>) = exitProcess(McpRunner.serve(coder, args) {
    port = 8080
    expose("write-code")
})

./gradlew shadowJar produces a fat JAR. java -jar coder.jar --port 9000 --expose write-code runs it. Wrap that in a Dockerfile (one FROM eclipse-temurin:21-jre line) and you have a container. Or --no-jre via GraalVM native image (Phase 2) for a single binary.

Independent deploy/scale/restart. The agent is a service.
Language-neutral consumers. Any MCP client speaks to it — Python LLM frameworks, IDEs, JS tools, other agents.
Operational footprint. It's a process you run, monitor, version, and roll back.

Best for: production agent fleets, agent-as-a-service offerings, multi-tenant deployments, anywhere the deploy boundary should match the agent boundary.

Tradeoffs

Concern	Library	Hosted	Autonomous
Per-call overhead	None (direct call)	None internally; HTTP for external	HTTP / serialization always
Compile-time type safety	✅ everywhere	✅ internal callers; schema-validated externally	Schema-validated only
Deploy unit	Host app	Host app	The agent itself
Independent scaling	Tied to host	Tied to host	Yes
Failure isolation	None (in-process)	None (in-process)	Yes (separate process)
Cross-language consumers	Kotlin / JVM only	Any MCP client	Any MCP client
Operational cost	Zero (rides host)	Zero (rides host)	Process to run/monitor

Beyond a single agent: the Swarm topology. When you want multiple agents to live in the same JVM but each carry its own Agent<IN, OUT> surface (prompt, knowledge, memory, hooks), ship them as separate JARs and let a captain ServiceLoader-discover and absorb() the rest. Each sibling becomes a tool the captain can call; personality is preserved end-to-end. See Swarm for the mechanism.

The progression

You don't have to pick once. The three modes are a path, not a partition:

Start library. Wire the agent into the code that needs it. Iterate fast — no infra to think about, type errors caught at ./gradlew compileKotlin.
Add hosted when external callers appear. Someone wants to call your agent from Claude Code, or another team wants programmatic access. One McpServer.from(agent).start() and your existing internal call sites are unchanged.
Eject to autonomous when the deploy unit needs to be the agent itself. Independent scaling, separate ops budget, language-neutral fleet. One main line, one gradle shadowJar, done.

The agent definition itself is the same Kotlin code in all three modes. Only the wiring around it changes.

See also: MCP Integration | Architecture Overview | Roadmap

Agents.KT Wiki

Getting Started

Core Concepts

Composition Operators

LLM Integration

Guided Generation

Agent Memory

MemoryBank

Reference

Contributing

Building From Source

Agent Deployment Modes

Agent Deployment Modes

Three modes at a glance

1. Library mode — agent as a function

2. Hosted mode — same agent, plus an MCP endpoint

3. Autonomous mode — agent ejected into its own process

Tradeoffs

The progression

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally