Skip to content

DT-3187: Java MCP Proof of Concept#2897

Draft
rushtong wants to merge 9 commits into
developfrom
gr-mcp-java-poc
Draft

DT-3187: Java MCP Proof of Concept#2897
rushtong wants to merge 9 commits into
developfrom
gr-mcp-java-poc

Conversation

@rushtong
Copy link
Copy Markdown
Contributor

@rushtong rushtong commented May 6, 2026

Addresses

https://broadworkbench.atlassian.net/browse/DT-3187

Phase 1: MCP Server — Dataset Search Proof of Concept

Summary

Adds a stateless Model Context Protocol (MCP) server endpoint to
the Consent application, exposing DUOS dataset and study data to AI clients (Claude, Cursor, etc.)
that speak the MCP protocol.

This is a proof-of-concept phase. A single tool — dataset_search — is implemented by placing a
@McpTool annotation directly on DatasetResource.findAllDatasetStudySummaries. A new scanner
(McpToolScanner) discovers annotated methods at startup and auto-generates both the MCP input
schema (from JAX-RS @PathParam / @QueryParam annotations) and the invocation handler (auth
resolution → argument binding → reflective call → structured result). The core service and
authentication infrastructure is unchanged; no existing REST endpoints or database schema are
modified.

Risks

  • The MCP pom library has two known security vulnerabilities:
  • The new /mcp path and technology will require AppSec approval and a pen-testing plan.

What changed

New dependency

io.modelcontextprotocol.sdk:mcp (0.14.1, stateless HTTP transport) is added to pom.xml.
mcp-json-jackson2 is excluded because its bundled JacksonJsonSchemaValidatorSupplier
ServiceLoader registration requires com.networknt:json-schema-validator 1.5.7, which conflicts
with the 3.0.2 version already pinned by this project. A thin custom shim
(ConsentMcpJsonMapper + ConsentJsonSchemaValidator) is registered via META-INF/services
instead, satisfying the SDK's ServiceLoader contracts without the conflicting transitive dependency.

New endpoint

POST /mcp — stateless MCP transport servlet. Each MCP protocol message (initialize, tool call,
etc.) is a self-contained HTTP POST; the server responds in the HTTP response body. There is no
SSE streaming.

New files (src/main/java/.../mcp/)

File Purpose
McpTool Method-level annotation that marks a JAX-RS resource method as an MCP tool (name, description, outputType, params).
McpToolParam Nested annotation (used inside @McpTool#params) that overrides or supplements auto-generated schema properties with a custom name, JSON type, description, and required flag.
McpToolScanner Scans resource instances for @McpTool-annotated methods at startup. For each method it builds a SyncToolSpecification containing an auto-derived input schema (required @PathParam properties, optional @QueryParam properties, @McpToolParam overrides/additions) and a reflective handler (resolves DuosUser, coerces MCP args to Java types, invokes the method, checks HTTP status, returns structuredContent).
McpClaimsFilter Servlet filter on /mcp — mirrors RequestHeaderCacheFilter for Jersey. Reads OAUTH2_CLAIM_* headers set by Apache mod_oauth2 and populates ClaimsCache, keyed by Bearer token.
McpAuthHelper Reads the Bearer token from McpTransportContext, resolves an AuthUser via ClaimsCache, fetches the fully-populated User (with roles), and returns a DuosUser — the same type resource methods receive via @Auth.
McpToolResults Factory for CallToolResult values. Serialises domain objects via Gson (consistent with the REST layer), then re-parses the JSON as a generic Java structure placed in structuredContent. All three result paths (success, plain-text, error) use structuredContent so the SDK accepts them even when the tool declares an outputSchema.
ConsentMcpToolProvider Assembles all tool specifications by calling McpToolScanner.scan(datasetResource). Reduced from ~140 lines to ~50 lines compared to the hardcoded handler approach.
ConsentMcpManaged Dropwizard Managed wrapper — calls server.closeGracefully() on shutdown.
ConsentMcpJsonMapper Custom McpJsonMapper backed by Jackson ObjectMapper.
ConsentMcpJsonMapperSupplier / ConsentJsonSchemaValidator / ConsentJsonSchemaValidatorSupplier ServiceLoader shims to avoid the networknt version conflict.

Modified files

  • DatasetResourcefindAllDatasetStudySummaries is annotated with @McpTool and given a
    @QueryParam("query") parameter. Query filtering (case-insensitive substring match on dataset
    name and study name) now runs inside the resource method, improving both the MCP tool and the
    underlying REST endpoint simultaneously.
  • ConsentModule — Guice bindings for HttpServletStatelessServerTransport,
    McpStatelessSyncServer, and ConsentMcpToolProvider. The transport's contextExtractor
    captures the raw Bearer token from each HTTP request and stores it in McpTransportContext
    under key "bearer", solving the cross-thread auth propagation problem (the SDK may dispatch
    tool handlers on a different thread from the original HTTP request).
  • ConsentApplication — registers McpClaimsFilter on /mcp, mounts the transport servlet,
    and wraps the server in ConsentMcpManaged.
  • AuthorizationHelper — adds public resolveAuthUser(String bearer) so McpAuthHelper can
    look up a cached AuthUser without duplicating the resolution logic.
  • datasetV3.yaml — documents the new optional query query parameter on the GET operation.
  • 5 OpenAPI path YAML files — fills previously missing summary or description fields:
    approveCloseoutBySigningOfficial.yaml, dacRuleToggle.yaml, userCreate.yaml,
    cleanupEmptyCertificationAndAlternativeSharingFiles.yaml, libraryCardHistoryByUserId.yaml.

Annotation-driven tool registration

Adding a new MCP tool now requires only a single annotation on the target resource method:

@McpTool(
    name = "dataset_search",
    description = "Search DUOS datasets and studies visible to the caller. ...",
    outputType = "array",
    params = {
      @McpToolParam(
          name = "query",
          type = "string",
          description = "Case-insensitive text matched against dataset name and study name. ...",
          required = false)
    })
@GET @Produces("application/json") @PermitAll @Path("/v3")
public Response findAllDatasetStudySummaries(
    @Auth DuosUser duosUser, @QueryParam("query") String query) { ... }

McpToolScanner handles the rest:

  • Input schema@PathParam → required string/integer property; @QueryParam → optional
    property; @Auth → not exposed. @McpToolParam entries override descriptions or add
    MCP-only fields with no JAX-RS counterpart.
  • Handler — resolves DuosUser from the transport context, coerces MCP arguments to the Java
    types declared on the method, invokes via reflection, checks the HTTP response status (≥ 400 →
    McpToolResults.error()), and serialises the entity with McpToolResults.of().

Tool: dataset_search

Input (all fields optional)

Field Type Description
query string Case-insensitive substring matched against dataset name and study name. Omit to return all datasets visible to the caller.

Output — array of DatasetStudySummary objects placed in result.structuredContent:

[
  {
    "dataset_id": 415,
    "dataset_name": "1000 Genomes",
    "dataset_identifier": "DUOS-000415",
    "study_name": "1000 Genomes Project",
    "public": true
  }
]

Authorization is enforced identically to the REST layer: DatasetService.findAllDatasetStudySummaries(caller) is called with the resolved User, so callers only see datasets they are permitted to access. Non-2xx responses from the resource method (e.g. 403 Forbidden) are surfaced as MCP error results with a human-readable message.


Interaction diagram

sequenceDiagram
    actor Client as MCP Client<br/>(Claude / Cursor)
    participant Apache as Apache httpd<br/>mod_oauth2
    participant Filter as McpClaimsFilter
    participant Transport as HttpServletStateless<br/>ServerTransport
    participant Scanner as McpToolScanner
    participant Auth as McpAuthHelper
    participant Cache as ClaimsCache
    participant Resource as DatasetResource<br/>findAllDatasetStudySummaries
    participant Svc as DatasetService

    Note over Client,Apache: Step 1 — initialize (required once per logical session)
    Client->>Apache: POST /mcp  {"method":"initialize"}
    Apache->>Apache: Validate Bearer token,<br/>set OAUTH2_CLAIM_* headers
    Apache->>Filter: forward request + headers
    Filter->>Cache: loadCache(bearer, OAUTH2_CLAIM_* headers)
    Filter->>Transport: chain.doFilter()
    Transport-->>Client: HTTP 200  {"result":{"protocolVersion":"2025-03-26","capabilities":{"tools":{}},...}}

    Note over Client,Apache: Step 2 — notifications/initialized (required)
    Client->>Apache: POST /mcp  {"method":"notifications/initialized"}
    Apache->>Filter: forward
    Filter->>Transport: chain.doFilter()
    Transport-->>Client: HTTP 200  (empty body — notification ack)

    Note over Client,Apache: Step 3 — tools/call  dataset_search
    Client->>Apache: POST /mcp  {"method":"tools/call","params":{"name":"dataset_search","arguments":{"query":"ANVIL"}}}
    Apache->>Apache: Validate Bearer token,<br/>set OAUTH2_CLAIM_* headers
    Apache->>Filter: forward request + headers
    Filter->>Cache: loadCache(bearer, headers)
    Filter->>Transport: chain.doFilter()
    Transport->>Transport: contextExtractor captures bearer<br/>→ McpTransportContext{"bearer":"<token>"}
    Transport->>Scanner: invoke(context, request) [auto-generated handler]
    Scanner->>Auth: resolveDuosUser(context, authHelper, userService)
    Auth->>Cache: resolveAuthUser(bearer)
    Cache-->>Auth: AuthUser{email}
    Auth-->>Scanner: DuosUser{authUser, user{roles,...}}
    Scanner->>Scanner: coerce MCP args → Java types
    Scanner->>Resource: findAllDatasetStudySummaries(duosUser, "ANVIL")
    Resource->>Svc: findAllDatasetStudySummaries(user)
    Svc-->>Resource: List<DatasetStudySummary>
    Resource->>Resource: filter by query "ANVIL"
    Resource-->>Scanner: Response 200 [List<DatasetStudySummary>]
    Scanner->>Scanner: status < 400 → McpToolResults.of(entity)
    Scanner-->>Transport: CallToolResult{structuredContent: [...]}
    Transport-->>Client: HTTP 200  {"result":{"structuredContent":[{"dataset_id":415,...}]}}
Loading

Testing

The endpoint can be exercised manually with three sequential curl calls (Bearer token via gcloud auth print-access-token):

TOKEN=$(gcloud auth print-access-token)
BASE=https://local.dsde-dev.broadinstitute.org:27443

# 1. initialize
curl -s -X POST $BASE/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2025-03-26","capabilities":{},"clientInfo":{"name":"curl","version":"0"}}}' | jq .

# 2. notifications/initialized
curl -s -X POST $BASE/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"notifications/initialized"}'

# 3. dataset_search
curl -s -X POST $BASE/mcp \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":2,"method":"tools/call","params":{"name":"dataset_search","arguments":{"query":"ANVIL"}}}' | jq '.result.structuredContent'

#4. dataset_approved_users
curl -s -X POST "https://local.dsde-dev.broadinstitute.org:27443/mcp" \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -d '{"jsonrpc":"2.0","id":3,"method":"tools/call","params":{"name":"dataset_approved_users","arguments":{"identifier":"DUOS-000123"}}}' | jq 

What this is not

  • No new roles, permissions, or database tables.
  • No changes to existing REST endpoint behaviour (the ?query= parameter is a backwards-compatible addition).
  • No SSE / streaming transport (stateless HTTP only).
  • Tool output is read-only; no mutations.

Next phases (not in this PR)

  • Additional tools across other resource classes — each one is now a single @McpTool annotation.
  • tools/list integration so clients can discover available tools dynamically.
  • Integration tests via the existing synthetic test framework.

Have you read CONTRIBUTING.md lately? If not, do that first.

  • Label PR with a Jira ticket number and include a link to the ticket
  • Label PR with a security risk modifier [no, low, medium, high]
  • PR describes scope of changes
  • Get a minimum of one thumbs worth of review, preferably two if enough team members are available
  • Get PO sign-off for all non-trivial UI or workflow changes
  • Verify all tests go green
  • Test this change deployed correctly and works on dev environment after deployment

Map<String, String> headers = getCache().getIfPresent(bearerToken);
if (headers == null) {
throw new NotAuthorizedException(
"Token not recognized — ensure the /mcp path has AuthType oauth2 configured in Apache");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to return the internals to the user. A better error might be "Your browser sent something we could't understand" as the exception to the user, but maybe log that the condition was reached and decode the bearerToken that was provided so we know the user having the issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this is not intended to be production ready yet. I fully intend to throw this code away if we do go down this path and instead implement this in much smaller phases.

* not-found exceptions propagate their messages to the caller.
*/
@Singleton
public class ConsentMcpToolProvider implements ConsentLogger {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fboulnois - didn't you front our services with something? I''m wondering if we should maybe do the same thing here. The code below seems like it's going to be:

  1. Duplicative of existing RESTFul endpoints and OpenAPI/Swagger documentation
  2. Expensive to maintain

I'm wondering if there's a code equivalent to what @fboulnois did in the demo branch, but written in Java. Do you think it makes sense to explore something like https://github.com/JavaAIDev/openapi-mcp-server (maybe not this exact project, but others like it) to see if we can get something that can ride on top of the OpenAPI documentation we've already built?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also my understanding ... i.e. that there was something that treated the openapi docs as the full suite of tools available. AFAIK, that will always be an option since anyone can read that and run their own server shim.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a new annotation-based approach for this and we can review in mobbing.

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented May 6, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants