DT-3187: Java MCP Proof of Concept#2897
Conversation
| Map<String, String> headers = getCache().getIfPresent(bearerToken); | ||
| if (headers == null) { | ||
| throw new NotAuthorizedException( | ||
| "Token not recognized — ensure the /mcp path has AuthType oauth2 configured in Apache"); |
There was a problem hiding this comment.
I'm not sure we want to return the internals to the user. A better error might be "Your browser sent something we could't understand" as the exception to the user, but maybe log that the condition was reached and decode the bearerToken that was provided so we know the user having the issue?
There was a problem hiding this comment.
Agreed, this is not intended to be production ready yet. I fully intend to throw this code away if we do go down this path and instead implement this in much smaller phases.
| * not-found exceptions propagate their messages to the caller. | ||
| */ | ||
| @Singleton | ||
| public class ConsentMcpToolProvider implements ConsentLogger { |
There was a problem hiding this comment.
@fboulnois - didn't you front our services with something? I''m wondering if we should maybe do the same thing here. The code below seems like it's going to be:
- Duplicative of existing RESTFul endpoints and OpenAPI/Swagger documentation
- Expensive to maintain
I'm wondering if there's a code equivalent to what @fboulnois did in the demo branch, but written in Java. Do you think it makes sense to explore something like https://github.com/JavaAIDev/openapi-mcp-server (maybe not this exact project, but others like it) to see if we can get something that can ride on top of the OpenAPI documentation we've already built?
There was a problem hiding this comment.
That is also my understanding ... i.e. that there was something that treated the openapi docs as the full suite of tools available. AFAIK, that will always be an option since anyone can read that and run their own server shim.
There was a problem hiding this comment.
I have a new annotation-based approach for this and we can review in mobbing.
|



Addresses
https://broadworkbench.atlassian.net/browse/DT-3187
Phase 1: MCP Server — Dataset Search Proof of Concept
Summary
Adds a stateless Model Context Protocol (MCP) server endpoint to
the Consent application, exposing DUOS dataset and study data to AI clients (Claude, Cursor, etc.)
that speak the MCP protocol.
This is a proof-of-concept phase. A single tool —
dataset_search— is implemented by placing a@McpToolannotation directly onDatasetResource.findAllDatasetStudySummaries. A new scanner(
McpToolScanner) discovers annotated methods at startup and auto-generates both the MCP inputschema (from JAX-RS
@PathParam/@QueryParamannotations) and the invocation handler (authresolution → argument binding → reflective call → structured result). The core service and
authentication infrastructure is unchanged; no existing REST endpoints or database schema are
modified.
Risks
/mcppath and technology will require AppSec approval and a pen-testing plan.What changed
New dependency
io.modelcontextprotocol.sdk:mcp(0.14.1, stateless HTTP transport) is added topom.xml.mcp-json-jackson2is excluded because its bundledJacksonJsonSchemaValidatorSupplierServiceLoader registration requires
com.networknt:json-schema-validator 1.5.7, which conflictswith the 3.0.2 version already pinned by this project. A thin custom shim
(
ConsentMcpJsonMapper+ConsentJsonSchemaValidator) is registered viaMETA-INF/servicesinstead, satisfying the SDK's ServiceLoader contracts without the conflicting transitive dependency.
New endpoint
POST /mcp— stateless MCP transport servlet. Each MCP protocol message (initialize, tool call,etc.) is a self-contained HTTP POST; the server responds in the HTTP response body. There is no
SSE streaming.
New files (
src/main/java/.../mcp/)McpToolname,description,outputType,params).McpToolParam@McpTool#params) that overrides or supplements auto-generated schema properties with a custom name, JSON type, description, and required flag.McpToolScanner@McpTool-annotated methods at startup. For each method it builds aSyncToolSpecificationcontaining an auto-derived input schema (required@PathParamproperties, optional@QueryParamproperties,@McpToolParamoverrides/additions) and a reflective handler (resolvesDuosUser, coerces MCP args to Java types, invokes the method, checks HTTP status, returnsstructuredContent).McpClaimsFilter/mcp— mirrorsRequestHeaderCacheFilterfor Jersey. ReadsOAUTH2_CLAIM_*headers set by Apachemod_oauth2and populatesClaimsCache, keyed by Bearer token.McpAuthHelperMcpTransportContext, resolves anAuthUserviaClaimsCache, fetches the fully-populatedUser(with roles), and returns aDuosUser— the same type resource methods receive via@Auth.McpToolResultsCallToolResultvalues. Serialises domain objects via Gson (consistent with the REST layer), then re-parses the JSON as a generic Java structure placed instructuredContent. All three result paths (success, plain-text, error) usestructuredContentso the SDK accepts them even when the tool declares anoutputSchema.ConsentMcpToolProviderMcpToolScanner.scan(datasetResource). Reduced from ~140 lines to ~50 lines compared to the hardcoded handler approach.ConsentMcpManagedManagedwrapper — callsserver.closeGracefully()on shutdown.ConsentMcpJsonMapperMcpJsonMapperbacked by JacksonObjectMapper.ConsentMcpJsonMapperSupplier/ConsentJsonSchemaValidator/ConsentJsonSchemaValidatorSupplierModified files
DatasetResource—findAllDatasetStudySummariesis annotated with@McpTooland given a@QueryParam("query")parameter. Query filtering (case-insensitive substring match on datasetname and study name) now runs inside the resource method, improving both the MCP tool and the
underlying REST endpoint simultaneously.
ConsentModule— Guice bindings forHttpServletStatelessServerTransport,McpStatelessSyncServer, andConsentMcpToolProvider. The transport'scontextExtractorcaptures the raw Bearer token from each HTTP request and stores it in
McpTransportContextunder key
"bearer", solving the cross-thread auth propagation problem (the SDK may dispatchtool handlers on a different thread from the original HTTP request).
ConsentApplication— registersMcpClaimsFilteron/mcp, mounts the transport servlet,and wraps the server in
ConsentMcpManaged.AuthorizationHelper— addspublic resolveAuthUser(String bearer)soMcpAuthHelpercanlook up a cached
AuthUserwithout duplicating the resolution logic.datasetV3.yaml— documents the new optionalqueryquery parameter on the GET operation.summaryordescriptionfields:approveCloseoutBySigningOfficial.yaml,dacRuleToggle.yaml,userCreate.yaml,cleanupEmptyCertificationAndAlternativeSharingFiles.yaml,libraryCardHistoryByUserId.yaml.Annotation-driven tool registration
Adding a new MCP tool now requires only a single annotation on the target resource method:
McpToolScannerhandles the rest:@PathParam→ required string/integer property;@QueryParam→ optionalproperty;
@Auth→ not exposed.@McpToolParamentries override descriptions or addMCP-only fields with no JAX-RS counterpart.
DuosUserfrom the transport context, coerces MCP arguments to the Javatypes declared on the method, invokes via reflection, checks the HTTP response status (≥ 400 →
McpToolResults.error()), and serialises the entity withMcpToolResults.of().Tool:
dataset_searchInput (all fields optional)
queryOutput — array of
DatasetStudySummaryobjects placed inresult.structuredContent:[ { "dataset_id": 415, "dataset_name": "1000 Genomes", "dataset_identifier": "DUOS-000415", "study_name": "1000 Genomes Project", "public": true } ]Authorization is enforced identically to the REST layer:
DatasetService.findAllDatasetStudySummaries(caller)is called with the resolvedUser, so callers only see datasets they are permitted to access. Non-2xx responses from the resource method (e.g. 403 Forbidden) are surfaced as MCP error results with a human-readable message.Interaction diagram
sequenceDiagram actor Client as MCP Client<br/>(Claude / Cursor) participant Apache as Apache httpd<br/>mod_oauth2 participant Filter as McpClaimsFilter participant Transport as HttpServletStateless<br/>ServerTransport participant Scanner as McpToolScanner participant Auth as McpAuthHelper participant Cache as ClaimsCache participant Resource as DatasetResource<br/>findAllDatasetStudySummaries participant Svc as DatasetService Note over Client,Apache: Step 1 — initialize (required once per logical session) Client->>Apache: POST /mcp {"method":"initialize"} Apache->>Apache: Validate Bearer token,<br/>set OAUTH2_CLAIM_* headers Apache->>Filter: forward request + headers Filter->>Cache: loadCache(bearer, OAUTH2_CLAIM_* headers) Filter->>Transport: chain.doFilter() Transport-->>Client: HTTP 200 {"result":{"protocolVersion":"2025-03-26","capabilities":{"tools":{}},...}} Note over Client,Apache: Step 2 — notifications/initialized (required) Client->>Apache: POST /mcp {"method":"notifications/initialized"} Apache->>Filter: forward Filter->>Transport: chain.doFilter() Transport-->>Client: HTTP 200 (empty body — notification ack) Note over Client,Apache: Step 3 — tools/call dataset_search Client->>Apache: POST /mcp {"method":"tools/call","params":{"name":"dataset_search","arguments":{"query":"ANVIL"}}} Apache->>Apache: Validate Bearer token,<br/>set OAUTH2_CLAIM_* headers Apache->>Filter: forward request + headers Filter->>Cache: loadCache(bearer, headers) Filter->>Transport: chain.doFilter() Transport->>Transport: contextExtractor captures bearer<br/>→ McpTransportContext{"bearer":"<token>"} Transport->>Scanner: invoke(context, request) [auto-generated handler] Scanner->>Auth: resolveDuosUser(context, authHelper, userService) Auth->>Cache: resolveAuthUser(bearer) Cache-->>Auth: AuthUser{email} Auth-->>Scanner: DuosUser{authUser, user{roles,...}} Scanner->>Scanner: coerce MCP args → Java types Scanner->>Resource: findAllDatasetStudySummaries(duosUser, "ANVIL") Resource->>Svc: findAllDatasetStudySummaries(user) Svc-->>Resource: List<DatasetStudySummary> Resource->>Resource: filter by query "ANVIL" Resource-->>Scanner: Response 200 [List<DatasetStudySummary>] Scanner->>Scanner: status < 400 → McpToolResults.of(entity) Scanner-->>Transport: CallToolResult{structuredContent: [...]} Transport-->>Client: HTTP 200 {"result":{"structuredContent":[{"dataset_id":415,...}]}}Testing
The endpoint can be exercised manually with three sequential curl calls (Bearer token via
gcloud auth print-access-token):What this is not
?query=parameter is a backwards-compatible addition).Next phases (not in this PR)
@McpToolannotation.tools/listintegration so clients can discover available tools dynamically.Have you read CONTRIBUTING.md lately? If not, do that first.