What problem does this solve?
There is no API to inspect what the engine knows about a deployed model at runtime — its framework, expected input shape, routing strategy, executor, or whether it's currently loaded in the cache. This information exists across multiple internal components but is not surfaced anywhere.
Proposed solution
New endpoint: GET /models/{name}/{version}/metadata
Response:
{
"name": "sentiment",
"version": "v1",
"framework": "sklearn",
"load_format": "joblib",
"device": "cpu",
"input_hint": "raw text string",
"output_hint": "integer class label",
"sample_input": "great movie",
"executor": "cpu",
"routing_strategy": "static",
"loaded": true,
"artifact_size_mb": 2.1
}
Data sources, in priority order:
deployment.json in the model directory (written by inference-engine package) — provides framework, load_format, device, input_hint, output_hint, sample_input, artifact_size_mb
ExecutionPolicy config — provides executor
RoutingService config — provides routing_strategy
- Registry cache state — provides
loaded
Fields absent from deployment.json are returned as null. The endpoint does not fail if deployment.json is absent — it returns what it can from the other sources.
Alternatives considered
Reading deployment.json directly from disk. This doesn't include runtime state (loaded, routing_strategy) and requires filesystem access from the client.
Area
Model loading / registry
What problem does this solve?
There is no API to inspect what the engine knows about a deployed model at runtime — its framework, expected input shape, routing strategy, executor, or whether it's currently loaded in the cache. This information exists across multiple internal components but is not surfaced anywhere.
Proposed solution
New endpoint:
GET /models/{name}/{version}/metadataResponse:
{ "name": "sentiment", "version": "v1", "framework": "sklearn", "load_format": "joblib", "device": "cpu", "input_hint": "raw text string", "output_hint": "integer class label", "sample_input": "great movie", "executor": "cpu", "routing_strategy": "static", "loaded": true, "artifact_size_mb": 2.1 }Data sources, in priority order:
deployment.jsonin the model directory (written byinference-engine package) — providesframework,load_format,device,input_hint,output_hint,sample_input,artifact_size_mbExecutionPolicyconfig — providesexecutorRoutingServiceconfig — providesrouting_strategyloadedFields absent from
deployment.jsonare returned asnull. The endpoint does not fail ifdeployment.jsonis absent — it returns what it can from the other sources.Alternatives considered
Reading
deployment.jsondirectly from disk. This doesn't include runtime state (loaded,routing_strategy) and requires filesystem access from the client.Area
Model loading / registry