What problem does this solve?
After inference-engine deploy writes a new model definition or updates routing config, the running server must be restarted to pick up the changes. In production this means downtime or a rolling restart for a change that should be instantaneous.
Proposed solution
New admin endpoint: POST /admin/reload (required scope: admin)
Behaviour:
- Calls
registry.clear_cache() — evicts all loaded pipelines from the LRU cache
- Re-runs
registry.warm_up() — reloads all registered definitions from disk
- Re-reads routing config via
importlib.reload — picks up any routing changes written by inference-engine deploy
- Returns:
{
"reloaded_models": ["sentiment:v1", "classifier:v2"],
"routing_updated": true,
"duration_ms": 142
}
ModelRegistry.clear_cache() evicts all entries from self._cache and resets LRU order. warm_up() is already idempotent.
RoutingService needs a new update_routes() method that replaces self.routes atomically after the config module is reloaded.
Alternatives considered
Server restart. Causes downtime and loses in-flight requests. Not acceptable for production use.
Area
Inference / prediction
What problem does this solve?
After
inference-engine deploywrites a new model definition or updates routing config, the running server must be restarted to pick up the changes. In production this means downtime or a rolling restart for a change that should be instantaneous.Proposed solution
New admin endpoint:
POST /admin/reload(required scope:admin)Behaviour:
registry.clear_cache()— evicts all loaded pipelines from the LRU cacheregistry.warm_up()— reloads all registered definitions from diskimportlib.reload— picks up any routing changes written byinference-engine deploy{ "reloaded_models": ["sentiment:v1", "classifier:v2"], "routing_updated": true, "duration_ms": 142 }ModelRegistry.clear_cache()evicts all entries fromself._cacheand resets LRU order.warm_up()is already idempotent.RoutingServiceneeds a newupdate_routes()method that replacesself.routesatomically after the config module is reloaded.Alternatives considered
Server restart. Causes downtime and loses in-flight requests. Not acceptable for production use.
Area
Inference / prediction