Skip to content

Add inference-engine export command with sagemaker, bentoml, ray, and docker targets #27

@AK11105

Description

@AK11105

What problem does this solve?

After packaging a model (#26 ), there is no way to export it to a specific deployment platform. Users targeting SageMaker, BentoML, Ray Serve, or standalone Docker must manually write platform-specific boilerplate — inference.py, service.py, deployment.py, serve.py — each with different calling conventions.

Proposed solution

New command: inference-engine export models/sentiment/v1/ --target sagemaker|bentoml|ray|docker

Reads deployment.json for metadata. Fails with a clear message if deployment.json is absent (run inference-engine package first).

sagemaker

export/sentiment-v1-sagemaker/
├── model.tar.gz    (artifact + inference.py, tarred for S3 upload)
└── inference.py    (model_fn loads via definition.py; predict_fn calls pipeline.run())

bentoml

export/sentiment-v1-bentoml/
├── service.py      (BentoML Service wrapping InferencePipeline.run())
├── bentofile.yaml
└── requirements.txt

ray

export/sentiment-v1-ray/
├── deployment.py   (Ray Serve Deployment; handle() calls pipeline.run())
└── requirements.txt

docker (standalone, no platform SDK)

export/sentiment-v1-docker/
├── Dockerfile
├── requirements.txt
└── serve.py        (~30-line FastAPI app: POST /predict → pipeline.run())

All exporters implement a common BaseExporter ABC. The LLM is only used when the target format requires a calling convention that differs from pipeline.run(x) — in practice rare, as all templates call pipeline.run() directly.

Alternatives considered

Replicate export — deferred. Replicate's format changes frequently and their target audience doesn't align with the platform's production focus.

Area

CLI (deploy / fix)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions