Skip to content

Add inference-compiler extension#2186

Open
PawelPeczek-Roboflow wants to merge 21 commits intomainfrom
feature/inference-compiler-tool
Open

Add inference-compiler extension#2186
PawelPeczek-Roboflow wants to merge 21 commits intomainfrom
feature/inference-compiler-tool

Conversation

@PawelPeczek-Roboflow
Copy link
Copy Markdown
Collaborator

@PawelPeczek-Roboflow PawelPeczek-Roboflow commented Mar 31, 2026

What does this PR do?

Adding CLI to compile TRT packages in two modes:

  • using inference-models package installed in env (if found)
  • using inference server container and inference-cli command running inside, when client calls the command w/o environment

CLI command:

inference enterprise inference-compiler compile-model --model-id yolov8n-640 --api-key XXX

Usage:

 Usage: python -m inference_cli.main enterprise inference-compiler compile-model
            [OPTIONS]

╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *  --model-id                -m                                 TEXT                     Model ID in format project/version. [required]                                                                                     │
│    --api-key                 -a                                 TEXT                     Roboflow API key for your workspace. If not given - env variable `ROBOFLOW_API_KEY` will be used                                   │
│    --debug-mode                  --no-debug-mode                                         Flag enabling errors stack traces to be displayed (helpful for debugging) [default: no-debug-mode]                                 │
│    --trt-forward-compatible      --no-trt-forward-compatible                             Flag to decide if forward-compatibility mode in TRT compilation should be enabled [default: no-trt-forward-compatible]             │
│    --trt-same-cc-compatible      --no-trt-same-cc-compatible                             Flag to decide if engine should be compiled to be compatible with devices sharing the same CUDA CC to the one running compilation  │
│                                                                                          procedure                                                                                                                          │
│                                                                                          [default: no-trt-same-cc-compatible]                                                                                               │
│    --compilation-mode                                           [auto|container|python]  Selection of compilation mode - `container` runs the procedure inside `inference` server, `python` runs in-process. `auto`         │
│                                                                                          (default) inspect environment dependencies to verify if the procedure can be run in-process, if not - offloading to the server.    │
│                                                                                          [default: auto]                                                                                                                    │
│    --image                                                      TEXT                     Point specific docker image you would like to run with command (useful for development of custom builds of inference server)       │
│    --use-local-images            --not-use-local-images                                  Flag to allow using local images (if set False image is always attempted to be pulled) [default: not-use-local-images]             │
│    --env-file-path                                              TEXT                     Path to key-value .env file to inject into compilation container (if you run in Python package, just export variables to env)      │
│    --help                                                                                Show this message and exit.                                                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Type of Change

  • New feature (non-breaking change that adds functionality)

Testing

  • e2e testing, complex behaviour hard to test w/o specifically created environment and introspection of the platform state
  • I have tested this change locally
  • I have added/updated tests for this change

Test details:

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code where necessary, particularly in hard-to-understand areas
  • My changes generate no new warnings or errors
  • I have updated the documentation accordingly (if applicable)

Additional Context

@PawelPeczek-Roboflow PawelPeczek-Roboflow marked this pull request as ready for review April 13, 2026 20:52
@roboflow roboflow deleted a comment from cursor bot Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants