NVIDIA CUDA GPU Installation

The following installation methods are available when your environment meets these requirements:

GPU Driver >= 535
CUDA >= 12.3
CUDNN >= 9.5
Python >= 3.10
Linux X86_64

1. Pre-built Docker Installation (Recommended)

Notice: The pre-built image supports SM 80/86/89/90 architecture GPUs (e.g. A800/H800/L20/L40/4090).

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-cuda-12.6:2.5.0

2. Pre-built Pip Installation

First install paddlepaddle-gpu. For detailed instructions, refer to PaddlePaddle Installation

# Install stable release
# CUDA 12.6
python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
# CUDA 12.9
python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/

# Install latest Nightly build
# CUDA 12.6
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/
# CUDA 12.9
python -m pip install --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu129/

Then install fastdeploy. Do not install from PyPI. Use the following methods instead (supports SM80/86/89/90 GPU architectures).

Note: Stable FastDeploy release pairs with stable PaddlePaddle; Nightly Build FastDeploy pairs with Nightly Build PaddlePaddle. The --extra-index-url is only used for downloading fastdeploy-gpu's dependencies; fastdeploy-gpu itself must be installed from the Paddle source specified by -i.

# Install stable release FastDeploy
# CUDA 12.6
python -m pip install fastdeploy-gpu==2.5.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# CUDA 12.9
python -m pip install fastdeploy-gpu==2.5.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu129/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

# Install Nightly Build FastDeploy
# CUDA 12.6
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu126/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
# CUDA 12.9
python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu129/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

3. Build from Source Using Docker

Note: dockerfiles/Dockerfile.gpu by default supports SM 80/90 architectures. To support other architectures, modify bash build.sh 1 python false [80,90] in the Dockerfile. It's recommended to specify no more than 2 architectures.

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

docker build -f dockerfiles/Dockerfile.gpu -t fastdeploy:gpu .

4. Build Wheel from Source

First install paddlepaddle-gpu. For detailed instructions, refer to PaddlePaddle Installation

python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

# Argument 1: Whether to build wheel package (1 for yes, 0 for compile only)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators
# Argument 4: Target GPU architectures
bash build.sh 1 python false [80,90]

The built packages will be in the FastDeploy/dist directory.

5. Precompiled Operator Wheel Packages

FastDeploy provides precompiled GPU operator wheel packages for quick setup without building the entire source code. This method currently supports SM80/90 architecture (e.g., A100/H100) and CUDA 12.6 environments only.

By default, build.sh compiles all custom operators from source.To use the precompiled package, enable it with the FD_USE_PRECOMPILED parameter. If the precompiled package cannot be downloaded or does not match the current environment, the system will automatically fall back to 4. Build Wheel from Source.

First, install paddlepaddle-gpu. For detailed instructions, please refer to the PaddlePaddle Installation Guide.

python -m pip install paddlepaddle-gpu==3.3.1 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

Then, clone the FastDeploy repository and build using the precompiled operator wheels:

git clone https://github.com/PaddlePaddle/FastDeploy
cd FastDeploy

# Argument 1: Whether to build wheel package (1 for yes)
# Argument 2: Python interpreter path
# Argument 3: Whether to compile CPU inference operators (false for GPU only)
# Argument 4: Target GPU architectures (currently supports 80/90)
# Argument 5: Whether to use precompiled operators (1 for enable)
# Argument 6 (optional): Specific commitID for precompiled operators(The default is the current commit ID.)

# Use precompiled operators for accelerated build
bash build.sh 1 python false [90] 1

# Use precompiled wheel from a specific commit
bash build.sh 1 python false [90] 1 8a9e7b53af4a98583cab65e4b44e3265a93e56d2

The downloaded wheel packages will be stored in the FastDeploy/pre_wheel directory. After the build completes, the operator binaries can be found in FastDeploy/fastdeploy/model_executor/ops/gpu.

Notes:

This mode prioritizes downloading precompiled GPU operator wheels to reduce build time.

Currently supports GPU, SM80/90, CUDA 12.6 only.

For custom architectures or modified operator logic, please use source compilation (Section 4).

You can check whether the precompiled wheel for a specific commit has been successfully built on the FastDeploy CI Build Status Page.

Environment Verification

After installation, verify the environment with this Python code:

import paddle
from paddle.jit.marker import unified
# Verify GPU availability
paddle.utils.run_check()

If the above code executes successfully, the environment is ready.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA CUDA GPU Installation

1. Pre-built Docker Installation (Recommended)

2. Pre-built Pip Installation

3. Build from Source Using Docker

4. Build Wheel from Source

5. Precompiled Operator Wheel Packages

Environment Verification

FilesExpand file tree

nvidia_gpu.md

Latest commit

History

nvidia_gpu.md

File metadata and controls

NVIDIA CUDA GPU Installation

1. Pre-built Docker Installation (Recommended)

2. Pre-built Pip Installation

3. Build from Source Using Docker

4. Build Wheel from Source

5. Precompiled Operator Wheel Packages

Environment Verification