Skip to content

Add GGML CUDA backend support across inference and native bridge#3

Open
zhongkaifu wants to merge 1 commit intomainfrom
codex/update-inferenceengine-for-ggml_cuda-support
Open

Add GGML CUDA backend support across inference and native bridge#3
zhongkaifu wants to merge 1 commit intomainfrom
codex/update-inferenceengine-for-ggml_cuda-support

Conversation

@zhongkaifu
Copy link
Copy Markdown
Owner

Motivation

  • Enable selection and use of the GGML CUDA backend end-to-end so models can run on CUDA-capable GPUs through the existing GGML bridge and inference stack.
  • Provide a compile-time toggle to build the native GGML bridge with CUDA to avoid changing default builds when CUDA is not available.

Description

  • Added BackendType.GgmlCuda and wired it to GgmlBackendType.Cuda so the inference layer can select CUDA-accelerated GGML execution. (files: InferenceEngine/ModelBase.cs, InferenceConsole/Program.cs, InferenceWeb/ModelService.cs, InferenceWeb/wwwroot/index.html).
  • Extended managed GGML bindings with GgmlBackendType.Cuda and improved error messaging to report ggml-cuda initialization failures. (file: TensorSharp.GGML/GgmlNative.cs).
  • Mapped the GGML allocator to report BlasEnum.CUDA for CUDA backends and generalized Float32-only error text (no longer Metal-specific). (files: TensorSharp.GGML/GgmlAllocator.cs, TensorSharp.GGML/GgmlStorage.cs, TensorSharp.GGML/GgmlBasicOps.cs).
  • Added conditional CUDA support in the native bridge with ggml-cuda.h include, BACKEND_TYPE_CUDA handling and runtime guard using ggml_backend_cuda_init(0), and added a -DTENSORSHARP_ENABLE_CUDA=ON CMake option that drives GGML_CUDA. (files: TensorSharp.GGML.Native/ggml_ops.cpp, TensorSharp.GGML.Native/CMakeLists.txt).

Testing

  • Ran dotnet build TensorSharp.slnx -c Release which failed because dotnet is not installed in this environment. (result: failed).
  • Ran CMake configure with CUDA enabled using cmake -S TensorSharp.GGML.Native -B /tmp/tensorsharp-ggml-native-build -DTENSORSHARP_ENABLE_CUDA=ON which failed early due to a missing Objective-C compiler component (cc1obj) in the environment before CUDA-specific checks could run. (result: failed).
  • No unit or integration test runs completed in this environment due to missing toolchain components, but runtime guards and compile-time gating are in place to prevent accidental CUDA usage when the native bridge is built without CUDA support.

Codex Task

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2baff2d5e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +305 to +306
set_last_error("ggml-cuda backend requested, but this native bridge was built without CUDA support.");
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reset backend state when CUDA support is unavailable

When ggml_cuda is requested on a build compiled without GGML_USE_CUDA, this branch returns after setting an error, but ensure_backend has already latched g_backend_type to CUDA and std::call_once prevents re-initialization. That leaves the process stuck so later attempts to load ggml_cpu/ggml_metal fail with “A different GGML backend was already initialized,” meaning a single failed CUDA attempt can permanently break backend selection until restart.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant