Git commit
every commit since 6990e2f (b8829) is failing
Operating systems
Windows
GGML backends
CPU, CUDA
Problem description & steps to reproduce
Description:
Since the library rename in commit 6990e2f (b8829), builds with Interprocedural Optimization (LTO) enabled fail on Windows. This affects both clang-cl (with lld-link) and standard MSVC.
Observations:
- The issue started with the rename of
libcommon to libllama-common.
- Disabling LTO (
-DGGML_LTO=OFF) fixes the build:
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=OFF ^
-DCMAKE_CXX_FLAGS="/EHsc /O2 /D_CRT_SECURE_NO_WARNINGS /wd4996 /D_AMD64_" ^
-DCMAKE_C_FLAGS="/EHsc /O2 /D_CRT_SECURE_NO_WARNINGS /wd4996 /D_AMD64_" ^
-DGGML_LTO=OFF ^
- It seems the symbol
common_debug_cb_eval is not properly exported or is being optimized away when the linker tries to merge the new llama-common library with the executables under LTO ?
Maybe i should provide a llama-bench comparison of builds with and without LTO enabled.
However this means to compile two sets of the last commit before b8829 to compare them.
But in my experience, LTO support is critical for local LLM users:
Many users, including myself, run large models (e.g., Qwen 35B) in hybrid mode where a significant portion of the layers reside in system RAM. In these scenarios, CPU inference performance and efficient KV-cache management are paramount.
My benchmarks with previous versions showed that an optimized build using Clang-cl and LTO provides a measurable performance boost (approx. 5% + in token generation and prompt processing) compared to standard builds. For self-compiling users aiming for the most efficient inference provider, losing LTO support is a significant regression in execution speed.
First Bad Commit
6990e2f (b8829)
Compile command (using LLVM / clang on Windows)
cmake -B build-vcpkg -G Ninja ^
-DCMAKE_TOOLCHAIN_FILE="%VCPKG_PATH%" ^
-DVCPKG_TARGET_TRIPLET=x64-windows ^
-DBUILD_SHARED_LIBS=ON ^
-DCMAKE_BUILD_TYPE=Release ^
-DLLAMA_BUILD_TESTS=OFF ^
-DLLAMA_TESTS_INSTALL=OFF ^
-DLLAMA_BUILD_EXAMPLES=OFF ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_CUDA=ON ^
-DGGML_NATIVE=ON ^
-DGGML_OPENMP=ON ^
-DCMAKE_CUDA_ARCHITECTURES=86 ^
-DGGML_CUDA_F16=ON ^
-DGGML_CUDA_GRAPHS=ON ^
-DGGML_CUDA_FORCE_CUBLAS=ON ^
-DGGML_CUDA_FA=ON ^
-DGGML_CUDA_FA_ALL_QUANTS=ON ^
-DGGML_CUDA_K_QUANTS=ON ^
-DGGML_CUDA_compression_mode=speed ^
-DCMAKE_C_COMPILER=clang-cl ^
-DCMAKE_CXX_COMPILER=clang-cl ^
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON ^
-DCMAKE_CXX_FLAGS="/EHsc /O2 /D_CRT_SECURE_NO_WARNINGS /wd4996 /D_AMD64_ -flto=thin" ^
-DCMAKE_C_FLAGS="/EHsc /O2 /D_CRT_SECURE_NO_WARNINGS /wd4996 /D_AMD64_ -flto=thin" ^
-DGGML_LTO=ON ^
-DCMAKE_CUDA_HOST_COMPILER="%MSVC_BIN%\cl.exe" ^
-DCMAKE_CUDA_FLAGS="-allow-unsupported-compiler" ^
-DGGML_OPENSSL=ON
cmake --build build-vcpkg --config Release -j
Relevant log output
The last few lines of the build output from the run using the llvm compiler:
fattn-vec-instance-q8_0-q5_1.cu
tmpxft_000068b4_00000000-10_fattn-vec-instance-q8_0-q5_1.cudafe1.cpp
[320/473] Building CUDA object ggml\src\ggml-cuda\CMakeFiles\g...cuda.dir\template-instances\fattn-vec-instance-q8_0-q5_0.cu.ob
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(585): warning #221-D: floating-point value does not fit in required floating-point type
return -((float)(1e+300)) ;
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(587): warning #221-D: floating-point value does not fit in required floating-point type
return make_half2(-((float)(1e+300)) , -((float)(1e+300)) );
^
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(587): warning #221-D: floating-point value does not fit in required floating-point type
return make_half2(-((float)(1e+300)) , -((float)(1e+300)) );
^
fattn-vec-instance-q8_0-q5_0.cu
tmpxft_0000d990_00000000-10_fattn-vec-instance-q8_0-q5_0.cudafe1.cpp
[347/473] Building CUDA object ggml\src\ggml-cuda\CMakeFiles\g...cuda.dir\template-instances\fattn-vec-instance-q8_0-q8_0.cu.ob
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(585): warning #221-D: floating-point value does not fit in required floating-point type
return -((float)(1e+300)) ;
^
Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(587): warning #221-D: floating-point value does not fit in required floating-point type
return make_half2(-((float)(1e+300)) , -((float)(1e+300)) );
^
G:\llama.cpp\ggml\src\ggml-cuda\common.cuh(587): warning #221-D: floating-point value does not fit in required floating-point type
return make_half2(-((float)(1e+300)) , -((float)(1e+300)) );
^
fattn-vec-instance-q8_0-q8_0.cu
tmpxft_00004dd8_00000000-10_fattn-vec-instance-q8_0-q8_0.cudafe1.cpp
[450/473] Linking CXX executable bin\llama-mtmd-debug.exe
FAILED: [code=4294967295] bin/llama-mtmd-debug.exe
C:\WINDOWS\system32\cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --msvc-ver=1950 --intdir=tools\mtmd\CMakeFiles\llama-mtmd-debug.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\rc.exe --mt=C:\msys64\ucrt64\bin\llvm-mt.exe --manifests -- C:\msys64\ucrt64\bin\lld-link.exe /nologo tools\mtmd\CMakeFiles\llama-mtmd-debug.dir\debug\mtmd-debug.cpp.obj /out:bin\llama-mtmd-debug.exe /implib:tools\mtmd\llama-mtmd-debug.lib /pdb:bin\llama-mtmd-debug.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console common\llama-common.lib tools\mtmd\mtmd.lib common\llama-common-base.lib src\llama.lib ggml\src\ggml.lib ggml\src\ggml-cpu.lib ggml\src\ggml-cuda\ggml-cuda.lib ggml\src\ggml-base.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && C:\WINDOWS\system32\cmd.exe /C "cd /D G:\llama.cpp\build-vcpkg\tools\mtmd && "C:\Program Files\PowerShell\7\pwsh.exe" -noprofile -executionpolicy Bypass -file C:/Users/Mathias/scoop/apps/vcpkg/current/scripts/buildsystems/msbuild/applocal.ps1 -targetBinary G:/llama.cpp/build-vcpkg/bin/llama-mtmd-debug.exe -installedDir C:/Users/Mathias/scoop/apps/vcpkg/current/installed/x64-windows-static/bin -OutVariable out""
LINK: command "C:\msys64\ucrt64\bin\lld-link.exe /nologo tools\mtmd\CMakeFiles\llama-mtmd-debug.dir\debug\mtmd-debug.cpp.obj /out:bin\llama-mtmd-debug.exe /implib:tools\mtmd\llama-mtmd-debug.lib /pdb:bin\llama-mtmd-debug.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console common\llama-common.lib tools\mtmd\mtmd.lib common\llama-common-base.lib src\llama.lib ggml\src\ggml.lib ggml\src\ggml-cpu.lib ggml\src\ggml-cuda\ggml-cuda.lib ggml\src\ggml-base.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST:EMBED,ID=1" failed (exit code 1) with the following output:
lld-link: error: undefined symbol: bool __cdecl common_debug_cb_eval<0>(struct ggml_tensor *, bool, void *)
>>> referenced by G:\llama.cpp\tools\mtmd\debug\mtmd-debug.cpp
>>> tools\mtmd\CMakeFiles\llama-mtmd-debug.dir\debug\mtmd-debug.cpp.obj
[451/473] Linking CXX executable bin\llama-mtmd-cli.exe
FAILED: [code=4294967295] bin/llama-mtmd-cli.exe
C:\WINDOWS\system32\cmd.exe /C "cd . && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_exe --msvc-ver=1950 --intdir=tools\mtmd\CMakeFiles\llama-mtmd-cli.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\rc.exe --mt=C:\msys64\ucrt64\bin\llvm-mt.exe --manifests -- C:\msys64\ucrt64\bin\lld-link.exe /nologo tools\mtmd\CMakeFiles\llama-mtmd-cli.dir\mtmd-cli.cpp.obj /out:bin\llama-mtmd-cli.exe /implib:tools\mtmd\llama-mtmd-cli.lib /pdb:bin\llama-mtmd-cli.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console common\llama-common.lib tools\mtmd\mtmd.lib common\llama-common-base.lib src\llama.lib ggml\src\ggml.lib ggml\src\ggml-cpu.lib ggml\src\ggml-cuda\ggml-cuda.lib ggml\src\ggml-base.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && C:\WINDOWS\system32\cmd.exe /C "cd /D G:\llama.cpp\build-vcpkg\tools\mtmd && "C:\Program Files\PowerShell\7\pwsh.exe" -noprofile -executionpolicy Bypass -file C:/Users/Mathias/scoop/apps/vcpkg/current/scripts/buildsystems/msbuild/applocal.ps1 -targetBinary G:/llama.cpp/build-vcpkg/bin/llama-mtmd-cli.exe -installedDir C:/Users/Mathias/scoop/apps/vcpkg/current/installed/x64-windows-static/bin -OutVariable out""
LINK: command "C:\msys64\ucrt64\bin\lld-link.exe /nologo tools\mtmd\CMakeFiles\llama-mtmd-cli.dir\mtmd-cli.cpp.obj /out:bin\llama-mtmd-cli.exe /implib:tools\mtmd\llama-mtmd-cli.lib /pdb:bin\llama-mtmd-cli.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console common\llama-common.lib tools\mtmd\mtmd.lib common\llama-common-base.lib src\llama.lib ggml\src\ggml.lib ggml\src\ggml-cpu.lib ggml\src\ggml-cuda\ggml-cuda.lib ggml\src\ggml-base.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib /MANIFEST:EMBED,ID=1" failed (exit code 1) with the following output:
lld-link: error: undefined symbol: bool __cdecl common_debug_cb_eval<0>(struct ggml_tensor *, bool, void *)
>>> referenced by G:\llama.cpp\tools\mtmd\mtmd-cli.cpp
>>> tools\mtmd\CMakeFiles\llama-mtmd-cli.dir\mtmd-cli.cpp.obj
[463/473] Generating bundle.js.hpp
ninja: build stopped: subcommand failed.
Compile command (using MSVC cl compiler)
cmake -B build-msvc -G Ninja ^
-DCMAKE_TOOLCHAIN_FILE="%VCPKG_PATH%" ^
-DVCPKG_TARGET_TRIPLET=x64-windows-static ^
-DCMAKE_BUILD_TYPE=Release ^
-DLLAMA_BUILD_TESTS=OFF ^
-DLLAMA_TESTS_INSTALL=OFF ^
-DLLAMA_BUILD_EXAMPLES=OFF ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_CUDA=ON ^
-DGGML_NATIVE=ON ^
-DGGML_OPENMP=ON ^
-DCMAKE_CUDA_ARCHITECTURES=86 ^
-DGGML_CUDA_F16=ON ^
-DGGML_CUDA_GRAPHS=ON ^
-DGGML_CUDA_FORCE_CUBLAS=ON ^
-DGGML_CUDA_FA=ON ^
-DGGML_CUDA_FA_ALL_QUANTS=ON ^
-DGGML_CUDA_K_QUANTS=ON ^
-DGGML_CUDA_compression_mode=speed ^
-DCMAKE_INTERPROCEDURAL_OPTIMIZATION=ON ^
-DCMAKE_CXX_FLAGS="/EHsc /O2 /Gy /Zc:inline /MT /GL" ^
-DCMAKE_C_FLAGS="/EHsc /O2 /Gy /Zc:inline /MT /GL" ^
-DCMAKE_EXE_LINKER_FLAGS="/LTCG" ^
-DCMAKE_CUDA_HOST_COMPILER="%MSVC_BIN%\cl.exe" ^
-DCMAKE_CUDA_FLAGS="-allow-unsupported-compiler" ^
-DGGML_LTO=ON ^
-DGGML_OPENSSL=ON
cmake --build build-msvc --config Release -j
Relevant log output
The last few lines of the build output from the run using the MSVC compiler:
(I'm sorry for the german messages in this log, but Microsoft translates even compiler messages)
[433/473] Building CXX object tools\parser\CMakeFiles\llama-template-analysis.dir\template-analysis.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[436/473] Building CXX object tools\tts\CMakeFiles\llama-tts.dir\tts.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[440/473] Building CXX object tools\server\CMakeFiles\server-context.dir\server-common.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[441/473] Building CXX object tools\server\CMakeFiles\server-context.dir\server-tools.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[442/473] Building CXX object tools\cli\CMakeFiles\llama-cli.dir\cli.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[443/473] Building CXX object tools\server\CMakeFiles\server-context.dir\server-context.cpp.obj
cl : Befehlszeile warning D9025 : "/MT" wird durch "/MD" überschrieben
[446/473] Linking CXX shared library bin\llama-common.dll
FAILED: [code=1] bin/llama-common.dll common/llama-common.lib
C:\WINDOWS\system32\cmd.exe /C ""C:\Program Files\CMake\bin\cmake.exe" -E __create_def G:\llama.cpp\build-msvc\common\CMakeFiles\llama-common.dir\.\exports.def G:\llama.cpp\build-msvc\common\CMakeFiles\llama-common.dir\.\exports.def.objs && "C:\Program Files\CMake\bin\cmake.exe" -E vs_link_dll --msvc-ver=1950 --intdir=common\CMakeFiles\llama-common.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100261~1.0\x64\mt.exe --manifests -- C:\PROGRA~1\MICROS~4\18\COMMUN~1\VC\Tools\MSVC\1450~1.357\bin\Hostx64\x64\link.exe /nologo common\CMakeFiles\llama-common.dir\arg.cpp.obj common\CMakeFiles\llama-common.dir\chat-auto-parser-generator.cpp.obj common\CMakeFiles\llama-common.dir\chat-auto-parser-helpers.cpp.obj common\CMakeFiles\llama-common.dir\chat-diff-analyzer.cpp.obj common\CMakeFiles\llama-common.dir\chat-peg-parser.cpp.obj common\CMakeFiles\llama-common.dir\chat.cpp.obj common\CMakeFiles\llama-common.dir\common.cpp.obj common\CMakeFiles\llama-common.dir\console.cpp.obj common\CMakeFiles\llama-common.dir\debug.cpp.obj common\CMakeFiles\llama-common.dir\download.cpp.obj common\CMakeFiles\llama-common.dir\hf-cache.cpp.obj common\CMakeFiles\llama-common.dir\json-partial.cpp.obj common\CMakeFiles\llama-common.dir\json-schema-to-grammar.cpp.obj common\CMakeFiles\llama-common.dir\llguidance.cpp.obj common\CMakeFiles\llama-common.dir\log.cpp.obj common\CMakeFiles\llama-common.dir\ngram-cache.cpp.obj common\CMakeFiles\llama-common.dir\ngram-map.cpp.obj common\CMakeFiles\llama-common.dir\ngram-mod.cpp.obj common\CMakeFiles\llama-common.dir\peg-parser.cpp.obj common\CMakeFiles\llama-common.dir\preset.cpp.obj common\CMakeFiles\llama-common.dir\regex-partial.cpp.obj common\CMakeFiles\llama-common.dir\reasoning-budget.cpp.obj common\CMakeFiles\llama-common.dir\sampling.cpp.obj common\CMakeFiles\llama-common.dir\speculative.cpp.obj common\CMakeFiles\llama-common.dir\unicode.cpp.obj common\CMakeFiles\llama-common.dir\jinja\lexer.cpp.obj common\CMakeFiles\llama-common.dir\jinja\parser.cpp.obj common\CMakeFiles\llama-common.dir\jinja\runtime.cpp.obj common\CMakeFiles\llama-common.dir\jinja\value.cpp.obj common\CMakeFiles\llama-common.dir\jinja\string.cpp.obj common\CMakeFiles\llama-common.dir\jinja\caps.cpp.obj common\CMakeFiles\llama-common.dir\__\license.cpp.obj /out:bin\llama-common.dll /implib:common\llama-common.lib /pdb:bin\llama-common.pdb /dll /version:0.0 /machine:x64 /INCREMENTAL:NO /INCREMENTAL:NO /LTCG /DEF:common\CMakeFiles\llama-common.dir\.\exports.def common\llama-common-base.lib vendor\cpp-httplib\cpp-httplib.lib src\llama.lib C:\Users\Mathias\scoop\apps\vcpkg\current\installed\x64-windows-static\lib\libssl.lib C:\Users\Mathias\scoop\apps\vcpkg\current\installed\x64-windows-static\lib\libcrypto.lib crypt32.lib ws2_32.lib ggml\src\ggml.lib ggml\src\ggml-cpu.lib ggml\src\ggml-cuda\ggml-cuda.lib ggml\src\ggml-base.lib kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && C:\WINDOWS\system32\cmd.exe /C "cd /D G:\llama.cpp\build-msvc\common && "C:\Program Files\PowerShell\7\pwsh.exe" -noprofile -executionpolicy Bypass -file C:/Users/Mathias/scoop/apps/vcpkg/current/scripts/buildsystems/msbuild/applocal.ps1 -targetBinary G:/llama.cpp/build-msvc/bin/llama-common.dll -installedDir C:/Users/Mathias/scoop/apps/vcpkg/current/installed/x64-windows-static/bin -OutVariable out""
unrecognized file format in 'common\CMakeFiles\llama-common.dir\arg.cpp.obj, 0'
[447/473] Linking CXX shared library bin\mtmd.dll
ninja: build stopped: subcommand failed.
Git commit
every commit since 6990e2f (b8829) is failing
Operating systems
Windows
GGML backends
CPU, CUDA
Problem description & steps to reproduce
Description:
Since the library rename in commit 6990e2f (b8829), builds with Interprocedural Optimization (LTO) enabled fail on Windows. This affects both
clang-cl(withlld-link) and standardMSVC.Observations:
libcommontolibllama-common.-DGGML_LTO=OFF) fixes the build:common_debug_cb_evalis not properly exported or is being optimized away when the linker tries to merge the newllama-commonlibrary with the executables under LTO ?Maybe i should provide a llama-bench comparison of builds with and without LTO enabled.
However this means to compile two sets of the last commit before b8829 to compare them.
But in my experience, LTO support is critical for local LLM users:
Many users, including myself, run large models (e.g., Qwen 35B) in hybrid mode where a significant portion of the layers reside in system RAM. In these scenarios, CPU inference performance and efficient KV-cache management are paramount.
My benchmarks with previous versions showed that an optimized build using Clang-cl and LTO provides a measurable performance boost (approx. 5% + in token generation and prompt processing) compared to standard builds. For self-compiling users aiming for the most efficient inference provider, losing LTO support is a significant regression in execution speed.
First Bad Commit
6990e2f (b8829)
Compile command (using LLVM / clang on Windows)
Relevant log output
The last few lines of the build output from the run using the llvm compiler:
Compile command (using MSVC cl compiler)
Relevant log output
The last few lines of the build output from the run using the MSVC compiler:
(I'm sorry for the german messages in this log, but Microsoft translates even compiler messages)