From 9d3ae22b2893d04b9c86f1485259b4938b77b668 Mon Sep 17 00:00:00 2001 From: Martin Kroeker Date: Sun, 22 Mar 2026 23:44:32 +0100 Subject: [PATCH] Add section on calling convention, from issue 5325 --- docs/faq.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/faq.md b/docs/faq.md index d68be9ed12..294204fd88 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -345,7 +345,12 @@ Multithreading support in OpenBLAS requires the use of internal buffers for shar If you get a message "error while loading shared libraries: libopenblas.so.0: ELF load command address/offset not properly aligned" when starting a program that is (dynamically) linked to OpenBLAS, this is very likely due to a bug in the GNU linker (ld) that is part of the GNU binutils package. This error was specifically observed on older versions of Ubuntu Linux updated with the (at the time) most recent binutils version 2.38, but an internet search turned up sporadic reports involving various other libraries dating back several years. A bugfix was created by the binutils developers and should be available in later versions of binutils.(See issue 3708 for details) -#### Using OpenBLAS with OpenMP +### The tests work fine, but calling any complex function from my code produces wrong or no results + +This is almost certainly a problem with the calling convention used, in particular with the way the computed result is transported back to the caller. By default, OpenBLAS follows the F2C convention of returning the result on the stack rather than as the first argument to the function. So if your code has a prototype like "void cdotu ( complex *res, int n,...)" change it to "complex cdotu (int n,...)". Better yet, +use the CBLAS interface rather than the Fortran one. + +### Using OpenBLAS with OpenMP OpenMP provides its own locking mechanisms, so when your code makes BLAS/LAPACK calls from inside OpenMP parallel regions it is imperative that you use an OpenBLAS that is built with USE_OPENMP=1, as otherwise deadlocks might occur. Furthermore, OpenBLAS will automatically restrict itself to using only a single thread when called from an OpenMP parallel region. When it is certain that calls will only occur