ALP-Tutorial/ALP_Transition_Path_Tutorial.tex at master · Algebraic-Programming/ALP-Tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
\section{ALP Transition Paths}\label{sec:intro}

\textbf{Algebraic Programming (ALP)} is a C++ framework for high-performance linear algebra that can auto-parallelize and auto-optimize your code. A key feature of ALP is its transition path APIs, which let you use ALP through standard interfaces without changing your existing code. In practice, ALP generates drop-in replacements for established linear algebra APIs. You simply re-link your program against ALP's libraries to get optimized performance (no code modifications needed). ALP v0.8 provides transition-path libraries for several standards, including the NIST Sparse BLAS and a CRS-based iterative solver interface (ALP/Solver). This means you can take an existing C/C++ program that uses a supported API and benefit from ALP's optimizations (such as vectorization and parallelism) just by linking with ALP's libraries [1].

One of these transition paths, and the focus of this tutorial, is ALP's \textbf{sparse Conjugate Gradient (CG) solver}. This CG solver accepts matrices in Compressed Row Storage (CRS) format (also known as CSR) and solves $Ax=b$ using an iterative method. Under the hood it leverages ALP's non-blocking execution model, which overlaps computations and memory operations for efficiency. From the user's perspective, however, the solver is accessed via a simple C-style API that feels synchronous. In this workshop, we'll learn how to use this CG solver interface step by step: from setting up ALP, to coding a solution for a small linear system, to building and running the program.

\subsection{Setup: Installing ALP and Preparing to Use the Solver}\label{sec:setup}

This section explains how to install ALP on a Linux system and compile a simple example.
ALP (Algebraic Programming) provides a C++17 library implementing the GraphBLAS interface
for linear-algebra-based computations.

\subsubsection*{Installation on Linux}

\begin{enumerate}
\item Install prerequisites: Ensure you have a C++11 compatible compiler (e.g. \texttt{g++} 4.8.2 or later) with OpenMP support, CMake (>= 3.13) and GNU Make, plus development headers for libNUMA and POSIX threads.
\end{enumerate}

For example, on Debian/Ubuntu:
\begin{verbatim}
sudo apt-get install build-essential libnuma-dev libpthread-stubs0-dev cmake
\end{verbatim}

\textbf{Direct linking option}: If you prefer to compile with your usual compiler,
you need to include ALP's headers and link against the ALP libraries manually.
For the CG solver transition path, that typically means linking against the
sparse solver library (e.g. libspsolver\_shmem\_parallel for the parallel version)
and any core ALP libraries it depends on. For example, if ALP is installed in /opt/alp,
you might compile with:

\begin{lstlisting}[language=bash, basicstyle=\ttfamily\small, showstringspaces=false]
gcc -I/opt/alp/include -L/opt/alp/lib \
-lspsolver_shmem_parallel -lalp_cspblas_shmem_parallel \
my_program.c -o my_program
\end{lstlisting}


(ALP's documentation provides details on which libraries to link for each backend [3].)
Using grbcxx is recommended for simplicity, but it's good to know what happens under the hood.
Now that our environment is set up, let's look at the CG solver API.


\subsection{Overview of ALP's Non-Blocking Sparse CG API}\label{sec:api}

The ALP/Solver transition path provides a C-style interface for initializing and running a Conjugate Gradient solver. All functions are exposed via a header (e.g. solver.h in ALP's include directory) and use simple types like pointers and handles. The main functions in this API are:

\begin{itemize}
\item \textbf{sparse\_cg\_init(\&handle, n, vals, cols, offs):} Initializes a CG solver instance. It allocates/assigns a solver context to the user-provided sparse\_cg\_handle\_t (an opaque handle type defined by ALP). The matrix $A$ is provided in CRS format by three arrays: vals (the nonzero values), cols (the column indices for each value), and offs (offsets in the vals array where each row begins). The dimension n (number of rows, which should equal number of columns for $A$) is also given. After this call, the handle represents an internal solver state with matrix $A$ stored. Return: Typically returns 0 on success (and a non-zero error code on failure) [4].

\item \textbf{sparse\_cg\_set\_preconditioner(handle, func, data):} (Optional) Sets a preconditioner for the iterative solve. The user provides a function func that applies the preconditioner $M^{-1}$ to a vector (i.e. solves $Mz = r$ for a given residual $r$), along with a user data pointer. ALP will call this func(z, r, data) in each CG iteration to precondition the residual. If you don't call this, the solver will default to no preconditioning (i.e. $M = I$). You can use this to plug in simple preconditioners (like Jacobi, with data holding the diagonal of $A$) or even advanced ones, without modifying the solver code. Return: 0 on success, or error code if the handle is invalid, etc.

\item \textbf{sparse\_cg\_solve(handle, x, b):} Runs the CG iteration to solve $Ax = b$. Here b is the right-hand side vector (input), and x is the solution vector (output). You should allocate both of these arrays of length n beforehand. The solver will iterate until it converges to a solution within some default tolerance or until it reaches an iteration limit. On input, you may put an initial guess in x. If not, it's safe to initialize x to zero (the solver will start from $x_0 = 0$ by default in that case). Upon return, x will contain the approximate solution. Return: 0 if the solution converged (or still 0 if it ran the maximum iterations – specific error codes might indicate divergence or other issues in future versions).

\item \textbf{sparse\_cg\_destroy(handle):} Destroys the solver instance and releases any resources associated with the given handle. After this, the handle becomes invalid. Always call this when you are done solving to avoid memory leaks. Return: 0 on success (and the handle pointer may be set to NULL or invalid after). This API is non-blocking in the sense that internally ALP may overlap operations (like sparse matrix-vector multiplications and vector updates) and use asynchronous execution for performance. However, the above functions themselves appear synchronous. For example, sparse\_cg\_solve will only return after the solve is complete (there’s no separate “wait” call exposed in this C interface). The benefit of ALP’s approach is that you, the developer, don’t need to manage threads or message passing at all ALP’s GraphBLAS engine handles parallelism behind the scenes. You just call these routines as you would any standard library. Now, let’s put these functions into practice with a concrete example.
\end{itemize}

This API is non-blocking in the sense that internally ALP may overlap operations (like sparse matrix-vector multiplications and vector updates) and use asynchronous execution for performance. However, the above functions themselves appear synchronous. For example, sparse\_cg\_solve will only return after the solve is complete (there’s no separate “wait” call exposed in this C interface). The benefit of ALP’s approach is that you, the developer, don’t need to manage threads or message passing at all. ALP’s GraphBLAS engine handles parallelism behind the scenes. You just call these routines as you would any standard library. Now, let’s put these functions into practice with a concrete example.


\subsection{Example: Solving a Linear System with ALP’s CG Solver}

Suppose we want to solve a small system $Ax = b$ to familiarize ourselves with the CG interface. We will use the following $3 \times 3$ symmetric positive-definite matrix $A$: $$ A = \begin{pmatrix} 4 & 1 & 0\\
        1 & 3 & -1\\
        0 & -1 & 2 \end{pmatrix}, $$
        and we choose a right-hand side vector $b$ such that the true solution is easy to verify. If we take the solution to be $x = (1,\;2,\;3)$, then $b = A x$ can be calculated as: $$ b = \begin{pmatrix}6 \ 4 \ 4 \end{pmatrix}, $$ since $4\cdot1 + 1\cdot2 + 0\cdot3 = 6$, $1\cdot1 + 3\cdot2 + (-1)\cdot3 = 4$, and $0\cdot1 + (-1)\cdot2 + 2\cdot3 = 4$. Our goal is to see if the CG solver recovers $x = (1,2,3)$ from $A$ and $b$.

We will hard-code $A$ in CRS format (also called CSR: Compressed Sparse Row) for the solver. In CRS, the matrix is stored by rows, using parallel arrays for values and column indices, plus an offset index for where each row starts. For matrix $A$ above:


\begin{itemize}
\item Nonzero values in row 0 are [4, 1] (at columns 0 and 1), in row 1 are [1, 3, -1] (at cols 0,1,2), and in row 2 are [-1, 2] (at cols 1,2). So the vals array will be \{4, 1, 1, 3, -1, -1, 2\} (concatenated by row).
\item Corresponding column indices cols would be \{0, 1, 0, 1, 2, 1, 2\} aligned with each value.
\item The offsets offs marking the start of each row’s data in the vals array would be: row0 starts at index 0, row1 at index 2, row2 at index 5, and an extra final entry = 7 (total number of nonzeros). So offs = \{0, 2, 5, 7\}.
\end{itemize}


Below is a complete program using ALP’s CG solver to solve for $x$. We include the necessary ALP header
for the solver API, set up the matrix and vectors, call the API functions in order, and then print the result.

\begin{lstlisting}[language=C++, label=lst:example, showstringspaces=false, basicstyle=\ttfamily\small, caption={Example program using ALP's CG solver API}]s

#include <stdio.h>
#include <stdlib.h>

// Include ALPs solver API header
#include <transition/solver.h> // (path may vary based on installation)

int main(){
    // Define the 3x3 test matrix in CRS format
    const size_t n = 3;

    double A_vals[] = {
        4.0, 1.0, // row 0 values
        1.0, 3.0, -1.0, // row 1 values
        -1.0, 2.0 // row 2 values
    };

    int A_cols[] = {
        0, 1, // row 0 column indices
        0, 1, 2, // row 1 column indices
        1, 2 // row 2 column indices
    };

    int A_offs[] = { 0, 2, 5, 7 }; // row start offsets: 0,2,5 and total nnz=7

    // Right-hand side b and solution vector x
    double b[] = { 6.0, 4.0, 4.0 }; // b = A * [1,2,3]^T
    double x[] = { 0.0, 0.0, 0.0 }; // initial guess x=0 (will hold the solution)

    // Solver handle
    sparse_cg_handle_t handle;

    int err = sparse_cg_init_dii(&handle, n, A_vals, A_cols, A_offs);
    if (err != 0) {
        fprintf(stderr, "CG init failed with error %d\n", err);
        return EXIT_FAILURE;
    }

    // (Optional) set a preconditioner here if needed, e.g. Jacobi or others.
    // We skip this, so no preconditioner (effectively M = Identity).
    err = sparse_cg_solve_dii(handle, x, b);
    if (err != 0) {
        fprintf(stderr, "CG solve failed with error %d\n", err);
        // Destroy handle before exiting
        sparse_cg_destroy_dii(handle);
        return EXIT_FAILURE;
    }

    // Print the solution vector x
    printf("Solution x = [%.2f, %.2f, %.2f]\n", x[0], x[1], x[2]);

    // Clean up
    sparse_cg_destroy_dii(handle);

    return 0;
}


\end{lstlisting}

Let’s break down what happens here:

\begin{itemize}

    \item We included <graphblas/solver.h> (the exact include path might be alp/solver.h or similar depending on ALP’s install, but typically it resides in the GraphBLAS include directory of ALP). This header defines the sparse\_cg\_* functions and the \textbf{sparse\_cg\_handle\_t} type.

    \item We set up the matrix data in CRS format. For clarity, the values and indices are grouped by row in the code. The offsets array \{0,2,5,7\} indicates: row0 uses vals[0..1], row1 uses vals[2..4] , row2 uses vals[5..6]. The matrix dimension n is 3.

    \item We prepare the vectors b and x. b is initialized to \{6,4,4\} as computed above. We initialize x to all zeros (as a starting guess). In a real scenario, you could start from a different guess, but zero is a common default.

    \item We create a \textbf{sparse\_cg\_handle\_t} and call {sparse\_cg\_init}. This hands the matrix to ALP’s solver. Under the hood, ALP will likely copy or reference this data and possibly analyze $A$ for the CG algorithm. We check the return code err, if non-zero, we print an error and exit. (For example, an error might occur if n or the offsets are inconsistent. In our case, it should succeed with err == 0.)

    \item We do not call \textbf{sparse\_cg\_set\_preconditioner} in this example, which means the CG will run unpreconditioned. If we wanted to, we could implement a simple preconditioner. For instance, a Jacobi preconditioner would use the diagonal of $A$: we’d create an array with $\text{diag}(A) = [4,3,2]$ and a function to divide the residual by this diagonal. We would then call \textbf{sparse\_cg\_set\_preconditioner(handle, my\_prec\_func, diag\_data)}. For brevity, we skip this. ALP will just use the identity preconditioner by default (no acceleration).

    \item Next, we call \textbf{sparse\_cg\_solve(handle, x, b)}. ALP will iterate internally to solve $Ax=b$. When this function returns, x should contain the solution. We again check err. A non-zero code could indicate the solver failed to converge (though typically it would still return 0 and one would check convergence via a status or residual, ALP’s API may evolve to provide more info). In our small case, it should converge in at most 3 iterations since $A$ is $3\times3$.

    \item We print the resulting x. We expect to see something close to [1.00, 2.00, 3.00]. Because our matrix and $b$ were consistent with an exact solution of $(1,2,3)$, the CG method should find that exactly (within floating-point rounding). You can compare this output with the known true solution to verify the solver worked correctly.

    \item Finally, we call \textbf{sparse\_cg\_destroy(handle)} to free ALP’s resources for the solver. This is important especially for larger problems to avoid memory leaks. After this, we return from main.

\end{itemize}


\section*{Building and Running the Example}
To compile the above code with ALP, we will use the direct linking option as discussed.
\begin{lstlisting}[language=bash, basicstyle=\ttfamily\small, showstringspaces=false]
g++ example.cpp -o cg_demo \
-I$ALP_INSTALL_DIR/include \
-L$ALP_INSTALL_DIR/lib/ -L$ALP_INSTALL_DIR/lib/sequential/ \
-lspsolver_shmem_parallel -lgraphblas -fopenmp -lnuma
\end{lstlisting}
After a successful compile, you can run the program:
    \begin{lstlisting}[language=bash,basicstyle=\ttfamily\small, showstringspaces=false]
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ALP_INSTALL_DIR/lib:$ALP_INSTALL_DIR/lib/sequential/
    ./cg_demo
    \end{lstlisting}
It should output something like:
    \begin{lstlisting}[language=bash,basicstyle=\ttfamily\small, showstringspaces=false]
    Solution x = [1.00, 2.00, 3.00]
    \end{lstlisting}


Congratulations! You’ve now written and executed a Conjugate Gradient solver using ALP’s transition path
API! By using this C-style interface, we got the benefits of ALP’s high-performance GraphBLAS engine
without having to dive into template programming or parallelization details. From here, you can explore
other parts of ALP’s API: for instance, ALP/GraphBLAS provides a full GraphBLAS-compatible API for linear
algebra operations on vectors and matrices, and ALP also offers a Pregel-like graph processing interface. All
of these can be integrated with existing code in a similar fashion (just link against ALP’s libraries) [2][3].


In summary, ALP’s transition paths enable a smooth adoption of advanced HPC algorithms – you write
standard code, and ALP handles the rest. We focused on the CG solver, but the same idea applies to other
supported interfaces. Feel free to experiment with different matrices, add a custom preconditioner function
to see its effect, or try other solvers if ALP introduces them in future releases. Happy coding!


\bibliographystyle{plain}
\begin{thebibliography}{9}

\bibitem{alp-transition}
ALP v0.8 Transition Path Overview. \\
\url{http://albert-jan.yzelman.net/alp/v0.8-preview/group__TRANS.html}

\bibitem{alp-docs}
ALP Documentation and Examples. GitHub. \\
\url{https://github.com/Algebraic-Programming/ALP}

\bibitem{alp-graphblas-use}
Use ALP GraphBLAS in Your Project. GitHub Docs. \\
\url{https://github.com/Algebraic-Programming/ALP/blob/develop/docs/Use_ALPGraphBLAS_in_your_own_project.md}

\bibitem{alp-transition-use}
Using ALP Transition Interfaces. GitHub Docs. \\
\url{https://github.com/Algebraic-Programming/ALP/blob/develop/docs/Transition_use.md}

\end{thebibliography}