ALP-Tutorial/ALP_Tutorial.tex at master · Algebraic-Programming/ALP-Tutorial · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723

\section{Installation on Linux}\label{sec:installation}

This section explains how to install ALP on a Linux system and compile a simple example. To get started:

\begin{enumerate}
\item \textbf{Install prerequisites}. Ensure you have a C++11 compatible compiler (e.g. \texttt{g++} 4.8.2 or later) with OpenMP support, CMake (>= 3.13) and GNU Make, plus development headers for libNUMA and POSIX threads.
For example, on Debian/Ubuntu:
\begin{verbatim}
sudo apt-get install build-essential libnuma-dev libpthread-stubs0-dev cmake;
\end{verbatim}
or, on Red Hat systems (as root):
\begin{verbatim}
dnf group install "Development Tools"
dnf install numactl-devel cmake
\end{verbatim}

\item \textbf{Obtain ALP}. Download or clone the ALP repository, e.g. from its official GitHub location:
\begin{verbatim}
git clone https://github.com/Algebraic-Programming/ALP.git
\end{verbatim}

\item \textbf{Build ALP}. Create a build directory and invoke the provided bootstrap script to configure the project with CMake, then compile and install:
\begin{lstlisting}[language=bash]
$ cd ALP && mkdir build && cd build
$ ../bootstrap.sh --prefix=../install # configure the build
$ make -j # compile the ALP library
$ make install # install to ../install
\end{lstlisting}
(You may choose a different installation prefix as desired.)

\item \textbf{Set up environment}. After installation, activate the ALP environment by sourcing the script setenv in the install directory:
\begin{lstlisting}[language=bash]
$ source ../install/bin/setenv
\end{lstlisting}
This script updates the shell PATH to make the ALP compiler wrapper accessible, as well as adds the ALP libraries to the applicable standard library paths.

\item \textbf{Compile an example}. ALP provides a compiler wrapper \texttt{grbcxx} to compile programs that use the ALP/GraphBLAS API. This wrapper automatically adds the correct include paths and links against the ALP library and its dependencies. For example, to compile the provided sp.cpp sample:
\begin{lstlisting}[language=bash]
$ grbcxx ../examples/sp.cpp -o sp_example
\end{lstlisting}
By default this produces a sequential program; you can add the option \texttt{-b reference\_omp} to use the OpenMP parallel backend for shared-memory auto-parallelisation. The wrapper \texttt{grbcxx} accepts other backends as well (e.g.\ \texttt{-b nonblocking} for nonblocking execution on shared-memory parallel systems or \texttt{-b hybrid} for hybrid shared- and distributed-memory execution\footnote{The \texttt{hybrid} backend requires the Lightweight Parallel Foundations, LPF, is installed as an additional dependence; see \url{https://github.com/Algebraic-Programming/LPF/} and/or the main ALP \texttt{README.md}.}).

\item \textbf{Run the program}. Use the provided runner \texttt{grbrun} to execute the compiled binary. For a simple shared-memory program, running with \texttt{grbrun} is similar to using \texttt{./program} directly. For example:
\begin{lstlisting}[language=bash]
$ grbrun ./sp_example
\end{lstlisting}
(The \texttt{grbrun} tool is more relevant when using distributed backends or controlling the execution environment; for basic usage, the program can also be run directly.)
\end{enumerate}
After these steps, you have installed ALP and have made sure its basic functionalities are operational. In the next sections we introduce core ALP/GraphBLAS concepts and walk through a simple example program.

\section{ALP/GraphBLAS}\label{sec:alp_concepts}

ALP exposes a GraphBLAS interface which separate in three categories: 1) algebraic containers (vectors, matrices, etc.); 2) algebraic structures (binary operators, semrings, etc.); and 3) algebraic operations that take containers and algebraic structures as arguments. This interface was developed in tandem with what became the GraphBLAS C specification, however, is pure C++. All containers, primitives, and algebraic structures are defined in the \texttt{grb} namespace. The ALP user documentation may be useful in course of the exercises. These may be found at: \url{http://albert-jan.yzelman.net/alp/user/}.

Let us first bootstrap our tutorial with a simple \emph{Hello World} example:

\begin{lstlisting}[language=c++,morekeywords=constexpr,morekeywords=size_t]
#include <cstddef>
#include <cstring>

#include <graphblas.hpp>

#include <assert.h>

constexpr size_t max_fn_size = 255;
typedef char Filename[ max_fn_size ];

void hello_world( const Filename &in, int &out ) {
	std::cout << "Hello from " << in << std::endl;
	out = 0;
}

int main( int argc, char ** argv ) {
	// get input
	Filename fn;
	(void) std::strncpy( fn, argv[ 0 ], max_fn_size );

	// set up output field
	int error_code = 100;

	// launch hello world program
	grb::Launcher< grb::AUTOMATIC > launcher;
	assert( launcher.exec( &hello_world, fn, error_code, true )
		== grb::SUCCESS );

	// return with the hello_world error code
	return error_code;
}
\end{lstlisting}

In this code, we have a very simple \texttt{hello\_world} function that takes its own filename as an input argument, prints a hello statement to \texttt{stdout}, and then returns a zero error code.
ALP uses the concept of a \emph{Launcher} to start ALP programs such as \texttt{hello\_world}, which is examplified in the main function above. This mechanism allows for encapsulation and starting sequences of ALP programs, potentially adaptively based on run-time conditions. The signature of an ALP program always consists of two arguments: the first being program input and the second being program output. The types of both input and output may be any POD type.

Assuming the above is saved as \texttt{alp\_hw.cpp}, it may be compiled and run as follows:
\begin{lstlisting}[language=bash]
$ grbcxx alp_hw.cpp
$ grbrun ./a.out
Info: grb::init (reference) called.
Hello from ./a.out
Info: grb::finalize (reference) called.
$
\end{lstlisting}

\noindent \textbf{Exercise 1.} Double-check that you have the expected output from this example, as we will use its framework in the following exercises.

\noindent \textbf{Question.} Why is \texttt{argv[0]} not directly passed as input to \texttt{hello\_world}?

\noindent \textbf{Bonus Question.} Consider the \href{http://albert-jan.yzelman.net/alp/user/classgrb_1_1Launcher.html#af33a2d0ff876594143988613ebaebae7}{programmer reference documentation for the \texttt{grb::Launcher}}, and consider distributed-memory parallel execution in particular. Why is the last argument to \texttt{launcher.exec} \texttt{true}?


\subsection{ALP/GraphBLAS Containers}

The primary ALP/GraphBLAS container types are \texttt{grb::Vector<T>} and \texttt{grb::Matrix<T>}. These are templated on a value type \texttt{T}, the type of elements stored. The type \texttt{T} can be any plain-old-data (POD) type, including \texttt{std::pair} or \texttt{std::complex<T>}. Both vectors and matrices can be sparse, meaning they efficiently represent mostly-zero data by storing only nonzero elements. For example, one can declare a vector of length $100\ 000$ and a $150\ 000\times100\ 000$ matrix as:
\begin{lstlisting}
grb::Vector<double> x(100000), y(150000);
grb::Matrix<void> A(150000, 100000);
\end{lstlisting}
In this snippet, \texttt{x} and \texttt{y} are vectors of type \texttt{double}. The matrix \texttt{A} is declared with type \texttt{void}, which signifies it only holds the pattern of nonzero positions and no numeric values. Perhaps more commonly, one would use a numeric type (e.g. \texttt{double}) for holding matrix nonzeroes. A \texttt{void} matrix as in the above example is useful for cases where existence of an entry is all that matters, as e.g.\ for storing Boolean matrices or unweighted graphs.

By default, newly instantiated vectors or matrices are empty, meaning they store no elements. You can query properties like length or dimensions via \texttt{grb::size(vector)} for vector length or \texttt{grb::nrows(matrix)} and \texttt{grb::ncols(matrix)} for matrix dimensions. The number of elements present within a container may be retrieved via \texttt{grb::nnz(container)}. Containers have a maximum capacity on the number of elements they may store; the capacity may be retrieved via \texttt{grb::capacity(container)} and on construction of a container is set to the maximum of its dimensions. For example, the initial capacity of \texttt{x} in the above is $100\ 000$, while that of \texttt{A} is $150\ 000$. The size of a container once initialised is fixed, while the capacity may increase during the lifetime of a container.

\noindent \textbf{Exercise 2.} Allocate vectors and matrices in ALP as follows:
\begin{itemize}
  \item a \texttt{grb::Vector<double>} \texttt{x} of length 100, with initial capacity 100;
  \item a \texttt{grb::Vector<double>} \texttt{y} of length 1\ 000, with initial capacity 1\ 000;
  \item a \texttt{grb::Matrix<double>} \texttt{A} of size $(100 \times 1\ 000)$, with initial capacity 1\ 000; and
  \item a \texttt{grb::Matrix<double>} \texttt{B} of size $(100 \times 1\ 000)$, with initial capacity 5\ 000.
\end{itemize}
You may start from a copy of \texttt{alp\_hw.cpp}. Employ \texttt{grb::capacity} to print out the capacities of each of the containers. \textbf{Hint:} refer to the user documentation on how to override the default capacities.

If done correctly, you should observe output similar to:

\begin{lstlisting} [language=bash, basicstyle=\ttfamily\small, showstringspaces=false]
Info: grb::init (reference) called.
Capacity of x: 100
Capacity of y: 1000
Capacity of A: 1000
Capacity of B: 5000
Info: grb::finalize (reference) called.
\end{lstlisting}

\noindent \textbf{Question.} Is overriding the default capacity necessary for all of \texttt{x, y, A} and \texttt{B} in the above exercise?

\subsection{Basic Container I/O}

The containers in the C++ Standard Template Library (STL) employ the concept of iterators to ingest and extract data, as well as foresees in primitives for manipulating container contents.
ALP/GraphBLAS is no different, e.g., providing \texttt{grb::clear(container)} to remove all elements from a container, similar to the clear function defined by STL vectors, sets, et cetera.
Similarly, \texttt{grb::set(vector,scalar)} sets all elements of a given vector equal to the given scalar, resulting in a full (dense) vector.
By contrast, \texttt{grb::setElement(vector,scalar,index)} sets only a given element at a given index equal to a given scalar.

\noindent \textbf{Exercise 3.} Start from a copy of \texttt{alp\_hw.cpp} and modify the \texttt{hello\_world} function to allocate two vectors and a matrix as follows:
\begin{itemize}
	\item a \texttt{grb::Vector<bool>} \texttt{x} and \texttt{y} both of length $497$ with capacities $497$ and $1$, respectively;
	\item a \texttt{grb::Matrix<void>} \texttt{A} of size $497\times497$ and capacity $1\ 727$.
\end{itemize}
Then, initialise $y$ with a single value \texttt{true} at index $200$, and initialise $x$ with \texttt{false} everywhere. Print the number of nonzeroes in $x$ and $y$. Once done, after compilation and execution, the output should be alike:
\begin{lstlisting}
...
nonzeroes in x: 497
nonzeroes in y: 1
...
\end{lstlisting}

\noindent \textbf{Bonus question.} Print the capacity of $y$. Should the value returned be unexpected, considering the specification in the user documentation, is this a bug in ALP?

ALP/GraphBLAS containers are compatible with standard STL output iterators. For example, the following for-loop prints all entries of $y$:
\begin{lstlisting}
for( const auto &pair : y ) {
	std::cout << "y[ " << pair.first << " ] = " << pair.second << "\n";
}
\end{lstlisting}

\noindent \textbf{Exercise 4.} Use output iterators to double-check that $x$ has $497$ values and that all those values equal \texttt{false}.

Commonly, matrices are available in common file exchange formats, such as MatrixMarket \texttt{.mtx}. To facilitate working with standard files, ALP contains utilities for reading standard format. The utilities are not included with \texttt{graphblas.hpp} and must instead be included explicitly:
\begin{lstlisting}
#include <graphblas/utils/parser.hpp>
\end{lstlisting}
Including the above parser utility defines the \texttt{MatrixFileReader} class. Its constructor takes one filename plus a Boolean that describes whether vertex are numbered consecutively (as required in the case of MatrixMarket files); some graph repositories, e.g. SNAP, have non consecutively-numbered vertices which could be an artifact of how the data is constructed or due to post-processing. In this case, passing \texttt{false} as the second argument to the parser will map the non-consecutive vertex IDs to a consecutive range instead, thus packing the graph structure in a minimally-sized sparse matrix. In this tutorial, however, we stick to MatrixMarket files and therefore always pass \texttt{true}:
\begin{lstlisting}
std::string in( "matrix_file.mtx" );
grb::utils::MatrixFileReader< double > parser( in, true );
\end{lstlisting}
After instantiation, the parser defines STL-compatible iterators that are enriched for use with sparse matrices; e.g., one may issue
\begin{lstlisting}
const auto iterator = parser.begin();
std::cout << "First parsed entry: ( " << iterator.i() << ", " << iterator.j() << " ) = " << iterator.v() << "\n";
\end{lstlisting}
which should print, on execution,
\begin{lstlisting}
First parsed entry: ( 495, 496 ) = 0.897354
\end{lstlisting}
Note that the template argument to \texttt{MatrixFileReader} defines the value type of the sparse matrix nonzero values. The start-end iterator pair from this parser is compatible with the \texttt{grb::buildMatrixUnique} ALP/GraphBLAS primitive, where the suffix -unique indicates that the iterator pair should never iterate over a nonzero at the same matrix position $(i,j)$ more than once. Hence reading in the matrix into the ALP/GraphBLAS container $A$ proceeds simply as
\begin{lstlisting}
grb::RC rc = grb::buildMatrixUnique(
    A,
    parser.begin( grb::SEQUENTIAL ), parser.end( grb::SEQUENTIAL ),
    grb::SEQUENTIAL
);
assert( rc == grb::SUCCESS );
\end{lstlisting}
The type \texttt{grb::RC} is the standard return type; ALP primitives\footnote{that are not simple `getters' like \texttt{grb::nnz}} always return an error code, and, if no error is encountered, return \texttt{grb::SUCCESS}. Iterators in ALP may be either \emph{sequential} or \emph{parallel}. Start-end iterator pairs that are sequential, such as retrieved from the parser in the above snippet, iterate over all elements of the underlying container (in this case, all nonzeroes in the sparse matrix file). A parallel iterator, by contrast, only retrieves some subset of elements $V_s$, where $s$ is the process ID. It assumes that there are a total of $p$ subsets $V_i$, where $p$ is the total number of processes. These subsets are pairwise disjoint (i.e., $V_i\cap V_j=\emptyset$ for all $i\neq j, 0\leq i,j<p$), while $\cup V_i$ corresponds to all elements in the underlying container. Parallel iterators are useful when launching an ALP/GraphBLAS program with multiple processes to benefit from distributed-memory parallelisation; in such cases, it would be wasteful if every process iterates over all data elements on data ingestion-- instead, parallel I/O is preferred. ALP primitives that take iterator pairs as input must be aware of the I/O type, which is passed as the last argument to \texttt{grb::buildMatrixUnique} in the above code snippet.

\noindent \textbf{Exercise 5.} Use the \texttt{FileMatrixParser} and its iterators to build $A$ from \texttt{west0497.mtx}. Have it print the number of nonzeroes in $A$ after buildMatrixUnique. Then modify the \texttt{main} function to take as the first program argument a path to a .mtx file, pass that path to the ALP/GraphBLAS program. Find and download the west0497 matrix from the SuiteSparse matrix collection, and run the application with the path to the downloaded matrix. If all went well, its output should be something like:
\begin{lstlisting}[keywordstyle=\ttfamily]
Info: grb::init (reference) called.
elements in x: 497
elements in y: 1
y[ 200 ] = 1
Info: MatrixMarket file detected. Header line: ``%%MatrixMarket matrix coordinate real general''
Info: MatrixFileReader constructed for /home/yzelman/Documents/datasets/graphs-and-matrices/west0497.mtx: an 497 times 497 matrix holding 1727 entries. Type is MatrixMarket and the input is general.
First parsed entry: ( 495, 496 ) = 0.897354
nonzeroes in A: 1727
Info: grb::finalize (reference) called.
\end{lstlisting}

\noindent \textbf{Bonus question.} Why is there no \texttt{grb::set(matrix,scalar)} primitive?

\subsection{Copying, Masking, and Standard Matrices}

Continuing from the last exercise, the following code would store a copy of $y$ in $x$ and a copy of $A$ in $B$:
\begin{lstlisting}
grb::Matrix< double > B( 497, 497 );
assert( grb::capacity( B ) == 497 );
grb::RC rc = grb::set( x, y );
rc = rc ? rc : grb::set( B, A, grb::RESIZE );
rc = rc ? rc : grb::set( B, A, grb::EXECUTE );
\end{lstlisting}
\noindent \textbf{Question.} What does the code pattern \texttt{rc = rc ? rc : <function call>;} achieve?

Note that after instantiation of $B$ (line 1 in the above) it will be allocated with a default capacity of $497$ values maximum (line 2). However, $A$ from the preceding exercise has $1\ 727$ values; therefore, simply executing \texttt{grb::set( B, A )} would return \texttt{grb::ILLEGAL}. Rather than manually having to call \texttt{grb::resize( B, 1727 )} to make the copy work, the above snippet instead first requests ALP/GraphBLAS to figure out the required capacity of $B$ and resize it if necessary (line 4), before then executing the copy (line 5). The execute phase is default-- i.e., the last line could equivalently have read \texttt{rc = rc ? rc : grb::set( B, A );}. Similarly, the vector copy (line 3) could have been broken up in resize-execute phases, however, from the previous exercises we already learned that the default vector capacity guarantees are sufficient to store $y$ and so we call the execute phase immediately.

\noindent \textbf{Question.} $A$ contains $1727$ double-precision elements. Are these $1727$ nonzeroes?

The \texttt{grb::set} primitive may also take a mask argument. The mask determines which outputs are computed and which outputs are discarded. For example, recall that $y$ from the previous exercise is a vector of size $497$ that contains only one value $y_{200}=$\texttt{true}. Then
\begin{lstlisting}
grb::RC rc = grb::set( x, y, false );
\end{lstlisting}
results in a vector $x$ that has one entry only: $x_{200}=\texttt{false}$. This is because $y$ has no elements anywhere except at position $200$, and hence the mask evaluates \texttt{false} for any $0\leq i<497, i\neq200$, and no entries are generated by the primitive at those positions. At position $200$, however, the mask $y$ does contain an element whose value is \texttt{true}, and hence the \texttt{grb::set} primitive will generate an output entry there. The value of the entry $x_{200}$ is set to the value given to the primitive, which is \texttt{false}. All GraphBLAS primitives with an output container can take mask arguments.

\noindent \textbf{Question.} What would \texttt{grb::set( y, x, true )} return for $y$, assuming it is computed immediately after the preceding code snippet?

We have shown how matrices may be built from input iterators while similarly, vectors may be built from standard iterators through \texttt{grb::buildVectorUnique} as well. Iterator-based ingestion also allows for the construction of vectors and matrices that have regular structures. ALP/GraphBLAS comes with some standard recipes that exploit this, for example to build an $n\times n$ identity matrix:
\begin{lstlisting}
...
#include <graphblas/algorithms/matrix_factory.hpp>
...

    const grb::Matrix< double > identity = grb::algorithms::matrices< double >::identity( n );

...
\end{lstlisting}
\href{http://albert-jan.yzelman.net/alp/v0.8-preview/classgrb_1_1algorithms_1_1matrices.html#a1336accbaf6a61ebd890bef9da4116fc}{Other regular patterns supported} are \emph{eye} (similar to identity but not required to be a square matrix) and \emph{diag} (which takes an optional parameter $k$ to generate offset the diagonal, e.g., to generate a superdiagonal matrix). The class includes a set of constructors that result in dense matrices as well, including \emph{dense}, \emph{full}, \emph{zeros}, and \emph{ones}; note, however, that ALP/GraphBLAS is not optimised to handle dense matrices efficiently and so their use is discouraged\footnote{for similar reasons, actually, there is no primitive \texttt{grb::set(matrix,scalar)} in the GraphBLAS API}. While constructing matrices from standard file formats and through general iterators hence are key features for usability, it is also possible to derive matrices from existing ones via \texttt{grb::set(matrix,mask,value)}.

\noindent \textbf{Exercise 6.} Copy the code from the previous exercise, and modify it to determine whether $A$ holds explicit zeroes; i.e., entries in $A$ that have numerical value zero. \textbf{Hint:} it is possible to complete this exercise using only masking and copiying. You may also want to change the element type of $A$ to \texttt{double} (or convince yourself that this is not required).

\noindent \textbf{Exercise 7.} Determine how many explicit zeroes exist on the diagonal, superdiagonal, and subdiagonal of $A$; i.e., compute $|\{A_{ij}\in A\ |\ A_{ij}=0, |i-j|\leq1, 0\leq i,j<497\}|$.

\noindent \textbf{Bonus question.} How much memory beyond that which is required to store the $n\times n$ identity matrix will a call to \texttt{matrices< double >::identity( n )} consume? \textbf{Hint:} consider that the iterators passed to \texttt{buildMatrixUnique} iterate over regular index sequences that can easily be systematically enumerated.

\subsection{Numerical Linear Algebra}

GraphBLAS, as the name implies, provides canonical BLAS-like functionalities on sparse matrices, sparse vectors, dense vectors and scalars.
These include \texttt{grb::foldl}, \texttt{grb::foldr}, \texttt{grb::dot}, \texttt{grb::eWiseAdd},
\texttt{grb::eWiseMul}, \texttt{grb::eWiseLambda}, \texttt{grb::eWiseApply}, \texttt{grb::mxv},
\texttt{grb::vxm}, and \texttt{grb::mxm}.

The loose analogy with BLAS is intentional, as these primitives cover the same ground as the Level 1, Level 2, and Level 3 BLAS operations.
The former three scalar/vector primitives are dubbed \emph{level 1}, the following two matrix--vector primitives \emph{level 2}, and the latter matrix--matrix primitive \emph{level 3}. Their functionalities are summarised as follows:\newline

    \textbf{grb::foldl, grb::foldr} – These primitives inplace operate on the input/output data structure by applying binary operator from the left or right, respectively.
    They require two data structures and two operators, the accumulation and binary operators oerators, typically taken from a semiring.
    For example, \texttt{grb::foldl( alpha, u, binary\_op, accum\_op )} computes $\alpha$\textit{=}$\alpha$ \textit{accum\_op} $(u_0$ \textit{binary\_op} $(u_1$ \textit{binary\_op} $(\ldots (u_{n-2}$ \textit{binary\_op} $u_{n-1})\ldots)))$, where $u$ is a vector of length $n$.\newline

    \textbf{grb::dot}%( grb::Vector<T> u, grb::Vector<T> v )}
	– Compute the dot product of two vectors, $\alpha$\textit{+=}$u^Tv$ or $\alpha$\textit{+=}$\sum_i (u_i \times v_i)$, in essence combining element-wise multiplication with a reduction. The output $\alpha$ is a scalar, usually a primitive type such as \texttt{double}. Unlike the out-of-place \texttt{grb::set}, the \texttt{grb::dot} updates the output scalar in-place.\newline

	\textbf{grb::eWiseMul, grb::eWiseAdd} - These primitives combine two containers element-by-element, the former using element-wise multiplication, and the latter using element-wise addition. Different from \texttt{grb::set},
    these primitives are in-place, meaning the result of the element-wise operations are added to any elements already in the output container; i.e., \texttt{grb::eWiseMul} computes $z$\textit{+=}$x\odot y$, where $\odot$ denotes element-wise multiplication.
    In case of sparse vectors and an initially-empty output container, the primitives separate themselves in terms of the structure of the output vector, which is composed either of an intersection or union of the input structures.
    Since \textbf{grb} primitives are unvaware of conventional multiplication and addition operations, they are provided
    as correspond monoids from the semiring argument, the multiplicative and additive monoids.
\begin{itemize}
  \item \textbf{intersection (eWiseMul):} The primitive will compute only an element-wise multiplication for those positions where \emph{both} input containers have entries. This is since any missing entries are assumed to have value zero, and are therefore ignored under multiplication.
  \item \textbf{union (eWiseAdd):} The primitive will compute element-wise addition for those positions where \emph{any} of the input containers have entries. This is again because a missing entry in one of the containers is assumed to have a zero value, meaning the result of the addition simply equals the value of the entry present in the other container.
  Note: The \texttt{grb} primitives do not assume conventional addition or multiplication. Instead, these operations are defined by the semiring argument, which specifies the additive and multiplicative monoids to use for the computation.
\end{itemize}

    \textbf{grb::eWiseLambda} - This primitive applies a user-defined function to each element of an input container, storing the result in an output container.
    Similar to \texttt{grb::set}, this primitive is out-of-place; i.e., the output container is potentially overwritten with the results of applying the function to each element of the input container.
    The user-defined function must be a callable object (e.g., a lambda function or a functor) that comes in two versions, one for vector and one for matrix arguments.
    For vector arguments, the function must take a single argument of the vector's value type and returns no value.
    For matrix arguments, the function must take three arguments: the row index, the column index, and the value type of the matrix; it also returns no value.
    The function is applied to each element of the input container and is expected to modify the output container.
    Note that in the nonblocking backend all containers modified by the lambda must be listed in the argument list, while only the first one generates the iterations over which the lambda is applied.\newline

    \textbf{grb::eWiseApply} - This primitive applies a user-defined binary function to corresponding elements of two input containers, storing the result in an output container.
    It aims to handle all operators that do not fit into the semiring abstraction, such as minimum, maximum, logical AND, logical OR, etc.\newline

    \textbf{grb::mxv}, \textbf{grb::vxm} - Performs right- and left-handed matrix--vector multiplication; i.e., $u$\textit{+=}$Av$ and $u$\textit{+=}$vA$, respectively. More precisely, e.g., \texttt{grb::mxv} computes the standard linear algebraic operation $u_i = u_i + \sum_j A_{ij} v_j$.
    Like almost all GraphBLAS primitives, the \texttt{grb::mxv} is an in-place operation.
    If the intent is to compute $u=Av$ while $u$ is not empty, there are two solutions: 1) $u$ may cleared first (\texttt{grb::clear(u)}), or 2) $u$ may have all its values set to zero first (\texttt{grb::set(u, 0)}).\newline

    \textbf{grb::mxm} - Performs matrix--matrix multiplication; i.e., $C$\textit{+=}$AB$, or $C_{ij}=C_{ij}\textit{+=}\sum_{k}A_{ik}B_{kj}$. If, for a given $i,j$ the $i$th row of $A$ is empty or the $j$th column of $B$ is empty, no output will be appended to $C$-- that is, if $C_{ij}\notin C$, then after executing the primitive no such entry will have been added to $C$, meaning that the sparsity of $C$ is retained and the only fill-in to $C$ is due to non-zero contributions of $AB$. If in-place behaviour is not desired, $C$ must be cleared prior to calling this primitive\footnote{note that while setting $C$ to a dense matrix of all zeroes semantically also results in an out-of-place matrix--matrix multiplication, typically GraphBLAS applications should shy away from forming dense matrices due to their quadratic storage requirements}.\newline

    \textbf{grb::set} - This primitive initializes values of a container (vector or matrix) to a specified scalar value or values from another container.
    It does not allocate nor free any dynamic memory, but simply sets values in the existing container.\newline

    The interface is more flexible that what is described above, since combinations of vectors and scalars and sometimes matrices
    are supported in the single argument lists of the primitives.
    For example \texttt{grb::foldr} supports both vector-scalar and scalar-vector combinations.
    With all the above operations, the same-type containers must have matching dimensions in the linear algebraic sense -- e.g.,
    for $u=Av$, $u$ must have size equal to the number of rows in $A$ while $v$ must have size equal to the number of columns.
    If the sizes do not match, the related primitives will return \texttt{grb::MISMATCH}.
    Similarly, if for an output container the result cannot be stored due to insufficient capacity,
    \texttt{grb::ILLEGAL} will be returned. As with \texttt{grb::set},
    furthermore, all above primitives may optionally take a mask as well as take a phase (resize or execute)
    as its last argument.

    There is one final caveat. In order to support graph algorithms, GraphBLAS provides on \emph{generalised}
    linear algebraic primitives -- not just numerical linear algebraic ones.
    This is explained further in the next subsection of this tutorial.
    To perform standard numerical linear algebra, the standard semirings are predefined in ALP/GraphBLAS.
    For instance, \texttt{grb::semirings::plusTimes< T >}, where $T$ is the domain over which we wish to perform numerical
    linear algebra (usually \texttt{double} or \texttt{std::complex< double >}).
    For example, to perform a sparse matrix--vector multiplication:
    \begin{lstlisting}
    auto plusTimes = grb::semirings::plusTimes< double >();
    grb::RC rc = grb::clear( y );
    rc = rc ? rc : grb::mxv(y, A, x, plusTimes);
    \end{lstlisting}

\noindent \textbf{Exercise 8.} Copy the older \texttt{alp\_hw.cpp} to start with this exercise. Modify it to perform the following steps:
\begin{enumerate}
\item initialise a small matrix $A$ and vector $x$;
\item use \texttt{grb::set} and \texttt{grb::setElement} to assign values (see below for a suggestion);
\item perform a matrix--vector multiplication $y = Ax$ (using \texttt{grb::mxv} with the plus-times semiring);
\item compute a dot product $d = x^Tx$ (using \texttt{grb::dot} with the same semiring);
\item perform an element-wise multiplication $z = x \odot y$ (using \texttt{grb::eWiseMul} with the same semiring); and
\item print the results.
\end{enumerate}
One example $A, x$ could be:
\[
A =
\begin{bmatrix}
0 & 1 & 2 \\
0 & 3 & 4 \\
5 & 6 & 0
\end{bmatrix},\quad
\mathbf{x} = \begin{bmatrix}1\\2\\3\end{bmatrix},
\]
for which, if the exercise is implemented OK, the output would be something like:
\begin{lstlisting} [language=bash, basicstyle=\ttfamily\small, showstringspaces=false]
Step 1: Constructing a 3x3 sparse matrix A.
Step 2: Creating vector x = [1, 2, 3]^T.
Step 3: Computing y = A·x under plus‐times semiring.
Step 4: Computing z = x ⊙ y (element‐wise multiply).
Step 5: Computing dot_val = xᵀ·x under plus‐times semiring.
x = [ 1, 2, 3 ]
y = A·x = [ 8, 18, 17 ]
z = x ⊙ y = [ 8, 36, 51 ]
dot(x,x) = 14
\end{lstlisting}

\noindent \textbf{Question.} If setting all $n$ elements of a size-$n$ vector via \texttt{grb::setElement}, what is its big-Omega (lower-bound asymptotic) complexity? Is the resulting complexity reasonable for a shared-memory parallel program?

\noindent \textbf{Bonus question.} If instead those $n$ elements are in some STL-type container for which we retrieve random access iterators, and ingest those elements using \texttt{grb::buildVectorUnique} and those random access iterators -- what would a good shared-memory parallel ALP/GraphBLAS implementation achieve in terms of complexity\footnote{if derived correctly, this bound is indeed achieved for the shared-memory parallel ALP/GraphBLAS backends that may be invoked and tested by passing \texttt{-b reference\_omp} and \texttt{-b nonblocking} to \texttt{grbcxx}}?

From these questions it follows that \texttt{grb::setElement} should only ever be used to set $\Theta(1)$ elements in some container and never asymptotically more than that. If a larger data set needs to be ingested into ALP containers, then in order to guarantee scalability, always rely on (parallel) iterators and the \texttt{grb::build*} primitives!

\subsection{Semirings and Algebraic Operations}

A key feature of GraphBLAS and ALP is that operations are defined over generic algebraic structures rather than over the conventional arithmetic operations only. An example of an algebraic structure is a \emph{semiring}. Loosely speaking, a semiring formalises what we understand as standard linear algebra. Intuitively, a semiring consists of a pair of operations, an ``addition'' and a ``multiplication'', along with their identity elements. The additive and multiplicative operations may differ from standard arithmetic ($+$ and $\times$). The multiplicative operation together with its identity -- which we call $\textbf{1}$ -- forms a monoid. The additive operation together with its identity -- $\mathbf{0}$ -- forms a commutative monoid, meaning that the order of addition shall not change the result ($a+b=b+a$). Formally speaking there are more requirements to monoids and semirings that ALP is aware of and exploits -- however, for the purposes of this tutorial, the preceding intuitive description suffices.

GraphBLAS allows using different semirings to, for example, perform computations like shortest paths or logical operations by substituting the plus or times operations with min, max, logical OR/AND, and so on. This is usually done structurally, by replacing the plus-times semirings with another, such as e.g.\ a min-plus semiring to compute shortest paths. This is why, as you saw in the previous subsection, GraphBLAS primitives explicitly take algebraic structures as a mandatory argument. ALP additionally allows expressing algebraic structures as C++ types, and introduces an algebraic type trait system:

\begin{itemize}
  \item \textbf{commutative monoids:} an associative and commutative operation with an identity element. Examples:
  \begin{itemize}
    \item \texttt{(+, 0)} — the usual addition over numbers:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Monoid< grb::operators::add< T >, grb::identities::zero >
\end{lstlisting}
or, more simply,
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::monoids::plus< T >
\end{lstlisting}
    \item \texttt{(min, $\infty$)} — useful for computing minima:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Monoid< grb::operators::min< T >, grb::identities::infinity > // or simply
grb::monoids::min< T >
\end{lstlisting}
    \item \texttt{($\land$, true)} — logical semiring for Boolean algebra:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Monoid< grb::operators::logical_and >, grb::identities::logical_true > // or
grb::monoids:land< T >
\end{lstlisting}
  \end{itemize}
For all the above examples, the following \emph{type traits} are \texttt{true}:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::is_monoid< grb::monoid::plus< T > >::value
grb::is_commutative< grb::monoid::min< T > >::value
// and so on
\end{lstlisting}

%  \item \textbf{general monoids:} an operation with an identity element. Example:
%  \begin{itemize}
%    \item \texttt{($\times$, 1)} — standard multiplication:
%\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
%grb::Monoid< grb::operators::mul< T >, grb::identities::one > // or
%grb::monoids::times< T >
%\end{lstlisting}
   \item \textbf{general binary operators}: binary operators without any additional structure. Examples:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::operators::mul< double >    // f(x, y) = x * y
grb::operators::subtract< int >  // f(i, j) = i - j
grb::operators::zip< char, int > // f(c, i) = {c, i}, an std::pair< char, int >
\end{lstlisting}
These too define algebraic type traits:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::is_operator< OP >::value // true for all above operators
grb::is_monoid< OP >::value   // false for all above operators
grb::is_associative< grb::operators::mul< T > >::value    // true
grb::is_commutative< grb::operators::subtract< T >::value // false
// ...
\end{lstlisting}
    \item \textbf{semirings}: simplified, a commutative ``additive'' monoid combined with a ``multiplicative'' monoid. Examples:
	\begin{itemize}
		\item the plus-times semiring: standard linear algebra
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Semiring<
    grb::operators::add< T >, grb::operators::mul< T >,
    grb::identities::zero, grb::identities::one
> // or
grb::semirings::plusTimes< T >
\end{lstlisting}
		\item the Boolean semiring: perhaps the second most common semiring
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Semiring<
    grb::operators::logical_or< bool >, grb::operators::logical_and< bool >,
    grb::identities::logical_false, grb::identities::logical_true
> // or
grb::semirings::boolean
\end{lstlisting}
		\item the min-plus semiring: useful for, e.g., shortest paths
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::Semiring<
    grb::operators::min< T >, grb::operators::add< T >,
    grb::identities::infinity, grb::identities::zero
> // or
grb::semirings:::minPlus< T >
\end{lstlisting}
	\end{itemize}
	For all of the above semirings, we have the following:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::is_semiring< S >::value // true
grb::is_monoid< S >::value   // false
grb::is_operator< S >::value // false
\end{lstlisting}
We may furthermore extract the additive and multiplicative monoids and operators from semirings:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false ]
grb::semirings::boolean mySemiring;
auto myAddM  = mySemiring.getAdditiveMonoid();
auto myMulM  = mySemiring.getMultiplicativeMonoid();
auto myAddOp = mySemiring.getAdditiveOperator();
auto myMulOp = myMulM.getOperator();
\end{lstlisting}
Finally, we may also extract and instantiate identities from semirings:
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false, morekeywords=constexpr ]
grb::semirings::plusTimes< int > mySemiring;
constexpr int myZero = mySemiring.template getZero< int >();
double castFromZero = mySemiring.template getZero< double >();
constexpr int myOne = mySemiring.getMultiplicativeMonoid().template getIdentity();
\end{lstlisting}
\end{itemize}

The design of ALP allows the definition of custom operators and identities, and therefore allows a virtually unlimited variety of algebraic structures. The most useful operators, monoids, and semirings are predefined in the namespaces \texttt{grb::operators}, \texttt{grb::monoids}, and \texttt{grb::semirings}, respectively. Let us now employ these algebraic structures to show-case how these amplify the expressiveness of standard linear algebraic primitives.\vspace{.5\baselineskip}

\noindent \textbf{Exercise 9} (warm-up): given a \texttt{grb::Vector< double > x}, write an ALP/GraphBLAS function that computes the squared 2-norm of $x$ (i.e., compute $\sum_i x_i^2$) using \texttt{grb::dot}. \textbf{Hint:} consider starting from the code of exercise 8. \textbf{Question}: which semiring applies here?\vspace{.5\baselineskip}

\noindent \textbf{Exercise 10}: ALP and GraphBLAS allow for the use of \emph{improper} semirings -- semirings that are mathematically not true semirings but are still useful in practice. In ALP, improper semirings are made up of 1) an commutative ``additive'' monoid and 2) \emph{any} binary ``multiplicative'' operator. All primitives that take a semiring may also take a pair of such an additive monoid and multiplicative operator-- for example, \texttt{grb::dot( alpha, x, y, plusTimes )} is semantically equivalent to
\begin{lstlisting} [language=C++, basicstyle=\ttfamily\small, showstringspaces=false, morekeywords=constexpr ]
grb::dot( alpha, x, y, plusTimes.getAdditiveMonoid(), plusTimes.getMultiplicativeOperator() );
\end{lstlisting}
Take note of \texttt{grb::operators::abs\_diff< double >}. What does this binary operator compute? Use this operator and the notion of improper semirings to compute the $1$-norm difference between two vectors $x$ and $y$ using a single call to \texttt{grb::dot}. \textbf{Hint:} start off from the code from the previous exercise.\vspace{.5\baselineskip}

\noindent \textbf{Exercise 11}: consider a square matrix \texttt{grb::Matrix< double > G} the matrix representation of an edge-weighted graph $G$. Consider \texttt{grb::Vector< double > s} (source) a vector of matching dimension to $G$ with a single value zero ($0$) at an index of your choosing. \textbf{Question}: normally, under a plusTimes semiring, $Gs$ would return a zero vector. However, what interpretation would $Gs$ have under the minPlus semiring? Use this interpretation to compute the shortest path to any other vertex  reachable from your chosen source. Then, extend the approach to compute the shortest path two hops away from your chosen source.\vspace{.25\baselineskip}

\textbf{Bonus question}: what interpretation could $G^ks$ have under the maxTimes semiring?

\section{Solution to Exercise 8}\label{sec:simple_example}

Below a sample solution code for this example, with commentary:\footnote{Maybe better to move this to the solutions directory on the tutorial repository? (TODO)}

\begin{lstlisting}[ language=C++, basicstyle=\ttfamily\small, caption={Example program using ALP/GraphBLAS primitives in C++}, label=lst:example, showstringspaces=false]

/*
 * example.cpp - Corrected minimal ALP (GraphBLAS) example.
 *
 * To compile (using the reference OpenMP backend):
 *    grbcxx -b reference_omp example.cpp -o example
 *
 * To run:
 *    grbrun ./example
 */

#include <cstdio>
#include <iostream>
#include <vector>
#include <utility>   // for std::pair
#include <array>

#include <graphblas.hpp>

using namespace grb;

// Indices and values for our sparse 3x3 matrix A:
//
//      A = [ 1   0   2 ]
//          [ 0   3   4 ]
//          [ 5   6   0 ]
//
// We store the nonzero entries via buildMatrixUnique.
static const size_t Iidx[6]    = { 0, 0, 1, 1, 2, 2 };  // row indices
static const size_t Jidx[6]    = { 0, 2, 1, 2, 0, 1 };  // column indices
static const double Avalues[6] = { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0 };

int main( int argc, char **argv ) {
    (void)argc;
    (void)argv;
    std::printf("example (ALP/GraphBLAS) API usage\n\n");

    //------------------------------
    // 1) Create a 3x3 sparse matrix A
    //------------------------------
    std::printf("Step 1: Constructing a 3x3 sparse matrix A.\n");
    Matrix<double> A(3, 3);
    // Reserve space for 6 nonzeros
    resize(A, 6);
    // Populate A from (Iidx,Jidx,Avalues)
    buildMatrixUnique(
        A,
        &(Iidx[0]),
        &(Jidx[0]),
        Avalues,
        /* nvals = */ 6,
        SEQUENTIAL
    );

    //------------------------------
    // 2) Create a 3-element vector x and initialize x = [1, 2, 3]^T
    //------------------------------
    std::printf("Step 2: Creating vector x = [1, 2, 3]^T.\n");
    Vector<double> x(3);
    set<descriptors::no_operation>(x, 0.0);           // zero-out
    setElement<descriptors::no_operation>(x, 1.0, 0); // x(0) = 1.0
    setElement<descriptors::no_operation>(x, 2.0, 1); // x(1) = 2.0
    setElement<descriptors::no_operation>(x, 3.0, 2); // x(2) = 3.0
    // note: setElement() supports vectors only
    //       set() support for matrix is work in progress

    //------------------------------
    // 3) Create two result vectors y and z (dimension 3)
    //------------------------------
    Vector<double> y(3), z(3);
    set<descriptors::no_operation>(y, 0.0);
    set<descriptors::no_operation>(z, 0.0);

    //------------------------------
    // 4) Use the built-in “plusTimes” semiring alias
    //      (add = plus, multiply = times, id‐add = 0.0, id-mul = 1.0)
    //------------------------------
    auto plusTimes = grb::semirings::plusTimes<double>();

    //------------------------------
    // 5) Compute y = A·x  (matrix‐vector multiply under plus‐times)
    //------------------------------
    std::printf("Step 3: Computing y = A·x under plus‐times semiring.\n");
    {
        RC rc = mxv<descriptors::no_operation>(y, A, x, plusTimes);
        if(rc != SUCCESS) {
            std::cerr << "Error: mxv(y,A,x) failed with code " << toString(rc) << "\n";
            return (int)rc;
        }
    }

    //------------------------------
    // 6) Compute z = x ⊙ y  (element‐wise multiply) via eWiseMul with semiring
    //------------------------------
    std::printf("Step 4: Computing z = x ⊙ y (element‐wise multiply).\n");
    {
        RC rc = eWiseMul<descriptors::no_operation>(
            z, x, y, plusTimes
        );
        if(rc != SUCCESS) {
            std::cerr << "Error: eWiseMul(z,x,y,plusTimes) failed with code " << toString(rc) << "\n";
            return (int)rc;
        }
    }

    //------------------------------
    // 7) Compute dot_val = xᵀ·x  (dot‐product under plus‐times semiring)
    //------------------------------
    std::printf("Step 5: Computing dot_val = xᵀ·x under plus‐times semiring.\n");
    double dot_val = 0.0;
    {
        RC rc = dot<descriptors::no_operation>(dot_val, x, x, plusTimes);
        if(rc != SUCCESS) {
            std::cerr << "Error: dot(x,x) failed with code " << toString(rc) << "\n";
            return (int)rc;
        }
    }

    //------------------------------
    // 8) Print x, y, z, and dot_val
    //    We reconstruct each full 3 - vector by filling an std::array<3,double>
    //------------------------------
    auto printVector = [&](const Vector<double> &v, const std::string &name) {
        // Initialize all entries to zero
        std::array<double,3> arr = { 0.0, 0.0, 0.0 };
        // Overwrite stored (nonzero) entries
        for(const auto &pair : v) {
            // pair.first = index, pair.second = value
            arr[pair.first] = pair.second;
        }
        // Print
        std::printf("%s = [ ", name.c_str());
        for(size_t i = 0; i < 3; ++i) {
            std::printf("%g", arr[i]);
            if(i + 1 < 3) std::printf(", ");
        }
        std::printf(" ]\n");
    };

    std::printf("\n-- Results --\n");
    printVector(x, "x");
    printVector(y, "y = A·x");
    printVector(z, "z = x ⊙ y");
    std::printf("dot(x,x) = %g\n\n", dot_val);

    return EXIT_SUCCESS;
}
\end{lstlisting}

The expected output is (bash)

\begin{lstlisting}[language=bash]

example (ALP/GraphBLAS) corrected API usage

Step 1: Constructing a 3x3 sparse matrix A.
Step 2: Creating vector x = [1, 2, 3]^T.
Step 3: Computing y = A·x under plus‐times semiring.
Step 4: Computing z = x ⊙ y (element‐wise multiply).
Step 5: Computing dot_val = xᵀ·x under plus‐times semiring.

-- Results --
x = [ 1, 2, 3 ]
y = A·x = [ 7, 18, 17 ]
z = x ⊙ y = [ 7, 36, 51 ]
dot(x,x) = 14
$
\end{lstlisting}

\subsection{Makefile and CMake Instructions}\label{sec:build_instructions}

Finally, we provide guidance on compiling and running the above example in your own development environment. If you followed the installation steps and used \texttt{grbcxx}, compilation is straightforward. Here we outline two approaches: using the ALP wrapper scripts, and integrating ALP manually via a build system.

\subsubsection*{Using the ALP compiler wrapper}

The simplest way to compile your ALP-based program is to use the provided wrapper. After sourcing the ALP environment (setenv script), the commands \texttt{grbcxx} and \texttt{grbrun} are available in your PATH. You can compile the example above by saving it (e.g. as \texttt{example.cpp}) and running:
\begin{lstlisting}[language=bash]
$ grbcxx example.cpp -o example
\end{lstlisting}
This will automatically invoke \texttt{g++} with the correct include directories and link against the ALP library and its required libraries (NUMA, pthread, etc.). By default, it uses the sequential backend. To enable parallel execution with OpenMP, add the flag \texttt{-b reference\_omp} (for shared-memory parallelism). For instance:


\begin{lstlisting}[language=bash]
$ grbcxx -b reference_omp example.cpp -o example
\end{lstlisting}
After compilation, run the program with:
\begin{lstlisting}[language=bash]
$ grbrun ./example
\end{lstlisting}

(You can also run \texttt{./example} directly for a non-distributed run; \texttt{grbrun} is mainly needed for orchestrating distributed runs or setting up the execution environment.)

\subsubsection*{Using a custom build (Make/CMake)}

If you prefer to compile without the wrapper (for integration into an existing project or custom build system), you need to instruct your compiler to include ALP's headers and link against the ALP library and dependencies. The ALP installation (at the chosen \texttt{--prefix}) provides an include directory and a library directory.

For example, if ALP is installed in \texttt{../install} as above, you could compile the program manually with:
\begin{lstlisting}[language=bash]
$ g++ -std=c++17 example.cpp
-I../install/include -D_GRB_WITH_REFERENCE -D_GRB_WITH_OMP -L../install/lib/sequential
-lgraphblas -lnuma -lpthread -lm -fopenmp -o example
\end{lstlisting}
Here we specify the include path for ALP headers and link with the ALP GraphBLAS library (assumed to be named \texttt{libgraphblas}) as well as libnuma, libpthread, libm (math), and OpenMP (the \texttt{-fopenmp} flag). These additional libraries are required by ALP (as noted in the install documentation). Using this command (or a corresponding Makefile rule) will produce the executable \texttt{example}.

If you are using CMake for your own project, you can integrate ALP as follows. There may not be an official CMake package for ALP, but you can use \texttt{find\_library} or hard-code the path. For instance, in your \texttt{CMakeLists.txt}:

\begin{lstlisting}[caption={Example CMakeLists.txt for an ALP project}]
    cmake_minimum_required(VERSION 3.13)
    project(ALPExample CXX)
    find_package(OpenMP REQUIRED) # find OpenMP for -fopenmp
    set(ALP_PREFIX "../install") # Adjust ../install to your ALP installation prefix
    include_directories("${ALP_PREFIX}/include")
    link_directories("${ALP_PREFIX}/lib" "${ALP_PREFIX}/lib/sequential")
    add_executable(example example.cpp)
    target_compile_options(example PRIVATE -O3 -march=native -std=c++17 -D_GRB_WITH_REFERENCE -D_GRB_WITH_OMP)
    target_link_libraries(example graphblas numa pthread m OpenMP::OpenMP_CXX)
\end{lstlisting}

This will ensure the compiler knows where to find \texttt{graphblas.hpp} and links the required libraries. After configuring with CMake and building (via \texttt{make}), you can run the program as usual.

\vspace{1ex}

Note: If ALP provides a CMake package file, you could use \texttt{find\_package} for ALP, but at the time of writing, linking manually as above is the general approach. Always make sure the library names and paths correspond to your installation. The ALP documentation mentions that programs should link with \texttt{-lnuma}, \texttt{-lpthread}, \texttt{-lm}, and OpenMP flags in addition to the ALP library.

\bigskip

This tutorial has introduced the fundamentals of using ALP/GraphBLAS in C++ on Linux, from installation to running a basic example. With ALP set up, you can explore more complex graph algorithms and linear algebra operations, confident that the library will handle parallelization and optimization under the hood. Happy coding!