Deeplearning/deep_learning_excercise.md at main · RahulAloth/Deeplearning

Parallelizing Daxpy and Initialization

In this notebook you will parallelize the initialize and daxpy functions to compute the results in parallel using CPUs or GPUs.

/* // Initialize vectors x and y: parallel algorithm version void initialize(std::vector &x, std::vector &y) { assert(x.size() == y.size());

// Parallelize initialization of x using iota view + for_each_n auto ints = std::views::iota(0); std::for_each_n(std::execution::par, ints.begin(), x.size(), [&](int i) { x[i] = static_cast(i); });

// Parallelize initialization of y std::fill_n(std::execution::par, y.begin(), y.size(), 2.0);

}

// DAXPY: AX + Y: parallel algorithm version void daxpy(double a, const std::vector &x, std::vector &y) { assert(x.size() == y.size());

std::transform(std::execution::par, x.begin(), x.end(), // input range y.begin(), // second input range y.begin(), // output range [&](double xi, double yi) { return a * xi + yi; });

} */ Compile and Run

Compiling with support for the parallel algorithms requires:

g++ and clang++: link against Intel TBB with -ltbb
nvc++: compile and link with -stdpar flag:
    -stdpar=multicore runs parallel algorithms on CPUs
    -stdpar=gpu runs parallel algorithms on GPUs, further -gpu= flags control the GPU target
    See the Parallel Algorithms Documentation.

The example compiles, runs, and produces correct results as provided. Parallelize it using the C++ standard library parallel algorithms and ensure that the results are still correct. You should see a drastic performance increase when running the program on the GPU (see the solution below if necessary).

The first 3 of the following blocks compile and run the program using different compilers on the CPU.

The last block compiles and runs the program on the GPU. If you get an error, make sure that the lambda captures are capturing scalars by value, and that when capturing a vector to access its data, one captures a pointer to its data by value as well using [x = x.data()].

!g++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise3.cpp -ltbb !./daxpy 1000000

!clang++ -std=c++20 -Ofast -march=native -DNDEBUG -o daxpy exercise3.cpp -ltbb !./daxpy 1000000

!nvc++ -stdpar=multicore -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise3.cpp !./daxpy 1000000

!nvc++ -stdpar=gpu -std=c++20 -O4 -fast -march=native -Mllvm-fast -DNDEBUG -o daxpy exercise3.cpp !./daxpy 1000000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FilesExpand file tree

deep_learning_excercise.md

Latest commit

History

deep_learning_excercise.md

File metadata and controls