This project demonstrates optimization techniques for computationally intensive tasks using SIMD (Single Instruction Multiple Data) and MPI (Message Passing Interface) parallelization. The project consists of two main components:
- SIMD Optimization - Matrix multiplication optimization using SIMD instructions
- MPI Optimization - Progressive sequence alignment algorithm using MPI for distributed computing
- C++ compiler with C++17 support
- SIMD support (ARM NEON or Apple-specific optimizations)
- MPI library (for MPI component)
- Python 3.x with pandas, matplotlib, and seaborn (for visualization)
- FFTW library (for FFT computations)
This component implements and benchmarks different matrix multiplication algorithms:
scalar1D- Basic single-threaded implementation with 1D memory layoutscalar2D- Basic single-threaded implementation with 2D memory layoutneon- SIMD-optimized implementation using ARM NEON instructionsapple- Apple-specific optimized implementation
cd SIMD_Optimization/scripts
./run.shThis script will:
- Compile the code
- Run benchmarks for various matrix sizes (4 to 16384)
- Generate raw and average results in CSV format
- Clean up temporary files
This component implements a progressive sequence alignment algorithm used in bioinformatics:
- Reads sequences from FASTA files
- Computes a distance matrix using FFT-based correlation
- Builds a guide tree for alignment
- Performs progressive alignment
cd MPI_Optimization/scripts
./serial_run.shcd MPI_Optimization/scripts
./mpi_run.shThe project includes comprehensive benchmarking scripts that:
- Measure execution time across different implementations
- Test with varying problem sizes (matrix dimensions or sequence lengths)
- Compare performance across different numbers of processes (for MPI)
- Save results to CSV files for analysis
Python scripts in each component's scripts directory generate visualizations:
cd SIMD_Optimization/scripts
python plot.pyGenerates:
- Performance comparison plots with execution time vs. matrix size
- Speedup comparison relative to baseline implementation
cd MPI_Optimization/scripts
python serial_plot.py # For serial performance
python mpi_plot.py # For MPI scaling performanceGenerates:
- Serial execution time per dataset
- MPI scaling performance across different numbers of processes
- The SIMD component uses ARM NEON and Apple SIMD intrinsics for vectorized operations
- The MPI component distributes computation across multiple processes
- Both components include careful performance measurement and analysis tools