Sorting Algorithms Benchmarking(DAA Practical Assignment)

Summary

This application benchmarks seven different sorting algorithms and three QuickSort pivot variants to evaluate their performance across various input sizes and data distributions.

1. Project Structure

Sorting-algorithms-benchmark/

scripts/
• benchmark.py – Main benchmarking script

sorting_algorithms/
• C implementations of all sorting algorithms

test_data/
• Generated datasets for benchmarking

executables/
• Compiled C programs (generated automatically)

graphs/
• Generated plots and CSV results

graphs/7_sorting_algo_comparisons/
• Graphs comparing seven sorting algorithms

graphs/quick_sort_analysis/
• Graphs comparing QuickSort pivot strategies

graphs/csv_data/
• CSV tables containing benchmark results

2. Algorithms Benchmarked

The benchmarking process includes the following sorting algorithms:

Bubble Sort – Simple comparison-based sorting algorithm.
Selection Sort – Selects the minimum element repeatedly.
Insertion Sort – Efficient for small or nearly sorted datasets.
Merge Sort – Divide-and-conquer sorting algorithm with O(n log n) complexity.
Heap Sort – Comparison-based algorithm using a binary heap.
Quick Sort (Median-of-Three Pivot) – Optimized QuickSort variant.
Radix Sort – Non-comparative integer sorting algorithm.

Additionally, three QuickSort pivot strategies are analyzed separately:

Quick Sort (First Element Pivot)
Quick Sort (Random Pivot)
Quick Sort (Median-of-Three Pivot)

3. Benchmarking Methodology

The experimental setup is designed to provide stable and comparable performance measurements for the selected sorting algorithms.

3.1. Machine Specifications

Operating System: Windows 11
Model: DELL G15 5530
Processor: Intel Core i7-13650H
Graphics Card: NVIDIA RTX 4050 Studio

3.2. Timing Mechanism

The execution time for each sorting algorithm is measured inside the C programs using the clock() function from <time.h>.

This approach ensures that the timing reflects only the execution time of the sorting algorithm itself, avoiding overhead from Python subprocess execution. The Python script is responsible only for compiling the C programs, running the experiments, collecting results, and generating graphs.

3.3. Number of Experiment Repetitions

Each sorting experiment (a specific algorithm run on a particular input size and type) is repeated 7 times. The purpose of multiple repetitions is to minimize the impact of transient system loads, background processes, and other environmental factors that could introduce variability in single measurements.

3.4. Reported Times

For each experiment, the average execution time across the 7 repetitions is reported. Times are presented in seconds (s), formatted to six decimal places for precision. In cases where an algorithm fails to complete (e.g., due to stack overflow for certain Quick Sort variants on large, pathological inputs), "Crashed/Timeout" is reported.

3.5. Input Selection

The test data used for benchmarking is pre-generated by the generate_test_data.py script. The inputs are selected to cover a range of sizes and initial orderings to evaluate best-case, worst-case, and average-case performance:

Input Sizes (N): The following input sizes are tested: 100, 500, 1,000, 5,000, 10,000, 25,000, 50,000, 75,000, and 100,000 elements.
Input Types: For each input size, three distinct types of data are generated:
- Random: A randomly ordered array of integers. This serves to evaluate the average-case performance of the algorithms.
- Sorted: An array of integers already sorted in ascending order. This is used to evaluate best-case performance for some algorithms (e.g., Insertion Sort, Merge Sort) and worst-case performance for others (e.g., Quick Sort with naive pivot selection).
- Reverse Sorted: An array of integers sorted in descending order. This typically represents a worst-case scenario for many comparison-based sorting algorithms.

3.6. Consistency of Inputs

The exact same set of input data files is used for all sorting algorithms. This ensures a fair and direct comparison of their performance characteristics under identical conditions. Each C sorting program is modified to read its input from standard input (stdin), allowing the Python benchmarking script to feed the exact same data to each algorithm.

4. Sorting Algorithms Details

Bubble Sort

Bubble Sort is a simple comparison-based sorting algorithm. It repeatedly steps through the list, compares adjacent elements and swaps them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted.

Properties

Data structure: Array

Performance	Comparison
Worst-case performance	O(n^2) comparisons, O(n^2) swaps
Best-case performance	O(n) comparisons, O(1) swaps (already sorted)
Average performance	O(n^2) comparisons, O(n^2) swaps
Worst-case space complexity	O(1) auxiliary

Selection Sort

The algorithm divides the input list into two parts: a sorted sublist of items which is built up from left to right at the front (left) of the list and a sublist of the remaining unsorted items that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.

Properties

Data structure: Array

Performance	Comparison
Worst-case performance	O(n^2) comparisons, O(n) swaps
Best-case performance	O(n^2) comparisons, O(n) swaps
Average performance	O(n^2) comparisons, O(n) swaps
Worst-case space complexity	O(1) auxiliary

Insertion Sort

Insertion Sort is a simple sorting algorithm that builds the final sorted array (or list) one item at a time. It is much less efficient on large lists than more advanced algorithms such as quicksort, heapsort, or merge sort. However, it has several advantages: simple implementation, efficient for small data sets, and efficient for data sets that are already substantially sorted.

Properties

Data structure: Array

Performance	Comparison
Worst-case performance	O(n^2) comparisons, O(n^2) shifts
Best-case performance	O(n) comparisons, O(1) shifts (already sorted)
Average performance	O(n^2) comparisons, O(n^2) shifts
Worst-case space complexity	O(1) auxiliary

Merge Sort

Merge Sort is an efficient, comparison-based, divide and conquer sorting algorithm. It works by dividing an unsorted list into n sublists, each containing one element (a list of one element is considered sorted). Then, it repeatedly merges sublists to produce new sorted sublists until there is only one sorted sublist remaining.

Properties

Data structure: Array (typically implemented with arrays, but can be linked lists)

Performance	Comparison
Worst-case performance	O(n log n)
Best-case performance	O(n log n)
Average performance	O(n log n)
Worst-case space complexity	O(n) auxiliary (for array implementations)

Quicksort

Quicksort is a divide-and-conquer algorithm. It works by selecting a 'pivot' element from the array and partitioning the other elements into two sub-arrays, according to whether they are less than or greater than the pivot. The sub-arrays are then sorted recursively. This particular implementation uses the last element as the pivot. This can be done in-place, requiring small additional amounts of memory to perform the sorting.

Properties

Data structure: Array

Performance	Comparison
Worst-case performance	O(n^2) (e.g., with sorted/reverse-sorted input and naive pivot)
Best-case performance	O(n log n)
Average performance	O(n log n)
Worst-case space complexity	O(log n) auxiliary (for recursion stack)

Radix Sort

Radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix (base). For elements with more than one significant digit, this bucketing process is repeated for each digit, while preserving the ordering of the prior step, until all digits have been considered. For this reason, radix sort has a time complexity of O(nk), where n is the number of elements and k is the number of digits (or passes).

Properties

Data structure: Array

Performance	Comparison
Worst-case performance	O(nk)
Best-case performance	O(nk)
Average performance	O(nk)
Worst-case space complexity	O(n + k) auxiliary (depending on bucket implementation)

5. Launch and Usage

Requirements

Python 3.x (preferably 3.9 or higher)
matplotlib Python library (pip install matplotlib)
pandas Python library (pip install pandas)
GCC Compiler (or compatible C compiler) for compiling the C sorting algorithms.

How to Run the Benchmark

Clone the repository:

git clone https://github.com/your-repo/Sorting-algorithms-benchmark.git # Replace with actual repo URL
cd Sorting-algorithms-benchmark

Install Python dependencies:
```
pip install -r requirements.txt
```
Generate Test Data (if not already present):
```
python generate_test_data.py
```
This will create the test_data/ directory and populate it with input files.
Run the Benchmarking Script:
```
python benchmark.py
```
This script will:
- Compile the C sorting algorithms into executables in the executables/ directory.
- Run each algorithm against the generated test data multiple times.
- Print the benchmarking progress and results to the console.
- Generate performance plots in the graphs/ directory.
- Generate a detailed results table (CSV) in the graphs/ directory.

Benchmark Outputs

After running the benchmark, the following outputs are generated:

• Execution time graphs for all sorting algorithms
• Comparison count graphs
• QuickSort pivot comparison graphs
• Correlation analysis between comparisons and execution time
• CSV tables containing all benchmark results

These outputs are stored inside the graphs/ directory.

6. Contact Information

For any inquiries, please contact: kowndinyamarepalli2007@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
graphs		graphs
sorting_algorithms		sorting_algorithms
test_data		test_data
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
SUBMISSION_README.md		SUBMISSION_README.md
benchmark.py		benchmark.py
generate_test_data.py		generate_test_data.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Sorting Algorithms Benchmarking(DAA Practical Assignment)

Summary

Table of Contents

1. Project Structure

2. Algorithms Benchmarked

3. Benchmarking Methodology

3.1. Machine Specifications

3.2. Timing Mechanism

3.3. Number of Experiment Repetitions

3.4. Reported Times

3.5. Input Selection

3.6. Consistency of Inputs

4. Sorting Algorithms Details

Bubble Sort

Properties

Selection Sort

Properties

Insertion Sort

Properties

Merge Sort

Properties

Quicksort

Properties

Radix Sort

Properties

5. Launch and Usage

Requirements

How to Run the Benchmark

Benchmark Outputs

6. Contact Information

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages