Analyzing sorting algorithms using Big-O Notation.
This mini project chooses and analyzes 5 sorting algorithms using graphs generated by python libraries and tests coverages. The idea is to identify which is the best sorting algorithm and understanding why.
In here we will take a look also at different coding components such as algorithms, constants, data generators and execution time gathering.
You will find a src file that contains the main modules, including the sorting algorithms chosen (bubble sort, selection sort, insertion sort, merge sort and quick sort), the random generator of data and more.
In order to compare all the sorting techniques at the same time, only 1 graph is displayed for all the running times of the algorithms.
The src folder contains the following modules:
-
run.py: Entry point of the project. It coordinates the execution of the experiments by selecting the input type, defining the size range, and calling the performance analysis functions. It also triggers the visualization of the results.
-
sorting.py: Contains the implementations of the sorting algorithms analyzed in this project: Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, and Quick Sort. Each function returns a sorted copy of the input array.
-
data_generator.py: Provides functions to generate different types of input arrays, including random, sorted, reverse sorted, and nearly sorted lists. These are used to evaluate algorithm behavior under different data distributions.
-
execution_time_gathering.py: Responsible for benchmarking the sorting algorithms. It measures execution times across multiple input sizes and samples, returning the median execution time for each algorithm.
-
plot_results.py: Handles visualization of the experimental results. It converts collected timing data into graphs that compare the performance of the sorting algorithms.
Also, the docs folder contains images used here.
The tests folder contains the testing modules for the different cases.
This problem has the following characteristics:
Best Case: O(n)
Average Case: O(n^2)
Worst Case: O(n^2)
Space: O(1)
Stable: Yes
Now in order to understand how it works and why it's called that way, we must imagine the numbers we are trying to sort are "bubbling" until the end of the array. This algorithm compares by pairs and exchanges its position if the item from the left is bigger than the one on the right, and so on and so forth.
Initial array: [5, 3, 8, 2]
Exchange 1 (5 y 3): [3, 5, 8, 2]
Exchange 2 (8 y 2): [3, 5, 2, 8]
Exchange 3 (5 y 2): [3, 2, 5, 8]
Exchange 4 (3 y 2): [2, 3, 5, 8]
Sorted array: [2, 3, 5, 8]
This problem has the following characteristics:
Best Case: O(n^2)
Average Case: O(n^2)
Worst Case: O(n^2)
Space: O(1)
Stable: No
This algorithm gors through the entire array and searches for the smallest number and brings it to the beginning of the array. This process repeats itself until the whole array is sorted.
Initial list:
[5, 3, 8, 2]
Pass 1:
Minimum found: 2
Swap 5 and 2
[2, 3, 8, 5]
Pass 2:
Minimum found: 3
No swap needed
[2, 3, 8, 5]
Pass 3:
Minimum found: 5
Swap 8 and 5
[2, 3, 5, 8]
Pass 4:
Minimum found: 8
No swap needed
[2, 3, 5, 8]
Sorted list:
[2, 3, 5, 8]
This problem has the following characteristics:
Insertion Sort
Best Case: O(n)
Average Case: O(n^2)
Worst Case: O(n^2)
Space: O(1)
Stable: Yes
This algorithm represents the action of ordering a deck of cards, where you take each of the cards and place them in the correct place. The same happens here, as the numbers are being placed one by one in their correct position, starting from the second element of the list, assuming that the first element is already sorted.
At each step, the algorithm selects the current element (called the key) and compares it with the elements before it. If the previous elements are greater than the key, they are shifted one position to the right to make space. Once the correct position is found, the key is inserted there.
This process continues until all elements have been examined and inserted into their proper position, resulting in a fully sorted list.
Initial list: [5, 3, 8, 2]
Pass 1:
Key = 3
Compare 3 with 5
Shift 5 to the right
[5, 5, 8, 2]
Insert 3
[3, 5, 8, 2]
Pass 2:
Key = 8
Compare 8 with 5
No shift needed
[3, 5, 8, 2]
Pass 3:
Key = 2
Compare 2 with 8
Shift 8 to the right
[3, 5, 8, 8]
Compare 2 with 5
Shift 5 to the right
[3, 5, 5, 8]
Compare 2 with 3
Shift 3 to the right
[3, 3, 5, 8]
Insert 2
[2, 3, 5, 8]
Sorted list:
[2, 3, 5, 8]
This problem has the following characteristics:
Merge Sort
Best Case: O(n log n)
Average Case: O(n log n)
Worst Case: O(n log n)
Space: O(n)
Stable: Yes
This one is one of the most used algorithms. It uses the technique called Divide and Conquer, which is based on dividing the whole array in 2 equal parts, and those 2 equal parts are divided in half, and so on and do forth.
This process also uses recurssion as it divides the blocks of arrays until having lots of blocks of length 1, and it is used for larger arrays compared to the previous sorting methods.
Initial list:
[5, 3, 8, 2]
Step 1: Divide
Split into two halves
[5, 3] [8, 2]
Step 2: Divide again
Split each half
[5] [3] [8] [2]
Step 3: Merge sorted pairs
Merge [5] and [3]
Compare 5 and 3
Take 3
Then take 5
Result:
[3, 5]
Merge [8] and [2]
Compare 8 and 2
Take 2
Then take 8
Result:
[2, 8]
Step 4: Final merge
Merge [3, 5] and [2, 8]
Compare 3 and 2
Take 2
Compare 3 and 8
Take 3
Compare 5 and 8
Take 5
Take remaining element 8
Result:
[2, 3, 5, 8]
Sorted list:
[2, 3, 5, 8]
This problem has the following characteristics:
Quick Sort
Best Case: O(n log n)
Average Case: O(n log n)
Worst Case: O(n^2) when the pivot is bad
Space: O(log n) using recursion
Stable: No
This algorith chooses a pivot as a reference to determine the numbers smaller and bigger in terms of the pivot. This process is done with recurssion. In this case, the secret relies on the pivot chosen, where is it normally chosen randomly as it reduces the chance of selecting the worst case pivots.
The most common method used to select the pivot is called median-of-three:
Take first, middle, last.
Use the median of those three as pivot.
This usually produces good splits.
Initial list
[5, 3, 8, 2]
Step 1:
Pivot = 5
Elements less than 5:
[3, 2]
Elements greater than 5:
[8]
Step 2: Sort left part [3, 2]
Pivot = 3
Elements less than 3:
[2]
Elements greater than 3:
[]
Sorted left part:
[2, 3]
Step 3: Sort right part [8]
Single element
Already sorted:
[8]
Step 4: Combine
Left part:
[2, 3]
Pivot: 5
Right part:
[8]
Final result:
[2, 3, 5, 8]
Sorted list:
[2, 3, 5, 8]
After adding the different modules in the src folder, the program was ran from the cmd with the following code:
py -m src.run.
The following graphs appeared.
Random data
Sorted data
Reverse sorted data
Nearly Sorted data
It is possible to conclude that the best sorting algorithm was the Quick Sort followed by Merge Sort, with a very low execution time compared to the rest of the sorting methods.
The slowest one was the Bubble sort as it constantly compares the list of elements by pairs, which turns out to be very inefficient with large arrays.
This means that the size of the arrays randomly generated have a max. size of 1000. There were 10 different input sizes, starting from 100, then 200... up to 1000.
The quadratic algorithms (Bubble, Selection, Insertion in average case) show a visibly accelerating growth as input size increases.
The O(n log n) algorithms (Merge and Quick) grow much more slowly.The difference becomes more significant as n approaches 1000. Also, the gap between O(n^2) and O(n log n) increases non-linearly.
The experimental results confirm the theoretical time complexities. Quadratic algorithms exhibit rapid growth as input size increases, while divide-and-conquer algorithms remain significantly more efficient. Quick Sort achieved the lowest observed execution time under the tested conditions, validating its practical efficiency for random input distributions.
Also, the following results are shown along with the previous graphs.
For n = 1000, an O(n^2) algorithm performs on the order of one million operations, while an O(n log n) algorithm performs roughly ten thousand operations. This explains the increasing separation between the curves.
In order to generate a professional and effective test coverage, an environment was installed and created following the instructions of the professor.
A document called requirements.txt containing the name of the dependencies was ran with the command pip install -r requirements.txt, which then helped with the testing part.
The folder tests contains 2 tests generated by ChatGPT, and these were ran with the command py -m unittest discover.
The following results appeared.
Now, in order to know the test coverage, the command used was coverage run -m unittest discover, followed by typing "coverage report". This were the results.
This means that the tests cover up correctly the code with the sorting methods and data generator, showing complete results for both the execution and testing of the algorithms.






