Skip to content

MissGomezzz/ALDA_Algorithm_Analysis_1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALDA Algorithm Analysis 1

Analyzing sorting algorithms using Big-O Notation.

Content

About this repository

This mini project chooses and analyzes 5 sorting algorithms using graphs generated by python libraries and tests coverages. The idea is to identify which is the best sorting algorithm and understanding why.

In here we will take a look also at different coding components such as algorithms, constants, data generators and execution time gathering.

You will find a src file that contains the main modules, including the sorting algorithms chosen (bubble sort, selection sort, insertion sort, merge sort and quick sort), the random generator of data and more.

In order to compare all the sorting techniques at the same time, only 1 graph is displayed for all the running times of the algorithms.

Documents you will find

The src folder contains the following modules:

  • run.py: Entry point of the project. It coordinates the execution of the experiments by selecting the input type, defining the size range, and calling the performance analysis functions. It also triggers the visualization of the results.

  • sorting.py: Contains the implementations of the sorting algorithms analyzed in this project: Bubble Sort, Selection Sort, Insertion Sort, Merge Sort, and Quick Sort. Each function returns a sorted copy of the input array.

  • data_generator.py: Provides functions to generate different types of input arrays, including random, sorted, reverse sorted, and nearly sorted lists. These are used to evaluate algorithm behavior under different data distributions.

  • execution_time_gathering.py: Responsible for benchmarking the sorting algorithms. It measures execution times across multiple input sizes and samples, returning the median execution time for each algorithm.

  • plot_results.py: Handles visualization of the experimental results. It converts collected timing data into graphs that compare the performance of the sorting algorithms.

Also, the docs folder contains images used here.

The tests folder contains the testing modules for the different cases.

Algorithm #1 - Bubble Sort

This problem has the following characteristics:

Best Case: O(n)

Average Case: O(n^2)

Worst Case: O(n^2)

Space: O(1)

Stable: Yes

Now in order to understand how it works and why it's called that way, we must imagine the numbers we are trying to sort are "bubbling" until the end of the array. This algorithm compares by pairs and exchanges its position if the item from the left is bigger than the one on the right, and so on and so forth.

Example

Initial array: [5, 3, 8, 2]

Exchange 1 (5 y 3): [3, 5, 8, 2]

Exchange 2 (8 y 2): [3, 5, 2, 8]

Exchange 3 (5 y 2): [3, 2, 5, 8]

Exchange 4 (3 y 2): [2, 3, 5, 8]

Sorted array: [2, 3, 5, 8]

Algorithm #2 - Selection sort

This problem has the following characteristics:

Best Case: O(n^2)

Average Case: O(n^2)

Worst Case: O(n^2)

Space: O(1)

Stable: No

This algorithm gors through the entire array and searches for the smallest number and brings it to the beginning of the array. This process repeats itself until the whole array is sorted.

Example

Initial list:

[5, 3, 8, 2]

Pass 1:

Minimum found: 2

Swap 5 and 2

[2, 3, 8, 5]

Pass 2:

Minimum found: 3

No swap needed

[2, 3, 8, 5]

Pass 3:

Minimum found: 5

Swap 8 and 5

[2, 3, 5, 8]

Pass 4:

Minimum found: 8

No swap needed

[2, 3, 5, 8]

Sorted list:

[2, 3, 5, 8]

Algorithm #3 - Insertion sort

This problem has the following characteristics:

Insertion Sort

Best Case: O(n)

Average Case: O(n^2)

Worst Case: O(n^2)

Space: O(1)

Stable: Yes

This algorithm represents the action of ordering a deck of cards, where you take each of the cards and place them in the correct place. The same happens here, as the numbers are being placed one by one in their correct position, starting from the second element of the list, assuming that the first element is already sorted.

At each step, the algorithm selects the current element (called the key) and compares it with the elements before it. If the previous elements are greater than the key, they are shifted one position to the right to make space. Once the correct position is found, the key is inserted there.

This process continues until all elements have been examined and inserted into their proper position, resulting in a fully sorted list.

Example

Initial list: [5, 3, 8, 2]

Pass 1:

Key = 3

Compare 3 with 5

Shift 5 to the right

[5, 5, 8, 2]

Insert 3

[3, 5, 8, 2]

Pass 2:

Key = 8

Compare 8 with 5

No shift needed

[3, 5, 8, 2]

Pass 3:

Key = 2

Compare 2 with 8

Shift 8 to the right

[3, 5, 8, 8]

Compare 2 with 5

Shift 5 to the right

[3, 5, 5, 8]

Compare 2 with 3

Shift 3 to the right

[3, 3, 5, 8]

Insert 2

[2, 3, 5, 8]

Sorted list:

[2, 3, 5, 8]

Algorithm #4 - Merge Sort

This problem has the following characteristics:

Merge Sort

Best Case: O(n log n)

Average Case: O(n log n)

Worst Case: O(n log n)

Space: O(n)

Stable: Yes

This one is one of the most used algorithms. It uses the technique called Divide and Conquer, which is based on dividing the whole array in 2 equal parts, and those 2 equal parts are divided in half, and so on and do forth.

This process also uses recurssion as it divides the blocks of arrays until having lots of blocks of length 1, and it is used for larger arrays compared to the previous sorting methods.

Example

Initial list:

[5, 3, 8, 2]

Step 1: Divide

Split into two halves

[5, 3] [8, 2]

Step 2: Divide again

Split each half

[5] [3] [8] [2]

Step 3: Merge sorted pairs

Merge [5] and [3]

Compare 5 and 3

Take 3

Then take 5

Result:

[3, 5]

Merge [8] and [2]

Compare 8 and 2

Take 2

Then take 8

Result:

[2, 8]

Step 4: Final merge

Merge [3, 5] and [2, 8]

Compare 3 and 2

Take 2

Compare 3 and 8

Take 3

Compare 5 and 8

Take 5

Take remaining element 8

Result:

[2, 3, 5, 8]

Sorted list:

[2, 3, 5, 8]

Algorithm #5 - Quick sort

This problem has the following characteristics:

Quick Sort

Best Case: O(n log n)

Average Case: O(n log n)

Worst Case: O(n^2) when the pivot is bad

Space: O(log n) using recursion

Stable: No

This algorith chooses a pivot as a reference to determine the numbers smaller and bigger in terms of the pivot. This process is done with recurssion. In this case, the secret relies on the pivot chosen, where is it normally chosen randomly as it reduces the chance of selecting the worst case pivots.

The most common method used to select the pivot is called median-of-three:

Take first, middle, last.

Use the median of those three as pivot.

This usually produces good splits.

Example

Initial list

[5, 3, 8, 2]

Step 1:

Pivot = 5

Elements less than 5:

[3, 2]

Elements greater than 5:

[8]

Step 2: Sort left part [3, 2]

Pivot = 3

Elements less than 3:

[2]

Elements greater than 3:

[]

Sorted left part:

[2, 3]

Step 3: Sort right part [8]

Single element

Already sorted:

[8]

Step 4: Combine

Left part:

[2, 3]

Pivot: 5

Right part:

[8]

Final result:

[2, 3, 5, 8]

Sorted list:

[2, 3, 5, 8]

Results and Graphs - Anlaysis

After adding the different modules in the src folder, the program was ran from the cmd with the following code:

py -m src.run.

The following graphs appeared.

Random data

Random data

Sorted data

Sorted Data

Reverse sorted data

Reverse sorted data

Nearly Sorted data

alt text

It is possible to conclude that the best sorting algorithm was the Quick Sort followed by Merge Sort, with a very low execution time compared to the rest of the sorting methods.

The slowest one was the Bubble sort as it constantly compares the list of elements by pairs, which turns out to be very inefficient with large arrays.

This means that the size of the arrays randomly generated have a max. size of 1000. There were 10 different input sizes, starting from 100, then 200... up to 1000.

The quadratic algorithms (Bubble, Selection, Insertion in average case) show a visibly accelerating growth as input size increases.

The O(n log n) algorithms (Merge and Quick) grow much more slowly.The difference becomes more significant as n approaches 1000. Also, the gap between O(n^2) and O(n log n) increases non-linearly.

The experimental results confirm the theoretical time complexities. Quadratic algorithms exhibit rapid growth as input size increases, while divide-and-conquer algorithms remain significantly more efficient. Quick Sort achieved the lowest observed execution time under the tested conditions, validating its practical efficiency for random input distributions.

Also, the following results are shown along with the previous graphs.

Processing data

For n = 1000, an O(n^2) algorithm performs on the order of one million operations, while an O(n log n) algorithm performs roughly ten thousand operations. This explains the increasing separation between the curves.

Test coverage

In order to generate a professional and effective test coverage, an environment was installed and created following the instructions of the professor.

A document called requirements.txt containing the name of the dependencies was ran with the command pip install -r requirements.txt, which then helped with the testing part.

The folder tests contains 2 tests generated by ChatGPT, and these were ran with the command py -m unittest discover.

The following results appeared.

Tests running time

Now, in order to know the test coverage, the command used was coverage run -m unittest discover, followed by typing "coverage report". This were the results.

Coverage report

This means that the tests cover up correctly the code with the sorting methods and data generator, showing complete results for both the execution and testing of the algorithms.


About

Analyzing sorting algorithms using Big-O Notation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages