Analyzing searching algorithms using Big-O Notation.
This mini project chooses and analyzes 5 searching algorithms using graphs generated by python libraries and tests coverages. The idea is to identify which is the best searching algorithm and understanding why.
In here we will take a look also at different coding components such as algorithms, constants, data generators and execution time gathering.
You will find a src file that contains the main modules, including the searching algorithms chosen, the random generator of data and more.
In order to compare all the searching techniques at the same time, only 1 graph is displayed for all the running times of the algorithms.
The src folder contains the following modules:
-
run.py: Entry point of the project. It coordinates the execution of the experiments by defining the input sizes, selecting the appropriate data type (sorted or unsorted depending on the algorithm), and calling the performance measurement functions. It also triggers the visualization of the results.
-
searching.py: Contains the implementations of the searching algorithms analyzed in this project: Linear Search, Binary Search, Jump Search, Interpolation Search, and Exponential Search. Each function receives an array and a target value, returning the index of the target if found or -1 otherwise.
-
data_generation.py: Provides functions to generate input arrays for the experiments. It creates sorted lists of different sizes, as well as target values for best, average, and worst case scenarios. These datasets are used to evaluate algorithm performance under different conditions.
-
execution_time.py: Responsible for benchmarking the searching algorithms. It measures the execution time of each algorithm for different input sizes using high precision timers, returning the recorded execution time for performance comparison.
-
plot_results.py: Handles visualization of the experimental results. It converts the collected timing data into graphs that compare the performance of the searching algorithms across different input sizes.
Also, the docs folder contains images used here.
The tests folder contains the testing modules for the different cases.
Time complexity:
Best case: O(1) (first element)
Average case: O(n)
Worst case: O(n)
Linear search checks each element one by one from the beginning of the list until it finds the target or reaches the end.
It works on both sorted and unsorted data.
If we search for 7 in:
[4, 2, 9, 7, 5]
Steps:
Check 4
Check 2
Check 9
Check 7 -> found
Time complexity:
Best case: O(1)
Average case: O(log n)
Worst case: O(log n)
Binary search works only on sorted arrays.
It divides the array in half each time:
If the middle value is the target, return it.
If the target is smaller, search the left half.
If larger, search the right half.
Search for 8 in:
[1, 3, 5, 7, 8, 10, 12]
Steps:
Middle is 7
8 is greater than 7, search right half
Middle becomes 10
8 is less than 10, search left
Find 8
Time complexity:
Best case: O(1)
Average case: O(sqrt(n))
Worst case: O(sqrt(n))
Jump search also requires a sorted array.
Instead of checking every element, it jumps ahead by fixed steps (usually square root of n), then performs a linear search within the correct block.
Search for 16 in:
[1, 4, 7, 10, 13, 16, 19, 22, 25]
If n = 9, step size is 3.
Jump from index 0 to 3 to 6.
16 is between index 3 and 6.
Do linear search inside that block.
Find 16.
Time complexity:
Best case: O(1)
Average case: O(log log n) for uniform data
Worst case: O(n)
Interpolation search is an improved version of binary search for uniformly distributed sorted data.
Instead of always checking the middle, it estimates where the value should be based on its value.
It works best when values are evenly distributed.
Search for 40 in:
[10, 20, 30, 40, 50, 60, 70]
Since 40 is proportionally near the middle, it calculates a position close to the real index instead of blindly splitting in half.
Time complexity:
Best case: O(1)
Average case: O(log n)
Worst case: O(log n)
Exponential search works on sorted arrays.
It first finds a range where the element may exist by doubling the index (1, 2, 4, 8, 16, ...).
Once it finds a range containing the target, it applies binary search within that range.
Search for 32 in:
[2, 4, 8, 16, 32, 64, 128]
Steps:
Check index 1
Check index 2
Check index 4
Now target is between index 2 and 4
Apply binary search in that range
Find 32
It happens when the element is not on the list, or when its located in the last position
Complexity: O(n)
It happens when the element is not present or when its found after dividing the original array many times.
Complexity: O(log n)
It happens when the element is not present, or when it is located on the last block. The search performs jumps of size sqrt(n).
Complexity: sqrt(n)
It occurs when the target is not in the array, or when the data is not uniformely distributed.
If its not uniformely distributed, the search will make poor guesses and might become a linear search, making this method inefficient.
Complexity: O(n)
It happens when the target is not found, or when the target is located far from the beggining of the array.
This method uses a range established by exponential blocks, and then performs binary search over it. So, in this case, having a far target from the beggining makes the method slower.
Complexity: O(log n)
The following graph generated by ChatGPT is based on the complexities established previosly, showing how the different methods behave on their worst case scenario.
After adding the different modules in the src folder, the program was ran from the cmd with the following code:
py -m src.run.
The following graphs appeared.
Sorted data
Since several searching algorithms require the input array to be sorted (Binary Search, Jump Search, Interpolation Search, and Exponential Search), the experiments were conducted using sorted arrays. This ensures that all algorithms operate under valid conditions and can be fairly compared.
After running the experiment, the graph compares the execution time of the five searching algorithms as the input size increases from 100,000,000 to 300,000,000 elements.
The results obtained from the experiment show clear differences in the performance of the evaluated search algorithms when the input size increases from 100 million to 300 million elements.
The Linear Search algorithm exhibits the highest execution times among all the tested algorithms. This behavior occurs because Linear Search examines the elements sequentially until it finds the target value. As a result, the algorithm may need to inspect a large portion of the dataset before locating the desired element, which corresponds to its theoretical time complexity of O(n). In this experiment, the execution time varies depending on the position of the target within the list. For example, when the target is located closer to the beginning of the array, fewer comparisons are required and the execution time decreases, while targets located further into the dataset require more comparisons and therefore more time.
The other algorithms — Binary Search, Jump Search, Interpolation Search, and Exponential Search — perform significantly faster because they exploit the fact that the dataset is sorted. Instead of examining every element sequentially, these algorithms progressively reduce the search space, allowing them to locate the target with far fewer comparisons.
Binary Search and Exponential Search maintain extremely low execution times because both rely on logarithmic growth of comparisons, corresponding to a time complexity of O(log n). Even when the dataset increases from 100 million to 300 million elements, the number of required comparisons grows very slowly, which explains why their execution times remain almost constant in the graph.
Jump Search performs slightly slower than Binary Search because its complexity is O(sqrt(n)). Although it still reduces the number of comparisons significantly compared to Linear Search, the square root growth means that the number of steps increases faster than logarithmic algorithms as the dataset becomes larger.
Interpolation Search shows very efficient performance in this experiment because the dataset is uniformly distributed. Since the array values increase in a predictable and evenly spaced manner, the algorithm can estimate the likely position of the target very accurately, allowing it to locate the element with very few iterations.
Overall, the experiment confirms the theoretical expectations of these algorithms. Algorithms with logarithmic or sublinear complexity scale much more efficiently as the dataset grows, while algorithms with linear complexity become significantly slower when processing very large datasets.
Also, in the cmd the following output appeared:
In order to generate a professional and effective test coverage, an environment was installed and created following the instructions of the professor.
A document called requirements.txt containing the name of the dependencies was ran with the command pip install -r requirements.txt, which then helped with the testing part.
The folder tests contains 2 tests generated by ChatGPT, and these were ran with the command py -m unittest discover.
The following results appeared.
Now, in order to know the test coverage, the command used was coverage run -m unittest discover, followed by typing "coverage report". This were the results.
This means that the tests cover up correctly the code with the serarching methods and data generator, showing complete results for both the execution and testing of the algorithms.




