-
Notifications
You must be signed in to change notification settings - Fork 2
Comparing GLib data structures
Glib is a C library developed as part of the GNOME project that provides a wide range of data structure implementations, utility functions, and portability abstractions. Among its generic containers are dynamic arrays (e.g., GArray, GPtrArray), byte arrays (GByteArray), singly linked lists (GSList), doubly linked lists (GList), double-ended queues (GQueue), hash tables (GHashTable), balanced binary trees (GTree), and others. This variety exists because no single container is optimal for all purposes: some favor fast insertion or deletion (especially at specific positions), others provide efficient random access, better memory locality, sorted traversal, or minimal overhead.
By offering multiple data structures, GLib allows programmers to select the container best suited to their performance, memory, and semantic requirements. This article explores how to use Benchgen to systematically evaluate the dynamic behavior of these data structures.
There are three essential dimensions to vary when evaluating a data structure: the number of insertions (
B = LOOP(CALL(B) C);
C = B <ops> B;
Here is a placeholder for an arbitrary sequence of operations. By filling it with different sequences we control the relative proportions of insertions, removals, and contains queries. Examples:
-
insert contains contains insert contains remove: contains-heavy program -
insert insert insert: program that only performs insertions -
insert contains removes: program that performs each operation the same number of times
For each container we iterate the parameters (
To avoid programs where the container remains mostly empty, we add an initialization gadget that quickly populates the container by exploiting L-system exponential growth:
A0 = A1 A1 A1 A1 A1 A1 A1 A1;
A1 = A2 A2 A2 A2 A2 A2 A2 A2;
A2 = A3 A3 A3 A3 A3 A3 A3 A3;
A3 = insert insert;
This produces
Finally, we provide a seed string to create a container instance, start the initialization gadget, and use the grammar template:
new A0 B
Here is an example program generated with this methodology.
The generated programs print a string for every operation performed; this allows us to count operations. Printing can be disabled via a compilation flag so we can also measure runtime without I/O overhead. In short, every program is compiled twice: once with prints enabled to count how many times each operation occurred, and once with prints disabled to measure execution time without that overhead.
We use hyperfine to benchmark the generated binaries.
We can assess the impact of each operation type individually. Since there are two other varying dimensions, each plotted data point represents the average execution time of all programs that share the same number of a given operation.

One curious observation about this first graph is that the average execution time decreases as the number of removals increases. At first glance this seems wrong, but the answer comes from how the data is grouped: programs with many removals tend to leave the container mostly empty, so subsequent operations are faster.
As the number of operations grows, hash tables and balanced trees clearly outperform lists, arrays, and queues. Among those latter containers, GList outperforms GQueue (which is expected, since GQueue is implemented on top of GList). At this scale, GTree is consistently faster than GHashTable. However, when we repeat the experiment using only these two containers and increase GHashTable eventually overtakes GTree:

This behavior follows from the data structures' different properties. Balanced search trees guarantee GTree faster in practice. As the workload grows, logarithmic scaling becomes dominant and GHashTable's average-constant behavior pays off, allowing it to surpass GTree.