This project has been created as part of the 42 curriculum by lwicket.
- My code is standard-complaint.
- Did you mean compliant?
- No
@vzverovich on Twitter, February 10, 2022
Warning
This project is still a work in progress. Its contents, along with the information in this README file, are subject to change.
The Libft (where ft stands for Forty-Two) is the very first project of the 42 School. It consists of reimplementing various functions from the C Standard Library, along with a few custom ones.
For the first projects of the 42 curriculum, students are not allowed to use any function from the standard library: everything must be reimplemented on top of system calls (write, read, etc.), with the sole exceptions of malloc and free.
Beyond the educational goal of learning how things work under the hood, reimplementing their own library provides students with a set of functions they can reuse in future projects.
I regarded my Libft as an opportunity to dive into the advanced aspects of C, approaching the project with a focus on strict standard compliance and high performance.
In addition to what's normally expected for this project, my Libft features:
- Strict adherence to the ISO C Standard.
- High-performance memory functions using "word-at-a-time" operations.
- Branchless
ctypefunctions (pure flex). - Cross-compilation support for Windows.
- Robust Makefile with proper
.hdependency tracking.
The following functions from the standard C library must be reimplemented:
- atoi
- bzero
- calloc
- isalnum
- isalpha
- isascii
- isdigit
- isprint
- memchr
- memcmp
- memcpy
- memmove
- memset
- strchr
- strdup
- strlcat
- strlcpy
- strlen
- strncmp
- strnstr
- strrchr
- tolower
- toupper
The following functions are not part of the standard C library. Their implementation specifications are defined in the project assignment.
| Function | Prototype | Allowed | Description |
|---|---|---|---|
ft_itoa |
char *ft_itoa(int n) |
malloc |
Allocates and returns a string representing the integer n. |
ft_putchar_fd |
void ft_putchar_fd(char c, int fd) |
write |
Outputs the character c to the given file descriptor. |
ft_putstr_fd |
void ft_putstr_fd(char *s, int fd) |
write |
Outputs the string s to the given file descriptor. |
ft_putendl_fd |
void ft_putendl_fd(char *s, int fd) |
write |
Outputs the string s to the given file descriptor, followed by a newline. |
ft_putnbr_fd |
void ft_putnbr_fd(int n, int fd) |
write |
Outputs the integer n to the given file descriptor. |
ft_substr |
char *ft_substr(const char *s, unsigned int start, size_t len) |
malloc |
Allocates and returns a substring from s. The substring begins at index start and is of maximum size len. |
ft_strjoin |
char *ft_strjoin(const char *s1, const char *s2) |
malloc |
Allocates and returns a new string resulting from the concatenation of s1 and s2. |
ft_strtrim |
char *ft_strtrim(const char *s1, const char *set) |
malloc |
Allocates and returns a copy of s1 with the characters specified in set removed from the beginning and the end. |
ft_split |
char **ft_split(const char *s, char c) |
malloc |
Allocates and returns an array of strings obtained by splitting s using the character c as a delimiter. The array is NULL-terminated. |
ft_striteri |
void ft_striteri(char *s, void (*f)(unsigned int, char *)) |
None | Applies the function f to each character of the string s, passing its index as the first argument. |
ft_strmapi |
char *ft_strmapi(const char *s, char (*f)(unsigned int, char)) |
malloc |
Allocates a new string where each character is the result of applying the function f to the corresponding character of s. The function f is passed the character's index as its first argument. |
ft_lstnew |
t_list *ft_lstnew(void *content) |
malloc |
Creates a new node with the given content; next is set to NULL. |
ft_lstadd_front |
void ft_lstadd_front(t_list **lst, t_list *new) |
None | Adds new node at the beginning of the list. |
ft_lstadd_back |
void ft_lstadd_back(t_list **lst, t_list *new) |
None | Adds new node at the end of the list. |
ft_lstsize |
int ft_lstsize(t_list *lst) |
None | Counts the number of nodes in the list. |
ft_lstlast |
t_list *ft_lstlast(t_list *lst) |
None | Returns the last node of the list. |
ft_lstdelone |
void ft_lstdelone(t_list *lst, void (*del)(void *)) |
free |
Frees a node's content using del, then frees the node itself. |
ft_lstclear |
void ft_lstclear(t_list **lst, void (*del)(void *)) |
free |
Deletes a list and all successors. |
ft_lstiter |
void ft_lstiter(t_list *lst, void (*f)(void *)) |
None | Applies f to the content of every node in the list. |
ft_lstmap |
t_list *ft_lstmap(t_list *lst, void *(*f)(void *), void (*del)(void *)) |
free, malloc |
Creates a new list by applying f to each node. Uses del if allocation fails. |
As per the assignment, the code is compiled using cc. It is not specified which compiler cc refers to, but it is safe to assume that it will be either GCC or Clang.
Tip
At the time I wrote this, on the Brussels campus, whether cc points to GCC or Clang actually varies from one machine to another.
My project relies on features from C99 (compound literals, inline functions, single-line comments and stdbool.h).
To compile the project, simply go to the libft/ directory and run make:
cd libft/
makeThis will produce a libft.a archive containing all the compiled .o object files, which you can then reuse in other projects.
My library supports cross-compilation for Windows via MinGW. First, ensure you have the necessary packages installed:
sudo apt install build-essential binutils-mingw-w64 gcc-mingw-w64To compile the library, run the following command:
make windowsThis will produce a libft.lib file, which you can then link into your projects targeting Windows.
cc <your src/obj files> libft.a -I<path to the directory containing libft.h>Warning
libft.a must come AFTER everything that uses it in the list of files you provide to the compiler. This is because the linker discards all the functions in your archive that aren't currently needed, and its list of necessary functions is based exclusively on the objects it has parsed so far. If libft.a is the very first file provided in your link command, all of its contents will be discarded!
Some 42 projects allow you to reuse your Libft. When that's the case, you aren't allowed to simply turn in the pre-compiled libft.a archive; you have to build it directly within your project's build process.
To do so, just copy/paste the libft/ directory at the root of your project, along with its Makefile, header, and source files.
Then, update the main Makefile of your project to add the following variables:
LIBFT_DIR := libft
LIBFT := $(LIBFT_DIR)/libft.aAppend the following options to your compiler flags:
LDFLAGS += -L$(LIBFT_DIR)
LDLIBS += -lft
CPPFLAGS += -I$(LIBFT_DIR)Warning
Make sure you use $(CPPFLAGS) in your implicit rules to compile objects from sources, otherwise the C preprocessor won't be able to find libft.h.
You can now create a rule to build the Libft and add it as a dependency to your main linking rule:
$(LIBFT): $(LIBFT_DIR)/Makefile
@$(MAKE) -C $(LIBFT_DIR)
$(NAME): $(LIBFT) $(OBJ)
$(CC) $(LDFLAGS) $(OBJ) $(LDLIBS) -o $@(Notice how $(OBJ) comes before $(LDLIBS) to respect the linking order mentioned earlier).
Don't forget to update your clean/fclean rules too:
clean:
$(RM) -r -- $(BUILD_DIR)
@$(MAKE) -C $(LIBFT_DIR) clean
fclean: clean
$(RM) -- $(NAME)
@$(MAKE) -C $(LIBFT_DIR) fcleanMy coding style conforms to the Norm, a set of strict styling guidelines enforced for all C exercises, which all students must follow under penalty of a zero grade.
Beyond purely cosmetic considerations, the Norm also forbids the following language constructs:
forloopsdo β¦ whileloopsswitchstatementsgoto- the ternary operator
- declarations with initialization
- assignments inside controlling expressions
(non-exhaustive list)
Tip
You can find the latest version of the Norm in the Norminette repository β the program used to check conformity with the Norm. The version of the Norm included in this repository is the one that was in use when I completed this project.
I implemented the memory functions to handle one word at a time, rather than byte by byte (char by char). I had to overcome two main obstacles:
- Strict Type Aliasing: Character types (like
unsigned char) are the only types allowed to alias anything in Standard C. There is no standard C type the size of a word that wouldn't violate strict aliasing rules. - Unaligned Memory Access: In order to minimize branching, my code unconditionally jumps to the next aligned memory boundary, then starts accessing memory word by word. However, the bytes located before that aligned address need to be processed too, as do the leftover bytes at the end of the buffer when the size is not a multiple of the alignment. To handle those, I perform an unaligned access, which is not natively supported on some older architectures.
Both problems were solved using GNU type attributes:
__may_alias__: Tells the compiler that this type may alias another, ensuring it won't make assumptions based on strict aliasing rules or accidentally optimize out important instructions.aligned (n): Tells the compiler that this type is aligned to n bytes. Setting an alignment requirement of 1 doesn't change the size of my word-sized type, but it explicitly permits unaligned access. This allows architectures that support unaligned writes to do them in one go, while safely breaking them down into separate byte instructions under the hood on architectures that don't.
I didn't use the C23 alignas specifier or standard attribute syntax because standard alignment specifiers cannot be applied to typedefs (only to variables and struct/union members), and standard attribute support on typedefs remains inconsistent across compilers.
If the GNU extensions aren't available, the memory functions fall back to using an unsigned char, which naturally avoids both of the aforementioned problems. All my memory functions were written to adapt themselves to whatever type is defined, meaning the only part of the code that needs to be conditionally compiled behind an #ifdef is the typedef itself.
If you want to be completely pedantic, memmove is theoretically impossible to implement in a 100% standard-compliant way.
This is because determining if two buffers overlap typically requires comparing their addresses like this:
if (dest < src + n && src < dest + n)
{
// Buffers overlap
}However, the C Standard strictly limits when you can relationally compare pointers:
When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. [...] In all other cases, the behavior is undefined.
This means that any < or > comparison between two pointers pointing to completely different objects is technically Undefined Behavior.
This "undefinedness" can be mitigated by first casting the pointers to a uintptr_t, since the comparison between two integers is always well-defined. This is the approach my implementation uses.
However, even this isn't universally compliant because it assumes a linear (flat) memory model. On older segmented memory architectures, you might get two completely different integer values that actually point to the exact same physical memory address. In such cases, an integer comparison would fail to catch the overlap.
That said, the segmented memory model is effectively dead in modern computing. On today's flat memory models, it is perfectly safe to assume that comparing two addresses (via uintptr_t) will reliably produce the expected behavior.
I already learned the hard way that -O3 could totally jeopardize my projects. That said, it was an interesting learning experience, and so I decided to use that flag again π±
This had the following consequences:
- I always have to do something with the return value of
write. This is because, on Ubuntu,gccis customized to define_FORTIFY_SOURCE=3when compiling with any level of optimization greater than 0. As a side effect, this requires you to use the return value ofwrite; otherwise, it emits a warning, which turns into a fatal error thanks to the mandatory-Werrorflag. Simply casting to(void)isn't enough to silence the warning, so I used the return value in its intended way: to detect errors, and retry if possible. - I had to compile with
-fno-builtin, because the compiler was sometimes smart enough to detect that I was basically doing amemcpy. It would then replace my code with a call to the actual, builtinmemcpyfunction. This would, of course, have been detected as the use of a forbidden function, which is punished by a-42grade. I used the commandnm -u <my_file.o>to ensure all my objects only contained the allowed functions.
My implementation does nothing to detect or handle out-of-bounds overflow. I based this design choice on the following excerpt from the C Standard:
The functions atof, atoi, atol, and atoll need not affect the value of the integer expression errno on an error. If the value of the result cannot be represented, the behavior is undefined.
However, my implementation ensures that all valid, representable values within the range of an int are parsed correctly without triggering internal overflows. This requires special care due to how string-to-integer conversion is typically implemented.
Usually, atoi calculates the result by reading digits into a positive accumulator (multiplying by 10 and adding the new digit), and only converts the final result to a negative number at the very end if a minus sign was detected. In C, signed integer overflow is strict Undefined Behavior (UB).
This standard accumulation strategy creates a subtle problem when parsing INT_MIN. In two's complement arithmetic, the absolute value of INT_MIN is exactly one greater than INT_MAX. Therefore, if the accumulator is simply declared as an int, parsing the string representation of INT_MIN will cause a signed integer overflow on the final digit, triggering UB before the negation step is ever reached.
On many common architectures, this bug goes completely unnoticed. The compiler's underlying behavior for signed overflow often defaults to two's complement wraparound, meaning the positive accumulator wraps completely around to the negative INT_MIN value. Negating INT_MIN typically yields INT_MIN again, making the function output the correct answer by sheer luck. However, relying on this is unsafe and violates the C standard; compiling the code with -fsanitize=undefined then running it flags this as an error.
To solve this, one might be tempted to use a larger signed type, such as long, for the accumulator. However, this is not a portable solution. The C Standard does not guarantee that long (or even a long long for that matter) is larger than int; for example, under the LLP64 data model used by 64-bit Windows, both int and long are exactly 32 bits.
Therefore, the only strictly portable way to safely accumulate digits and support INT_MIN is to use an unsigned int for the accumulator. Unsigned arithmetic is guaranteed by the Standard never to overflow (it operates using modulo arithmetic), and an unsigned int is perfectly capable of holding the absolute value of INT_MIN before the final sign logic is applied.
This is the official document that defines the behavior required of conforming C implementations. I used it as the authoritative reference for the behavior of every libc function.
You can purchase the Standard for the modest sum of 227 CHF on iso.org. Alternatively, you can download a free draft version that is 99% identical to the final version: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf
musl was my main reference when implementing the libc functions. It's a treasure trove of clever C code, full of branchless techniques and SWAR tricks, while always keeping portability and safety in mind.
The definitive reference for bitwise tricks, a must-read for implementing SWAR strategies.
https://graphics.stanford.edu/~seander/bithacks.html
https://gcc.gnu.org/onlinedocs/gcc-3.3/gcc/Type-Attributes.html
- N3519 Understanding Effective Type Aliasing in C by Eskil Steenberg
- The joys and perils of C and C++ aliasing by Martin Sebor
In a famous TED interview, Linus Torvalds demonstrates an approach to implementing linked list functions that he considers "good taste" in coding. This was particularly helpful when implementing functions that modify the head pointer. The following repository was a great help in understanding the concept: https://github.com/mkirchner/linked-list-good-taste
This entire project has been handcrafted by a human (yours truly), but proofread by AI. Many thanks to the free version of ChatGPT for fixing my grammar and providing feedback on my cursed code!