Foundry is a deep learning framework that was created entirely with NumPy and Python. This project aims to engage in first-principles thinking and extend the high-level abstractions common to modern libraries. This library functions as a clear, readable, and thoroughly documented artefact for demonstrating the fundamental ideas of automatic differentiation, neural network architecture, and training pipelines by implementing the essential elements of a framework such as PyTorch or TensorFlow.
Modern deep learning frameworks are marvels of software engineering, but their complexity can often obscure the fundamental principles they are built upon. The motivation for Foundry was to deconstruct this complexity and build a working mental model of how these systems operate internally.
This project draws inspiration from the design philosophies of two of the most influential frameworks:
- PyTorch: Foundry's core design heavily emulates PyTorch's dynamic computational graphs and user-centric API. The "define-by-run" approach, where the graph is built on the fly as operations are executed, offers immense flexibility and is more intuitive for debugging and building complex, dynamic architectures. The autograd engine in Foundry is a direct implementation of this philosophy.
- TensorFlow & Keras: The high-level model API in Foundry is inspired by Keras. By providing simple, chainable methods like
.fit(),.evaluate(), and.compile(), the framework abstracts away the boilerplate of the training loop, allowing the user to focus on model architecture. This demonstrates the power of layered API design, where low-level control is available but not required for simple tasks.
Ultimately, Foundry is an educational tool. It is an attempt to answer questions like: What is a Tensor beyond a multi-dimensional array? How does .backward() actually work? How is the chain rule implemented in code? The result is a library that is simple enough to be understood in its entirety, yet functional enough to build and train real neural networks.
- Dynamic Computational Graph: A tape-based automatic differentiation engine that dynamically builds a graph of operations. This emulates PyTorch's define-by-run approach, allowing for more flexible model design and easier debugging.
- Rich Tensor Operations: A comprehensive
Tensorclass that wrapsnumpy.ndarrayand overloads standard operators (+,*,@, etc.) to automatically track the history of operations for gradient computation. - Modular Neural Network Primitives: A dedicated
nnpackage containing the essential building blocks for neural networks, promoting a clean, object-oriented design.- Layers:
Dense,BatchNorm1D,Dropout. Each layer encapsulates its parameters and forward-pass logic. - Activations:
ReLU,Sigmoid,Tanh,Softmax, implemented as distinct, reusable operations. - Losses:
MSELoss,CrossEntropyLoss, crucial for quantifying model error during training.
- Layers:
- High-Level Model API: A Keras-inspired
ModelandSequentialAPI that abstracts the training loop into intuitive.fit(),.evaluate(), and.predict()methods. - Modern Optimizers: A suite of standard optimization algorithms, including
SGD(with momentum),Adam, andRMSprop. The design decouples the optimization logic from the model parameters, making it easy to swap strategies. - Gradient Verification Utilities: A
gradient_checkingtool to numerically verify the correctness of the analytical gradients computed by the backward pass, ensuring the autograd engine is bug-free.
Foundry is designed with a modular, layered architecture that promotes a clear separation of concerns, a principle borrowed from production-grade frameworks. This structure makes the codebase easier to navigate, maintain, and extend.
This is the foundational layer responsible for all numerical computation and gradient tracking. It is the framework's central nervous system.
tensor.py: Defines theTensorobject, the central data abstraction of the framework. It is an advanced wrapper around a NumPy array that contains not only the data but also the necessary metadata for the autograd engine:data: The underlyingnumpy.ndarraythat holds the numerical values.requires_grad: A boolean flag that acts as a signal to the autograd engine. IfTrue, the engine will track operations involving this tensor to compute gradients for it later.grad: After.backward()is called, this attribute accumulates the gradient of the final scalar value with respect to this tensor.grad_fn: A reference to theFunctionobject that created this tensor. This is the key to building the computational graph, as it provides a backward link from a tensor to its parent operation and, through that, to its input tensors. Tensors created by the user (leaf nodes) havegrad_fn=None.
autograd.py: Implements the automatic differentiation engine. This is the heart of the framework's "learning" capability.Function: An abstract base class for every differentiable operation (e.g.,Add,Mul,MatMul,ReLU). Each operation class inherits fromFunctionand implements two critical methods:forward(): Takes input tensors and computes the output tensor.backward(): Takes the gradient from the next layer in the graph and computes the local gradients with respect to its own inputs, applying the chain rule.
- The Backward Pass: The learning process is enabled by reverse-mode automatic differentiation. When
.backward()is called on a final, scalar tensor (like the output of a loss function), the following happens:- A topological sort is performed on the computational graph, starting from the final tensor and tracing back through the
grad_fnattributes of all ancestor tensors. This creates an ordered list of operations to execute. - The engine traverses this graph in reverse order.
- At each node (a
Functionobject), it calls thebackward()method, passing in the gradient computed from the previous step. This method then calculates the gradients for its inputs and passes them down the graph. - This process continues until all leaf nodes with
requires_grad=Truehave had theirgradattributes populated.
- A topological sort is performed on the computational graph, starting from the final tensor and tracing back through the
This module provides the abstractions and building blocks for creating and training neural networks. It serves as the primary user-facing API for model construction.
layers.py: Contains the network layers. TheLayerclass is an abstract base class that defines a common interface, encapsulating both state (learnable parameters like weights and biases) and computation (theforwardpass). This object-oriented design makes layers reusable and chainable.models.py: Implements a high-level API for model building. TheModelclass provides a Keras-style interface that abstracts away the training loop. This is an important design choice that separates the model's architecture from the logic of how it's trained, making the code cleaner and more user-friendly. TheSequentialmodel is a simple container for a linear stack of layers.losses.py&activations.py: These modules provide standard loss functions and activation functions. By implementing them as distinct classes or functions, they can be easily swapped out during model definition or training.
This module contains the optimization algorithms responsible for updating the model's parameters based on the computed gradients.
optimizers.py: TheOptimizerbase class defines the core interface. It is initialized with a list of model parameters it needs to track. Its primary methods are:.zero_grad(): Resets the gradients of all parameters before a new backpropagation pass. This is necessary because gradients are accumulated by default..step(): Updates the parameters using their.gradattribute according to the specific optimization algorithm's logic (e.g., the update rule for Adam or SGD). This design decouples the gradient calculation (done by the autograd engine) from the parameter update logic, making the framework highly modular.
It is recommended to use a virtual environment to manage dependencies.
- Clone the repository:
git clone [https://github.com/your-username/foundry.git](https://github.com/your-username/foundry.git) cd foundry - Create and activate the Conda environment:
conda env create -f environment.yml conda activate foundry
- Install the package in editable mode:
pip install -e .
- Clone the repository and navigate to the project directory.
- Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install the package and its dependencies:
pip install -e .[dev]
Here is a simple example of building and training a Multi-Layer Perceptron (MLP) for a regression task.
import numpy as np
from foundry.nn.models import MLP
from foundry.nn.losses import MSELoss
from foundry.optim.optimizers import Adam
# 1. Create a simple dataset
X = np.random.randn(100, 10).astype(np.float32)
y_true = np.random.randn(100, 1).astype(np.float32)
# 2. Define the model architecture
model = MLP(
input_size=10,
hidden_sizes=[64, 32],
output_size=1,
activation='relu'
)
# 3. Compile the model with a loss function and an optimizer
loss_fn = MSELoss()
optimizer = Adam(model.parameters(), lr=0.01)
model.compile(loss=loss_fn, optimizer=optimizer)
# 4. Train the model
model.fit(X, y_true, epochs=10, batch_size=32, verbose=1)
# 5. Make predictions
predictions = model.predict(X)
print(f"Sample predictions: {predictions[:5].flatten()}")The project has a comprehensive test suite to ensure correctness, especially for the gradient calculations. To run the tests, use pytest:
pytestTo view the test coverage report:
pytest --cov=src/foundryWhile this project was primarily for learning, there are several areas where it could be extended:
- Convolutional & Recurrent Layers: Implementing
Conv2D,MaxPool2D, andRNN/LSTMlayers to handle image and sequence data. - GPU Support: Integrating a GPU backend (e.g., via CuPy) to accelerate computations.
- More Optimizers and Regularizers: Adding more advanced optimizers and regularization techniques like L1/L2 weight decay.
- Static Graph & JIT Compilation: Exploring the possibility of a static graph mode with a just-in-time (JIT) compiler for performance optimization.