Skip to content

A deep learning framework built from scratch in Python and NumPy to demonstrate the principles of automatic differentiation and neural networks.

License

Notifications You must be signed in to change notification settings

DevangK18/Foundry

Repository files navigation

Foundry

License: MIT

Foundry is a deep learning framework that was created entirely with NumPy and Python. This project aims to engage in first-principles thinking and extend the high-level abstractions common to modern libraries. This library functions as a clear, readable, and thoroughly documented artefact for demonstrating the fundamental ideas of automatic differentiation, neural network architecture, and training pipelines by implementing the essential elements of a framework such as PyTorch or TensorFlow.

Motivation and Educational Philosophy

Modern deep learning frameworks are marvels of software engineering, but their complexity can often obscure the fundamental principles they are built upon. The motivation for Foundry was to deconstruct this complexity and build a working mental model of how these systems operate internally.

This project draws inspiration from the design philosophies of two of the most influential frameworks:

  • PyTorch: Foundry's core design heavily emulates PyTorch's dynamic computational graphs and user-centric API. The "define-by-run" approach, where the graph is built on the fly as operations are executed, offers immense flexibility and is more intuitive for debugging and building complex, dynamic architectures. The autograd engine in Foundry is a direct implementation of this philosophy.
  • TensorFlow & Keras: The high-level model API in Foundry is inspired by Keras. By providing simple, chainable methods like .fit(), .evaluate(), and .compile(), the framework abstracts away the boilerplate of the training loop, allowing the user to focus on model architecture. This demonstrates the power of layered API design, where low-level control is available but not required for simple tasks.

Ultimately, Foundry is an educational tool. It is an attempt to answer questions like: What is a Tensor beyond a multi-dimensional array? How does .backward() actually work? How is the chain rule implemented in code? The result is a library that is simple enough to be understood in its entirety, yet functional enough to build and train real neural networks.

Features

  • Dynamic Computational Graph: A tape-based automatic differentiation engine that dynamically builds a graph of operations. This emulates PyTorch's define-by-run approach, allowing for more flexible model design and easier debugging.
  • Rich Tensor Operations: A comprehensive Tensor class that wraps numpy.ndarray and overloads standard operators (+, *, @, etc.) to automatically track the history of operations for gradient computation.
  • Modular Neural Network Primitives: A dedicated nn package containing the essential building blocks for neural networks, promoting a clean, object-oriented design.
    • Layers: Dense, BatchNorm1D, Dropout. Each layer encapsulates its parameters and forward-pass logic.
    • Activations: ReLU, Sigmoid, Tanh, Softmax, implemented as distinct, reusable operations.
    • Losses: MSELoss, CrossEntropyLoss, crucial for quantifying model error during training.
  • High-Level Model API: A Keras-inspired Model and Sequential API that abstracts the training loop into intuitive .fit(), .evaluate(), and .predict() methods.
  • Modern Optimizers: A suite of standard optimization algorithms, including SGD (with momentum), Adam, and RMSprop. The design decouples the optimization logic from the model parameters, making it easy to swap strategies.
  • Gradient Verification Utilities: A gradient_checking tool to numerically verify the correctness of the analytical gradients computed by the backward pass, ensuring the autograd engine is bug-free.

Architecture

Foundry is designed with a modular, layered architecture that promotes a clear separation of concerns, a principle borrowed from production-grade frameworks. This structure makes the codebase easier to navigate, maintain, and extend.

1. The Core Engine: foundry.core

This is the foundational layer responsible for all numerical computation and gradient tracking. It is the framework's central nervous system.

  • tensor.py: Defines the Tensor object, the central data abstraction of the framework. It is an advanced wrapper around a NumPy array that contains not only the data but also the necessary metadata for the autograd engine:
    • data: The underlying numpy.ndarray that holds the numerical values.
    • requires_grad: A boolean flag that acts as a signal to the autograd engine. If True, the engine will track operations involving this tensor to compute gradients for it later.
    • grad: After .backward() is called, this attribute accumulates the gradient of the final scalar value with respect to this tensor.
    • grad_fn: A reference to the Function object that created this tensor. This is the key to building the computational graph, as it provides a backward link from a tensor to its parent operation and, through that, to its input tensors. Tensors created by the user (leaf nodes) have grad_fn=None.
  • autograd.py: Implements the automatic differentiation engine. This is the heart of the framework's "learning" capability.
    • Function: An abstract base class for every differentiable operation (e.g., Add, Mul, MatMul, ReLU). Each operation class inherits from Function and implements two critical methods:
      • forward(): Takes input tensors and computes the output tensor.
      • backward(): Takes the gradient from the next layer in the graph and computes the local gradients with respect to its own inputs, applying the chain rule.
    • The Backward Pass: The learning process is enabled by reverse-mode automatic differentiation. When .backward() is called on a final, scalar tensor (like the output of a loss function), the following happens:
      1. A topological sort is performed on the computational graph, starting from the final tensor and tracing back through the grad_fn attributes of all ancestor tensors. This creates an ordered list of operations to execute.
      2. The engine traverses this graph in reverse order.
      3. At each node (a Function object), it calls the backward() method, passing in the gradient computed from the previous step. This method then calculates the gradients for its inputs and passes them down the graph.
      4. This process continues until all leaf nodes with requires_grad=True have had their grad attributes populated.

2. The Neural Network Module: foundry.nn

This module provides the abstractions and building blocks for creating and training neural networks. It serves as the primary user-facing API for model construction.

  • layers.py: Contains the network layers. The Layer class is an abstract base class that defines a common interface, encapsulating both state (learnable parameters like weights and biases) and computation (the forward pass). This object-oriented design makes layers reusable and chainable.
  • models.py: Implements a high-level API for model building. The Model class provides a Keras-style interface that abstracts away the training loop. This is an important design choice that separates the model's architecture from the logic of how it's trained, making the code cleaner and more user-friendly. The Sequential model is a simple container for a linear stack of layers.
  • losses.py & activations.py: These modules provide standard loss functions and activation functions. By implementing them as distinct classes or functions, they can be easily swapped out during model definition or training.

3. The Optimization Module: foundry.optim

This module contains the optimization algorithms responsible for updating the model's parameters based on the computed gradients.

  • optimizers.py: The Optimizer base class defines the core interface. It is initialized with a list of model parameters it needs to track. Its primary methods are:
    • .zero_grad(): Resets the gradients of all parameters before a new backpropagation pass. This is necessary because gradients are accumulated by default.
    • .step(): Updates the parameters using their .grad attribute according to the specific optimization algorithm's logic (e.g., the update rule for Adam or SGD). This design decouples the gradient calculation (done by the autograd engine) from the parameter update logic, making the framework highly modular.

Installation

It is recommended to use a virtual environment to manage dependencies.

Using Conda (Recommended)

  1. Clone the repository:
    git clone [https://github.com/your-username/foundry.git](https://github.com/your-username/foundry.git)
    cd foundry
  2. Create and activate the Conda environment:
    conda env create -f environment.yml
    conda activate foundry
  3. Install the package in editable mode:
    pip install -e .

Using pip

  1. Clone the repository and navigate to the project directory.
  2. Create and activate a virtual environment:
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install the package and its dependencies:
    pip install -e .[dev]

Quick Start

Here is a simple example of building and training a Multi-Layer Perceptron (MLP) for a regression task.

import numpy as np
from foundry.nn.models import MLP
from foundry.nn.losses import MSELoss
from foundry.optim.optimizers import Adam

# 1. Create a simple dataset
X = np.random.randn(100, 10).astype(np.float32)
y_true = np.random.randn(100, 1).astype(np.float32)

# 2. Define the model architecture
model = MLP(
    input_size=10,
    hidden_sizes=[64, 32],
    output_size=1,
    activation='relu'
)

# 3. Compile the model with a loss function and an optimizer
loss_fn = MSELoss()
optimizer = Adam(model.parameters(), lr=0.01)
model.compile(loss=loss_fn, optimizer=optimizer)

# 4. Train the model
model.fit(X, y_true, epochs=10, batch_size=32, verbose=1)

# 5. Make predictions
predictions = model.predict(X)
print(f"Sample predictions: {predictions[:5].flatten()}")

Running Tests

The project has a comprehensive test suite to ensure correctness, especially for the gradient calculations. To run the tests, use pytest:

pytest

To view the test coverage report:

pytest --cov=src/foundry

Future Work

While this project was primarily for learning, there are several areas where it could be extended:

  • Convolutional & Recurrent Layers: Implementing Conv2D, MaxPool2D, and RNN/LSTM layers to handle image and sequence data.
  • GPU Support: Integrating a GPU backend (e.g., via CuPy) to accelerate computations.
  • More Optimizers and Regularizers: Adding more advanced optimizers and regularization techniques like L1/L2 weight decay.
  • Static Graph & JIT Compilation: Exploring the possibility of a static graph mode with a just-in-time (JIT) compiler for performance optimization.

About

A deep learning framework built from scratch in Python and NumPy to demonstrate the principles of automatic differentiation and neural networks.

Topics

Resources

License

Stars

Watchers

Forks

Languages