Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ jobs:
contents: write
strategy:
matrix:
python-version: ["3.9", "3.10", "3.11", "3.12"]
python-version: ["3.10", "3.11", "3.12"]

steps:
- uses: actions/checkout@v4
Expand Down
12 changes: 12 additions & 0 deletions Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ You are an agent working on `neutro`, an "intentionally naive" and educational i
- `RegexTokenizer` is preferred for LLM tasks, implementing byte-level BPE with regex splitting.
- Maintain educational clarity: explicitly implement the greedy merge process without obscure optimizations.

## Documentation Sync

Whenever you modify a source file under `neutro/layers/`, `neutro/models/`, or `neutro/engine/`, you MUST update its corresponding documentation file under `docs/`. The doc path mirrors the source path (e.g., `neutro/layers/core/dense.py` ↔ `docs/layers/core/dense.md`).

Required for every doc change:
- Follow the **line-by-line walkthrough** style: explain `__init__`, `build`, `forward`, `backward` in sequence.
- Add 🔍 **"Why" annotations** on every stored/cached value — explain what it's used for in backward.
- Add 📐 **Shape walkthroughs** on every matrix operation — show `(B, D) @ (D, U) → (B, U)`.
- Reference exact file paths and line numbers in the source.
- If creating a new layer, create a new `.md` file in the corresponding `docs/` subdirectory.
- Run `pytest` after doc changes to verify no regressions.

## Testing
- Aim for >90% test coverage.
- Use `pytest`.
Expand Down
77 changes: 77 additions & 0 deletions docs/activations/activations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Activation Functions

## Theory

Activation functions introduce non-linearity into neural networks. Without them, stacking linear layers would collapse into a single linear transformation.

### ReLU — `neutro/activations/relu.py`

$$\text{ReLU}(x) = \max(0, x)$$

$$\text{ReLU}'(x) = \mathbf{1}_{x > 0}$$

- **Gradient**: 1 for positive inputs, 0 for negative. This causes the "dying ReLU" problem where neurons can get stuck at 0.

### Sigmoid — `neutro/activations/sigmoid.py`

$$\sigma(x) = \frac{1}{1 + e^{-x}}$$

$$\sigma'(x) = \sigma(x)(1 - \sigma(x))$$

- Output range: (0, 1). Used for binary classification or as gating mechanism (LSTM, GRU).
- **Vanishing gradient**: for very large or very small inputs, the gradient approaches 0.

### Tanh — `neutro/activations/tanh.py`

$$\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$$

$$\tanh'(x) = 1 - \tanh^2(x)$$

- Output range: (-1, 1). Zero-centered, often preferred over sigmoid in hidden layers.

### Softmax — `neutro/activations/softmax.py`

$$\text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$$

- Output: probability distribution over classes.
- **Jacobian-Vector Product** (`gradient_fast`, line 18): computes $y * (\text{grad\_output} - \sum(y * \text{grad\_output}))$ without building the full $N \times N$ Jacobian.

### SiLU — `neutro/activations/silu.py` (Sigmoid Linear Unit)

$$\text{SiLU}(x) = x \cdot \sigma(x)$$

$$\text{SiLU}'(x) = \sigma(x) + x \cdot \sigma(x) \cdot (1 - \sigma(x))$$

- Also called Swish. Used in modern architectures (e.g., Llama, GPT).

## Implementation Guide

All activations follow the same pattern:

```python
class ReLU:
def forward(self, x): ...
def gradient(self, x): ... # element-wise gradient
def gradient_fast(self, x, grad): ... # fused JVP (optional)
```

- `forward` is used by `Dense` and other layers in the forward pass.
- `gradient` returns the element-wise derivative, which is multiplied by the upstream gradient in `Dense.backward`.
- `gradient_fast` is an optimization used by Softmax to avoid the full Jacobian matrix.

## Usage Example

```python
from neutro.activations import get_activation

relu = get_activation('relu')
x = np.array([-1, 0, 2])
y = relu(x) # [0, 0, 2]
dy = relu.gradient(x) # [0, 0, 1]
```

## References

- Nair, V., & Hinton, G. E. (2010). **Rectified Linear Units Improve Restricted Boltzmann Machines**.
- Hendrycks, D., & Gimpel, K. (2016). **Gaussian Error Linear Units (GELUs)**.
- Elfwing, S., Uchibe, E., & Doya, K. (2018). **Sigmoid-weighted linear units for neural network function approximation in reinforcement learning**.
60 changes: 60 additions & 0 deletions docs/callbacks/callbacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# Callbacks

## Theory

Callbacks are objects that hook into the training loop at various points. They allow you to monitor training, save checkpoints, adjust learning rates, and stop training early without cluttering the training loop itself.

**Hook points** (in order):
1. `on_train_begin` / `on_train_end`
2. `on_epoch_begin` / `on_epoch_end`
3. `on_batch_begin` / `on_batch_end`

## Implementation Guide

### File: `neutro/callbacks/base.py`

```python
class Callback:
def set_model(self, model): ...
def on_train_begin(self, logs=None): ...
def on_train_end(self, logs=None): ...
def on_epoch_begin(self, epoch, logs=None): ...
def on_epoch_end(self, epoch, logs=None): ...
def on_batch_begin(self, batch, logs=None): ...
def on_batch_end(self, batch, logs=None): ...
```

All methods are no-ops by default. Subclasses override the needed hooks.

### History — `neutro/callbacks/history.py`

Records per-epoch metrics into `history.history` dict (keys: `loss`, `val_loss`, `accuracy`, etc.).

### EarlyStopping — `neutro/callbacks/early_stopping.py`

Monitors a metric (e.g., `val_loss`) and stops training if it hasn't improved for `patience` epochs. Uses `model.stop_training = True`.

### ReduceLROnPlateau / LR Scheduler — `neutro/callbacks/lr_scheduler.py`

Reduces the learning rate when a metric plateaus, or follows a predefined schedule.

### Checkpoint — `neutro/callbacks/checkpoint.py`

Saves the model to disk at the end of each epoch using `joblib.dump`.

## Usage Example

```python
from neutro.callbacks import EarlyStopping, ModelCheckpoint, ReduceLROnPlateau

callbacks = [
EarlyStopping(monitor='val_loss', patience=5),
ModelCheckpoint('best_model.pkl', save_best_only=True),
ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3),
]
model.fit(X, y, callbacks=callbacks, epochs=100)
```

## References

- Keras Callbacks API. [Keras.io](https://keras.io/api/callbacks/)
34 changes: 34 additions & 0 deletions docs/data/data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Data

## DataLoader — `neutro/data.py`

A simple data loader for batching and shuffling:

```python
class DataLoader:
def __init__(self, x, y, batch_size=32, shuffle=True, augmenter=None):
self.x = x
self.y = y
self.batch_size = batch_size
self.shuffle = shuffle
self.augmenter = augmenter
self.indices = np.arange(len(x))
self.on_epoch_end()

def __len__(self):
return int(np.ceil(len(self.x) / self.batch_size))

def on_epoch_end(self):
if self.shuffle:
np.random.shuffle(self.indices)

def __getitem__(self, index):
batch_idx = self.indices[index * self.batch_size:(index + 1) * self.batch_size]
batch_x, batch_y = self.x[batch_idx], self.y[batch_idx]
return batch_x, batch_y

def __iter__(self):
for i in range(len(self)):
yield self[i]
self.on_epoch_end()
```
113 changes: 113 additions & 0 deletions docs/engine/node.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# KerasTensor, Node, and the Functional API Graph Engine

## Theory

The Functional API lets you build models as directed acyclic graphs (DAGs) of layers, rather than as linear stacks. This requires a mechanism to track *symbolic* data flow during model construction, before any real data is seen.

Two core classes enable this:

- **`KerasTensor`**: A symbolic placeholder representing the *future* output of a layer. It carries a `shape` but no actual data.
- **`Node`**: A record of one *call* to a layer. It links input `KerasTensor`s → output `KerasTensor`s and is stored on the layer's `_inbound_nodes` list.

When you write `outputs = Dense(32)(inputs)`, the layer's `__call__` method detects that `inputs` is a `KerasTensor`, builds the layer (if needed), computes the output shape symbolically, wraps it in a new `KerasTensor`, and records a `Node`. No NumPy computation occurs.

Later, `Model._init_graph` traverses the graph backward from the outputs to discover all reachable `Node`s and `Layer`s, producing a topological ordering used for forward and backward execution.

## Implementation Guide

### `KerasTensor` — `neutro/engine/node.py:3-13`

```python
class KerasTensor:
def __init__(self, shape, node=None, name=None):
self.shape = shape
self.node = node # The Node that produced this tensor
self.name = name
```

- `shape` is a tuple like `(None, 32)` — the batch dimension is `None` (unknown until runtime).
- `node` is set when a `Node` is created and links back to the producing layer.

### `Node` — `neutro/engine/node.py:15-38`

```python
class Node:
def __init__(self, layer, input_tensors, output_tensors):
self.layer = layer
self.input_tensors = input_tensors
self.output_tensors = output_tensors
layer._inbound_nodes.append(self)
# Link output tensors back to this node
if isinstance(output_tensors, list):
for t in output_tensors:
t.node = self
else:
output_tensors.node = self
```

Key behaviors:
- **Registration**: The node registers itself on `layer._inbound_nodes`, enabling multi-parent graph traversal.
- **One layer, many nodes**: A shared layer used 3 times will have 3 entries in `_inbound_nodes`, each with different input/output tensors.
- **List outputs**: Layers like `Add` that take lists of inputs store the lists in `input_tensors`. Multi-output layers store lists in `output_tensors`.

### How `Layer.__call__` triggers Node creation — `neutro/layers/base.py:67-105`

The symbolic path (line 77-97):

```python
if is_symbolic:
if not self.built:
self.build(input_shapes) # e.g., Dense.build((None, 10))
output_shape = self.compute_output_shape(input_shapes)
output_tensors = KerasTensor(shape=output_shape)
Node(self, input_tensors=inputs, output_tensors=output_tensors)
return output_tensors
```

This is a **zero-computation** path: no `forward` is called, only shape inference.

## Graph Discovery (`Model._init_graph`) — `neutro/models/base_model.py:25-62`

```python
def traverse(tensor):
if hasattr(tensor, 'node') and tensor.node:
node = tensor.node
if node not in visited_nodes:
visited_nodes.add(node)
# Recursively visit inputs
if isinstance(node.input_tensors, list):
for t in node.input_tensors:
traverse(t)
else:
traverse(node.input_tensors)
nodes_ordered.append(node)
```

This produces `_nodes_ordered` in **reverse topological order** (inputs before outputs). The backward pass iterates `reversed(_nodes_ordered)`.

## Usage Example

```python
from neutro.layers import Input, Dense
from neutro.models import Model
from neutro.engine.node import KerasTensor, Node

# Symbolic construction
inputs = Input(shape=(4,)) # returns a KerasTensor
x = Dense(8, activation='relu')(inputs) # Layer.__call__ creates a Node
outputs = Dense(1)(x)

# Inspect the graph
print(type(inputs)) # <class 'KerasTensor'>
print(inputs.shape) # (None, 4)
print(outputs.node.layer) # Dense(1) — the final layer

# Model discovers nodes via traversal
model = Model(inputs=inputs, outputs=outputs)
print(len(model._nodes_ordered)) # Number of Nodes discovered
```

## References

- Chollet, F. (2015). **Keras** — the Functional API was introduced in Keras 1.0. [GitHub](https://github.com/keras-team/keras)
- Keras Functional API Guide. [Keras.io](https://keras.io/guides/functional_api/)
52 changes: 52 additions & 0 deletions docs/initializers/initializers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Initializers

## Theory

Weight initialization is critical for training deep networks. Poor initialization can cause vanishing/exploding gradients. `neutro` implements several strategies.

### Glorot (Xavier) Uniform — `neutro/initializers/glorot.py`

$$W \sim U\left[-\sqrt{\frac{6}{n_{\text{in}} + n_{\text{out}}}}, \sqrt{\frac{6}{n_{\text{in}} + n_{\text{out}}}}\right]$$

Recommended for layers with tanh or sigmoid activation.

### He Initialization — `neutro/initializers/he.py`

$$W \sim N\left(0, \sqrt{\frac{2}{n_{\text{in}}}}\right)$$

Recommended for layers with ReLU activation. Keeps variance of activations constant across layers.

### Constant — `neutro/initializers/constant.py`

$W = c$ for a constant $c$. Used for bias initialization (typically $c=0$).

### Random — `neutro/initializers/random.py`

$$W \sim N(\text{mean}, \text{stddev})$$

## Implementation Guide

All initializers are callable objects:

```python
class GlorotUniform:
def __call__(self, shape):
limit = np.sqrt(6 / (shape[0] + shape[1]))
return np.random.uniform(-limit, limit, size=shape)
```

They are instantiated in layer `__init__` and called in `build`:

```python
class Dense(Layer):
def __init__(self, units, kernel_initializer='glorot_uniform', ...):
self.kernel_initializer = get_initializer(kernel_initializer)

def build(self, input_shape):
self.params['W'] = self.kernel_initializer((input_shape[-1], self.units))
```

## References

- Glorot, X., & Bengio, Y. (2010). **Understanding the difficulty of training deep feedforward neural networks**. *AISTATS*.
- He, K., et al. (2015). **Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification**. [arXiv:1502.01852](https://arxiv.org/abs/1502.01852)
Loading
Loading