Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ You are an agent working on `neutro`, an "intentionally naive" and educational i
2. **Keras API Fidelity**: Maintain strict compatibility with Keras/TensorFlow APIs (`compile`, `fit`, `predict`, `evaluate`, `summary`, `Sequential`, `Model`).
3. **Educational Clarity**: Code should be readable and reflect the underlying mathematical algorithms (e.g., FlashAttention, MoE routing, RoPE). Use clear variable names and minimal but impactful comments.
4. **No Magic**: Avoid complex meta-programming or obscure libraries. If a layer needs a backward pass, implement it explicitly.
5. **Nested Training**: Ensure that nested layers (layers within blocks) are discovered and updated by the optimizer. Use `Layer.sublayers` to traverse the hierarchy.
5. **No Autograd**: `neutro` has no automatic differentiation engine. There is no equivalent of PyTorch's `autograd` or JAX's `grad`. Every layer MUST implement its own `backward(grad_output)` that manually computes gradients using the chain rule. This is the defining educational feature of the library — you *are* the autograd engine.
6. **Nested Training**: Ensure that nested layers (layers within blocks) are discovered and updated by the optimizer. Use `Layer.sublayers` to traverse the hierarchy.

## Implementation Details

Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@ Let's be honest: modern DL frameworks are black boxes. You pip install 4GB of bi
- **A Toy, not a Tool**: This isn't meant for production. It's a playground for learning advanced algorithms (MHA, GQA, FlashAttention, LSTM) in their purest form.
- **For the Wisdom-Rich**: If you remember when 64MB of RAM was a flex and "vectorization" meant loop unrolling, this is for you. It's a fun way to play with cutting-edge 2024 algorithms using 1990s-era clarity.

## 🚫 No Autograd

Unlike PyTorch or TensorFlow, `neutro` has **zero automatic differentiation**. You will not find an `autograd` engine here. Every gradient is computed by hand — each layer implements its own `backward` method using explicit matrix multiplications and the chain rule.

This is not a bug, it's the feature. Writing `self.grads['W'] = inputs.T @ grad_output` is how you *learn* what backpropagation actually does.

---

## 🚀 What's Inside?
Expand Down
36 changes: 36 additions & 0 deletions docs/layers/base.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,42 @@ y = layer(x)
8. `Dense.forward` computes `np.dot(x, W) + b`, applies ReLU, caches `self.inputs` and `self.z`, returns the output.
9. Later, `layer.backward(grad_output)` uses those cached values to compute weight gradients.

## 🚫 No Autograd — You Write the Gradients

This is the single most important thing to understand about `neutro`:

**There is no automatic differentiation engine.**

In PyTorch, you write:

```python
y = x @ W + b # PyTorch traces this into a graph
y.backward() # PyTorch automatically computes gradients for W and b
```

In `neutro`, you write both `forward` AND `backward`:

```python
def forward(self, x):
self.inputs = x
return x @ self.params['W'] + self.params['b']

def backward(self, grad_output):
self.grads['W'] = self.inputs.T @ grad_output
self.grads['b'] = np.sum(grad_output, axis=0)
return grad_output @ self.params['W'].T
```

Why? Because every matrix multiplication you write in `backward` — every `@`, every `np.sum`, every `reshape` — is an explicit application of the **chain rule**. You are not calling `loss.backward()`. You *are* the autograd engine.

This means:
- **If you add a new layer**, you must implement `backward` yourself — no framework will do it for you.
- **If you change the forward pass**, you must update backward to match. Every new line in `forward` probably needs a corresponding line in `backward`.
- **If backward gives wrong shapes**, you'll get a NumPy shape mismatch error — not a cryptic autograd graph error. You'll learn to think in shapes.
- **Every value you cache on `self` in `forward`** (like `self.inputs` or `self.z`) is cached for one reason: `backward` needs it. There is no tape, no graph, no magic — just stored NumPy arrays and chain rule math.

This is the defining educational feature of the library. You can't hand-wave through gradient descent here. You must understand where gradients come from.

## Try it yourself

Here's how you'd create a custom `MyDense` layer from scratch:
Expand Down
9 changes: 9 additions & 0 deletions docs/layers/core/core_utility_layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,15 @@ def backward(self, grad_output):

🔍 **Line `if self.mask is None`**: If we never called forward (or called it with `training=False`), there's no mask. In that case, the gradient passes through unchanged — just like the forward pass.

#### `compute_output_shape`

```python
def compute_output_shape(self, input_shape):
return input_shape
```

Dropout does not change tensor rank or dimensions; it only masks values during training. So the output shape is always identical to the input shape.

---

## Flatten — `neutro/layers/core/flatten.py`
Expand Down
4 changes: 4 additions & 0 deletions neutro/layers/core/dropout.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,14 @@ def __init__(self, rate, **kwargs):

def forward(self, inputs, training=False):
if not training or self.rate == 0:
self.mask = None
return inputs
self.mask = np.random.binomial(1, 1 - self.rate, size=inputs.shape) / (1 - self.rate)
return inputs * self.mask

def compute_output_shape(self, input_shape):
return input_shape
Comment on lines +17 to +18

def backward(self, grad_output):
if self.mask is None:
return grad_output
Expand Down
46 changes: 46 additions & 0 deletions tests/engine/test_node.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import numpy as np
from neutro.engine.node import KerasTensor, Node


class FakeLayer:
def __init__(self):
self.name = "fake_layer"


def test_keras_tensor_repr():
t = KerasTensor(shape=(None, 32, 32, 3), name="input")
r = repr(t)
assert "KerasTensor" in r
assert "(None, 32, 32, 3)" in r
assert "input" in r


def test_node_single_output():
layer = FakeLayer()
output = KerasTensor(shape=(None, 10), name="output")
node = Node(layer, input_tensors=[], output_tensors=output)

assert node.layer is layer
assert node.output_tensors is output
assert output.node is node
assert layer._inbound_nodes == [node]


def test_node_list_output():
layer = FakeLayer()
out1 = KerasTensor(shape=(None, 5))
out2 = KerasTensor(shape=(None, 3))
node = Node(layer, input_tensors=[], output_tensors=[out1, out2])

assert node.output_tensors == [out1, out2]
assert out1.node is node
assert out2.node is node


def test_node_repr():
layer = FakeLayer()
output = KerasTensor(shape=(None, 10))
node = Node(layer, input_tensors=[], output_tensors=output)
r = repr(node)
assert "Node" in r
assert "fake_layer" in r
136 changes: 128 additions & 8 deletions tests/layers/core/test_dropout.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,137 @@
import numpy as np
import pytest
from neutro.layers.core.dropout import Dropout
from neutro.models.base_model import Sequential

def test_dropout():

def test_dropout_inference():
layer = Dropout(0.5)
x = np.random.rand(10, 10)

# Inference

out_inf = layer.forward(x, training=False)
assert np.all(out_inf == x)

# Training


def test_dropout_training():
layer = Dropout(0.5)
x = np.random.rand(10, 10)

out_train = layer.forward(x, training=True)
assert not np.all(out_train == x)

grad = layer.backward(np.random.rand(10, 10))
assert grad.shape == (10, 10)


def test_dropout_rate_zero():
layer = Dropout(0.0)
x = np.random.rand(10, 10)

out = layer.forward(x, training=True)
assert np.all(out == x)

grad = np.random.rand(10, 10)
dx = layer.backward(grad)
assert np.all(dx == grad)


def test_dropout_1d_input():
layer = Dropout(0.5)
x = np.random.rand(20)

out = layer.forward(x, training=True)
assert out.shape == (20,)
assert not np.all(out == x)

grad = np.random.rand(20)
dx = layer.backward(grad)
assert dx.shape == (20,)


def test_dropout_3d_input():
layer = Dropout(0.3)
x = np.random.rand(4, 16, 64)

out = layer.forward(x, training=True)
assert out.shape == (4, 16, 64)

grad = np.random.rand(4, 16, 64)
dx = layer.backward(grad)
assert dx.shape == (4, 16, 64)


def test_dropout_statistics():
layer = Dropout(0.5)
x = np.ones((1000, 100))

out = layer.forward(x, training=True)
zero_fraction = np.mean(out == 0)
assert 0.45 < zero_fraction < 0.55


def test_dropout_backward_inference():
layer = Dropout(0.5)
x = np.random.rand(10, 10)
grad = np.random.rand(10, 10)

layer.forward(x, training=True)
layer.forward(x, training=False)
dx = layer.backward(grad)
assert np.all(dx == grad)
Comment on lines +69 to +77


def test_dropout_backward_values():
layer = Dropout(0.5)
x = np.ones((10, 10))
grad = np.ones((10, 10))

layer.forward(x, training=True)

dx = layer.backward(grad)
expected_dx = grad * layer.mask
np.testing.assert_allclose(dx, expected_dx)


def test_dropout_backward_no_forward():
layer = Dropout(0.5)

grad = np.random.rand(10, 10)
dx = layer.backward(grad)
assert np.all(dx == grad)


def test_dropout_compute_output_shape():
layer = Dropout(0.5)

shape = layer.compute_output_shape((None, 32))
assert shape == (None, 32)

shape = layer.compute_output_shape((16, 32))
assert shape == (16, 32)

shape = layer.compute_output_shape((None, 16, 64))
assert shape == (None, 16, 64)


def test_dropout_in_sequential_model():
model = Sequential([
Dropout(0.5),
Dropout(0.3),
Dropout(0.0),
])
x = np.random.rand(8, 32)

out = model.forward(x, training=True)
assert out.shape == (8, 32)

out_inf = model.forward(x, training=False)
assert np.all(out_inf == x)


def test_dropout_mask_recreated_each_forward():
layer = Dropout(0.5)
x = np.ones((100, 100))

out1 = layer.forward(x, training=True)
mask1 = (out1 != 0).astype(float)
out2 = layer.forward(x, training=True)
mask2 = (out2 != 0).astype(float)

assert not np.all(mask1 == mask2)
52 changes: 52 additions & 0 deletions tests/layers/core/test_input_layer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
import numpy as np
import pytest
from neutro.layers.core.input_layer import InputLayer, Input
from neutro.engine.node import KerasTensor


def test_input_layer_forward():
layer = InputLayer(input_shape=(4,))
out = layer.forward(np.array([1, 2, 3, 4]))
assert np.array_equal(out, np.array([1, 2, 3, 4]))


def test_input_layer_backward():
layer = InputLayer(input_shape=(4,))
grad = layer.backward(np.array([0.1, 0.2, 0.3, 0.4]))
assert np.array_equal(grad, np.array([0.1, 0.2, 0.3, 0.4]))


def test_input_layer_build_immediate():
layer = InputLayer(input_shape=(28, 28, 1))
assert layer.built
assert layer.input_shape == (28, 28, 1)


def test_input_layer_build_explicit():
layer = InputLayer()
layer.build((None, 28, 28, 1))
assert layer.built
assert layer.input_shape == (None, 28, 28, 1)


def test_input_no_shape_raises():
with pytest.raises(ValueError, match="Please provide a shape"):
Input(shape=None)


def test_input_with_list_shape():
tensor = Input(shape=[28, 28, 1])
assert isinstance(tensor, KerasTensor)
assert tensor.shape == (None, 28, 28, 1)


def test_input_with_tuple_shape():
tensor = Input(shape=(28, 28, 1))
assert isinstance(tensor, KerasTensor)
assert tensor.shape == (None, 28, 28, 1)


def test_input_with_batch_shape():
tensor = Input(shape=(None, 28, 28, 1))
assert isinstance(tensor, KerasTensor)
assert tensor.shape == (None, 28, 28, 1)
Loading
Loading