sourcepirate · sourcepirate · May 31, 2026 · May 31, 2026 · May 31, 2026
diff --git a/Agents.md b/Agents.md
@@ -8,7 +8,8 @@ You are an agent working on `neutro`, an "intentionally naive" and educational i
 2.  **Keras API Fidelity**: Maintain strict compatibility with Keras/TensorFlow APIs (`compile`, `fit`, `predict`, `evaluate`, `summary`, `Sequential`, `Model`).
 3.  **Educational Clarity**: Code should be readable and reflect the underlying mathematical algorithms (e.g., FlashAttention, MoE routing, RoPE). Use clear variable names and minimal but impactful comments.
 4.  **No Magic**: Avoid complex meta-programming or obscure libraries. If a layer needs a backward pass, implement it explicitly.
-5.  **Nested Training**: Ensure that nested layers (layers within blocks) are discovered and updated by the optimizer. Use `Layer.sublayers` to traverse the hierarchy.
+5.  **No Autograd**: `neutro` has no automatic differentiation engine. There is no equivalent of PyTorch's `autograd` or JAX's `grad`. Every layer MUST implement its own `backward(grad_output)` that manually computes gradients using the chain rule. This is the defining educational feature of the library — you *are* the autograd engine.
+6.  **Nested Training**: Ensure that nested layers (layers within blocks) are discovered and updated by the optimizer. Use `Layer.sublayers` to traverse the hierarchy.
 
 ## Implementation Details
 

diff --git a/README.md b/README.md
@@ -17,6 +17,12 @@ Let's be honest: modern DL frameworks are black boxes. You pip install 4GB of bi
 - **A Toy, not a Tool**: This isn't meant for production. It's a playground for learning advanced algorithms (MHA, GQA, FlashAttention, LSTM) in their purest form.
 - **For the Wisdom-Rich**: If you remember when 64MB of RAM was a flex and "vectorization" meant loop unrolling, this is for you. It's a fun way to play with cutting-edge 2024 algorithms using 1990s-era clarity.
 
+## 🚫 No Autograd
+
+Unlike PyTorch or TensorFlow, `neutro` has **zero automatic differentiation**. You will not find an `autograd` engine here. Every gradient is computed by hand — each layer implements its own `backward` method using explicit matrix multiplications and the chain rule.
+
+This is not a bug, it's the feature. Writing `self.grads['W'] = inputs.T @ grad_output` is how you *learn* what backpropagation actually does.
+
 ---
 
 ## 🚀 What's Inside?

diff --git a/docs/layers/base.md b/docs/layers/base.md
@@ -247,6 +247,42 @@ y = layer(x)
 8. `Dense.forward` computes `np.dot(x, W) + b`, applies ReLU, caches `self.inputs` and `self.z`, returns the output.
 9. Later, `layer.backward(grad_output)` uses those cached values to compute weight gradients.
 
+## 🚫 No Autograd — You Write the Gradients
+
+This is the single most important thing to understand about `neutro`:
+
+**There is no automatic differentiation engine.**
+
+In PyTorch, you write:
+
+```python
+y = x @ W + b      # PyTorch traces this into a graph
+y.backward()       # PyTorch automatically computes gradients for W and b
+```
+
+In `neutro`, you write both `forward` AND `backward`:
+
+```python
+def forward(self, x):
+    self.inputs = x
+    return x @ self.params['W'] + self.params['b']
+
+def backward(self, grad_output):
+    self.grads['W'] = self.inputs.T @ grad_output
+    self.grads['b'] = np.sum(grad_output, axis=0)
+    return grad_output @ self.params['W'].T
+```
+
+Why? Because every matrix multiplication you write in `backward` — every `@`, every `np.sum`, every `reshape` — is an explicit application of the **chain rule**. You are not calling `loss.backward()`. You *are* the autograd engine.
+
+This means:
+- **If you add a new layer**, you must implement `backward` yourself — no framework will do it for you.
+- **If you change the forward pass**, you must update backward to match. Every new line in `forward` probably needs a corresponding line in `backward`.
+- **If backward gives wrong shapes**, you'll get a NumPy shape mismatch error — not a cryptic autograd graph error. You'll learn to think in shapes.
+- **Every value you cache on `self` in `forward`** (like `self.inputs` or `self.z`) is cached for one reason: `backward` needs it. There is no tape, no graph, no magic — just stored NumPy arrays and chain rule math.
+
+This is the defining educational feature of the library. You can't hand-wave through gradient descent here. You must understand where gradients come from.
+
 ## Try it yourself
 
 Here's how you'd create a custom `MyDense` layer from scratch:

diff --git a/docs/layers/core/core_utility_layers.md b/docs/layers/core/core_utility_layers.md
@@ -60,6 +60,15 @@ def backward(self, grad_output):
 
 🔍 **Line `if self.mask is None`**: If we never called forward (or called it with `training=False`), there's no mask. In that case, the gradient passes through unchanged — just like the forward pass.
 
+#### `compute_output_shape`
+
+```python
+def compute_output_shape(self, input_shape):
+    return input_shape
+```
+
+Dropout does not change tensor rank or dimensions; it only masks values during training. So the output shape is always identical to the input shape.
+
 ---
 
 ## Flatten — `neutro/layers/core/flatten.py`

diff --git a/neutro/layers/core/dropout.py b/neutro/layers/core/dropout.py
@@ -9,10 +9,14 @@ def __init__(self, rate, **kwargs):
 
     def forward(self, inputs, training=False):
         if not training or self.rate == 0:
+            self.mask = None
             return inputs
         self.mask = np.random.binomial(1, 1 - self.rate, size=inputs.shape) / (1 - self.rate)
         return inputs * self.mask
 
+    def compute_output_shape(self, input_shape):
+        return input_shape
+
     def backward(self, grad_output):
         if self.mask is None:
             return grad_output

diff --git a/tests/engine/test_node.py b/tests/engine/test_node.py
@@ -0,0 +1,46 @@
+import numpy as np
+from neutro.engine.node import KerasTensor, Node
+
+
+class FakeLayer:
+    def __init__(self):
+        self.name = "fake_layer"
+
+
+def test_keras_tensor_repr():
+    t = KerasTensor(shape=(None, 32, 32, 3), name="input")
+    r = repr(t)
+    assert "KerasTensor" in r
+    assert "(None, 32, 32, 3)" in r
+    assert "input" in r
+
+
+def test_node_single_output():
+    layer = FakeLayer()
+    output = KerasTensor(shape=(None, 10), name="output")
+    node = Node(layer, input_tensors=[], output_tensors=output)
+
+    assert node.layer is layer
+    assert node.output_tensors is output
+    assert output.node is node
+    assert layer._inbound_nodes == [node]
+
+
+def test_node_list_output():
+    layer = FakeLayer()
+    out1 = KerasTensor(shape=(None, 5))
+    out2 = KerasTensor(shape=(None, 3))
+    node = Node(layer, input_tensors=[], output_tensors=[out1, out2])
+
+    assert node.output_tensors == [out1, out2]
+    assert out1.node is node
+    assert out2.node is node
+
+
+def test_node_repr():
+    layer = FakeLayer()
+    output = KerasTensor(shape=(None, 10))
+    node = Node(layer, input_tensors=[], output_tensors=output)
+    r = repr(node)
+    assert "Node" in r
+    assert "fake_layer" in r
diff --git a/tests/layers/core/test_dropout.py b/tests/layers/core/test_dropout.py
@@ -1,17 +1,137 @@
 import numpy as np
+import pytest
 from neutro.layers.core.dropout import Dropout
+from neutro.models.base_model import Sequential
 
-def test_dropout():
+
+def test_dropout_inference():
     layer = Dropout(0.5)
     x = np.random.rand(10, 10)
-
-    # Inference
+
     out_inf = layer.forward(x, training=False)
     assert np.all(out_inf == x)
-
-    # Training
+
+
+def test_dropout_training():
+    layer = Dropout(0.5)
+    x = np.random.rand(10, 10)
+
     out_train = layer.forward(x, training=True)
     assert not np.all(out_train == x)
-
-    grad = layer.backward(np.random.rand(10, 10))
-    assert grad.shape == (10, 10)
+
+
+def test_dropout_rate_zero():
+    layer = Dropout(0.0)
+    x = np.random.rand(10, 10)
+
+    out = layer.forward(x, training=True)
+    assert np.all(out == x)
+
+    grad = np.random.rand(10, 10)
+    dx = layer.backward(grad)
+    assert np.all(dx == grad)
+
+
+def test_dropout_1d_input():
+    layer = Dropout(0.5)
+    x = np.random.rand(20)
+
+    out = layer.forward(x, training=True)
+    assert out.shape == (20,)
+    assert not np.all(out == x)
+
+    grad = np.random.rand(20)
+    dx = layer.backward(grad)
+    assert dx.shape == (20,)
+
+
+def test_dropout_3d_input():
+    layer = Dropout(0.3)
+    x = np.random.rand(4, 16, 64)
+
+    out = layer.forward(x, training=True)
+    assert out.shape == (4, 16, 64)
+
+    grad = np.random.rand(4, 16, 64)
+    dx = layer.backward(grad)
+    assert dx.shape == (4, 16, 64)
+
+
+def test_dropout_statistics():
+    layer = Dropout(0.5)
+    x = np.ones((1000, 100))
+
+    out = layer.forward(x, training=True)
+    zero_fraction = np.mean(out == 0)
+    assert 0.45 < zero_fraction < 0.55
+
+
+def test_dropout_backward_inference():
+    layer = Dropout(0.5)
+    x = np.random.rand(10, 10)
+    grad = np.random.rand(10, 10)
+
+    layer.forward(x, training=True)
+    layer.forward(x, training=False)
+    dx = layer.backward(grad)
+    assert np.all(dx == grad)
+
+
+def test_dropout_backward_values():
+    layer = Dropout(0.5)
+    x = np.ones((10, 10))
+    grad = np.ones((10, 10))
+
+    layer.forward(x, training=True)
+
+    dx = layer.backward(grad)
+    expected_dx = grad * layer.mask
+    np.testing.assert_allclose(dx, expected_dx)
+
+
+def test_dropout_backward_no_forward():
+    layer = Dropout(0.5)
+
+    grad = np.random.rand(10, 10)
+    dx = layer.backward(grad)
+    assert np.all(dx == grad)
+
+
+def test_dropout_compute_output_shape():
+    layer = Dropout(0.5)
+
+    shape = layer.compute_output_shape((None, 32))
+    assert shape == (None, 32)
+
+    shape = layer.compute_output_shape((16, 32))
+    assert shape == (16, 32)
+
+    shape = layer.compute_output_shape((None, 16, 64))
+    assert shape == (None, 16, 64)
+
+
+def test_dropout_in_sequential_model():
+    model = Sequential([
+        Dropout(0.5),
+        Dropout(0.3),
+        Dropout(0.0),
+    ])
+    x = np.random.rand(8, 32)
+
+    out = model.forward(x, training=True)
+    assert out.shape == (8, 32)
+
+    out_inf = model.forward(x, training=False)
+    assert np.all(out_inf == x)
+
+
+def test_dropout_mask_recreated_each_forward():
+    layer = Dropout(0.5)
+    x = np.ones((100, 100))
+
+    out1 = layer.forward(x, training=True)
+    mask1 = (out1 != 0).astype(float)
+    out2 = layer.forward(x, training=True)
+    mask2 = (out2 != 0).astype(float)
+
+    assert not np.all(mask1 == mask2)
diff --git a/tests/layers/core/test_input_layer.py b/tests/layers/core/test_input_layer.py
@@ -0,0 +1,52 @@
+import numpy as np
+import pytest
+from neutro.layers.core.input_layer import InputLayer, Input
+from neutro.engine.node import KerasTensor
+
+
+def test_input_layer_forward():
+    layer = InputLayer(input_shape=(4,))
+    out = layer.forward(np.array([1, 2, 3, 4]))
+    assert np.array_equal(out, np.array([1, 2, 3, 4]))
+
+
+def test_input_layer_backward():
+    layer = InputLayer(input_shape=(4,))
+    grad = layer.backward(np.array([0.1, 0.2, 0.3, 0.4]))
+    assert np.array_equal(grad, np.array([0.1, 0.2, 0.3, 0.4]))
+
+
+def test_input_layer_build_immediate():
+    layer = InputLayer(input_shape=(28, 28, 1))
+    assert layer.built
+    assert layer.input_shape == (28, 28, 1)
+
+
+def test_input_layer_build_explicit():
+    layer = InputLayer()
+    layer.build((None, 28, 28, 1))
+    assert layer.built
+    assert layer.input_shape == (None, 28, 28, 1)
+
+
+def test_input_no_shape_raises():
+    with pytest.raises(ValueError, match="Please provide a shape"):
+        Input(shape=None)
+
+
+def test_input_with_list_shape():
+    tensor = Input(shape=[28, 28, 1])
+    assert isinstance(tensor, KerasTensor)
+    assert tensor.shape == (None, 28, 28, 1)
+
+
+def test_input_with_tuple_shape():
+    tensor = Input(shape=(28, 28, 1))
+    assert isinstance(tensor, KerasTensor)
+    assert tensor.shape == (None, 28, 28, 1)
+
+
+def test_input_with_batch_shape():
+    tensor = Input(shape=(None, 28, 28, 1))
+    assert isinstance(tensor, KerasTensor)
+    assert tensor.shape == (None, 28, 28, 1)