test(autogram): Add extra test for batched equivalence#445
test(autogram): Add extra test for batched equivalence#445ValerianRey wants to merge 3 commits intomainfrom
Conversation
|
I just investigated, and having a 2nd engine makes the result for |
|
The problem can be reproduced even without having two engines. engine_none = Engine(model.modules(), batch_dim=None)
inputs = make_tensors(batch_size, input_shapes)
targets = make_tensors(batch_size, output_shapes)
loss_fn = make_mse_loss_fn(targets)
# Call the model's forward pass twice, to make the hook run twice instead of once
torch.random.manual_seed(0) # Fix randomness for random models
output = model(inputs)
losses = reduce_to_vector(loss_fn(output))
torch.random.manual_seed(0) # Fix randomness for random models
output = model(inputs)
losses = reduce_to_vector(loss_fn(output))
gramian_none = engine_none.compute_gramian(losses)Having two engines (and the one that does an extra forward first) is just a convoluted way of calling twice the model's forward pass, which results in the same bug. |
|
A solution is to reset the state of the engine between the two forward passes. We don't want the first hook call to have any effect on the state of the engine. Should we add a public |
Codecov Report✅ All modified and coverable lines are covered by tests. 🚀 New features to boost your workflow:
|
|
I'm not sure this is such a good test tbh, I'll probably not merge this. |
|
I like this test. I'm not sure to understand exactly what the problem is with the state. Could we have the reset in the hook? Like at the beginning? |
We don't wanna reset at every hook: we wanna reset before the forward pass of the model. So we'd need a model hook for that (which would change the engine constructor) or we could also use a context manager. It would be something like: engine = ...
with engine.activate_hooks():
output = ...
losses = ...
gramian = engine.compute_gramian(losses)The hooks would thus be manually activated, rather than be always activated unless we're in gramian computation phase. I'm not a big fan of that either but it's not that bad. IMO we shouldn't change anything for now, but still think about it a bit |
This test can be quite practical, to check that both engines do give the same results. For instance, when working on Transformer, autogram is not equivalent to autograd_gramian yet, but both engines (autogram with batch_dim=None and autogram with batch_dim=0) are equivalent.
It seems that this test does not pass for FreeParam. This is weird, considering that both should be equivalent to autograd_gramian. We may have a bug here. This could also be caused by having two engines, but then again it would be an unexpected behavior.