using the edges from the retina and comparing the previous frame/activations with small kernels, you should be able to tell if the average activation is above/below/left/right of the previous average activation.
Then, you can make multiple line segements representing the translations. These lines should converge at the axis of rotation, and the differences of the averages in the kernels should give the amount of rotation.

Further improvement: use small kernelse to detect small changes, and use blurred, larger kernels to detect larger changes.