Conversation
runame
left a comment
There was a problem hiding this comment.
LGTM. One question: are there not some cases where this can be more efficient, i.e. when batch_size * seq_len >> feature_dim?
|
Yes, you are right that there are different pros and cons. I tried to sum them up in the following note: Actually, having done the analysis I think we need to discuss whether we should aim for simplicity by supporting only For now, I will not merge and label this for 'needs discussion'. |
|
@yorkerlin please add this to the agenda for our next in-person discussion |
Resolves #37.