Hi, I think your code is very useful. But 'l-bfgs' seems to out perform 'sgd' consistently, which seems counter-intuitive to me. One thing I have in my mind is for 'sgd' it does not include the momentum to accumulate the past gradients. I would like to add that into your code and maybe try to merge it to your code. Is that ok to you?
Hi, I think your code is very useful. But 'l-bfgs' seems to out perform 'sgd' consistently, which seems counter-intuitive to me. One thing I have in my mind is for 'sgd' it does not include the momentum to accumulate the past gradients. I would like to add that into your code and maybe try to merge it to your code. Is that ok to you?