Skip to content

About the gradient descent #2

@queqichao

Description

@queqichao

Hi, I think your code is very useful. But 'l-bfgs' seems to out perform 'sgd' consistently, which seems counter-intuitive to me. One thing I have in my mind is for 'sgd' it does not include the momentum to accumulate the past gradients. I would like to add that into your code and maybe try to merge it to your code. Is that ok to you?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions