training question

Hello, I have some questions regarding the training constraints of this paper. I noticed that your paper mentioned that the final epoch is 300 and is divided into Adam optimization and SGD mode. However, in actual training, it seems difficult to enter SGD mode due to the oscillation of loss. Did you encounter such a situation at that time? I would greatly appreciate it if I could receive your answer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training question #9

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

training question #9

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions