Hello, I have some questions regarding the training constraints of this paper. I noticed that your paper mentioned that the final epoch is 300 and is divided into Adam optimization and SGD mode. However, in actual training, it seems difficult to enter SGD mode due to the oscillation of loss. Did you encounter such a situation at that time? I would greatly appreciate it if I could receive your answer
Hello, I have some questions regarding the training constraints of this paper. I noticed that your paper mentioned that the final epoch is 300 and is divided into Adam optimization and SGD mode. However, in actual training, it seems difficult to enter SGD mode due to the oscillation of loss. Did you encounter such a situation at that time? I would greatly appreciate it if I could receive your answer