It is about grid search (Table 3) in your experiment to learn non-robust features from the relabeled non-robust set.
When training on a non-robust set, what validation data did you use to decide the best model hyper-parameters?
Did you split the non-robust set into train and validation set to choose the best hyper-parameters?
or used part of the original images?
Since I don't understand using a part of the non-robust set as a validation set can work for obtaining good test accuracy, I was confused.