The Transformer structure of your code is using PostNorm form :
query_feat = self.norm1(self.mixing(sampled_feat, query_feat))
query_feat = self.norm2(self.self_attn(query_points, query_feat))
query_feat = self.norm3(self.ffn(query_feat))
请问您有尝试过PreNorm这种形式吗?效果会不会更好?
3D视觉领域大家用PostNorm是不是更多?
The Transformer structure of your code is using PostNorm form :
query_feat = self.norm1(self.mixing(sampled_feat, query_feat))
query_feat = self.norm2(self.self_attn(query_points, query_feat))
query_feat = self.norm3(self.ffn(query_feat))
请问您有尝试过PreNorm这种形式吗?效果会不会更好?
3D视觉领域大家用PostNorm是不是更多?