Dear authors,
First of all, I’d like to express my appreciation for your great work on this project. The ideas are clear and the implementation is excellent — it’s been very inspiring.
I have a technical question regarding the joint training process: how do you ensure that the semantic IDs (SIDs) assigned to each item are completely collision-free? I understand that if two different items are mapped to exactly the same SID, a collision occurs, which means we can no longer assign a unique identifier to each item — this contradicts the requirement of generative retrieval. I’m curious whether this collision-free property is naturally guaranteed during training through certain mechanisms, or if collisions are resolved in a post-processing step after training.
It would be greatly appreciated if you could briefly share your design thinking on this point.
Looking forward to your reply.
Dear authors,
First of all, I’d like to express my appreciation for your great work on this project. The ideas are clear and the implementation is excellent — it’s been very inspiring.
I have a technical question regarding the joint training process: how do you ensure that the semantic IDs (SIDs) assigned to each item are completely collision-free? I understand that if two different items are mapped to exactly the same SID, a collision occurs, which means we can no longer assign a unique identifier to each item — this contradicts the requirement of generative retrieval. I’m curious whether this collision-free property is naturally guaranteed during training through certain mechanisms, or if collisions are resolved in a post-processing step after training.
It would be greatly appreciated if you could briefly share your design thinking on this point.
Looking forward to your reply.