Question about how SID collisions are avoided during joint training

Dear authors,

First of all, I’d like to express my appreciation for your great work on this project. The ideas are clear and the implementation is excellent — it’s been very inspiring.

I have a technical question regarding the joint training process: how do you ensure that the semantic IDs (SIDs) assigned to each item are completely collision-free? I understand that if two different items are mapped to exactly the same SID, a collision occurs, which means we can no longer assign a unique identifier to each item — this contradicts the requirement of generative retrieval. I’m curious whether this collision-free property is naturally guaranteed during training through certain mechanisms, or if collisions are resolved in a post-processing step after training.

It would be greatly appreciated if you could briefly share your design thinking on this point.

Looking forward to your reply.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about how SID collisions are avoided during joint training #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about how SID collisions are avoided during joint training #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions