Skip to content

Question about how SID collisions are avoided during joint training #2

@WSM123123

Description

@WSM123123

Dear authors,

First of all, I’d like to express my appreciation for your great work on this project. The ideas are clear and the implementation is excellent — it’s been very inspiring.

I have a technical question regarding the joint training process: how do you ensure that the semantic IDs (SIDs) assigned to each item are completely collision-free? I understand that if two different items are mapped to exactly the same SID, a collision occurs, which means we can no longer assign a unique identifier to each item — this contradicts the requirement of generative retrieval. I’m curious whether this collision-free property is naturally guaranteed during training through certain mechanisms, or if collisions are resolved in a post-processing step after training.

It would be greatly appreciated if you could briefly share your design thinking on this point.

Looking forward to your reply.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions