Skip to content

Conversation

@kmontemayor2-sc
Copy link
Collaborator

Scope of work done

Add server-side util so we can remotely fetch the training input.

Since this is kind of a big PR not adding the client-side equivalent in this one :P

Again this is server-side code, and it's really meant to be called by users.

Where is the documentation for this feature?: N/A

Did you add automated tests or write a test plan?

Updated Changelog.md? NO

Ready for code review?: NO

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle! Did a pass here and left some comments/questions



def get_training_input(
split: Union[Literal["train", "val", "test"], str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sufficient? What if we want "all" nodes (i.e. dataset.node_ids)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wouldn't be "training"input" would it? We can add another function to do that in the future (get_all_nodes?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be for the random negative loader for link prediction training, which would need some dataset.node_ids or equivalent.

We can add another function to do that in the future (get_all_nodes?)

We wouldn't need a whole different function, we could just specify some 'all' split and if its that we use _dataset.node_ids.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, I do think that the tuple[Tensor, Tensor, Tensor | None] is important for the ABLP.

I guess I could rename this to get_ablp_input or something? Would that ameliorate your concerns?

Copy link
Collaborator

@mkolodner-sc mkolodner-sc Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, renaming will be fine here for this function, and in a follow-up we will refactor the get_node_ids_on_rank utility so that it can be used to split, making it extendable for the SNC use case, and can be called in this function to reduce the duplicity. Can we add a TODO here in the meantime?



def get_training_input(
split: Union[Literal["train", "val", "test"], str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be for the random negative loader for link prediction training, which would need some dataset.node_ids or equivalent.

We can add another function to do that in the future (get_all_nodes?)

We wouldn't need a whole different function, we could just specify some 'all' split and if its that we use _dataset.node_ids.

Copy link
Collaborator

@mkolodner-sc mkolodner-sc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kyle for the work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants