Skip to content

[FEA]: Add "existing-cluster" run_mode to create_ingestor() #1676

@randerzander

Description

@randerzander

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Significant improvement

Please provide a clear description of problem this feature solves

Today we have "batch" and "inprocess" runmodes.

Some users want to connect to an existing ray cluster, whether that's running on the local box, slurm, or a kubernetes cluster.

the experimental "online" mode probably makes more sense to be named "existing-cluster" which more aptly fits those scenarios

Describe the feature, and optionally a solution or implementation and any alternatives

We are going to slim down the choices available to make things easier for the user, the choices will be:
"ray" - has to do with connecting/creating ray clusters both locally and remote
"inprocess" - which will handle local in process pipelines
"remote" - this is api sitting in front of the nemo_retriever pipeline so that you can send documents to a hosted system.

ingestor = create_ingestor(run_mode="batch", ...)

If run_mode is left unset, the new default will be "auto". Auto will decide for the user the best course of action, depending on data heuristics, whether to use ray or inprocess to handle the job.

Additional context

No response

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions