-
Notifications
You must be signed in to change notification settings - Fork 309
Description
Is this a new feature, an improvement, or a change to existing functionality?
New Feature
How would you describe the priority of this feature request
Significant improvement
Please provide a clear description of problem this feature solves
Today we have "batch" and "inprocess" runmodes.
Some users want to connect to an existing ray cluster, whether that's running on the local box, slurm, or a kubernetes cluster.
the experimental "online" mode probably makes more sense to be named "existing-cluster" which more aptly fits those scenarios
Describe the feature, and optionally a solution or implementation and any alternatives
We are going to slim down the choices available to make things easier for the user, the choices will be:
"ray" - has to do with connecting/creating ray clusters both locally and remote
"inprocess" - which will handle local in process pipelines
"remote" - this is api sitting in front of the nemo_retriever pipeline so that you can send documents to a hosted system.
ingestor = create_ingestor(run_mode="batch", ...)
If run_mode is left unset, the new default will be "auto". Auto will decide for the user the best course of action, depending on data heuristics, whether to use ray or inprocess to handle the job.
Additional context
No response