Figure out how inference will work

I don't plan on shelling out money for inference at the moment, so the initial plan is to have users bring their own "inference back-end" with them - likely Colab for now. Some points about this though:

- Who will be responsible for creating the prompt, sending it off to the inference backend and parsing the resulting generation?
  - Initial plan is to implement that here, and the front-end will simply `POST` user messages to an endpoint and receive responses (Maybe via WebSockets? Not sure holding on to a connection for 10+ seconds is a good idea)
  - Pros:
    - We'll have real-world data on inference requests, which we can use to calculate how much $ it would cost to actually run inference ourselves (I've gotten _many_ users suggesting I open a Patreon to cover hosting expenses - I'm unsure how well that'd pan out but with data this decision could be made a little more clearly)
    - We can automatically push new prompting code by just updating the server
  - Cons:
    - Increased server load, since we'll be acting as a proxy for inference requests.
- How will inference work for group chats? How do we decide which characters should speak and when?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out how inference will work #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Figure out how inference will work #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions