[proposal] - Allow CORS Headers for intranet / VPN purposes#94
[proposal] - Allow CORS Headers for intranet / VPN purposes#94WebReflection wants to merge 2 commits into
Conversation
|
If anyone is interested around "how can I test this" ? test.js import Queue from 'https://esm.run/gen-q';
const { parse, stringify } = JSON;
const decoder = new TextDecoder;
const chatOptions = {
stream: true,
role: 'user',
};
export default class DS4 {
#model;
#url;
constructor({
url = 'http://YOUR_MACHINE_IP:8000',
model = 'deepseek-v4-flash',
version = 'v1',
}) {
this.#model = model;
this.#url = new URL(`${url}/${version}`);
}
async *chat(content, { stream = true, role = 'user' } = chatOptions) {
const items = new Queue;
const { body } = await fetch(`${this.#url}/chat/completions`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: stringify({
model: this.#model,
messages: [
{ role, content }
],
stream,
}),
});
const reader = body.getReader();
new ReadableStream({
async start(controller) {
(function next() {
reader.read().then(({ done, value }) => {
if (done) {
items.splice(0);
controller.close();
return;
}
const text = decoder.decode(value);
if (/^\s*data:\s*(\{[\s\S]+\})\s*$/.test(text)) {
const { $1: json } = RegExp;
items.push(...parse(json.trim()).choices);
}
next();
});
}());
},
});
for await (const item of items) {
const { finish_reason, delta } = item;
if (finish_reason != null) break;
const { content, reasoning_content } = delta;
if (content || reasoning_content)
yield { content, reasoning: reasoning_content };
}
}
}index.html <!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>DS4</title>
<style>
body {
font-family: sans-serif;
&.thinking {
opacity: 0.5;
}
}
</style>
<script type="module">
import DS4 from './test.js';
const ds4 = new DS4({
url: 'http://YOUR_MACHINE_IP:8000',
model: 'deepseek-v4-flash',
version: 'v1',
});
let thinking = true;
document.body.classList.add('thinking');
for await (const { content, reasoning } of ds4.chat('List three Redis design principles.')) {
if (thinking && reasoning && !content) {
document.body.textContent += reasoning;
}
else if (thinking && content) {
thinking = false;
document.body.classList.remove('thinking');
document.body.textContent = content;
}
else if (content) {
document.body.textContent += content;
}
}
console.log('done');
</script>
</head>
<body>
</body>
</html>The testing library is a WIP and it will be able to consume all channels and do more but with that, and this patch, you'll see results from a |
|
this might be a duplicate of #70 which I've just realized was in already ... my thoughts:
thank you! |
|
Alternatively, #44 |
|
@calvinrp answered in here #70 (comment) |
|
Why not add a proxy on top for all the shenanigans? IMHO this should be kept as simple as possible. |
|
@d3y4n having CORS options backed in is the "as simple as possible" idea indeed ... anything else is not simple anymore. |
|
@WebReflection not my call, just saying you're hardcoding values and tomorrow someone might need different ones (even you). Cheers |
|
@d3y4n PR #70 is easier and simpler, I’ll try that and amend this one. the issue is that requiring any other tool just to have CORS (3 related PRs already, it’s not something a few needs, it’s something everyone expects as possibility at some point) means goodbye ease of portability and basic flag in docs to add such feature (if we want the flag at all, otherwise it’s about enabling CORS on 0.0.0.0 ‘cause that’s already an explicit intent). Both devices and automated tests done remotely will need that, if browsers are used instead of curls, so yeah, everyone could fix it in a way or another but changes are so tiny/simple I am not sure why making it any more complicated for users would be desirable |
|
FYI/related: There is also the idea to create a separate web proxy that then talks to the inference engine with a generalized protocol: There are already two competing protocols, classic stateless chat completion and the more modern, stateful "Responses". It might make sense to separate the protocol frontend from the engine, as suggested here -> #91 (comment) CORS would then only be needed to get implemented in this combined frontend proxy and the engine does not have to worry about all that. Otherwise I am a bit skeptical about making the current |
|
To whom it might concern, the latest edit of this file does the following:
Tested via: # example
./ds4-server --cors --host localhost --ctx 100000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192It is working like a charm, glad to hear where/how I could improve this MR, thank you. |
433b6bd to
a0ab701
Compare
This variant uses simpler headers approach from antirez#70 and it adds `--cors` flag like it was suggested in antirez#44.
|
FYI much cleaner MR, I can't find anything particularly cumbersome + tests pass but I can test only on CUDA |
This MR has been successfully tested in my local network with a DGX Spark around the WiFi and I need this variant to be able to query via my
localhostor any other connected device that Spark so that we can all benefit from this project within my house.Thanks for considering this change/update.
To be discussed
Ideally there should be a
--corsflag when starting the server but I'd like to start with this implementation that "just works" ™️ and hear out from others/maintainer if there's anything else I can improve/change but trust me it works already and I am playing around a tiny library that would let me lurk ds4 from anywhere I am in my own apartment, as long as the DGX is up and running.P.S. thanks a lot for this project, I will inevitably try to bring it to ROCm once I have my machine around but so far with Spark it's working wonderfully!