Harden DSML and JSON parsing in the server#104
Conversation
|
I'm not sure this is the right direction. If we want a stricter sampling during tool calling (that already happens at T=0 for the metadata) the correct direction would be to force the grammar in the sampling itself, not to reject malformed calls. |
|
I understand the concern.What I had in mind with this PR was not to intervene in the tool calling or in the JSON formatting generated by the DS4 model. My reasoning is more about the server side that receives the calls, even if they are local, to protect it from possible anomalous or malformed inputs. That said, this is probably a relatively minor aspect. If another agent calls the DS4 server, we can decide between two scenarios:
PS: I wouldn’t want to distract you from more important matters; if you feel that a discussion on this topic isn’t aligned with the product vision, feel free to close it — I won’t take it personally 🙂 Capisco il dubbio. Quello che avevo in mente con questa PR non è intervenire sul tool calling o sulla formattazione JSON generata dal modello DS4. Il mio ragionamento è più sulla parte server che riceve le chiamate, anche se locali, per proteggerla da possibili input anomali o malformati. È comunque un aspetto probabilmente di importanza limitata. Se un altro agente chiama il server DS4, possiamo decidere tra due scenari:
PS: non vorrei distrarti da aspetti più importanti, se ti sembra che un adiscussione sul tema non è in linea con la "visione del prodotto" chiudi pure, non me la prendo :) |
his change hardens the server-side parsers used for tool-call decoding and request parsing.
Fixed Cases
1. DSML attribute prefix collisions
The DSML parser previously matched attributes too loosely, so malformed attributes such as
xnamecould be interpreted asname.Incorrect example now rejected:
2. Non-JSON numeric literals accepted by the request parser
The JSON parser used
strtod, which accepted values that are not valid JSON numbers.Incorrect examples now rejected:
{"messages":[],"temperature":NaN} {"messages":[],"temperature":Infinity} {"messages":[],"top_p":+1} {"messages":[],"top_k":01}3. Raw control characters inside JSON strings
The JSON parser previously accepted unescaped control characters inside strings instead of rejecting the request.
Incorrect example now rejected:
{ "messages": [ { "role": "user", "content": "line1 line2" } ] }The same applies to raw tabs and other control characters that must be escaped in JSON.
4. Malformed Unicode surrogate pairs
The JSON string parser now rejects invalid surrogate-pair sequences instead of accepting malformed Unicode escapes.
Tests
Validation