Harden DSML and JSON parsing in the server by Chida82 · Pull Request #104 · antirez/ds4

Chida82 · 2026-05-12T16:52:21Z

his change hardens the server-side parsers used for tool-call decoding and request parsing.

Reject DSML attributes that only partially match required names.
Reject non-JSON numeric literals that were previously accepted by the local parser.
Reject raw control characters inside JSON strings.
Reject malformed Unicode surrogate pairs in JSON strings.
Add focused regression tests for the affected parser paths.

Fixed Cases

1. DSML attribute prefix collisions

The DSML parser previously matched attributes too loosely, so malformed attributes such as xname could be interpreted as name.

Incorrect example now rejected:

<｜DSML｜tool_calls>
<｜DSML｜invoke xname="bash">
<｜DSML｜parameter xname="command" string="true">pwd</｜DSML｜parameter>
</｜DSML｜invoke>
</｜DSML｜tool_calls>

2. Non-JSON numeric literals accepted by the request parser

The JSON parser used strtod, which accepted values that are not valid JSON numbers.

Incorrect examples now rejected:

{"messages":[],"temperature":NaN}
{"messages":[],"temperature":Infinity}
{"messages":[],"top_p":+1}
{"messages":[],"top_k":01}

3. Raw control characters inside JSON strings

The JSON parser previously accepted unescaped control characters inside strings instead of rejecting the request.

Incorrect example now rejected:

{
  "messages": [
    {
      "role": "user",
      "content": "line1
line2"
    }
  ]
}

The same applies to raw tabs and other control characters that must be escaped in JSON.

4. Malformed Unicode surrogate pairs

The JSON string parser now rejects invalid surrogate-pair sequences instead of accepting malformed Unicode escapes.

Tests

Add a regression test for invalid DSML attribute names.
Add a regression test for invalid JSON numeric forms.
Add a regression test for raw control characters in JSON strings.
Keep the existing nesting-limit JSON test in place.

Validation

make ds4_test && ./ds4_test --server
make ds4-server && make clean

antirez · 2026-05-13T06:22:44Z

I'm not sure this is the right direction. If we want a stricter sampling during tool calling (that already happens at T=0 for the metadata) the correct direction would be to force the grammar in the sampling itself, not to reject malformed calls.

Chida82 · 2026-05-13T10:28:37Z

I understand the concern.What I had in mind with this PR was not to intervene in the tool calling or in the JSON formatting generated by the DS4 model.

My reasoning is more about the server side that receives the calls, even if they are local, to protect it from possible anomalous or malformed inputs.

That said, this is probably a relatively minor aspect. If another agent calls the DS4 server, we can decide between two scenarios:

Protect ourselves: if an LLM calls DS4 with a malformed request, it receives an explicit error. In an agentic loop, I would expect it to understand the error, fix the formatting, and retry. In this case, yes, upstream sampling/grammar would help reduce these cases.
Be less formal: if DS4 can still understand the intent even with inputs that are not perfectly formatted, then this validation can be considered too rigorous.

PS: I wouldn’t want to distract you from more important matters; if you feel that a discussion on this topic isn’t aligned with the product vision, feel free to close it — I won’t take it personally 🙂

Capisco il dubbio. Quello che avevo in mente con questa PR non è intervenire sul tool calling o sulla formattazione JSON generata dal modello DS4.

Il mio ragionamento è più sulla parte server che riceve le chiamate, anche se locali, per proteggerla da possibili input anomali o malformati.

È comunque un aspetto probabilmente di importanza limitata. Se un altro agente chiama il server DS4, possiamo decidere tra due scenari:

Proteggerci: se un LLM chiama DS4 con una richiesta malformata, riceve un errore esplicito. In un loop agentico mi aspetterei che capisca l’errore, corregga la formattazione e riprovi. In questo caso, sì, il sampling/grammar a monte aiuterebbe a ridurre questi casi.
Essere meno formali: se DS4 riesce comunque a capire l’intento anche con input non perfettamente formattati, allora questa validazione può essere considerata troppo rigoroso.

PS: non vorrei distrarti da aspetti più importanti, se ti sembra che un adiscussione sul tema non è in linea con la "visione del prodotto" chiudi pure, non me la prendo :)

Harden DSML and JSON parsing in the server

a0084a1

antirez added http-api tools-calling labels May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden DSML and JSON parsing in the server#104

Harden DSML and JSON parsing in the server#104
Chida82 wants to merge 1 commit into
antirez:mainfrom
Chida82:hardeningparserjson

Chida82 commented May 12, 2026

Uh oh!

antirez commented May 13, 2026

Uh oh!

Chida82 commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Chida82 commented May 12, 2026

Fixed Cases

1. DSML attribute prefix collisions

2. Non-JSON numeric literals accepted by the request parser

3. Raw control characters inside JSON strings

4. Malformed Unicode surrogate pairs

Tests

Validation

Uh oh!

antirez commented May 13, 2026

Uh oh!

Chida82 commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Chida82 commented May 13, 2026 •

edited

Loading