Fix LFM2.5 tool parser inference by blairhudson · Pull Request #1260 · ml-explore/mlx-lm

blairhudson · 2026-05-08T10:33:08Z

LFM2.5 tokenizers expose <|tool_call_start|> and <|tool_call_end|> tokens, but do not provide chat-template metadata that lets mlx-lm infer a tool parser. As a result, server responses can emit raw tool-call markup in message.content instead of OpenAI-compatible tool_calls.

This adds tokenizer-vocab based parser inference for that marker pair and maps it to the existing pythonic tool parser.

Tests:

uv run --with-editable . --with unittest-xml-reporting python -m unittest tests.test_tool_parsing tests.test_tokenizers.TestTokenizers.test_tool_calling

You can try it yourself with:

uv tool install "git+https://github.com/blairhudson/mlx-lm.git@fix/lfm25-tool-parser-inference"
mlx_lm.server --model LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit

jbuchananr · 2026-05-11T17:18:09Z

Can you add support for this use case:

Reproduce

  mlx_lm.server --model <your-mlx-model> --port 8080

  curl -s http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "<your-mlx-model>",
      "temperature": 0.0,
      "max_tokens": 2048,
      "tool_choice": "auto",
      "messages": [
        {
          "role": "user",
          "content": "Order the ingredients for a lasagna to be delivered to 845 Willow Lane, 
  Springfield, IL 62704. Include noodles, ground beef, ricotta, mozzarella, parmesan, tomato sauce, 
  onion, garlic, olive oil, basil, oregano, and salt."
        }
      ],
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "grocery.orderIngredients",
            "description": "Orders a list of ingredients for delivery to a specified address.",
            "parameters": {
              "type": "object",
              "properties": {
                "ingredientList": {
                  "type": "array",
                  "description": "List of ingredients to order.",
                  "items": {
                    "type": "object",
                    "properties": {
                      "name":   { "type": "string" },
                      "amount": { "type": "number" },
                      "unit":   { "type": "string" }
                    },
                    "required": ["name", "amount", "unit"]
                  }
                },
                "deliveryAddress": { "type": "string" }
              },
              "required": ["ingredientList", "deliveryAddress"],
              "additionalProperties": false
            }
          }
        }
      ]
    }'

  Expected response (OpenAI-shaped)

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [{
          "id": "...",
          "type": "function",
          "function": {
            "name": "grocery.orderIngredients",
            "arguments": "{\"ingredientList\":[{\"name\":\"noodles\",\"amount\":500,\"unit\":\"g\"}, 
  ...],\"deliveryAddress\":\"845 Willow Lane, Springfield, IL 62704\"}"
          }
        }]
      }
    }]
  }

  Actual response

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "I am placing an order for the following ingredients to be delivered to 845 Willow 
  Lane, Springfield, IL 62704: 500g noodles, 300g ground beef, 200g ricotta, 250g mozzarella, 100g 
  parmesan, 400ml tomato sauce, 100g onion, 50g garlic, 30ml olive oil, 10g basil, 5g oregano, 5g 
  salt."
      }
    }],
    "usage": { "prompt_tokens": 256, "completion_tokens": 482, "total_tokens": 738 }
  }

  Why this looks like a server bug, not a model regression

  - finish_reason is "tool_calls" — the server detected a tool call happened.
  }

  Actual response

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "I am placing an order for the following ingredients to be delivered to 845 Willow
  Lane, Springfield, IL 62704: 500g noodles, 300g ground beef, 200g ricotta, 250g mozzarella, 100g
  parmesan, 400ml tomato sauce, 100g onion, 50g garlic, 30ml olive oil, 10g basil, 5g oregano, 5g
  salt."
      }
    }],
    "usage": { "prompt_tokens": 256, "completion_tokens": 482, "total_tokens": 738 }
  }

Issue

finish_reason is "tool_calls" — the server detected a tool call happened.
But message.tool_calls is missing, and the natural-language form of the arguments leaked into
message.content.
A simpler tool call (single string arg) on the same model/server returns a correctly structured
tool_calls array — only the nested-array schema trips it.

This is the signature of a tool-call parser that handles flat arguments but doesn't extract the
model's native <tool_call>{...}</tool_call> block when the JSON payload contains a nested array of
objects.

Env

Environment

mlx-lm 0.31.3
macOS 15.5, Apple Silicon
temperature: 0.0, tool_choice: "auto"
LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit

Fix LFM2.5 tool parser inference

173e285

blairhudson mentioned this pull request May 8, 2026

add: lfm2/2.5 tool parser #1246

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix LFM2.5 tool parser inference#1260

Fix LFM2.5 tool parser inference#1260
blairhudson wants to merge 1 commit into
ml-explore:mainfrom
blairhudson:fix/lfm25-tool-parser-inference

blairhudson commented May 8, 2026 •

edited

Loading

Uh oh!

jbuchananr commented May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blairhudson commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbuchananr commented May 11, 2026

Reproduce

Issue

Env

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

blairhudson commented May 8, 2026 •

edited

Loading