Skip to content

Fix LFM2.5 tool parser inference#1260

Open
blairhudson wants to merge 1 commit into
ml-explore:mainfrom
blairhudson:fix/lfm25-tool-parser-inference
Open

Fix LFM2.5 tool parser inference#1260
blairhudson wants to merge 1 commit into
ml-explore:mainfrom
blairhudson:fix/lfm25-tool-parser-inference

Conversation

@blairhudson
Copy link
Copy Markdown

@blairhudson blairhudson commented May 8, 2026

LFM2.5 tokenizers expose <|tool_call_start|> and <|tool_call_end|> tokens, but do not provide chat-template metadata that lets mlx-lm infer a tool parser. As a result, server responses can emit raw tool-call markup in message.content instead of OpenAI-compatible tool_calls.

This adds tokenizer-vocab based parser inference for that marker pair and maps it to the existing pythonic tool parser.

Tests:

  • uv run --with-editable . --with unittest-xml-reporting python -m unittest tests.test_tool_parsing tests.test_tokenizers.TestTokenizers.test_tool_calling

You can try it yourself with:

uv tool install "git+https://github.com/blairhudson/mlx-lm.git@fix/lfm25-tool-parser-inference"
mlx_lm.server --model LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit

@blairhudson blairhudson mentioned this pull request May 8, 2026
3 tasks
@jbuchananr
Copy link
Copy Markdown

Can you add support for this use case:

Reproduce

  mlx_lm.server --model <your-mlx-model> --port 8080

  curl -s http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
      "model": "<your-mlx-model>",
      "temperature": 0.0,
      "max_tokens": 2048,
      "tool_choice": "auto",
      "messages": [
        {
          "role": "user",
          "content": "Order the ingredients for a lasagna to be delivered to 845 Willow Lane, 
  Springfield, IL 62704. Include noodles, ground beef, ricotta, mozzarella, parmesan, tomato sauce, 
  onion, garlic, olive oil, basil, oregano, and salt."
        }
      ],
      "tools": [
        {
          "type": "function",
          "function": {
            "name": "grocery.orderIngredients",
            "description": "Orders a list of ingredients for delivery to a specified address.",
            "parameters": {
              "type": "object",
              "properties": {
                "ingredientList": {
                  "type": "array",
                  "description": "List of ingredients to order.",
                  "items": {
                    "type": "object",
                    "properties": {
                      "name":   { "type": "string" },
                      "amount": { "type": "number" },
                      "unit":   { "type": "string" }
                    },
                    "required": ["name", "amount", "unit"]
                  }
                },
                "deliveryAddress": { "type": "string" }
              },
              "required": ["ingredientList", "deliveryAddress"],
              "additionalProperties": false
            }
          }
        }
      ]
    }'

  Expected response (OpenAI-shaped)

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [{
          "id": "...",
          "type": "function",
          "function": {
            "name": "grocery.orderIngredients",
            "arguments": "{\"ingredientList\":[{\"name\":\"noodles\",\"amount\":500,\"unit\":\"g\"}, 
  ...],\"deliveryAddress\":\"845 Willow Lane, Springfield, IL 62704\"}"
          }
        }]
      }
    }]
  }

  Actual response

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "I am placing an order for the following ingredients to be delivered to 845 Willow 
  Lane, Springfield, IL 62704: 500g noodles, 300g ground beef, 200g ricotta, 250g mozzarella, 100g 
  parmesan, 400ml tomato sauce, 100g onion, 50g garlic, 30ml olive oil, 10g basil, 5g oregano, 5g 
  salt."
      }
    }],
    "usage": { "prompt_tokens": 256, "completion_tokens": 482, "total_tokens": 738 }
  }

  Why this looks like a server bug, not a model regression

  - finish_reason is "tool_calls" — the server detected a tool call happened.
  }

  Actual response

  {
    "choices": [{
      "finish_reason": "tool_calls",
      "message": {
        "role": "assistant",
        "content": "I am placing an order for the following ingredients to be delivered to 845 Willow
  Lane, Springfield, IL 62704: 500g noodles, 300g ground beef, 200g ricotta, 250g mozzarella, 100g
  parmesan, 400ml tomato sauce, 100g onion, 50g garlic, 30ml olive oil, 10g basil, 5g oregano, 5g
  salt."
      }
    }],
    "usage": { "prompt_tokens": 256, "completion_tokens": 482, "total_tokens": 738 }
  }

Issue

  • finish_reason is "tool_calls" — the server detected a tool call happened.
  • But message.tool_calls is missing, and the natural-language form of the arguments leaked into
    message.content.
  • A simpler tool call (single string arg) on the same model/server returns a correctly structured
    tool_calls array — only the nested-array schema trips it.

This is the signature of a tool-call parser that handles flat arguments but doesn't extract the
model's native <tool_call>{...}</tool_call> block when the JSON payload contains a nested array of
objects.

Env

Environment

  • mlx-lm 0.31.3
  • macOS 15.5, Apple Silicon
  • temperature: 0.0, tool_choice: "auto"
  • LiquidAI/LFM2.5-1.2B-Instruct-MLX-8bit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants