Skip to content

DOMDocument::saveHTML() incorrectly escapes IPv6 URLs in element attributes #21390

@andronocean

Description

@andronocean

Description

URLs with an IPv6 address as the host use square brackets [] around the address, per RFC 3986. The saveHTML() method on DOMDocument incorrectly URL-encodes these square brackets in attributes that expect a URL value (like href, src, and action). Other attributes I tested don't seem to be affected.

This example with various permutations of attributes and IPv6 URLs:

<?php
$html = <<<EOD
<html>
<head>
<link rel='stylesheet' href='http://[::1]:5173/app.css'/>
<script src='https://[::1]:5173/app.js'></script>
</head>
<body>
<a href='http://[::1]' data-custom='http://[::1]'>anchor to http://[::1]</a>
<form action='http://[::1]'></form>
<blockquote cite='http://[::1]'></blockquote>
</body>
</html>
EOD;

$document = new DOMDocument();
$document->loadHTML($html);

print $document->saveHTML();

Resulted in this output:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<link rel="stylesheet" href="http://%5B::1%5D:5173/app.css">
<script src="https://%5B::1%5D:5173/app.js"></script>
</head>
<body>
<a href="http://%5B::1%5D" data-custom="http://[::1]">anchor</a>
<form action="http://%5B::1%5D"></form>
<blockquote cite="http://[::1]"></blockquote>
</body>
</html>

But I expected this output instead:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<link rel="stylesheet" href="http://[::1]:5173/app.css">
<script src="https://[::1]:5173/app.js"></script>
</head>
<body>
<a href="http://[::1]" data-custom="http://[::1]">anchor</a>
<form action="http://[::1]"></form>
<blockquote cite="http://[::1]"></blockquote>
</body>
</html>

(cite on <blockquote> seems to be unaffected, even though by spec it should be a URL.)

The internal representation of such an attribute within the class is unaffected; the escaping happens only on output with saveHTML().

I also checked Dom\HTMLDocument::saveHTML(), and that method returns all attributes correctly without escaping. I know that is the preferred version today, but a great many older codebases still rely on DOMDocument.

Live example comparing both classes: https://3v4l.org/9gXDT#v8.4.18

PHP Version

PHP 8.4.17 (cli) (built: Jan 13 2026 17:17:10) (NTS)
Copyright (c) The PHP Group
Built by Shivam Mathur
Zend Engine v4.4.17, Copyright (c) Zend Technologies
    with Xdebug v3.5.0, Copyright (c) 2002-2025, by Derick Rethans
    with Zend OPcache v8.4.17, Copyright (c), by Zend Technologies

Operating System

macOS 15.7.4

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions