Skip to content

Add reverse HTML parser for WebExpress.WebCore.WebHtml#14

Merged
ReneSchwarzer merged 7 commits intodevelopfrom
copilot/implement-reverse-html-renderer
Apr 11, 2026
Merged

Add reverse HTML parser for WebExpress.WebCore.WebHtml#14
ReneSchwarzer merged 7 commits intodevelopfrom
copilot/implement-reverse-html-renderer

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 11, 2026

Implements the inverse of the existing HTML renderer: a parser that takes an arbitrary HTML string and reconstructs the corresponding IHtmlNode object tree from the WebExpress.WebCore.WebHtml namespace.

Core components (WebExpress.WebCore/WebHtml/Parser/)

  • HtmlTokenType – enum of 7 token categories (Doctype, StartTag, EndTag, SelfClosingTag, Text, Comment, EndOfFile)
  • HtmlTokenAttribute – per-attribute model; Value == null signals boolean attributes (disabled, checked, etc.)
  • HtmlToken – immutable token produced by the tokenizer
  • HtmlTokenizer – lenient character-level tokenizer; handles void elements, quoted/unquoted/boolean attributes, comments, DOCTYPE, and recovers stray < as text
  • HtmlElementFactory – case-insensitive registry mapping all known tag names to their specific HtmlElement subclass; unknown tags fall back to new HtmlElement(tagName) for robustness. Accepts both kbd (correct HTML) and kdb (mirrors existing class-name typo)
  • HtmlParser – recursive descent parser; tolerates unclosed tags and produces a flat IReadOnlyList<IHtmlNode> or single-root variant
  • HtmlParseException – exception with optional Position property, consistent with SocketHandshakeException style

Usage

var parser = new HtmlParser();

// Returns IReadOnlyList<IHtmlNode>
var nodes = parser.Parse("<div class=\"container\"><p id=\"intro\">Hello</p></div>");

var div   = nodes.OfType<HtmlElementTextContentDiv>().Single();   // HtmlElementTextContentDiv
var p     = div.Elements.OfType<HtmlElementTextContentP>().Single();
var text  = p.Elements.OfType<HtmlText>().Single();               // "Hello"

// Round-trip: render → parse → render produces equivalent HTML
var img = new HtmlElementMultimediaImg { Src = "logo.png", Alt = "Logo" };
var restored = parser.Parse(img.ToString().Trim())
                     .OfType<HtmlElementMultimediaImg>().Single();
Assert.Equal(img.ToString(), restored.ToString());

Tests (WebExpress.WebCore.Test/Html/Parser/)

47 new tests across three classes:

  • UnitTestHtmlTokenizer – tokenizer unit tests (empty input, tag types, attribute variants, comments, DOCTYPE, name normalisation)
  • UnitTestHtmlElementFactory – factory registration, case-insensitivity, unknown tags, null guard
  • UnitTestHtmlParser – element reconstruction, nested/deep structures, text nodes, self-closing/void tags, boolean/data/ARIA attributes, comments, DOCTYPE, unknown tags, malformed HTML, and round-trip correctness

@ReneSchwarzer ReneSchwarzer marked this pull request as ready for review April 11, 2026 18:51
Copilot AI changed the title [WIP] Implement reverse HTML renderer for WebExpress framework Add reverse HTML parser for WebExpress.WebCore.WebHtml Apr 11, 2026
Copilot AI requested a review from ReneSchwarzer April 11, 2026 18:54
Copilot AI and others added 3 commits April 11, 2026 18:59
…rse-html-renderer

Add missing factory mappings and comprehensive parser tests
@ReneSchwarzer ReneSchwarzer merged commit f84327c into develop Apr 11, 2026
1 check passed
@ReneSchwarzer ReneSchwarzer deleted the copilot/implement-reverse-html-renderer branch April 11, 2026 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants