Skip to content

Revamp Processor#112

Draft
ianjosephwilson wants to merge 9 commits intot-strings:mainfrom
ianjosephwilson:ian/revamp_processor
Draft

Revamp Processor#112
ianjosephwilson wants to merge 9 commits intot-strings:mainfrom
ianjosephwilson:ian/revamp_processor

Conversation

@ianjosephwilson
Copy link
Copy Markdown
Contributor

  • Revamp Processor To Be Template Based
    • Enumerate interpolation cases to simplify logic and establish framework to hone correctness, specially around text types.
  • Remove Nodes
  • Move svg support into processor.
  • Add in system context, parent tag context and namespace context.
  • Change Component Signature(s)
def FunctionComponent(children: Template, **kwargs) -> Template:
    return t""
def FactoryComponent(children: Template, **kwargs) -> Callable[[], Template]:
    def ComponentCallable() -> Template:
        return t""
    return ComponentCallable
  • Change Component Children Processing Order
    • Children is a Template so the actual contents are not processed until after the component is invoked.
  • Change Content Interpolation Rules
    • Zero-arg functions no longer have special treatment:
      • to_html(t'<div>{(lambda: "dynamic")}</div>') != '<div>dynamic</div>'
      • A new formatter named callback can be used which then interpolates the return value
        • to_html(t'<div>{(lambda: "dynamic"):callback}</div>') == '<div>dynamic</div>'
    • Boolean values no longer have special treatment:
      • to_html(t'<div>{False}</div>') == "<div>False</div>"
      • None is still ignored: to_html(t'<div>{None}</div>') == "<div></div>"

Copy link
Copy Markdown
Contributor

@davepeck davepeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ianjosephwilson for this updated PR! I hope I've done a good first-level job at explaining my perspective here...

Comment on lines 16 to +23
if not text:
return text
elif allow_markup and isinstance(text, HasHTMLDunder):
return text.__html__()
elif not allow_markup and type(text) is not str:
# text manipulation triggers regular html escapes on Markup
text = str(text)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might write this preamble as:

if not text:
	return text
if allow_markup and isinstance(text, HasHTMLDuner):
	return text.__html__()
text = str(text)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems logically equivalent? I think maybe I have a tendency to come back later and misread the conditions broken like this and it forces reading the condition and body BUT I looked into this function and it is a case study of different smells so I think I need to revisit it anyways then we can reconsider.

Copy link
Copy Markdown
Contributor

@davepeck davepeck Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is logically equivalent, but I personally find it easier to grok. Definitely a minor comment.

wrapper, so that tag and attribute case-fixing applies from the root.
"""
parser = TemplateParser(svg_context=svg_context)
parser = TemplateParser()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious about your instinct here. SVG fixes feel very much like a parser-layer concern to me, not a processor -- by the time we reach our processor, we should have dealt with the annoyingness of the base HTML parser we're using.

(UPDATE ah, I get it. See further comments.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to explain it before but I think it was not concise enough: #102 (comment)

keep the tnodes as normalized as possible, ie. all lower case as parsed. The semantic information that the SVG is in there OR NOT is already captured. Fixing the case OR NOT seems like another stage in the pipeline

put the "fixing" into the processor until we have more information about the MathML and larger tdom changes have settled

attrs: AttributesDict,
system_kwargs: dict[str, object],
):
system_kwargs: AttributesDict,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps let's drop system_kwargs for now and just use children?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an alternative proposal for this? It seems necessary at some point.

Copy link
Copy Markdown
Contributor

@davepeck davepeck Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see below)

Comment on lines +431 to +432
def _fix_svg_attrs(html_attrs: Iterable[HTMLAttribute]) -> Iterable[HTMLAttribute]:
"""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking over your SVG-in-processor implementation, it does seem like the better approach. Namely, it allows us to think of SVG serialization as an output concern, which allows us to dynamically decide output format. And <foreignObject> is... wow, what a "fun" wrinkle. Okay then!

and passed as keyword arguments if the callable accepts them (or has
**kwargs). Attributes that don't match parameters are silently ignored.
"""
type FunctionComponentProto = Callable[..., Template]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with dropping the Proto suffix for all of these; for some reason, feels more in line with other python type names I've seen here.

)
)

def _process_normal_text(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we collapse _process_normal_text() and _process_normal_text_from_value()?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could maybe delegate from one to the other but I think it is important to have the "entrypoint" function that (should have) has a cast in it. I will try to improve this.


def _process_raw_texts(
self,
bf: list[str],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The triple of bf, template, and last_ctx feel like their own type — RenderState maybe? Although I don't know if it's actually worth making a refactor here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think maybe its too early here, especially since last_ctx itself might get split up.

bf.append(self.escape_html_text(value))


def resolve_text_without_recursion(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another method whose intent I think I understand but whose practice is less clear to me at the moment. Specific details to follow...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a high-level comment, it took me a moment to understand that the "no recursion" guarantee isn't enforced by the function; it's up to callers to invoke it at the right time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function resolves the interpolations within a template into a string but it does not allow things that "recurse" during processing: templates and iterables. We don't allow those for scripts, styles, titles, textareas, etc. because it is kind of a edge case and does not really make sense for them to have children:

  • normal text (all the other tags that can contain text: div, p, etc.) - entities work and children usually make sense
  • raw text (script, style, at least) - entities don't work here and children don't make sense
  • escapable raw text (textarea, title, at least) - entities work here but children don't make sense

elif isinstance(value, HasHTMLDunder):
# @DESIGN: We could also force callers to use `:safe` to trigger
# the interpolation in this special case.
return Markup(value.__html__())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we return trusted Markup in the singleton case, but raise a ValueError in the other case?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting this: Your @DESIGN comment feels important to me here, since this appears to be the only place in resolve_text_without_recursion() where we silently wrap content; where we take on any responsibility for thinking about escaping/safety.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't escape the closing tags in chunks, chunks=["var x <", "/script>"] from t"<script>var x <{'/script>'}</script>" like normal text. So we only allow "plain" values EXCEPT when the interpolation is the entire content, ie. an exact match (no chunks): t"<script>{user_escaped_this_dunder_html_object}</script>".

): # type() check to avoid subclasses, probably something smarter here
if value:
text.append(value)
elif not isinstance(value, str) and isinstance(value, (Template, Iterable)):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume not isinstance(value, str) is because str is Iterable. The singleton path handles this via an explicit isinstance(value, str) earlier in its if-thens.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe that but in combination with also trying to disambiguate between "true" strings made with str() versus subclasses like something from markupsafe. Although this might be an "optimization gone unreadable", I have to go back and review it, and if that is the case I'd probably take it out unless it is related to constraining the type signatures of the interpolations.

@davepeck
Copy link
Copy Markdown
Contributor

davepeck commented Apr 6, 2026

One last comment:

Boolean values no longer have special treatment

I think we need to support this, mostly for familiarity from anyone arriving from Javascript-land. {someCondition && <div>...</div>} is an extremely common idiom.

@ianjosephwilson ianjosephwilson marked this pull request as ready for review April 8, 2026 04:40
@ianjosephwilson ianjosephwilson marked this pull request as draft April 8, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants