Skip to content

Add possibility to Strip tags if the html flag is true#189

Merged
alexander-schranz merged 1 commit intomassiveart:2.9from
martinlagler:enhancement/strip-html-tags
Feb 11, 2026
Merged

Add possibility to Strip tags if the html flag is true#189
alexander-schranz merged 1 commit intomassiveart:2.9from
martinlagler:enhancement/strip-html-tags

Conversation

@martinlagler
Copy link
Copy Markdown
Member

Strip HTML tags if the html flag is true.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for stripping HTML tags from indexed values when a field mapping includes an html flag, aiming to prevent HTML content from being stored in search documents.

Changes:

  • Detects whether title and description should be treated as HTML and strips tags accordingly.
  • Strips HTML tags for any mapped field when mapping['html'] is truthy during document population.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +114 to +117
if ($titleField) {
$titleValue = $this->fieldEvaluator->getValue($object, $titleField);
$document->setTitle($titleHtml ? \strip_tags($titleValue) : $titleValue);
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip_tags() will throw a TypeError on PHP 8 when $titleValue is null (and can also mis-handle non-string values). Since $titleValue can legitimately be null/empty, guard the call (e.g., only strip when it’s a string) or cast safely so indexing doesn’t crash when the title is missing.

Copilot uses AI. Check for mistakes.
Comment on lines 119 to 122
if ($descriptionField) {
$description = $this->fieldEvaluator->getValue($object, $descriptionField);
if ($description) {
$document->setDescription($description);
}
$descriptionValue = $this->fieldEvaluator->getValue($object, $descriptionField);
$document->setDescription($descriptionHtml ? \strip_tags($descriptionValue) : $descriptionValue);
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip_tags() will throw a TypeError on PHP 8 when $descriptionValue is null (and can also break for non-string values). Since descriptions are optional and can resolve to null, guard the call (strip only for strings) to avoid runtime failures during indexing.

Copilot uses AI. Check for mistakes.
Comment thread Search/ObjectToDocumentConverter.php Outdated
$value = $this->fieldEvaluator->getValue($object, $mappingField);

if ($mapping['html'] ?? false) {
$value = \strip_tags($value);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying strip_tags() directly to $value can crash on PHP 8 when the evaluated value is null, an object (before conversion), or an array (e.g., type=array). Consider moving the HTML stripping after conversion and only applying it to string values (and/or mapping over arrays of strings) so html=true can’t trigger a TypeError.

Suggested change
$value = \strip_tags($value);
if (\is_string($value)) {
$value = \strip_tags($value);
} elseif (\is_array($value)) {
$value = \array_map(
static function ($item) {
return \is_string($item) ? \strip_tags($item) : $item;
},
$value
);
}

Copilot uses AI. Check for mistakes.
Comment on lines +213 to +216
if ($mapping['html'] ?? false) {
$value = \strip_tags($value);
}

Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new html mapping behavior (stripping tags for mapped fields and for title/description) isn’t covered by unit tests. Add/extend tests to assert HTML is stripped when html=true and left intact otherwise, including null/array cases to prevent regressions on PHP 8.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@alexander-schranz alexander-schranz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a test or adopt a test in ObjectToDocumentConverterTest to strip_tags.

Comment thread Search/ObjectToDocumentConverter.php Outdated
Comment thread Search/ObjectToDocumentConverter.php Outdated
Comment thread Search/ObjectToDocumentConverter.php Outdated
Comment thread Search/ObjectToDocumentConverter.php Outdated
@martinlagler martinlagler force-pushed the enhancement/strip-html-tags branch 3 times, most recently from 80a8859 to f83c075 Compare February 10, 2026 15:30
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +209 to +212
$titleExpression = 'object.getStructure().testTitle.getValue()';
$descipritonExpression = 'object.getStructure().testDescription.getValue()';
$titleField = new Expression($titleExpression);
$descriptionField = new Expression($descipritonExpression);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in variable name $descipritonExpression (should be $descriptionExpression). This makes the test harder to read and increases the chance of copy/paste mistakes in future edits.

Copilot uses AI. Check for mistakes.
Comment on lines +215 to +230
$this->indexMetadata->setFieldMapping([
'testTitle' => [
'field' => new Expression($titleExpression),
'type' => 'string',
'indexed' => false,
'aggregate' => true,
'html' => true,
],
'testDescription' => [
'field' => new Expression($descipritonExpression),
'type' => 'string',
'indexed' => false,
'aggregate' => true,
'html' => true,
],
]);
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR also introduces HTML stripping for regular mapped fields in populateDocument() via the html mapping option, but the new tests only cover title/description stripping. Please add a unit test asserting that a mapped field value has tags stripped when 'html' => true (and remains unchanged when false), to prevent regressions in the new mapping behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +91 to 111
$isTitleHtml = false;
$isDescriptionHtml = false;

if ($titleField || $descriptionField) {
foreach ($fieldMapping as $fieldName => $mapping) {
$this->hasRequiredMapping($document, $mapping);
$field = $mapping['field'];
$isHtml = $mapping['html'] ?? false;

if (!$field instanceof Expression) {
continue;
}

if ($titleField instanceof Expression && $field->getExpression() === $titleField->getExpression()) {
$isTitleHtml = $isHtml;
}

if ($descriptionField instanceof Expression && $field->getExpression() === $descriptionField->getExpression()) {
$isDescriptionHtml = $isHtml;
}
}
Copy link

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTML-stripping detection for title/description only works when the metadata title/description fields and the corresponding mapping fields are instances of Expression (it compares getExpression()). If titleField/descriptionField are configured as Field or Property (supported by XmlDriver::getField()), the html flag will never be applied and tags won't be stripped. Consider implementing a generic equality check for FieldInterface types (Expression/Property/Field/Value) or another way to associate the html option with the title/description fields.

Copilot uses AI. Check for mistakes.
@martinlagler martinlagler force-pushed the enhancement/strip-html-tags branch from f83c075 to 509b012 Compare February 11, 2026 07:00
@alexander-schranz alexander-schranz merged commit b528a5b into massiveart:2.9 Feb 11, 2026
8 of 9 checks passed
@alexander-schranz alexander-schranz changed the title Strip tags if the html flag is true Add possibility to Strip tags if the html flag is true Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants