Add possibility to Strip tags if the html flag is true#189
Conversation
ae44709 to
33ed45b
Compare
80f707f to
11e4e25
Compare
There was a problem hiding this comment.
Pull request overview
Adds support for stripping HTML tags from indexed values when a field mapping includes an html flag, aiming to prevent HTML content from being stored in search documents.
Changes:
- Detects whether
titleanddescriptionshould be treated as HTML and strips tags accordingly. - Strips HTML tags for any mapped field when
mapping['html']is truthy during document population.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if ($titleField) { | ||
| $titleValue = $this->fieldEvaluator->getValue($object, $titleField); | ||
| $document->setTitle($titleHtml ? \strip_tags($titleValue) : $titleValue); | ||
| } |
There was a problem hiding this comment.
strip_tags() will throw a TypeError on PHP 8 when $titleValue is null (and can also mis-handle non-string values). Since $titleValue can legitimately be null/empty, guard the call (e.g., only strip when it’s a string) or cast safely so indexing doesn’t crash when the title is missing.
| if ($descriptionField) { | ||
| $description = $this->fieldEvaluator->getValue($object, $descriptionField); | ||
| if ($description) { | ||
| $document->setDescription($description); | ||
| } | ||
| $descriptionValue = $this->fieldEvaluator->getValue($object, $descriptionField); | ||
| $document->setDescription($descriptionHtml ? \strip_tags($descriptionValue) : $descriptionValue); | ||
| } |
There was a problem hiding this comment.
strip_tags() will throw a TypeError on PHP 8 when $descriptionValue is null (and can also break for non-string values). Since descriptions are optional and can resolve to null, guard the call (strip only for strings) to avoid runtime failures during indexing.
| $value = $this->fieldEvaluator->getValue($object, $mappingField); | ||
|
|
||
| if ($mapping['html'] ?? false) { | ||
| $value = \strip_tags($value); |
There was a problem hiding this comment.
Applying strip_tags() directly to $value can crash on PHP 8 when the evaluated value is null, an object (before conversion), or an array (e.g., type=array). Consider moving the HTML stripping after conversion and only applying it to string values (and/or mapping over arrays of strings) so html=true can’t trigger a TypeError.
| $value = \strip_tags($value); | |
| if (\is_string($value)) { | |
| $value = \strip_tags($value); | |
| } elseif (\is_array($value)) { | |
| $value = \array_map( | |
| static function ($item) { | |
| return \is_string($item) ? \strip_tags($item) : $item; | |
| }, | |
| $value | |
| ); | |
| } |
| if ($mapping['html'] ?? false) { | ||
| $value = \strip_tags($value); | ||
| } | ||
|
|
There was a problem hiding this comment.
The new html mapping behavior (stripping tags for mapped fields and for title/description) isn’t covered by unit tests. Add/extend tests to assert HTML is stripped when html=true and left intact otherwise, including null/array cases to prevent regressions on PHP 8.
alexander-schranz
left a comment
There was a problem hiding this comment.
We should add a test or adopt a test in ObjectToDocumentConverterTest to strip_tags.
80a8859 to
f83c075
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $titleExpression = 'object.getStructure().testTitle.getValue()'; | ||
| $descipritonExpression = 'object.getStructure().testDescription.getValue()'; | ||
| $titleField = new Expression($titleExpression); | ||
| $descriptionField = new Expression($descipritonExpression); |
There was a problem hiding this comment.
Typo in variable name $descipritonExpression (should be $descriptionExpression). This makes the test harder to read and increases the chance of copy/paste mistakes in future edits.
| $this->indexMetadata->setFieldMapping([ | ||
| 'testTitle' => [ | ||
| 'field' => new Expression($titleExpression), | ||
| 'type' => 'string', | ||
| 'indexed' => false, | ||
| 'aggregate' => true, | ||
| 'html' => true, | ||
| ], | ||
| 'testDescription' => [ | ||
| 'field' => new Expression($descipritonExpression), | ||
| 'type' => 'string', | ||
| 'indexed' => false, | ||
| 'aggregate' => true, | ||
| 'html' => true, | ||
| ], | ||
| ]); |
There was a problem hiding this comment.
The PR also introduces HTML stripping for regular mapped fields in populateDocument() via the html mapping option, but the new tests only cover title/description stripping. Please add a unit test asserting that a mapped field value has tags stripped when 'html' => true (and remains unchanged when false), to prevent regressions in the new mapping behavior.
| $isTitleHtml = false; | ||
| $isDescriptionHtml = false; | ||
|
|
||
| if ($titleField || $descriptionField) { | ||
| foreach ($fieldMapping as $fieldName => $mapping) { | ||
| $this->hasRequiredMapping($document, $mapping); | ||
| $field = $mapping['field']; | ||
| $isHtml = $mapping['html'] ?? false; | ||
|
|
||
| if (!$field instanceof Expression) { | ||
| continue; | ||
| } | ||
|
|
||
| if ($titleField instanceof Expression && $field->getExpression() === $titleField->getExpression()) { | ||
| $isTitleHtml = $isHtml; | ||
| } | ||
|
|
||
| if ($descriptionField instanceof Expression && $field->getExpression() === $descriptionField->getExpression()) { | ||
| $isDescriptionHtml = $isHtml; | ||
| } | ||
| } |
There was a problem hiding this comment.
The HTML-stripping detection for title/description only works when the metadata title/description fields and the corresponding mapping fields are instances of Expression (it compares getExpression()). If titleField/descriptionField are configured as Field or Property (supported by XmlDriver::getField()), the html flag will never be applied and tags won't be stripped. Consider implementing a generic equality check for FieldInterface types (Expression/Property/Field/Value) or another way to associate the html option with the title/description fields.
f83c075 to
509b012
Compare
Strip HTML tags if the
htmlflag is true.