-
-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Is there an existing issue for this?
- I have searched the existing issues and found no similar reports
Are you using the latest version of this package?
- The issue I'm reporting exists in the latest release
Can other PDF readers read the file?
- The PDF I'm trying to read opens correctly in at least one other PDF reader
When running this snippet
use PrinsFrank\PdfParser\PdfParser;
$document = (new PdfParser())->parseFile('example_004.pdf');
$text = $document->getText();I run into the following issue/exception (Please attach the pdf)
PDF used from https://tcpdf.org/examples/example_004/
example_004.pdf
When parsing this errors out at some point with a ParseFailureException
with message Invalid scale operand "83.977440" for scale operator
#0 src/Document/ContentStream/ContentStream.php(55): PrinsFrank\PdfParser\Document\ContentStream\Command\Operator\State\TextStateOperator->applyToTextState()
#1 src/Document/ContentStream/ContentStream.php(73): PrinsFrank\PdfParser\Document\ContentStream\ContentStream->getPositionedTextElements()
#2 src/Document/Object/Decorator/Page.php(28): PrinsFrank\PdfParser\Document\ContentStream\ContentStream->getText()
#3 src/Document/Document.php(160): PrinsFrank\PdfParser\Document\Object\Decorator\Page->getText()
It seems the trailing 0 causes the validation to error. I am unsure whether this is not acceptable notation in a PDF.
src/Document/ContentStream/Command/Operator/State/TextStateOperator.php
if (trim($operands) !== (string)($scale = (int) $operands) && trim($operands) !== (string)($scale = (float) $operands)) {
throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}
A fix is possible by trimming trailing 0's in the $operand string. I find this a bit ugly. The check itself looks like it could fail on other scenarios as well regarding floating point precision perhaps.
Example fix:
if (str_contains((string)$operands, '.')) {
$operands = rtrim(rtrim((string)$operands, '0'), '.');
}
if (trim($operands) !== (string)($scale = (int) $operands) && trim($operands) !== (string)($scale = (float) $operands)) {
throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}
A regex may actually be a proper solution. ctype_digit is too strict, is_numeric too lenient. But I'm not sure what values are actually allowed. (negative values? values without leading value like .5 instead of 0.5?)
if (preg_match('/^-?[0-9]+(\.[0-9]+)?$/', $operands, $matches) !== 1) {
throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}
Do you allow attachment files to be used in tests to prevent regressions?
The linked file is not in my ownership. But I can make a specific file for regression testing.
- Yes, I give permission to use this file as a test file to prevent future regressions (And am authorized to give this permission)