Skip to content

[Bug]: scale operand doesn't accept trailing 0 #299

@ahoogerheide

Description

@ahoogerheide

Is there an existing issue for this?

  • I have searched the existing issues and found no similar reports

Are you using the latest version of this package?

  • The issue I'm reporting exists in the latest release

Can other PDF readers read the file?

  • The PDF I'm trying to read opens correctly in at least one other PDF reader

When running this snippet

use PrinsFrank\PdfParser\PdfParser;

$document = (new PdfParser())->parseFile('example_004.pdf');
$text = $document->getText();

I run into the following issue/exception (Please attach the pdf)

PDF used from https://tcpdf.org/examples/example_004/
example_004.pdf

When parsing this errors out at some point with a ParseFailureException
with message Invalid scale operand "83.977440" for scale operator

#0 src/Document/ContentStream/ContentStream.php(55): PrinsFrank\PdfParser\Document\ContentStream\Command\Operator\State\TextStateOperator->applyToTextState()
#1 src/Document/ContentStream/ContentStream.php(73): PrinsFrank\PdfParser\Document\ContentStream\ContentStream->getPositionedTextElements()
#2 src/Document/Object/Decorator/Page.php(28): PrinsFrank\PdfParser\Document\ContentStream\ContentStream->getText()
#3 src/Document/Document.php(160): PrinsFrank\PdfParser\Document\Object\Decorator\Page->getText()

It seems the trailing 0 causes the validation to error. I am unsure whether this is not acceptable notation in a PDF.

src/Document/ContentStream/Command/Operator/State/TextStateOperator.php

if (trim($operands) !== (string)($scale = (int) $operands) && trim($operands) !== (string)($scale = (float) $operands)) {
    throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}

A fix is possible by trimming trailing 0's in the $operand string. I find this a bit ugly. The check itself looks like it could fail on other scenarios as well regarding floating point precision perhaps.

Example fix:

if (str_contains((string)$operands, '.')) {
    $operands = rtrim(rtrim((string)$operands, '0'), '.');
}
if (trim($operands) !== (string)($scale = (int) $operands) && trim($operands) !== (string)($scale = (float) $operands)) {
    throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}

A regex may actually be a proper solution. ctype_digit is too strict, is_numeric too lenient. But I'm not sure what values are actually allowed. (negative values? values without leading value like .5 instead of 0.5?)

if (preg_match('/^-?[0-9]+(\.[0-9]+)?$/', $operands, $matches) !== 1) {
    throw new ParseFailureException(sprintf('Invalid scale operand "%s" for scale operator', $operands));
}

Do you allow attachment files to be used in tests to prevent regressions?

The linked file is not in my ownership. But I can make a specific file for regression testing.

  • Yes, I give permission to use this file as a test file to prevent future regressions (And am authorized to give this permission)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions