Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't parse fdpf file from 1.86 version of FPDF and works fine with FPDF 1.81 #703

Open
Saulight73 opened this issue Apr 15, 2024 · 8 comments
Labels

Comments

@Saulight73
Copy link

The error we have in our logs comes from when we parse the data of the pages. We are using a PDF generated by the latest version of FPDF, version 1.86. However, the last version where this error did not occur is 1.81. Therefore, we would like to have, if possible, an idea of what could be causing this error:

Undefined array key 0 in /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.5.0/src/Smalot/PdfParser/Page.php on line 284.

Even with version 2.9.0 of your parser, the error persists. Therefore, I am attaching my PHP parsing code below:

    private static function getXandYofPDFText(string $stringtosearch, string $pdfLink, int $documentID){
        if (!is_string($stringtosearch) || empty($stringtosearch)) {
            throw new Exception(ErrorCodesHelper::get("INVALID_PARAMETERS",["stringtosearch"]));
        }
    
        if (!is_string($pdfLink) || empty($pdfLink)) {
            throw new Exception(ErrorCodesHelper::get("INVALID_PARAMETERS",["pdfLink"]));
        }
    
        if (!is_int($documentID)) {
            throw new Exception(ErrorCodesHelper::get("INVALID_PARAMETERS",["documentID"]));
        }
    
        if ($documentID <= 0) {
            throw new Exception(ErrorCodesHelper::get("INVALID_PARAMETERS",["documentID"]));
        }

        $parser = new \Smalot\PdfParser\Parser();

        $globalArray = array();
        $pdf = $parser->parseContent( @file_get_contents( $pdfLink ) );

        
        if( $pdf === null )
        {
            throw new Exception(ErrorCodesHelper::get("DOCUSIGN_API_CALL_ERROR",["Impossible de parser le document suivant : ".$pdfLink]));
        }

        $compteurpage = 1;
        $pages = $pdf->getPages();

        
        if( $pages === null )
        {
            throw new Exception(ErrorCodesHelper::get("DOCUSIGN_API_CALL_ERROR",["Impossible de parser les pages du document suivant : ".$pdfLink]));
        }
        
        foreach( $pages as $pagenumber )
        
        {

            // print_r($pagenumber);
            
            /**
             * Récupération du texte et des informations associées (ancres, textes, coordonnées du début de la ligne depuis en bas à gauche, etc.)
             */
            $dataTm = $pagenumber->getDataTm(); 
            
            if( $dataTm == null )
            {
                throw new Exception(ErrorCodesHelper::get("DOCUSIGN_API_CALL_ERROR",["Impossible de parser la data des pages pour le document suivant : ".$pdfLink]));
            }

            $compteurindex = 0;
            foreach( $dataTm as $a )
            {
                if ( str_contains( $a[ 1 ], $stringtosearch ) ) 
                {
                    /**
                     * Je récupère les coordonnées X et Y, le numéro de la page, le numéro d'ordre du signataire et le numéro d'ordre du document.
                     */
                    $line = $dataTm[ (string)$compteurindex ];
                    $x = (int)$line[ 0 ][ 4 ];
                    $y = 859 - (int)$line[ 0 ][ 5 ];
                    $array = [ $x, $y, $compteurpage, $documentID ];
                    
                    @array_push( $globalArray, $array );
                }
                $compteurindex++;
            }
            $compteurpage++;
            
        }

        return $globalArray;

    }
    ```
    
    
Thank you for providing us with prompt assistance for our production solution.

Best regards,

GLENAT Group
@k00ni k00ni added the bug label Apr 16, 2024
@Saulight73
Copy link
Author

@k00ni any news about this issue ? we can't work well parsing PDF's with 2.10 version. its also the same error :

[Tue May 28 10:46:34.576806 2024] [proxy_fcgi:error] [pid 3767717] [client 10.1.21.27:53967] AH01071: Got error 'PHP message: PHP Warning: Undefined array key 0 in /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php on line 279; PHP message: PHP Fatal error: Uncaught TypeError: Smalot\PdfParser\Page::getPDFObjectForFpdf(): Return value must be of type Smalot\PdfParser\PDFObject, null returned in /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php:279\nStack trace:\n#0 /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php(399): Smalot\PdfParser\Page->getPDFObjectForFpdf()\n#1 /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php(424): Smalot\PdfParser\Page->extractRawData()\n#2 /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php(504): Smalot\PdfParser\Page->extractDecodedRawData()\n#3 /var/www/clients/client1/web10/web/application/library/php/pdfparser-2.10.0/src/Smalot/PdfParser/Page.php(655): Smalot\PdfParser\Page->getDataCommands()\n#4 /var/www/clients/client1/web10/web/application/library/php/Glenat/App/DocuSignApp....', referer: http://core-test.glenat.com/

@k00ni
Copy link
Collaborator

k00ni commented Jun 5, 2024

Please upload a PDF here which causes this problem.

@Saulight73
Copy link
Author

PDF Problem.pdf

Here it is.

@k00ni
Copy link
Collaborator

k00ni commented Jun 5, 2024

I tried your PDF, but PDFParser reported a different error:

PHPUnitTests\Integration\ParserTest::testIssue703
Exception: Invalid object reference for $obj.
/var/www/html/src/Smalot/PdfParser/RawData/RawDataParser.php:536
/var/www/html/src/Smalot/PdfParser/RawData/RawDataParser.php:242
/var/www/html/src/Smalot/PdfParser/RawData/RawDataParser.php:918
/var/www/html/src/Smalot/PdfParser/RawData/RawDataParser.php:952
/var/www/html/src/Smalot/PdfParser/Parser.php:103
/var/www/html/tests/PHPUnit/Integration/ParserTest.php:446
phpvfscomposer:///var/www/html/dev-tools/vendor/phpunit/phpunit/phpunit:106

Here is my test code (separate branch issue/703):

https://github.com/smalot/pdfparser/blob/issue/703/tests/PHPUnit/Integration/ParserTest.php#L442-L464

I may have made a mistake somewhere, can you have a look please?

@Saulight73
Copy link
Author

$localPdfPath = './tpm.pdf';
file_put_contents($localPdfPath, file_get_contents($pdfLink));
$pdf = $parser->parseFile($localPdfPath);

here it our code same as yours so didnt know why you have different error! Can you try to save localy the pdf in en tmp file like use maybe solve this problem

@k00ni
Copy link
Collaborator

k00ni commented Jun 5, 2024

I tried it locally and got the error I mentioned. After #719 got merged, we can run CI for side-branches too and will see if the same error occurs.

@k00ni
Copy link
Collaborator

k00ni commented Jun 5, 2024

#719 got merged.

My last found error is also shown online for your given PDF, for instance here for PHP 7.2: https://github.com/smalot/pdfparser/actions/runs/9386160259/job/25846037622#step:6:37 (or the same error here for PHP 8.3)

Exception: Invalid object reference for $obj.
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:536
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:242
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:918
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:952
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/Parser.php:103
/home/runner/work/pdfparser/pdfparser/tests/PHPUnit/Integration/ParserTest.php:453

Also, which PHP version do you use?

Btw. might be the same error as in #714

@Saulight73
Copy link
Author

#719 got merged.

My last found error is also shown online for your given PDF, for instance here for PHP 7.2: https://github.com/smalot/pdfparser/actions/runs/9386160259/job/25846037622#step:6:37 (or the same error here for PHP 8.3)

Exception: Invalid object reference for $obj.
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:536
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:242
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:918
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/RawData/RawDataParser.php:952
/home/runner/work/pdfparser/pdfparser/src/Smalot/PdfParser/Parser.php:103
/home/runner/work/pdfparser/pdfparser/tests/PHPUnit/Integration/ParserTest.php:453

Also, which PHP version do you use?

Btw. might be the same error as in #714

We use PHP 8.2 and before we try in PHP 8.3 i can give you some others PDF if you want generated with other parameters and version :

test3_fpdfSeul.pdf
test4_iso.pdf
test5_serveurapache.pdf
test6_serveurapacheIso.pdf
test7_serveurapacheIsoV1.pdf
test9_V2-1.pdf
test9_V2-2.pdf
test9_V2-3.pdf
test10-V2-bloc-fin.pdf
test11-v2-blocFin.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants