Jump to content

Topic on Extension talk:PdfHandler

Windows fix after TomRamm's and Mwgbell's fix

1
Checkitthrice8 (talkcontribs)

After applying both of their fixes, I was closer, but not all the way there. When uploading a pdf, I could see the image, but still got the 0x0. One thing that I found helpful was to look at the image table in mysql. If the img_width, img_height and img_metadata for your pdf's aren't getting set, you're not going to see thumbnails. SQL for this:

SELECT * FROM blob.image where img_minor_mime='pdf'

The problem I had was in pdfimage.php trying to create the metadata. It was faulting, and therefore never getting to the point of writing these fields. The problem for me is in the function convertDumpToArray($metaDump, $infoDump). The arguments are the outputs from the command call to pdfinfo.exe.

It's expecting $metaDump to be XML, but it starts out with some key->value pairs. The last pair is a key value pair of Metadata: and then the start of the xml. Here's an example from one of my pdfs:

Creator:        ARTS Import
Producer:       PDFlib 4.0.1 (Win32)
CreationDate:   Thu Aug  1 11:37:01 2002
ModDate:        Thu Aug  1 11:57:11 2002

...

JavaScript:     no
PDF version:    1.3
Metadata:
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d' bytes='853'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
xmlns:iX='http://ns.adobe.com/iX/1.0/'>

...

This is obviously not going to make an xml parser happy. It's possible that I'm using a different version of pdfinfo.exe than this code was expecting. To see if my fix will help you, grab a pdf file and run something like this from the command line:

C:\bin\xpdf-tools-win-4.05\pdfinfo.exe -enc UTF-8 -meta file.pdf > meta

If the resulting meta file has the key->value pairs at the top like mine, you'll need this fix.

In extensions/PdfHandler/includes/phpImage.php, there's a line (in my file at line 182):

$lines=explode( "\n", $infodump);

change to

$lines=explode("\n",$metadump);

and comment out lines 216 to 219:

$metaDump = trim( $metaDump );
if ( $metaDump !== '' ) {
	$data['xmp'] = $metaDump;
}

//$metaDump = trim( $metaDump );
//if ( $metaDump !== '' ) {
//	$data['xmp'] = $metaDump;
//}

Oddly, the code was kind of built to accept the data with key/value pairs and xml mixed. I wonder if this got changed to support a slightly different package. This fix ignores the key/value pairs from $infoDump, which seem to be a duplicate of the ones in $metaDump, except for a list of the size of every single page. I don't know why that would be important to include, but if you think that would be nice, you could change the first part of the fix to:

$lines = explode( "\n", $infoDump );
$lines2 = explode("\n", $metaDump );
$lines = [...$lines, ...$lines2];

One last thing: I don't think that TommRamm's fix is thread-safe, but I'm not sure. If multiple calls happen simultaneously, I think they'll end up stomping on each other meta and page output files. It seemed like that's what was happening when I ran the maintenance files to create the thumbs I was missing. No worries, though, after a few runs they were all good.

Reply to "Windows fix after TomRamm's and Mwgbell's fix"