Jump to content

Extension talk:Pdf Export/archive 2

Add topic
From mediawiki.org


Still No Images

[edit]

Hi there, someone can help us about this problem ? the PDF exporting is great, but the link is incorrect for the image "Image:Image_Name.gif", and the not the direct link itself.

Help please ! th3_gooroo@hotmail.com



Here was my solution:

After adding:

global $wgUploadPath;

I replaced

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);

$bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);

with

$bhtml = str_replace ($wgUploadPath, $wgUploadPath.'/', $bhtml);

Currently running at http://kumu.brocku.ca


Possible Image Fix

[edit]

This solution almost works for me - but it led me to the right answer: Add these two lines where the other globals are defined:

global $wgUploadPath;
global $IP;

and then change

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);

$bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);

to

$bhtml = str_replace ($wgUploadPath, $IP.$wgUploadPath.'/', $bhtml);

For me this fixed the problem with images but internal links to wiki pages still didn't work. These links are relative just like the images and need to be absoluted. --justinm

Images With Windows

[edit]

I managed to get this working by inserting the following:

$bhtml = str_replace ('href="/', 'href="', $bhtml);

$bhtml = str_replace ('src="/', 'src="', $bhtml)

I don't really know if this is the best way to do it, but it fixed my problem with the html being generated as:

<a href="/index.php

or

<img src="/30/6/image.jpg

Images When Wiki Is Root of Domain

[edit]

The string replacement

$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);

doesn't do anything if $wgScriptPath is the empty string. This is the case if the wiki is the root of the domain.

I've fixed this by replacing the above line with

$bhtml = str_replace ("href=\"/index.php", "href=\"" . $wgServer . "/index.php", $bhtml);
$bhtml = str_replace ("src=\"/images", "src=\"" . $wgServer . "/images", $bhtml);

so both the href="..." and the src="..." attributes are fixed up.

It seems to me that the root of these problems is that there's no way to pass a default server prefix to htmldoc.

You could try Extension:PdfBook, it can also export single pages now and images work fine on it. --Nad 21:13, 2 November 2008 (UTC)Reply
Thanks Nad. this worked. Using a redirect and virtual host in Apache, server.oursite.org/wiki becomes wiki.oursite.org/. With your edit, images appear again on the PDFs. --Erikvw 11:11, 17 December 2008 (UTC)Reply

More generic fix

[edit]

Thanks to all for pointing me in the right path to fix this on my wiki. After looking at a few wiki installations, it appears some problems can be seen when wgScript to an empty string, while others choose not to use the default path to articles. Either $wgScript or wgScriptPath may together serve as the generic approach to finding the absolute path to an article however. The following should help images display, and fix relative links, while maintaining all absolute links. Please comment/improve as needed:

Replace the following two lines in converters/HtmlDocPdfConverter.php (same as others have pointed out):

  $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
  $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);

with the following:

  global $wgUploadPath,$wgScript;
  $bhtml = str_replace ($wgUploadPath, $wgServer.$wgUploadPath,$bhtml);
  if (strlen($wgScriptPath)>0)
    $pathToTitle=$wgScriptPath;
  else $pathToTitle=$wgScript;
  $bhtml = str_replace ("href=\"$pathToTitle", 'href="'.$wgServer.$pathToTitle, $bhtml);

--Biyer 16:05, 25 January 2010 (UTC)Reply

what files should be changed ?

[edit]

Very nice this solutions, but which files should be editted?

Possible Fix

[edit]

on UNIX: check the DNS-Resolution on the webserver which host the wiki (nslookup or dig). Is there no resolution for your wiki-domain, htmldoc can not read the images from servers (and gives also no errors out). But the generated PDF contains no images.

Still Still no Images

[edit]

hello, after some problems with htmldoc this extension works on my wiki, but in pdfs there are no images. i tried all this solutions, but no one works on my (windows) system. Are there other solutions/fixes ?

Thanks. Johannes741 14:35, 25 May 2010 (UTC)Reply

No images if you use security

[edit]

While we're piling hack upon hack, here is mine. Our wiki is secured using mod_auth_ntlm (its a WAMP) but this will work for anyone who has a .htaccess as well.

$bhtml = str_replace("src=\"http://","src=\"http://WINSERVERNAME\\winusername:winpassword@",$bhtml);

Empty Files

[edit]

Help, i only get empty files. But when i execute this line on the console i get a working pdf...

htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage '$mytemp' > test.pdf

Whats wrong ? Maybe an apache-issue or something ?

I had the same problem. I changed this line to passthru("/usr/local/bin/htmldoc -t ... and it worked. The path before htmldoc is usually the same as $wgImageMagickConvertCommand in LocalSettings.php. /usr/bin/htmldoc is another usual path to try. --Ikiwaner 20:06, 20 December 2006 (UTC)Reply
I had problems with the permissions on /var/tmp which I fixed by running chmod o+wt /var/tmp 147.209.216.245 01:31, 15 June 2007 (UTC)Reply
I had this problem on Windows XP. Took me hours to figure out. I moved the C:\Program Files\HTMLDoc to C:\HTMLDoc, added HTMLDoc as a virtual directory, then in the PDFExport.php I use the line:

$mytemp = "pdftemp/" .time(). "-" .rand() . ".html";
passthru("C:\HTMLDoc\htmldoc.exe --book --toclevels 3 -t pdf13 --numbered --no-strict --book --toclevels 3 --title --firstpage c1 --toctitle \"Table of Contents\" --tocheader .t. --tocfooter ..i --linkstyle underline --size Universal --left 1.00in --right 0.50in --top 0.50in --bottom 0.50in --header .t. --footer h.1 --nup 1 --tocheader .t. --tocfooter ..i --bodyfont Times --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset iso-8859-1 --color --quiet --jpeg --browserwidth 680 $mytemp");

The problem occurred because of the space between in "Program Files", and if I remember correctly it WOULD NOT WORK if you do not append the .exe to htmldoc
You could have also used c:\progra~1 and left HTMLDOC where it was. Kids today...

Clean code

[edit]

Hi there, would be nice if you could post the clean code with all above corrections included. I tried them all with mediawiki 1.8.2 but I can't get it to work. Cheers Florian

Test and working on 1.9

[edit]

this was tried on Linux/Apache/PHP5 with 1.9 in a software configuration very simillar to Wikipedia and works fine.

tom

Working code for Windows with MediaWiki v1.9

[edit]

The PDF export worked for me on Windows after I fixed the path AND more importantly on line 80, I had to change '$mytemp' to $mytemp - i.e. remove the single quotes aroung $mytemp.

-Vivek Agarwal

Here is the complete source:

<?php

if (!defined('MEDIAWIKI')) die();
require_once ("$IP/includes/SpecialPage.php");

$wgExtensionFunctions[] = 'wfSpecialPdf';
$wgExtensionCredits['specialpage'][] = array(
        'name' => 'Pdf',
        'author' =>' Thomas Hempel',
        'description' => 'prints a page as pdf',
        'url' => 'http://www.netapp.com'
);

$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav';
$wgHooks['MonoBookTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';

function wfSpecialPdf() {
  global $IP, $wgMessageCache;

        $wgMessageCache->addMessages(
                array(
                        'pdfprint' => 'PdfPrint' ,
                        'pdf_print_link' => 'Print as PDF'));

        class SpecialPdf extends SpecialPage {
                var $title;
                var $article;
                var $html;
                var $parserOptions;
                var $bhtml;

                function SpecialPdf() {
                        SpecialPage::SpecialPage( 'PdfPrint' );
                }

                function execute( $par ) {
                        global $wgRequest;
                        global $wgOut;
                        global $wgUser;
                        global $wgParser;
                        global $wgScriptPath;
                        global $wgServer;

                        $page = isset( $par ) ? $par : $wgRequest->getText( 'page' );
                        $title = Title::newFromText( $page );
                        $article = new Article ($title);
                        $wgOut->setPrintable();
                        $wgOut->disable();
                        $parserOptions = ParserOptions::newFromUser( $wgUser );
                        $parserOptions->setEditSection( false );
                        $parserOptions->setTidy(true);
                        $wgParser->mShowToc = false;
                        $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent() ) ."\n\n",
                                        $title, $parserOptions );

                        $bhtml = $parserOutput->getText();
                        $bhtml = utf8_decode($bhtml);

                        $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
                        $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);
                        $bhtml = str_replace ('<img', '<img', $bhtml);
                        $bhtml = str_replace ('/>', '/>', $bhtml);

                        $html = "<html><head><title>" . $page . "</title></head><body>" . $bhtml . "</body></html>";

                        // make a temporary directory with an unique name
                        $mytemp = "d:/tmp/f" .time(). "-" .rand() . ".html";
                        $article_f = fopen($mytemp,'w');
                        fwrite($article_f, $html);
                        fclose($article_f);
                        putenv("HTMLDOC_NOCGI=1");


                        # Write the content type to the client...
                        header("Content-Type: application/pdf");
                        header("Content-Disposition: attachment; filename=\"$page.pdf\"");
                        flush();

                        # Run HTMLDOC to provide the PDF file to the user...
                        passthru("htmldoc -t pdf14 --bodyfont Helvetica --no-links --linkstyle plain --footer c.1 --header c.1 --tocheader ... --charset 8859-1 --color --quiet --jpeg --webpage $mytemp");
                        unlink ($mytemp);

                }
        }
        SpecialPage::addPage (new SpecialPdf());
}

function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) {
                $nav_urls['pdfprint'] = array(
                        'text' => wfMsg( 'pdf_print_link' ),
                        'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=" . wfUrlencode( "{$skintemplate->thispage}
" )  )
                );

        return true;
}

function wfSpecialPdfToolbox( &$monobook ) {
        if ( isset( $monobook->data['nav_urls']['pdfprint'] ) )
                if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) {
                        ?><li id="t-ispdf"><?php echo $monobook->msg( 'pdf_print_link' ); ?></li><?php
                } else {
                        ?><li id="t-pdf"><?php
                                ?><a href="<?php echo htmlspecialchars( str_replace('%0D%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] )) ?>">
<?php
                                        echo $monobook->msg( 'pdf_print_link' );
                                ?></a><?php
                        ?></li><?php
                }
        return true;
}
?>
  • I used your code... It gave me link to Print PDF. But when i attempt to open the PDF it say that its unable to open and size of the pdf is only 5kb. You were talking abt fixing "path" Do we need to set any path?? help me...

Page rendering

[edit]

Hello, I'd wish some improvement of this useful extension. While it works technically it's a fact that the pages look better when printed to a PDF printer over your web browser. To have an improvement compared to web broweser PDFs it should look more LaTeX-style. --Ikiwaner 00:00, 16 January 2007 (UTC)Reply

Ever tried Extension:Wiki2LaTeX? --Flominator 10:30, 14 August 2007 (UTC)Reply

Errors in the last version in discussion

[edit]

Using the very last iteration of the code posted in the discussions, I get the following error when I click the Print as PDF link:

Fatal error: Call to a member function getNamespace() on a non-object in /srv/www/htdocs/wiki/includes/Article.php on line 150

Working with sites starting with http://www...not with sites http://...

[edit]

Seems to be working only with sites which include www in their adress. Is this possible?--87.2.110.219 21:53, 23 January 2007 (UTC)Reply

A problem with htmldoc encoding

[edit]

htmldoc is very sensitive about the encoding.

In file SpecialPdf.php, line 89
passthru("/usr/bin/htmldoc -t pdf14 --charset 8859-1 --color --quiet --jpeg --webpage...,
for the new version of htmldoc --charset should be iso-8859-1
Ivan

What is the purpose of HTMLDOC, if it's Windows app?

[edit]

Hi all

I don't quite understand, if HTMLDOC is a windows application, how will this help me if my web server is a Linux server? There is also a Linux Version!

A problem with PHP passthru

[edit]

Thanks for the extension! I'm using MediaWiki 1.8.2 on Windows 2003 and it's work. I had a problem with the passthru fonction that i solved by copying cmd.exe in the php installation folder.

Where do I download extension

[edit]

Can't seem to find SpecialPage.php, where do I download the file?

Just cut and paste the code above into a text file with that name and extension. Jschroe

Scaling Images to fit paper width

[edit]

I found an argument to htmldoc, that allows you to specify the 'width' of the page in pixels, which is sort of the opposite of scaling, but rather, setting the viewable resolution for the images. So, I have an image that is 900 pixels wide, I'd want to set my browser width to something greater than 900 to see the whole image at once.

   passthru("htmldoc -t pdf14 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'");

would become:

   passthru("htmldoc -t pdf14 --browserwidth 950 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'");

--DavidSeymore 18:48, 14 May 2007 (UTC)Reply

doesn't work for me - I ended up having to use getimagesize() and adding a max-limited width parameter which isn't ideal.

Title of Wiki-Article in PDF

[edit]

Hi,

i was wondering if it it possible to display the title of the Wiki article in the generated PDF?!THX

Change $html = "<html><head><title>" . $page . "</title></head><body>" . $bhtml . "</body></html>";
to something like $html = "<html><head></head><body><H1>" . $title . "</H1>" . $bhtml . "</body></html>"; Suspect

Name of the generated file

[edit]

The extension works fine except for the fact that is outputs a file called index.php, renaming this to something.pdf works as it is in fact an pdf file. But I m wondering how I could fix it so it outputs <article_name.pdf>. Any suggestions?

Solution

[edit]

In file PdfExport_body.php line 95 :

- header(sprintf('Content-Disposition: attachment; filename="%s"', $wgRequest->getText('filename')));
+ header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $pages[0]));

You get <article_name.pdf> instead of <index.php>.

Generated PDF is 1kb and corrupted

[edit]

I am testing MW 1.10.1 on a LAMP system (upgraded from 1.6.7). Then tried the instructions on the main article page. After clicking on the Print to PDF link though I get a 1kb PDF. Any ideas as to what could be wrong? How do i go about fixing this? SellFone 07:23, 3 August 2007 (UTC)Reply

Solution

[edit]

I downloaded the HTMLDOC Binary and i didnt realize that you had to pay for it. When I ran it, it asked for a license and so I went ahead and installed gcc & gcc-c++ so i could compile from source and now its working.

Please provide it as download

[edit]

Patch to Export Multiple Pages to PDF

[edit]

This patch creates a Special Page form similar to Special:Export for specifying multiple pages to export to PDF. This patch was created against the 1.1 (19-July-2007) version of PdfExport and tested in MediaWiki 1.11.

Please post the whole files, insteat the patches. Thx


Patch for PdfExport.php:

19,20c19,20
<         'version' => '1.1 (19-July-2007) ',
<         'description' => 'renders a page as pdf',
---
>     'version' => '1.1-csn (1-Aug-2007) ',
>     'description' => 'renders one or more pages as pdf',
43,46d42
<  
<  
<  
<  
59c55,59
<                         $page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); 
---
>             $page = isset( $par ) ? $par : $wgRequest->getText( 'pages' );
>             if(strlen($page) > 0) {
>                 $pages = explode( "\n", $page );
>                 $html = "<html><head><title>" . (count($pages) == 1 ? str_replace('_',' ',utf8_decode($page)) : wfMsg( 'pdf_collection_title' )) . "</title></head><body>";
>                 foreach($pages as $page) {
68,69c69
<                         $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent() ) ."\n\n",
<                                         $title, $parserOptions );
---
>                     $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent() ) ."\n\n", $title, $parserOptions );
77,78c77,79
<  
<                         $html = "<html><head><title>" . utf8_decode($page) . "</title></head><body>" . $bhtml . "</body></html>";
---
>                     $html .= "<h1>".str_replace('_',' ',utf8_decode($page))."</h1>".$bhtml;
>                 }
>                 $html .= "</body></html>";
91c92
<                         #header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));
---
>                 header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));
103c104,119
<  
---
>             } else {
>             	$wgOut->setPagetitle( 'PDF Export' );
>                 $self = SpecialPage::getTitleFor( 'PdfPrint' );
>                 $wgOut->addHtml( wfMsgExt( 'pdf_text', 'parse' ) );
>                 
>                 $form = Xml::openElement( 'form', array( 'method' => 'post', 'action' => $self->getLocalUrl( 'action=submit' ) ) );
>                 
>                 $form .= Xml::openElement( 'textarea', array( 'name' => 'pages', 'cols' => 40, 'rows' => 10 ) );
>                 $form .= htmlspecialchars( $page );
>                 $form .= Xml::closeElement( 'textarea' );
>                 $form .= '<br />';
>                 
>                 $form .= Xml::submitButton( wfMsg( 'pdf_submit' ) );
>                 $form .= Xml::closeElement( 'form' );
>                 $wgOut->addHtml( $form );
>             }
112c128
<                         'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "page=" . wfUrlencode( "{$skintemplate->thispage}" )  )
---
>         'href' => $skintemplate->makeSpecialUrl( 'PdfPrint', "pages=" . wfUrlencode( "{$skintemplate->thispage}" )  )

Patch for PdfExport.i18n.php:

12c12,15
<         'pdf_print_link' => 'Print as PDF'
---
>         'pdf_print_link' => 'Print as PDF',
>         'pdf_collection_title' => 'Wiki Pages',
>         'pdf_text' => '<p>You can export the text of a particular page or set of pages to PDF.</p><p>To export pages, enter the titles in the text box below, one title per line.</p><p>If you just want one page, you can also use a link, e.g. [[{{ns:Special}}:PdfPrint/{{MediaWiki:mainpage}}]] for the page "[[{{MediaWiki:mainpage}}]]".</p>',
>         'pdf_submit' => 'Convert'


--Johnp125 18:09, 25 September 2007 (UTC)Reply

Tried out the extra code. I get a htmldoc error when going to pdfprint. the regular print as pdf seems to work just fine just not the additional items.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved. This software is based in part on the work of the Independent JPEG Group. ERROR: No HTML files! Usage: htmldoc [options] filename1.html [ ... filenameN.html ] htmldoc filename.book Options: --batch filename.book --bodycolor color --bodyfont {courier,helvetica,monospace,sans,serif,times} --bodyimage filename.{bmp,gif,jpg,png} --book --bottom margin{in,cm,mm} --browserwidth pixels --charset {cp-874...1258,iso-8859-1...8859-15,koi8-r} --color --compression[=level] --continuous --cookies 'name="value with space"; name=value' --datadir directory --duplex --effectduration {0.1..10.0} --embedfonts --encryption --firstpage {p1,toc,c1} --fontsize {4.0..24.0} --fontspacing {1.0..3.0} --footer fff {--format, -t} {ps1,ps2,ps3,pdf11,pdf12,pdf13,pdf14,html,htmlsep} --gray --header fff --header1 fff --headfootfont {courier{-bold,-oblique,-boldoblique}, helvetica{-bold,-oblique,-boldoblique}, monospace{-bold,-oblique,-boldoblique}, sans{-bold,-oblique,-boldoblique}, serif{-bold,-italic,-bolditalic}, times{-roman,-bold,-italic,-bolditalic}} --headfootsize {6.0..24.0} --headingfont {courier,helvetica,monospace,sans,serif,times} --help --helpdir directory --hfimage0 filename.{bmp,gif,jpg,png} --hfimage1 filename.{bmp,gif,jpg,png} --hfimage2 filename.{bmp,gif,jpg,png} --hfimage3 filename.{bmp,gif,jpg,png} --hfimage4 filename.{bmp,gif,jpg,png} --hfimage5 filename.{bmp,gif,jpg,png} --hfimage6 filename.{bmp,gif,jpg,png} --hfimage7 filename.{bmp,gif,jpg,png} --hfimage8 filename.{bmp,gif,jpg,png} --hfimage9 filename.{bmp,gif,jpg,png} --jpeg[=quality] --landscape --left margin{in,cm,mm} --linkcolor color --links --linkstyle {plain,underline} --logoimage filename.{bmp,gif,jpg,png} --no-compression --no-duplex --no-embedfonts --no-encryption --no-links --no-localfiles --no-numbered --no-overflow --no-pscommands --no-strict --no-title --no-toc --numbered --nup {1,2,4,6,9,16} {--outdir, -d} dirname {--outfile, -f} filename.{ps,pdf,html} --overflow --owner-password password --pageduration {1.0..60.0} --pageeffect {none,bi,bo,d,gd,gdr,gr,hb,hsi,hso,vb,vsi,vso,wd,wl,wr,wu} --pagelayout {single,one,twoleft,tworight} --pagemode {document,outline,fullscreen} --path "dir1;dir2;dir3;...;dirN" --permissions {all,annotate,copy,modify,print,no-annotate,no-copy,no-modify,no-print,none} --portrait --proxy http://host:port --pscommands --quiet --referer url --right margin{in,cm,mm} --size {letter,a4,WxH{in,cm,mm},etc} --strict --textcolor color --textfont {courier,times,helvetica} --title --titlefile filename.{htm,html,shtml} --titleimage filename.{bmp,gif,jpg,png} --tocfooter fff --tocheader fff --toclevels levels --toctitle string --top margin{in,cm,mm} --user-password password {--verbose, -v} --version --webpage fff = heading format string; each 'f' can be one of: . = blank / = n/N arabic page numbers (1/3, 2/3, 3/3) : = c/C arabic chapter page numbers (1/2, 2/2, 1/4, 2/4, ...) 1 = arabic numbers (1, 2, 3, ...) a = lowercase letters A = uppercase letters c = current chapter heading C = current chapter page number (arabic) d = current date D = current date and time h = current heading i = lowercase roman numerals I = uppercase roman numerals l = logo image t = title text T = current time

This shows at the top of the page.

Below that is the options to convert different documents to pdf, but it does not work.

Fedora C 4 fix

[edit]

I have installed the pdf export extension and added the code in localsettings.php and my wiki just shows a blank screen when I install this extension. I have installed htmldoc and it works from command prompt.


Has anyone a solution,, to get this run under windows ?

[edit]

Please post here. Export works already fine, but the pdf file is empty.

ThX

The code as posted above under Working Code for Windows with MediaWiki v1.9 works.. you just need to make sure the temp file path exists and is writable, and that IIS anonymous user has access to cmd and anonymous access is possible for images.. Suspect

Fatal Error

[edit]

Call to undefined method: specialpdf->__construct() in /usr/home/admin/domains/<domain>/public_html/mediawiki-1.6.10/extensions/PdfExport/PdfExport.php on line 51

This occurs when opening any page in MediaWiki. HTMLDoc installed


To fix this problem, replace

parent::__construct( 'PdfPrint' );

with

SpecialPage::SpecialPage ('PdfPrint');

Got it working on Win2k3 and MediaWiki 1.10.1 (Finally)

[edit]

Here's the solution. Copy and paste the code from above, and make the following modifications:

  • Download the latest version of HTMLDoc (which, despite the claim that it has an installer, it does not)
  • Extract the contents of the HTMLDoc zip file to C:\Program Files\HTMLDoc\
  • Add "C:\Program Files\HTMLDoc\" to the PATH environment variable
  • Set IUSR_<MACHINE-NAME> "Read" and "Read & Execute" permissions on C:\Windows\System32\cmd.exe
  • Set IUSR_<MACHINE-NAME> "Full Control" on C:\Windows\Temp\
  • Copy and paste the new PdfExport.php code from Working Code for Windows with MediaWiki v1.9
  • Change the value of $mytemp to $mytemp = "C:\\Windows\\Temp\\f" .time(). "-" .rand() . ".html";

That was enough to do it for me - hope this helps some of you!

~JT

Got it working with Win2k3 and MediaWiki 1.11.0

[edit]
  • I used the above instructions for 1.10, except the value of $mytemp indicated by JT was wrong - I have changed it to include double backslashes.
  • If it still doesn't work, try copying your CMD.exe to your \PHP folder. Make sure your PHP folder has IUSR_MACHINENAME read and read & execute permissions.

~Mark E.

To make images work, I had to change line 59 from:
$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
to
$bhtml = str_replace ($wgScriptPath, $wgScriptPath, $bhtml);

because it was doubling the servername; and in line 81, I changed --charset 8859-1 to --charset iso-8859-1. --Maiden taiwan 20:05, 6 March 2008 (UTC)Reply

Could you explain for beginners what we are suppose to do with HTMLDoc. How to use this extension on my website ? Thanks for your help. Marcjb 22:32, 27 August 2007 (UTC)Reply

Working on Debian unstable with Mediawiki 1.7

[edit]

But I'm not getting any images either.

Datakid 01:55, 5 September 2007 (UTC)Reply

More robust way of ensuring URL's are absolute

[edit]

I've had to make URL's absolute for a couple of other extensions and found a more robust way than doing a replacement on the parsed text. Instead just set the following three globals before the parser is called and it will make them all absolute for you:

$wgArticlePath = $wgServer.$wgArticlePath;
$wgScriptPath  = $wgServer.$wgScriptPath;
$wgUploadPath  = $wgServer.$wgUploadPath;
$wgScript      = $wgServer.$wgScript;

--Nad 04:00, 5 September 2007 (UTC)Reply

Nad, Is the parser that you refer to already in the php script that is on the front page? I've added those lines at the top of the file, after $wgHooks and before function wfSpecialPdf(), and still no images?

I also tried putting those lines in SpecialPdf.execute() after $wgServer; and before $page = isset( $par ) ? $par : $wgRequest->getText( 'page' ); Still no joy. Datakid 01:31, 6 September 2007 (UTC)Reply
Ideally they should be defined just before the $wgParser->parse is called in the execute() function. But I've just grep'd the html output of one of my exported PDF's from Extension:Pdf Book which uses this method of "absoluterising", but it hasn't worked for the images, maybe best to stick with the current text-replacement until I sort out that problem. The replacement used to make the url's absolute is:
$bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
$bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);
Just realised that you also need to modify $wgUploadPath to make image src attributes absolute too, but I also have a problem with images showing up even with the src attributes being absolute... --Nad 04:13, 6 September 2007 (UTC)Reply


Holla,
  • Keep in mind that, /w/ above is highly installation dependant, and may be quite different between installations, actually, it can be any string from / upwards.
  • Keep in mind that, not every image resides in the $wgUploadPath or a subdirectory thereof. Images might be in a shared repository, like WikiMedia Commons.
    Newest development is to allow an arbitrary number of such repositories per installation. Since it is likely for them to have different individual algorithms to 'absolutize' their path names, it might be a good idea to rely on an image-related function for the task. Images 'know' which repository they are in, including the local file system, or the upload directory. The philosophy of image related code is to always address Image objects, and usually not deal with repositories directly. So when you find an image in the source text, get a new Image($name) object, and you probably have a function returning an absolute URL right with it.
    If it's not there, add it, or file a bug. If you cannot do it yourself, you could also ask me to add it, and I might even do so (-; I'm dealing with image-related functions anyways atm. ;-)
--Purodha Blissenbach 07:55, 8 September 2007 (UTC)Reply
Having had a glace at the code, I am pretty certain that, the structurally correct way is, to let the parser object take care of asking image objects for absolute URLs. That means, add a $parserOptions->setImageAbsURLs(true); or similar, which is likely not yet possible, but imho pretty easily added to the parser. Investigate! It may be already there. Else, see above.
--Purodha Blissenbach 08:13, 8 September 2007 (UTC)Reply

Bug: /tmp/ not writeable due to openbasedir restriction.

[edit]

We did not get a pdf file, but got a series of php output of the type:

Warning: fopen() [<a href='function.fopen'>function.fopen</a>]: open_basedir restriction in effect. File(/tmp/f1189268162-720194324.html) is not within the allowed path(s): (/home/www/...) in /home/www/.../extensions/PdfExport/PdfExport.php on line 85

(paths truncated for brevity)

instead. We suggest, not to send these downstream (with wrong http headers, btw.) but rather display a decent error message on the special page. Likely, using if(is_writable($file)){…} before writing into the file would do. --Purodha Blissenbach 16:48, 8 September 2007 (UTC)Reply

Bug and Fix: Pdf Export blocks Semantic MediaWiki

[edit]

We have several extensions installed, and included Pdf Export before including SemanticMediaWiki (SMW). The outcome was SMW not working:

  • Special:Version looked as expected, but
  • Special:Specialpages did not show any of SMWs special pages,
  • URL-calling the page Special:SMWAdmin, or entering it via the seach box, yielded a "nonexisting special page" error.

Thus the installation of SMW could not be completed. It requires the Special:SMWAdmin page to be accessed.

Pdf Export was not working, see above. When we removed it from LocalSettings.php, we could use SMW. When we placed its call after the inclusion and activation of SMW in LocalSettings.php, SMW continued to work. See also bug 11238.
--Purodha Blissenbach 16:48, 8 September 2007 (UTC)Reply

Bug and Fix: Pdf Export forces all special pages to load during init, thus slowing down the wiki

[edit]

Table and image flow / Image alignment

[edit]

Tables and images have all of their text forced below, as if they were followed by:

<br clear=all />

Also, images that are right aligned:

[[Image:Bar.jpg|right|300px]]

end up on the left side of the page and all text is forced to appear after the image. Is this a limitation of htmldoc or something that can be fixed in the extension? -- 66.83.143.246 18:53, 13 September 2007 (UTC)Reply

I've temporarily fixed the image problem by wrapping them into a table, but this does not solve the general problem why htmldoc is forcing breaks after floats. 66.83.143.246 13:28, 14 September 2007 (UTC)Reply
It's most likely not a limitation of htmldoc but actually the extension, probably to do with the parser sanitisation process. Comment out the line that removes the temp file and check the html in that.. Is not pretty. Suspect

Error page

[edit]

--Johnp125 03:10, 23 September 2007 (UTC)Reply

When I click on the print to pdf button all I get is this.

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved.

This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

checked error log in apache and I'm getting errors on line 55,57,58.

[client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite(): supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose(): supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Fatal error: Call to a member function getNamespace() on a non-object in /var/www/html/wiki/includes/Article.php on line 160, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fopen(/var/www/html/wiki/pdfs/header.html) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: No such file or directory in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 55, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fwrite(): supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 57, referer: http://192.168.1.99/wiki/index.php/Main_Page [client 192.168.1.102] PHP Warning: fclose(): supplied argument is not a valid stream resource in /var/www/html/wiki/extensions/PdfExport/PdfExport.php on line 58, referer: http://192.168.1.99/wiki/index.php/Main_Page



I'm running fedora c4. the htmldoc is in the /usr/bin folder.

Blue boxes on images and empty table of contents entry

[edit]

htmldoc was adding blue borders on images that didn't have the frame attribute since they all had anchor tags around them and a slot in the table of contents for the mediawiki generated table of contents. I removed these with the following regular expressions in the execute() function:

 // Remove the table of contents completely
 $bhtml = preg_replace(
       '/<table id="toc".*?\/script>/ms',
       '',
       $bhtml
 );

 // Remove any links from images to avoid blue boxes in the PDF output.
 $bhtml = preg_replace(
       '/<a href="[^"]*Image:.*?>(<.*?>)<\/a>/',
       "$1",
       $bhtml
 );

The usual caveats about parsing HTML with regular expressions apply -- it will fail if the alt text or caption includes a closing > or if any other number of things change.

66.83.143.246 12:45, 26 September 2007 (UTC)Reply

PdfExport on mediawiki 1.11.0rc1

[edit]

I'm not quite sure weither it's a bug or just strange behaviour due to the mediawiki version I'm using, but I'm encountering an error. On the special page overview (list of special pages), the link PdfExport causes an error (obviously because no page is given to render as PDF).

  • Question1: Is the PdfExport link ment to exist on the list of special pages (Spcial:SpecialPages)?
  • Question2: Is the returned error (Fatal error: Call to a member function getNamespace() on a non-object in...) a feature, when clicked on the link mentioned above?
  • Question3: If not a feature but a bug, is there a neat solution to this?

I have made a quick fix 'surpressing' the error and giving the user an existing page as PDF by

adding the following line

if($page==''){$page='PdfAbout';}

after

$page = isset( $par ) ? $par : $wgRequest->getText( 'page' );

Fix? (request for confirmation)

[edit]

I found a way to remove the 'PdfExport' link from the list of SpecialPages.

change

parent::__construct( 'PdfPrint' );

to

parent::__construct( 'PdfPrint', '', false );

I assume that the __construct function refers to

	function SpecialPage( $name = '', $restriction = '', $listed = true, $function = false, $file = 'default', $includable = false ) {

(found somewhere near line 555 in includes/SpecialPage.php) and therefore the third parameter sets listed to false.

It seems to work, but I cannot really grasp the __construct concept.

Better fix - an enhancement

[edit]

An imho better fix was to keep the entry in Special:Specialpages, and allow both missing, and nonexistant pages to be passed to Special:PdfPrint, where there should be a simple form, allowing users to try another page name, like some other special pages do, too.

We have that implemented already and are currently testing it on http://krefeldwiki.de/wiki/Spezial:PdfPrint

--Purodha Blissenbach 15:58, 16 October 2007 (UTC)Reply

Do you meen it's a standard feature, or did you manually hack it to do so?
I would be very interested in the code :)
--Kaspera 13:05, 22 October 2007 (UTC)Reply
He hacked it, and it seems to be working on that Krefeld wiki (warning: the site is t-e-r-r-i-b-l-y s---l---o---w).
I would be interested in the code too.
Lexw 11:33, 20 November 2007 (UTC)Reply
Yes, we had to make few changes to make it work. We shall publish the modified code as soon as time permits together with the ?action=pdf enhancement code, see next level 2 section. -- Purodha Blissenbach 12:46, 1 December 2007 (UTC)Reply

Enhancement: allow call via ?action=pdf

[edit]

It would be desirable to make the extension callable via action= parameter, that means, the following calls should be producing identical results:

We are currently implementing this. --Purodha Blissenbach 15:58, 16 October 2007 (UTC)Reply

Prince : A better alternative to HTMLDOC?

[edit]

Is it possible to use Prince instead of HTMLDOC?

Prince seem to works the same way but it also supports CSS.

Seem to be working great with MW 1.11.0 after modifying the PdfExport.php.

[edit]
  • Download and install Prince
  • Download the PHP5 Prince Accessories
  • Move prince.php into the PdfExport folder
  • In PdfExport.php, add the line:
require_once( 'prince.php' );
  • Remove the following section in PdfExport.php:
// make a temporary directory with an unique name
// NOTE: If no PDF file is created and you get message "ERROR: No HTML files!", 
// try using a temporary directory that is within web server space.
// For example (assuming the web server root directory is /var/www/html):
// $mytemp = "/var/html/www/tmp/f" .time(). "-" .rand() . ".html";
$mytemp = "/tmp/f" .time(). "-" .rand() . ".html";
$article_f = fopen($mytemp,'w');
fwrite($article_f, $html);
fclose($article_f);
putenv("HTMLDOC_NOCGI=1");

# Write the content type to the client...
header("Content-Type: application/pdf");

# uncomment this line if you wish acrobat to launch in a separate window
#header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

flush();

# if the page is on a HTTPS server and contains images that are on the HTTPS server AND also reachable with HTTP
# uncomment the next line
#system("perl -pi -e 's/img src=\"https:\/\//img src=\"http:\/\//g' '$mytemp'");

# Run HTMLDOC to provide the PDF file to the user...
passthru("htmldoc -t pdf14 --charset iso-8859-1 --color --quiet --jpeg --webpage '$mytemp'");

unlink ($mytemp);
  • Replace with: (Don't forget to put in the actually path to Prince executable)
# Path to Prince executable
$prince = new Prince('actual path to Prince executable');

# Convert an XML or HTML string to a PDF file, 
# which will be passed through to the output buffer of the current PHP page.
$prince->convert3($html);

# Write the content type to the client...
header("Content-Type: application/pdf");

# uncomment this line if you wish acrobat to launch in a separate window
header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

  • Then sit back and enjoy the magic :)
  • Images works !!!!!
  • You can even tell Prince to use the main.css from your skin and make the pdf export look just like the actual wiki.
  • It is also easy to use your own css to make the pdf look very professional.  Check out the samples section for some great inspiration.
  • The only major draw back is the license cost for running on server.  It is only free for personal use.
  • yes... USD 1900 for academic license... too much for me :-) And HTMLDOC isn't HTML4.0 compliant (i.e. no CSS!). Thus, no PDF from wiki pages, at the moment... --GB 22:13, 26 November 2007 (UTC)Reply

PdfExport ERROR if set User rigths

[edit]

I tried to use my own MW for giving PDF file to another software. An external software passes the appropriate URL to explorer and the PDF is ready.

If user rights is default there is no problem. After I changed the user rights (see below), there is not any content in the generated PDF file, except the navigation menu on the left-hand side of the wiki's page.

I set the user rights:
$wgGroupPermissions['*'    ]['read']         = false;
$wgGroupPermissions['user' ]['read']         = true;
$wgWhitelistRead = array("Main Page", 'Special:PdfPrint', 'Special:Userlogin',);

How can I get the whole content of the page? Can I change somehow the USER ID in the PdfExport.php? Any other solution?

Anonymous Access for Images to Work

[edit]

For htmldoc to render the images it requires anonymous access

Since the file needs to be saved locally in order to remove the wiki escape characters that would otherwise cause htmldoc to fail…and the file contains absolute urls to the images.. it is in fact the server that is requesting the page.. not the user.

This is of particular importance for windows users who have enabled remote auth or alike, since anonymous access has to be disabled for these to work.

My current work around is to setup a second website under iis that allows anonymous access however only by the server's ip address, then setup a host entry that uses the anonymous site ip and the live site hostname. Then when the server goes to connect to the URL it gets anonymous access to the images and they are included in the PDF.

An alternative MAY be to setup an application pool that uses a user credential that you have setup on the wiki.. but don’t know if this will work as htmldoc is run at commandline so most likely the connection will be servername$ Suspect 06:29, 7 December 2007 (UTC)Reply

Seems complicated. What I did on my site is simply put a condition in the LocalSettings file to enable authentication only if the request comes from anything else than localhost:
if ($_SERVER['REMOTE_ADDR']!='127.0.0.1') {
require_once( "includes/LdapAuthentication.php" );
... authentication code continues ...
}

--Guimou 03:58, 12 December 2007 (UTC)Reply

Hmm, dont think that is going to work in a Windows installation, since the image references are absolute, the PHP code isnt even executed..IIS is blocking anonymous access to the images.. BTW, I couldnt get the application pool setup to work -- Suspect

Working code for MW 1.11 on Fedora 8

[edit]

The code published on the "extension" page works most of the time but does not output images.

Also, on our server (Fedora 8, MW 1.11.0) the URL associated with the Print As PDF command in the toolbox sometimes has %0A tacked on at the end, and not %0D%0A as DavidJameson reported on his server.

I therefore ended up using the code in Extension_talk:Pdf_Export#No_HTML_files_found above as PdfExport.php, corrected line 103 to read

?><a href="<?php echo htmlspecialchars( str_replace('%0A', '', $monobook->data['nav_urls']['pdfprint']['href'] )) ?>"><?php

and changed line 60 to read

$bhtml = str_replace ('/images/',$wgServer . '/images/', $bhtml);

as per dgrant 18:30, 25 October 2006

I can now export PDF files with images. Hvdeynde 11:04, 19 December 2007 (UTC)Reply

[edit]

I searched for google and mediawiki, but found no hint. I am using MW1.9.3 on WAMP.

  • I tried typing manually wiki/index.php?title=Special:PdfPrint/Accueil, and it worked, ever since we did not run with rewrited urls. And direct links still not there.

Has someone a public demo so I can see what I should have in the sidebar ?

What could I do wrong so it does not work ?

Any help welcomed. I tried hacking the toolbox filling, but it was quite ... complex.

212.157.112.26 15:17, 20 December 2007 (UTC) // MathiasReply

Solution: it comes from the template hook.. If your template don't use the standard call from monobook, you must change it accordingly.

Passthru unable to fork

[edit]

I got a php warning saying that Passthru was unable to fork. I am running PHP and MediaWiki on Windows and IIS, and the solution seems to be giving read and read execute access to cmd.exe, which is found in the system32 folder. I also needed to make sure my web user could write to the temp folder location specified in the script.

Hope this helps someone!

Don't like this extension

[edit]

CGI Error: The specified CGI application misbehaved by not returning a complete set of HTTP headers.

Like others have said, there IS NO INSTALLER despite claims to the contrary. This thing is a MESS to set up. I did not get it working and I'm probably going to give up. Thanks for nothing.

Special:PdfPrint should not crash when clicked

[edit]

From Special:Specialpages, it's possible to click the "PdfPrint" special page. Since it has been passed no parameters, it crashes:

PHP Fatal error:  Call to a member function getNamespace() on a non-object in D:\\mediawiki\\w\\includes\\Article.php on line 160, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP Stack trace:, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   1. {main}() D:\\mediawiki\\w\\index.php:0, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   2. MediaWiki->initialize() D:\\mediawiki\\w\\index.php:89, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   3. MediaWiki->initializeSpecialCases() D:\\mediawiki\\w\\includes\\Wiki.php:45, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   4. SpecialPage::executePath() D:\\mediawiki\\w\\includes\\Wiki.php:201, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   5. SpecialPdf->execute() D:\\mediawiki\\w\\includes\\SpecialPage.php:459, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages
PHP   6. Article->getContent() D:\\mediawiki\\w\\extensions\\PdfExport\\PdfExport.php:64, referer: http://test-techwiki.vistaprint.net/wiki/Special:Specialpages

With no parameter, PdfPrint should display a form where the user can enter an article name.

--Maiden taiwan 17:17, 28 February 2008 (UTC)Reply

String Replacement Error

[edit]

So, I found a couple problems with this plugin: First off, if we have /w in the text of the article, it gets replaced with the website address, which is not proper behavior. I changed the str_replace line to:

$bhtml = str_replace ('src="'.$wgScriptPath, 'src="'.$wgServer        . $wgScriptPath, $bhtml);

which seems like a far more logical thing to do. I also did a $pageFile = str_replace(":", "", $page); and changed the output file in

header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));

to

header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $pageFile));

This causes the filename to have the semicolons removed, which is illegal on Windows for sure, and I had some trouble on OSX as well.

How to set user permissions to access this extension

[edit]

Hi there, I was thinking that one possible sollution to decrease the ammount of bandwith used when used this extension could be to limit the access to it to certain users, for example anonymous or other types of users (custom users even). Is there an easy way to do this just like when you put an extension to work only with sysops (merge and delete, for instance) and avoid touching the whitelist? the example is that it should be like this -> $wgGroupPermissions['*']['pdfexport'] = false;

Thanks! --Juanan 16:47, 13 March 2008 (UTC)Reply

Fatal Error when i click on Special-Pages on "PDF-Druck" (MW 1.12.0)

[edit]

When I click under Special-Pages on den "PDF-Druck" Link then I've got the following message:

Fatal error: Argument 1 passed to Article::__construct() must not be null, called in /srv/wiki/techwiki/extensions/PdfExport/PdfExport.php on line 57 and defined in /srv/wiki/techwiki/includes/Article.php on line 44

When I click on the Main-Page under the Tools-Menu on "Print as PDF" then it works fine.

What's wrong?

Thanks! --Hampa 09:42, 13 May 2008 (UTC)Reply

parsing page breaks?

[edit]

HTMLdoc seems to support page breaks by inserting

<!-- PAGE BREAK -->

but pdfExport doesnt seem to parse it or pass it to HTMLdoc.

forcing the HTML comment to show in the final rendered page either with code pre or nowiki includes the comment inline but still doesnt render properly on PDF output

...

Does anyone else have an update on this? Updating some documentation into the Wiki, and it would be useful to have some page breaks for dumping pdfs out for users

Scott.harman 14:00, 28 May 2008 (UTC)Reply

...

Quick and dirty way of getting this to work... After the code

$bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);

add the line

$bhtml = str_replace ('&l_t;!-- PAGE BREAK --\&g_t;','<!-- PAGE BREAK -->', $bhtml);"

Remove the underscores in &l_t and &g_t. I just couldn't get them to show in the viewed page otherwise.

07:18, 7 July 2008 (UTC)

"??????" signs instead of russian letters

[edit]

When doing export of russian language pages we have all "?" signs instead of all russian letters!!! :( Pages are in UTF-8 encoding.

Me to. How to fix it?

And me. Tried converting hmtml body to cp1251 and passing different encoding as htmldoc parameter

-- So, any way to make it work?

Solution

[edit]
  • Edit file PdfExport_body.php
    • change two appearances of utf8_decode($...) to iconv("UTF-8", "cp1251", $...)
    • change
passthru("htmldoc -t pdf14 --charset iso-8859-1 --color ...

to

passthru("htmldoc -t pdf14 --charset cp-1251 --color ...
  • Download Cyrillic fonts from here (direct link) and put them instead of ones in /usr/share/htmldoc/fonts/
  • PROFIT!
Not solved
[edit]

Links to fonts are both broken. Will this solution work for Windows?

[edit]

That link helps me. But the first word from my page is not exported!

Solved for me
[edit]

Cyr Fonts found with Google at upload.com.ua
Ubuntu server LAMP, UTF-8, htmldoc 1.8.27 < Kod.connect 13:13, 16 April 2010 (UTC) >Reply

One more solution
[edit]

You can convert TTF fonts yourself with help TTF2PT1:

ttf2pt1 -l cyrillic -e -G A DejaVuSansMono-Bold.ttf Sans-Bold

Blank Page

[edit]

I'm just getting a blank page.

  • RHEL 5
  • HTMLDOC 1.8.27-4.99
  • Linux/Firefox

Check what $wgTmpDirectory is set too, default is "{$wgUploadDirectory}/tmp". Quick fix create a tmp folder in the images directory

How can I make the TOC to be Bookmarks in the PDF?

[edit]

Is it possible to write a simple code that converts article headlines to PDF bookmarks? Any idea?

Possible to change extension to use SkinTemplateToolboxEnd hook in 1.13

[edit]

Has anyone tried rewriting this extension to use the new SkinTemplateToolboxEnd hook in MW 1.13, rather than MonoboookToolboxEnd that it currently uses?

When using this extension with the Modern template, the URl never appears because Modern doesn't use MonobookToolboxEnd or anything similar.

Italian translation

[edit]

This is the snippet of code to be add to localization file for the italian translation:

$messages['it'] = array(
        'pdfprint' => 'Pdf Export' ,
        'pdf_print_link' => 'Stampa come PDF',
        'pdf_print_text' => 'Inserire un elenco di una o piĂš pagine da esportare, un nome pagina per linea',
        'pdf_submit' => 'Crea PDF',
        'pdf_portrait' => 'Verticale',
        'pdf_landscape' => 'Orizzontale',
        'pdf_size' => 'Formato carta',
        'pdf_size_default' => 'A4',
        'pdf_size_options' => 'Letter
                               A4
                               Universal'
);
I have added your translation to the code. Best regards --Marbot 19:56, 17 August 2009 (UTC)Reply

Exported File Name

[edit]

For some reason, setting $wgPdfExportAttach to true didn't help me with the download file name. Even then, I was still getting "index.php". So I commented out the if statement, but then I got ".pdf" because $page doesn't seem to be defined inside outputpdf(). So I made the following changes to function outputpdf() inside PdfExport_body.php:

$foundone = false;
 
foreach ($pages as $pg) {
    $f = $this->save1page ($pg);
    if ($f == null) {
        continue;
    }
    $foundone = true;
    if ($iswindows) {
</syntxhighlight>

becomes:

<syntaxhighlight lang="php">
$foundone = false;
$title = "";
 
foreach ($pages as $pg) {
    $f = $this->save1page ($pg);
    if ($f == null) {
        continue;
    }
    $foundone = true;
    if ($title == "") $title = Title::newFromText( $pg );
    if ($iswindows) {

Then, a little futher down:

if ($wgPdfExportAttach) {
    header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $page));
}

becomes:

//if ($wgPdfExportAttach) {
    header(sprintf('Content-Disposition: attachment; filename="%s.pdf"', $title));
//}

Problems with path definitions

[edit]

It took me a long while to realize why wasn't it working. I am using and Windows Server 2003 + XAMPP and it appeared that the path for the creation of the temporary file is different than the one needed for the HTMLDoc. The one for the file creation needed C:\\xampp\\htdocs\\mediawiki\\images\\; while the one for the HTMLDoc needed C:/xampp/htdocs/mediawiki/images/. I hope this works for some of you how experienced the same problem!

Adding Line to LocalSettings.php Results in Blank Page

[edit]

Crazy problem here...I am missing something Redhat 4 Running MediaWiki 1.7.1 Standard Template LDAP authentication to Domino HTMLDOCS has been installed Source pages have been created as per the instructions

When I add the following to the LocalSettings.php:

require_once("extensions/PdfExport/PdfExport.php");

All I get is a blank page Main Page.

As soon as I remove the line the wiki works again. I am unable to get any of this extension to work within the wiki, never see any buttons regarding the ability to export to PDF from within a post or via the special pages.

Any suggestions?

Thanks

Bernard

FIX: It should be:

require_once("$IP/extensions/PdfExport/PdfExport.php");

I had the same problem

[edit]

It was a cut and paste error

In my error logs i found:

[Mon Oct 19 11:07:06 2009] [error] [client ::1] PHP Parse error:  syntax error, unexpected T_STRING, expecting ')' in /usr/share/mediawiki/extensions/PdfExport/PdfExport.php on line 9

and when i looked @ line 8 a ' was missing.

Cut and paste is a terrible way to distribute an extension.

Problems with <pre> Long Line rendering

[edit]

Has anyone had any luck getting this extension to not cut off long lines of preformatted text. I have some pages with generated output that is longer than can fit in typical 80-characters. If I click on the Mediawiki "Printable" page, these get conveniently resized to fit on a standard letter size paper. However, if I send the page to PDFExport, these lines just get cut off. Any suggestions would be appreciated.

Thanks, Dan

Adding Logo and current date to pdf output

[edit]

Hi all,

I want to have a logo in the header of each page, which is converted to pdf. I also would like to have the current date and time to be printed in the footer. It seems to be possible if you start the conversion by the GUI of HTMLDOC, but is it also possible when starting HTMLDOC via MediaWiki?

Thankful for every kind of help!

Greeting, Stefan

Re : Adding Logo and current date

[edit]

You can customize htmldoc cmd as follow :

--logoimage PATH_TO_YOUR_IMAGE --header l d

more info using man htmldoc

Re : Adding Logo and current date to pdf output

[edit]

Yrs It will be usefull to have a way to customize the PDF Header with any content (wikitags ?, HTML code ? php code to request Article object data like owner, catÊgorie etc ?)

Fabrice

Problem with generating images

[edit]

I use the Version PdfExport 2.0 with MW 1.13.0. By generating no images are in the document, in place of the images are points.

Any suggestions would be appreciated.

Thanks, Markus


I have fixed the problem, the same as in 13.1.4

http://www.mediawiki.org/wiki/Extension_talk:Pdf_Export#Images_When_Wiki_Is_Root_of_Domain

1k PDF on 1.15.0

[edit]

Installed extension on 1.15.0 today. I get a 1k file. We downloaded HTMLDoc and compiled from source. Set path to it. We verified that HTMLDoc is generating valid PDFs. Any ideas? Chanur 20:10, 24 June 2009 (UTC)Reply

Complex diagrams and Thumb-nails

[edit]

Readers often expect to be able to zoom in to complex diagrams in a pdf to see the details. I would like to re-create this effect in the pdf files I generate from MediaWiki. I can achieve the desired effect by maually changing the references in the intermediate html that is used as input to HTMLDoc. The change is in the IMG tag

replace  
/mediawiki/images/thumb/9/99/foo.jpg/680px-foo.jpg width=680 height=477 
with 
/mediawiki/images/9/99/foo.jpg width=100%  

This has the effect that HTMLDoc picks up the original full size image rather than a thumb-nail and then shrinks it to 100% of the page width. It displays correctly and the pdf reader can zoom in for more detail.

I would like to see this included, either as a default or configurable option, in the PdfExport extension,

Niall Teskey 27 July 2009

IIS7 and Mediawiki 1.15.1 does not work

[edit]

Only receive Error 500 when i try to generate a PDF.

Need usage instructions

[edit]

Okay, so we've got installation instructions, but what about usage? Is there content or link that could be provided on how users (even noobs) could evoke this extension? 140.80.199.91 14:34, 27 August 2009 (UTC)Reply


--

I have put in all four source files, and included the file in LocalSettings. I can verify that PdfExport.php IS being included, but nothing is happening. I have no link to "Export a Pdf" in my toolbox. Nothing new in special pages. No changes, anywhere. MediaWiki 1.15

Sections of pages possible?

[edit]

I'd like to be able to export just a portion of a page, is there anyway to append the code to write a portion of a page to the tempdir so that only a specified section gets created as a pdf?

for example everything under a certain heading...

UTF-8

[edit]

Where is the problem that this extension does not work entirely in utf-8? 91.155.182.79 17:03, 21 November 2009 (UTC)Reply

Datei beginnt nicht mit "%PDF-" / File does not start with "%PDF-"

[edit]

Hi, I have just installed this extension. When trying to print a page Acrobat provides the error message as stated above. Is there someone out there to help? Thank you and cheers --kgh 20:14, 5 December 2009 (UTC)Reply

My mistake. Please do not bother about this. Cheers --kgh 22:08, 5 December 2009 (UTC)Reply

Could you post your solution?

[edit]

Hi, I ran into the same problem as you did. Would you mind sharing your solution since you've already posted the problem? Thanks in advance!

regards --cc (May 24, 2010)

Hi cc, for me quite some time has passed since I installed this extension and cannot exactly remember what I did wrong. However it must have been an easy solution since my knowledge is limited. To cut a long story short, I think I forgot to create the tmp subdirectory in the images directory. Actually the recent code should work without it, but one never knows. Hopefully this was the reason? Cheers --kgh 19:51, 26 May 2010 (UTC)Reply

Specify custom CSS?

[edit]

Hi.. I have installed PdfExport 2.2.2 in MediaWiki 1.15.1 running on a Windows 2000 Server, IIS server with PHP 5.2.10 and HTMLDOC 1.8.27. Everything works beautifully!! Just had to add the HTMLDOC-folder to the system PATH, and viola! This plugin looks very nice, but the resulting PDF could be "spiffed up" abit... Is it possible to specify a custom CSS-file that should be used? I'd like to create a separate CSS-file for the PdfExport, instead of using the existing CSS-files in MediaWiki.. Thanks for sharing the code, Thomas! -- Tor Arne

CSS Support in Prince

[edit]

HTMLDOC does not support CSS in the 1.8 versions -- apparently this is planned for 1.9, but yet to be seen. Someone above recommending Prince, and I did some substantive re-writing of the extension source to allow me to use Prince. Works like a charm; much better quality output, and you can use CSS files. It's a little less reliable, however, and tends to throw errors on some unexpected html characters. I'm troubleshooting right now, but if there's interest once it's a little more solid I'll post my source. -- Ram

php-errors inside pdf

[edit]

Hi!

Im running Apache 2.2, PHP 5.3.1 and the newest mediawiki. When I try to generate .pdf's with your extionsion I get a 500byte PDF with the following output:

Notice: Undefined variable: f in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 83

Notice: Undefined variable: process in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 112

Warning: proc_close() expects parameter 1 to be resource, null given in D:\Program Files\Apache Software Foundation\Apache2.2\htdocs\wiki\extensions\PdfExport\PdfExport_body.php on line 112

What can I do?

Hi Suven, I suppose this problem is related to PHP. See Download. Cheers --kgh 15:34, 3 February 2010 (UTC)Reply

I get same error. I see that process is opened with $htmldoc_process = proc_open(blah blah) on line 103 but then line 112 says proc_close($process). How can this not be a bug in what I downloaded? I'm going to try to give proc_close the same variable used with proc_open and see what happens.

$f on line 83 is not mentioned anywhere else in the file. Where should this have been set?

                      foreach ($pages as $pg) {
			  $pagestring .= $this->save1page ($pg);
                          if ($f == null) {
			    continue;
                          }
                          }

Same here!! Using:

PHP 5.2.0-8+etch16 (cli) (built: Nov 24 2009 11:14:47)
Copyright (c) 1997-2006 The PHP Group
Zend Engine v2.2.0, Copyright (c) 1998-2006 Zend Technologies
Adn Wiki 1.15.1

Getting Errors:

[01-Mar-2010 13:22:56] PHP Notice:  Undefined variable: f in /var/www/help9/extensions/PdfExport/PdfExport_body.php on line 92
[01-Mar-2010 13:29:15] PHP Notice:  Undefined variable: process in /var/www/help9/extensions/PdfExport/PdfExport_body.php on line 126
[01-Mar-2010 13:29:15] PHP Warning:  proc_close() expects parameter 1 to be resource, null given in /var/www/help9/extension/PdfExport/PdfExport_body.php on line 126

It hangs very long (~5 min) on fpasstrough. And when it's finished no images are in the pdf file. -> Solved: Server is in a DMZ. So he could not resolve the html links.


I have the same problem, file of 191 bytes:

Warning:  proc_close() expects parameter 1 to be resource, null given in /homez.318/forumims/www/wiki/extensions/PdfExport/PdfExport_body.php on line 112

I changed the variable process to htmldoc_process, but now, the file is 3 bytes and empty...

When I try directly htmldoc on a html file, it works... :( Help please


I was getting these same errors in my httpd error log, but the extension was happily producing the file (with images after I commented out the str_replace on line 57 of PdfExport_body.php since I have a virtual Apache host). So, I don't think these errors are the source of any real problem. To make them go away, I simply surrounded the offending code in PdfExport_body.php with an if stanza using the isset function. In my else stanza, I executed the same thing he would have had the variable existed but been null or 0. i.e.:

at line 83

 if (isset($f)) {
   if ($f == null) {
     continue;
   }
 }
 else { continue; }

at line 114

 if (isset($process)) {
   $returnStatus = proc_close($process);
 }
 else { $returnStatus = 0; }

--jcw 21:15, 15 April 2010 (UTC)Reply

I put fixes for these two coding errors on the code talk page. They are fairly basic and demonstrate that the current version has clearly not been tested at all.

For the problem on line 83, I simply changed:

  if ($f == null) {

to be:

  if ($pagestring == null) {

... as this fit the logic of the code. -- Robert Watkins


Getting article HTML

[edit]
  • MediaWiki version: 1.15.2
  • PHP version: 5.3.0
  • MySQL version: 5.1.36
  • URL:

Hi.

I'm writing new Mediawiki extension based on Extension:Pdf Export. I'm trying to export an article.

The PDF Export extension includes the following function:

public function save1page ( $page ) {
    global $wgUser;
    global $wgParser;
    global $wgScriptPath;
    global $wgServer;                       
    global $wgPdfExportHttpsImages;
 
    $title = Title::newFromText( $page );
    if( is_null( $title ) ) { 
        return null;
    }

    if( !$title->userCanRead() ){
        return null;
    }
    
    $article = new Article ($title);
    $parserOptions = ParserOptions::newFromUser( $wgUser );
    $parserOptions->setEditSection( false );
    $parserOptions->setTidy(true);
    $wgParser->mShowToc = false;
    $parserOutput = $wgParser->parse( $article->preSaveTransform( $article->getContent() ) ."\n\n", $title, $parserOptions );
 
    $bhtml = $parserOutput->getText();
    // XXX Hack to thread the EUR sign correctly
    $bhtml = str_replace(chr(0xE2) . chr(0x82) . chr(0xAC), chr(0xA4), $bhtml);
    $bhtml = utf8_decode($bhtml);
 
    $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
                        $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);
 
    // removed heights of images
    $bhtml = preg_replace ('/height="\d+"/', '', $bhtml);
    // set upper limit for width
    $bhtml = preg_replace ('/width="(\d+)"/e', '"width=\"".($1> MAX_IMAGE_WIDTH ?  MAX_IMAGE_WIDTH : $1)."\""', $bhtml);
 
    if ($wgPdfExportHttpsImages) {
        $bhtm = str_replace('img src=\"https:\/\/','img src=\"http:\/\/', $bhtml);
    }
 
    $html = "<html><head><title>" . utf8_decode($page) . "</title></head><body>" . $bhtml . "</body></html>";
    return $html;
}

At the end of this function, $html is empty. I tried to print out (wrote to a log file) $bhtml after line 24, but it's empty too.

What could be the problem? Is there another way to get the HTML of an article?

Thanks!

Nahsh 11:10, 11 April 2010 (UTC)Reply

Why does PdfExport ignore several style definitions of commonPrint.css ?

[edit]

Netzschrauber: I have installed PdfExport (Version 2.3 (2010-01-28)) and Htmldoc on a Ubuntu 8.04 LTS Server, using with MediaWiki 1.15.1. Everything works fine but output seems to ignore some (important) CSS style definitions made with commonPrint.css:

  • No borders (tables, headlines, images)
  • Text alignment of td elements in <table class="wikitable">. No vertical-align: top;
  • Alignment of thumbnails [[image.jpg|thumb|right]]. All images are left-aligned.
  • Font-style of <dd> elements
  • Color of <small> elements. For example color: #999;

Thanks!

Documents are only 3 bytes big.

[edit]

Hi,

the Documents i create are only 3 bytes big. Of course the File is corruptet, but whats my failure?

Mediawiki 1.15.0 PHP 5.2.10 (isapi)


        Hi, did you restart after installing htmldoc? that did the trick on my system.
The code is broken with some silly mistakes - look at the code talk page for fixes

can´t open pdfs

[edit]

Hello,

sorry for my bad english, but i need help, please.

I have installed this extension on Windows with Xampp (PHP5.2.3 (apache2handler) MySQL 5.0.45-community-nt) with MediaWiki 1.15.3. In my Wiki i have got a button "download as PDF" now. When i klick on this button i can download the page as pdf. But when i try to open ist FoxitReader says "format error: not a PDF or corruptet". AdobeReader can´t open it, too.

I don´t understand in the manual where to "put htmldoc in Path enviroment variable". HTMLDOC for Windows was installed under C:\Programme\HTMLDOC\

Is this right?

I don´t see any other configurations.

What`s wrong with my installation, im very confused ?

Johannes741 13:30, 25 May 2010 (UTC)Reply

Restart the system  :-)

seen above

PDF File Name

[edit]

I made an edit that fixes using the Filename form box on the PdfPrint special page. However, it doesn't play nicely if using index.php?title=Special:PdfPrint&page=WikiPage unless you also add &filename=WikiPage.pdf to the URL. It is fairly easy to reformat links to a PDF version to include the filename variable but perhaps those two print cases should be split up.

BTOuellette 19:33, 4 June 2010 (UTC)Reply

Latin-Extended-A (latin2) encoding problem

[edit]

First of all thank you for this extension! I've tried to use it on pages that contains Latin-Extended-A characters (for example "%c5%91"), and got question-marks instead of the non-Latin-1 characters. Also found that the some error-output is appended to the pdf after %%EOF:

<br />
<b>Warning</b>:  proc_close() expects parameter 1 to be resource, null given in <b>/var/www/wikitest/extensions/PdfExport/PdfExport_body.php</b> on line <b>114</b><br />

My solution is to use iconv() instead of utf8_decode() and repaired the misspelling of proc_close($process) to proc_close($htmldoc_process). Patch:

*** PdfExport_body.php.orig     2010-07-09 13:36:27.000000000 +0200
--- PdfExport_body.php.new      2010-07-09 15:07:21.000000000 +0200
*************** $wgPdfExportHttpsImages = false; // set
*** 51,57 ****
                          $bhtml = $parserOutput->getText();
                          // XXX Hack to thread the EUR sign correctly
                          $bhtml = str_replace(chr(0xE2) . chr(0x82) . chr(0xAC), chr(0xA4), $bhtml);
!                         $bhtml = utf8_decode($bhtml);

                          $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
                          $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);
--- 51,57 ----
                          $bhtml = $parserOutput->getText();
                          // XXX Hack to thread the EUR sign correctly
                          $bhtml = str_replace(chr(0xE2) . chr(0x82) . chr(0xAC), chr(0xA4), $bhtml);
!                         $bhtml = iconv("UTF-8", "ISO8859-2", $bhtml);

                          $bhtml = str_replace ($wgScriptPath, $wgServer . $wgScriptPath, $bhtml);
                          $bhtml = str_replace ('/w/',$wgServer . '/w/', $bhtml);
*************** $wgPdfExportHttpsImages = false; // set
*** 109,115 ****
                        fpassthru($pipes[1]);
                        fclose($pipes[1]);

!                       $returnStatus = proc_close($process);

                          if($returnStatus == 1)
                          {
--- 109,115 ----
                        fpassthru($pipes[1]);
                        fclose($pipes[1]);

!                       $returnStatus = proc_close($htmldoc_process);

                          if($returnStatus == 1)
                          {

Greets from Hungary!

[edit]

The Extension "demo" link will return a 1kb corrupted pdf file. This should be removed/fixed.

I've copied the files, included the extension in my own mediawiki. I've downloaded installed HTMLDoc, restarted, and verified that the HTMLDoc utility opens. I've added HTMLDoc to my path [windows]. Instead of 1kb files, I get 0kb files, everytime.

Not sure what isn't working.


--Fixed my issue, needed another restart and it appears to be working--

"Unable to Open Document" Error

[edit]

Hi I'm running Mediawiki 1.15 on Red Hat. I installed the extension and it seemed to have installed correctly. When I click the 'Print as PDF' link in the toolbox, it asks if I would like to open or save the .pdf file. I choose open. It opens the Evince Document Viewer but pops up an error window telling me the document couldn't be opened. It gives the following reason:

Unhandled MIME type: “text/plain”

Does anyone know why this is happening and what I can do to fix it? Thanks!

20.132.68.146 19:45, 12 November 2010 (UTC)Reply