Jump to content

Extension talk:PdfBook/Archive 1

Add topic
From mediawiki.org

See also

[edit]

Extension:Pdf Export is very similar to this extension, and so the discussion at Extension talk:Pdf Export may have solutions to problems you're having with this extension.

PdfBook images fix for htmldoc

[edit]

At about line 101 in extensions/PdfBook/PdfBook.hooks.php just before

    // Write the HTML to a tmp file

insert the following:

    $src_str = 'src="' . $wgServer . '/images/';
    // Use this instead if not at the webroot
    // $src_str = 'src="' . $wgServer . '/' . $wgScriptPath . '/images/';
    $html = str_replace($src_str, 'src="', $html);

The <img src .... urls should be relative to images folder for images to display correctly in the pdf generated.

The following is a sample of the CLI command that gets generated and executed in the PdfBook extension:

  htmldoc -t pdf --charset iso-8859-1 \
    --left 1cm --top 1cm --bottom 1cm --right 1cm  --header ... --footer .1. \
    --toclevels 3  --headfootsize 8 --quiet --jpeg --color --bodyfont Arial  \
    --fontsize 8  --fontspacing 1  --linkstyle plain --linkcolor 217A28 \
    --no-title --format pdf14 --numbered --firstpage toc \
    -f output1.pdf images/pdf-book577686f7d5ca5

This has been tested in MW v1.19.24 on Debian Jessie and htmldoc v1.18.27.

PdfBook seems not to be working

[edit]

You can export a single article as a one-page PDF by setting format=single in the query-string. Example:

http://www.foo.bar/wiki/index.php?title=Main_Page&action=pdfbook&format=single

When I do this I get the message :

Main Page&action=pdfbook&format=single There is currently no text in this page. You can search for this page title in other pages, search the related logs, or edit this page.

Whereas Main Page has a lot of text. ;-) What am I overlooking? Installed the latest PdfBook (according to Special:Version 1.1.0, 2014-04-01 in MW 1.23.13. But the 1.1.0 comes from PdfBook dated Jan. 9th, 2016 and the corresponding file 'version' says:

[root@node PdfBook]# cat version PdfBook: 17d1dfd8475ac21b81a60c3f82afe58fde9d47bb

2016-01-09T23:07:34

17d1dfd

htmldoc has been installed.

Missing Images in Https with authentication

[edit]

--Johnp125 13:52, 11 June 2008 (UTC)Reply

I have a problem when downloading to a pdf file I do not get the pictures when I am downloading via https location. When I download via http the pictures show up. Any idea why this would be the case?

Our SSL doesn't seem to be functional at the moment, but can you check the URL its trying to load the images from? maybe check if exporting as raw html instead of pdf also has problem images? --Nad 07:11, 12 June 2008 (UTC)Reply

--Johnp125 15:25, 12 June 2008 (UTC)Reply

I can get the html file to show the pictures, but it wants to register again with the authentication server, or ie says allow blocked content, which I click on and try and sign on again. The server is not allowing me to sign on again, but the pictures are still showing up.

--Johnp125 14:57, 29 July 2008 (UTC)Reply

I think the problem may be a security issue. Is there a way to generate the data without requesting authentication from the web server? I can get the html version to show the pictures just not the pdf version. If I go to the back door and access the site via http then the pictures show up via pdf.

Sdball 17:36, 6 November 2008 (UTC)Reply

I had the same problem, so I tweaked the extension to:

  • use files in /tmp so image references can work
  • search the generated html for images
  • determine the actual path to the image from their url
    • i.e. https://server.com/wiki/images/a/b/image.jpg -> /www/wiki/images/a/b/image.jpg
  • use that path to copy the image file to /tmp
  • modify the generated html to point to the image file, not the absolute url
    • i.e. src=https://server.com/wiki/images/a/b/image.jpg -> src=image.jpg

Feel free to contact me if you'd like the code.

74.143.96.50 20:36, 12 February 2010 (UTC)Reply

Same problem here, different solution. htmldoc uses a unique User-Agent string when hitting the web server. With Apache you can do something like this in your apache config:

SetEnvIf User-Agent ^HTMLDOC let_me_in
.. basic auth stuff ..
require valid-user
Order allow,deny
Allow from env=let_me_in
Satisfy Any

note that this is a significant security hole since anyone hitting your server with that User-Agent string can now get in. You may want to combine (or possibly replace) it with a filter based on the IP address that htmldoc is hitting your server from. If the request always come from 127.0.0.1 for instance, you can Allow from 127.0.0.1 to let it pass. You could also change the htmldoc binary to use your own special useragent string. MedaWiki shouldn't care that the user is anonymous unless you've forced off anonymous access somehow besides the authentication.

[edit]
  • is it possible to specify the page number to start with? This makes sense when you are going to use the exported PDF as appendix to another doc already with n pages.
  • is it possible to add a link in the toolbox menu section which is only viewable on categories pages?

Sure. Add something like this:

# Get i18 file
require_once( 'PdfBook.i18n.php' );

# Create toolbox link
$wgHooks['MonoBookTemplateToolboxEnd'][] = 'fnPDFBookLink';

function fnPDFBookLink( &$monobook )
{
    global $wgMessageCache, $wgPdfBookMessages;
    foreach( $wgPdfBookMess

ages as $lang => $messages ) {
    	$wgMessageCache->addMessages( $messages, $lang );
    }
    $thispage = $monobook->data['thispage']; // e.g. "Category:Wiki"
    $nsnumber = $monobook->data['nsnumber']; // NS 14 is category

    if ( $nsnumber == 14 ){
	echo "\n\t\t\t\t<li><a href=\"./$thispage?action=pdfbook\">";
    	$monobook->msg( 'pdf_book_link' );
	echo "</a></li>\n";
    }
    return true;
}

And add a i18n file named PdfBook.i18n.php with the following contents:

<?php
$wgPdfBookMessages = array();

$wgPdfBookMessages['de'] = array(
        'pdfbook' => 'Pdf-Druck' ,
        'pdf_book_link' => 'Kategorie als PDF ausgeben'
);
$wgPdfBookMessages['en'] = array(
        'pdfbook' => 'PdfPrint' ,
        'pdf_book_link' => 'Print category as PDF'
);
?>
  • Does anyone know what the code would be to add the link into the sidebar for the vector skin?
This worked brilliantly for the Monobook skin, but i want to use it on the Vector Skin in the Toolbar. If you have the code please let me know, thanks guys. Nali99.
Found the solutions bascially use the above code exactly but replace 'MonoBookTemplateToolboxEnd' with 'SkinTemplateToolboxEnd' and also replace '$monobook' with $vector. Your code should look like this:
# Create toolbox link
$wgHooks['SkinTemplateToolboxEnd'][] = 'fnPDFBookLink';

function fnPDFBookLink( &$vector )
{
    global $wgMessageCache, $wgPdfBookMessages;
    foreach( $wgPdfBookMess

ages as $lang => $messages ) {
    	$wgMessageCache->addMessages( $messages, $lang );
    }
    $thispage = $vector->data['thispage']; // e.g. "Category:Wiki"
    $nsnumber = $vector->data['nsnumber']; // NS 14 is category

    if ( $nsnumber == 14 ){
	echo "\n\t\t\t\t<li><a href=\"./$thispage?action=pdfbook\">";
    	$vector->msg( 'pdf_book_link' );
	echo "</a></li>\n";
    }
    return true;
}

By Nali_99

Mediawiki 1.11.0

[edit]

Version 0.0.3 didn't work anymore after an upgrade. I made a little fix to PdfBook.php around line 98 of PdfBook.php and it works again.

   // while ($row = mysql_fetch_row($result)) {
      while ($row = $db->fetchRow($result)) {

Disclaimer. I don't know PHP for real, don't know mediawiki, don't know how to program. Just got it by inserting debug statements into PdfBook.php. Looks like mysql_fetch is censored somewhere now ;)

PS: To insert debug statements:

  • In LocalSettings.php insert:
$wgDebugLogFile = "/tmp/debug.log";  // file should be writable can be anywhere.
  • Anywhere in the code, insert
wfDebug (.....);

- Daniel (edutechwiki.unige.ch)

Thanks a lot for this, it's still not working for me in 1.11 (I've only just done my 1.11 upgrade), but I've made some changes based on your findings which have got it partially there ;-) --Nad 21:36, 21 September 2007 (UTC)Reply
It seems that 1.11 is a bit more memory hungry and my large test books were killing it, after giving PHP 64MB it's working fine now! --Nad 21:41, 21 September 2007 (UTC)Reply

Empty file downloaded

[edit]

Greetings Nad,

I have been trying to use your PDFBook Mediawiki extension since it may be a great solution to an issue I have.

I have installed HTMLDoc under "c:\pogram files" and can use it on its own to create PDF Books. I have also included the "PdfBook.php" in my "Local Settings.php" file.

The issue I am having is that when I select the link to export my category as a book and select to save or open the pdf file it has 0 bytes. So, the file is created with the correct name but with no data.

Is there something else I must do to ensure HTMLDoc.exe is actually being called by your extension? Is there a required directory that it needs to be in?

Any help would be appreciated!

Thanks!

You have to make sure that htmldoc is in your executable PATH so that it can execute from just typing "htmldoc" without needing to supply the full pathname no matter what current directory you're in. Another thing to check would be to comment out the "@unlink($file)" line and after saving a pdf, check if it's left a tmp file in the root of your images directory, which is the data sent to htmldoc. --Nad 00:35, 6 September 2007 (UTC)Reply
I'm experiencing the exact same problem, my files turns up empty. I run the server on a windows machine using Apache. I've installed HTMLDoc and I'm able to create PDF-files using the GUI. If I comment out "@unlink($file)" and then generates the tmp-file through the GUI I'll get my pdf, but all files I download are 0 byte in size... What can be wrong? /Jesper 15:59, 23 October 2007 (UTC)
With some hacking of Pdf_Book.php I'm now able to create PDF:s, but only from categories, not from a single page. By commenting out "putenv("HTMLDOC_NOCGI=1");" on line 152 it now generates Category PDF:s. /Jesper 08:09, 25 October 2007 (UTC)
I can't even get this far. Did you make any changes other than commenting out that one line? Has anybody else gotten this to work on an Apache Server running on Windows? -Michelle 19:19, 1 May 2008 (UTC)
Works for me with WAMP and MediaWiki 1.3. I had to copy libeay32.dll and ssleay32.dll from C:\wamp\Apache2\bin to C:\Program Files\HTMLDOC in order to get HTMLDoc working. I also had to restart Apache to make it refresh the PATH environment variable. Before restart it couldn't find HTMLDoc.
I also had to copy the 7.1 C dll (msvcr71.dll/msvcp71.dll) to the HTMLDOC folder. You can find it here: http://support.microsoft.com/kb/326922 Antdos (talk) 10:11, 19 July 2012 (UTC)Reply
Make sure your webserver user has write access to /var/tmp. On my setup, htmldoc uses this as a tmp directory. You can diagnose this sort of issue by changing the htmldoc command to something like strace htmldoc > $file.log
For macOS 10.12: HTMLDOC is installed to /usr/local/bin. If you are using the builtin apache server this directory won't be in the PATH, so /usr/local/bin/htmldoc could not be found by the pdf book extension. Follow the steps outlined in https://serverfault.com/a/827046/434690 --Frankhintsch (talk) 10:07, 8 September 2017 (UTC)Reply


Greetings to anyone who finds this - I was having the same problem, and I'm a total nub at wiki, and after editing PdfBook.php and debugging the sql statement that gets ran, my problem was that, I had to actually add categories, so I just edited a few pages and added [[Category::Name_Of_Category]] and then it worked for me. Hope someone finds this useful. Maybe as a patch sometime in the future there could be code that checks if the $article[] array is empty before headering out to the pdf file, or checking to see if the tmp file it writes is 0 bytes, and then echoing an error message instead of the pdf file. Just a thought.

Invalid PDF File

[edit]

Nad,

Thanks for your quick response!

However, I am still having issues. The File is being created and has size to it....but Adobe Reader gives me the following error."

"There is an error opening this document. This file is damaged and cannot be repaired".

HTMLDoc seems to be quitting during the conversion job.

If I add the ".html" extension to the temp file and run HTMLDoc from the command line I can convert the temp html file manully over to a PDF file.

I then compare in Notepad the one I generated and the one your script creates and notice the PDF your script creates quites after pocessing a certain amount of lines.

I have your PDFExport Extension working just fine...so I was wondering what else it could be.

Any ideas?

Thanks!

How long is it taking to generate the PDF before quitting? 30 seconds? if that long it could be reaching max execution time? and how large is the PDF before it bails? --Nad 20:29, 6 September 2007 (UTC)Reply



Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

Like I stated before, I'm using your PDFExport Extension and it works great.

Let me know what you think --136.182.158.153 21:29, 6 September 2007 (UTC)Reply

When you run htmldoc manually passing the generated tmp file to it, are you using the exact same command and parameters that the extension uses? --Nad 21:51, 6 September 2007 (UTC)Reply

continued...

Nad,

It only writes about 18 lines to the .pdf file and takes a couple seconds for the file to generate. It doesn't appear to quit, it saves the file like it normally would however when I edit the file in notepad it is not complete (Stops after ~18 lines with Wordwrap on)

If I change this line;

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file";

to

$cmd = "htmldoc -t pdf --charset iso-8859-1 $cmd $file > test.pdf";

Then I get a test.pdf in my mediawiki root folder which works perfectly

You could try changing the htmldoc command to use passthru like Extension:Pdf Export - I had it like that on mine but had problems with the gzip encoding, but it may work better like that for you --Nad 21:55, 6 September 2007 (UTC)Reply

images in the pdf Book?

[edit]

Is there any possibility of getting images displayed in the pdf Book as well?. would be a fantastic improvement. Any workarounds? Martin

I'm working on it, I just can't get them to work currently. I'm checking out some of the solutions at Extension talk:Pdf Export too as that one uses htmldoc as well. --Nad 12:39, 12 September 2007 (UTC)Reply


Nad, thanks for your great work. I made some fixes to your extension and got it to work correctly with images, even with secure server without modifying .htaccess.

The points are:

  • when generating html output only, links to images could stay absolute as currently.
  • when generating pdf output, links to images should be converted to relative links to the temp file (pdf-book-something in $IP/images)
  • --browserwidth could be a workaround when you have only large images, but would make your small images too small when your image sizes varying a lot. My solution is to rescale large images to fit in the page (pick up image width and height from html output, if they are too big for the paper size, then adjust width="x%", x depending on the ratio width/maxWidth and height/maxHeight.

Hope this helps. Just tell me if you'd want me to send you my codes. Lechau 02:20, 6 June 2008 (UTC)Reply

A hack

[edit]

In file PdfBook.hooks.php around line 101 (I may have inserted other stuff) just before "#write the HTML to a tmp file" insert this:

$ori_string = 'src="';
$repl_string = 'src="' . $wgServer;
$html = str_replace ($ori_string, $repl_string, $html);
# Write the HTML to a tmp file

The problem is that the intermediary output file got stuff like this:

 src="/mediawiki/images/thumb/pict.png

but you want:

httpee://your.server.org/mediawiki/images/thumb/pict.png

This is not the best solution, a regexp hacker should actually rip away most of the html picture markup and then replace the thumb by the original pic maybe. But above is at least a minimal job. To see the intermediary file as someone said, comment the unlink at the end and the get it from the images file.

 	//@unlink($file);

Sorry, I'm not a real programmer and have too much workload to help for real. Just wanted to produce some handouts ;) - Daniel

[edit]

I did not find the often mentioned ./images folder. Only the images folder in the wiki root. Any ideas?

Same problem as section 2

[edit]

I'm on Ubuntu Linux with Mediawiki 1.10. Htmldoc is in /usr/bin. I commented out the unlink command, and the temp file is empty (0 length).

I checked to be sure that my Apache user can run htmldoc -- it can. Unsure what I should try next.

By the way, your single-page export plugin works perfectly (even for images). So I know that htmldoc is not at fault here.

I didn't write the single page one, but the code seems pretty similar. I'll just have to see what differences there is in the code between this one and the single-page one. --Nad 22:28, 14 September 2007 (UTC)Reply


Upload filetype

[edit]

What happens when pdf is not a valid file type when uploading? Does the wiki control this with this extension, if so do I need to add pdf file types to the type of files you can upload?

The upload filetype is unrelated to this since exported pdf's are downloaded not uploaded. If you want to add pdf to your allowed upload filetypes, use $wgFileExtensions[] = 'pdf', you may also want to set $wgVerifyMimeType to false if it's giving you hassles when you try and upload exotic types of file. --Nad 04:11, 21 September 2007 (UTC)Reply


More empty file downloads

[edit]

--Johnp125 02:12, 25 September 2007 (UTC)Reply

Sorry to be such a pain. I have setup a test wiki which is running fedora --Johnp125 00:23, 27 September 2007 (UTC)c 4. Please check out my test wiki and see if you can give me some direction. I have debug for the wiki in localsettings.php on. If you need admin access please email me at johnp125@yahoo.com and I'll hook you up.Reply

http://wikitest.homelinux.net/wiki2/index.php/Main_Page

The output shows a bug due to 1.11 being more strict about hook return values. Try again now with the latest version, 0.0.4. Also note that even if it works, you will get just an empty document since the point of this extension is to compose a book from the content of a category, if it not placed in a category or the category contains no members then the result will be empty. To export the content of a single page you should be using Extension:Pdf Export. --Nad 03:33, 25 September 2007 (UTC)Reply
However, I'm working on version 0.5 now which can be used in non-category pages and will compose the book from the article links found in the page, so that books can then be composed from explicit lists or DPL queries. --Nad 03:33, 25 September 2007 (UTC)Reply

--Johnp125 13:28, 25 September 2007 (UTC)Reply

Hey that sound great I'd love to help you with it.

You mentioned single page. I had 2 types of pdf downloads there.

http://wikitest.homelinux.net/wiki2/index.php?title=Category:test&action=pdfbook this one should be going after the demo page with the catageory:test and then creating a pdf book from that. Is this not the right way to use the code? I know if I created more pagese and put the catageory:test under them they would get put into the pdf file as well.

You had a typo in the word "category", link is working now ;-) --Nad 22:21, 25 September 2007 (UTC)Reply

--Johnp125 17:30, 26 September 2007 (UTC)Reply

Thanks a bunch. Your the greatest. Glad to have this working now.

Checked out your info about Images not showing in mediawiki 1.10.2---1.11. Nice work.

I just did another update yesterday which has images working now --Nad 21:06, 26 September 2007 (UTC)Reply


--Johnp125 00:16, 27 September 2007 (UTC)Reply

Is this the update that is going to work with DPL queries? I started to play around with that extension. I know it's working but right now it's too big to try and figure out.


--Johnp125 00:23, 27 September 2007 (UTC)Reply

Hey by the way could you tell me how to make the pdfbook extension just make a big html file, so I could open it in word or openoffice in html format and let the office program convert it from the html file? Or is it easier to say and harder to do?

That feature is very easy to add because it simply requires not sending the file to HTMLDOC, I've added an option in a new version (0.0.7) which allows you to do this by adding format=html to the query-string. --Nad 02:06, 27 September 2007 (UTC)Reply


--Johnp125 22:04, 30 September 2007 (UTC)Reply

Wow that sounds great can't wait to try out the html export. I looked for the 0.0.7 version but only saw the 0.0.6 version when I went to the download section. Also could you give me a example of how the format=html is used.

http://www.foo.bar/wiki/index.php?title=Catgeory:Foo&action=pdfbook

Where would it go in this string?

Sorry about that I must have forgotten to update it, it's at 0.0.7 now. To change the URL above to produce html, append &format=html to it. We use a template which has a link for both, see OrganicDesign:Template:Book. --Nad 07:11, 1 October 2007 (UTC)Reply


--Johnp125 01:55, 2 October 2007 (UTC)Reply

The html export looks really good. I Did notice on small html files Microsoft word gets confused about it. Maybe if you put the html header info at the top and bottom of page to help microsoft word out. Openoffice did not seem to have a problem with it. However word is looking for the html tags on small exports. If it's a big export it gets the idea.


--Johnp125 02:08, 2 October 2007 (UTC)Reply

Just tested it again with a small html download. Word tried to format it when opening. Then I added the <html> to the beginning and then added the </html> at the end. Then reopened the file with word and bingo it worked fine. Maybe something to add in 0.0.8? Openoffice worked either way.

Keep up the good work. This is the best extension for wiki out there right now.

If you have larger text, don't forget to change server settings. E.g. for a 2000 page document produced with a low-end 2CPU sparc box I use this in php.ini:

max_execution_time = 600
max_input_time = 600
memory_limit = 100M

and this in http.conf:

Timeout 600

Else you just get a blank page without any warning or error message - Daniel K. Schneider 11:00, 20 June 2008 (UTC)Reply

Hacks to change PDF output (v. 0.6)

[edit]
  • Images: If they don't fit your PDF page, you have to set pixel width of a virtual browser page (that's a "feature" of htmldoc). By default it is 680 pixels only and images larger than that will be rendered larger than your PDF page! Lots of my pictures are...
  • Titlepage: If you want a standardized titlepage before the TOC, create it in HTML and put it somewhere in your file system. I just put it in the images directory.

Then change PdfBook.php like this for example:

$cmdext =  " --browserwidth 1000 --titlefile $wgUploadDirectory/PDFBook.html";
$cmd  = "htmldoc -t pdf --charset iso-8859-1 $cmd $cmdext $file";

Basically, I found it a good idea to read the htmldoc manual. In my Unix system it sits in /usr/local/share/doc/htmldoc/htmldoc.pdf. (see chapter 8). Made other changes too.

Now of course Nad may at some point add some more options, but changing a line in the php file does it too :) - Daniel (edutechwiki.unige.ch)

PdfBook Error Solution....for me at least

[edit]

Nad,

I ended up creating an additional temp file which I had HTMLDoc redirect the output to. This was the only way I was able to have it not quit during the process PDF conversion process. I then open the file and read its data back into $content. After doing that I am able succsefully download the complete pdf file.

But I have another question for you.....I have seen a jspwiki which retrieves all the articles for a category and lists them on a page and uses a form to allow you to select which ones you want. It then retrieves the selected articles as one entire book. Is there a way to include a similar form in Mediawiki. Or do you know of a way to use an external html web page to retrieve/send commands like that to Mediawiki?

Thanks,

Dan --136.182.158.145 21:27, 7 September 2007 (UTC)Reply

The PDf Book extension will allow exceptions so that not all items in the category are included. It would be possible to have it add items to the selection in the same way. A form could then be used to generate the list from which the book is made. I'll have a think about that though because it's an interesting point you make, that books could be generated from queries rather than just categories... --Nad 22:01, 7 September 2007 (UTC)Reply

Just in case the anonymous above re-reads this: I had the same problem of PdfBook not generating any output, but the solution was simple: make sure that the upload directory (usually ./images) is writeable for the web server process. After I changed that, PdfBook worked okay. Cheers, Lexw 15:30, 5 October 2007 (UTC)Reply


Missing Images in new version

[edit]

I love this extension I think it is the best thing for wiki right now. However when I use the new pdfbook version 0.7 I am not getting any pictures. All I get is url links to the pictures. This is in the pdf format not the html format. Any Ideas? --Johnp125 20:29, 15 October 2007 (UTC)Reply

Do you mean to say that your images were working on the previous version and have stopped working now? I had never had images working until I made some changes in the last version. Do you have a link to an example of a failing image export so I can check out what the problem may be? --Nad 19:32, 17 October 2007 (UTC)Reply


--Johnp125 18:12, 19 October 2007 (UTC)Reply

Sorry for the delayed post. Yes I had images working on the 0.6 version and then on the 0.7 version I am not getting any images in pdf format. I can go ahead and setup my test server real soon and make sure you and I can test both. I think I still have a copy of the 0.6 version I will try it again as well.

--Johnp125 18:23, 19 October 2007 (UTC)Reply

Also I noticed the links are not working just right. For example if I have a document in Category: Testing and it pulls that document, and in that document it has another page that is in Category: Testing as well should the link not take me to the page in the pdf doc? Right now it is refering to the html link not the pdf link. I would think that it should realize that link was pulled by the category and then change the refrence to the pdf location.

I have a problem with the pictures, if the wiki needs a http authentication. It seems, that the pictures are iportet from the webserver and not from the file system. Does this the reason for the problems? Proofy 07:54, 29 November 2007 (UTC)Reply

Missing Images and hangs with larger categories

[edit]

Mistral 13:28, 17 October 2007 (UTC)Reply

We installed on Linux with 128 mb of memory allocated to php. Using the template idea referred to by Organic Design we have tested this and observe the following.

-images are not uploaded. They are copied to the pdf as links to the wiki image -html and pdf output work fine on small categories ( < 10 entries) Output is ready in less than 2 seconds and it looks nice -however for pages with > 25 entries when you press submit to get pdf output the browser hangs and never completes the operation. You need to close the browser to terminate the operation.

It should work for large books, our test book on organicdesign is over 250 pages/800KB and only takes a second or two with 64MB allocated. Have you tried saving it as html only then manually running it through htmldoc to see if that's working ok? --Nad 19:41, 17 October 2007 (UTC)Reply

I looked at your book link and the translation to pdf worked great on IE6 with Acrobat. However I do notice that there is not a single image in the book. Is it possible having 2 or 3 images per page in 25 - 30 pages is the problem?


I looked at the translation into html code to see why the images were not showing. I believe this can be fixed easily.

Here is the html output http://wiki.fomportal.comhttp://wiki.fomportal.com/images/9/94/BERalex_Full.jpg

here is what it should read src="http://wiki.fomportal.com/images/9/94/BERalex_Full.jpg" width="262" height="207" /> Do you see the duplication of the site address? ((http://wiki.fomportal.com)) Maybe this a configuration issue?? Mistral 18:03, 19 October 2007 (UTC)Reply

I'll check it out soon, your research into the problem should make it a lot easier for me to fix ;-) --Nad 20:40, 19 October 2007 (UTC)Reply
I found a bug which was trying to make URL's absolute which were already absolute, see if 0.0.8 works any better --Nad 00:18, 29 October 2007 (UTC)Reply


Missing images due to apache .htaccess restriction

[edit]

I've just encountered the problem that no images were displayed within the PDF - only their borders. In my case this was caused by .htaccess asking for a password in order to access the wiki folder.

The solution was to add "Allow from 127.0.0.1" and "Satisfy All" to the .htaccess file so htmldoc could access the images for embedding them into the PDF. --^Rooker 12:14, 05 December 2007 (UTC)

Be aware that the corresponding IP adress is not always 127.0.0.1. It didn't work for me. So I spend some ours on debugging until I took a look in the apache access.log where I saw that accesses by the local machine where not logged with 127.0.0.1 but with the real IP adress of our server. --Fydel 12:45, 9 January 2009 (UTC)Reply

seblac 28-10-2010 : Other solution for the same problem : For me the problem comes with the inclusion of a NTLM device in APACHE 2 using SSPI module. The only solution was to encapsulate security rules for php files only :

<Directory "c:/foo/www/">
Order Allow,Deny
Allow from All

<Files *.php>
AuthName "foo access"
AuthType SSPI
SSPIAuth On
SSPIAuthoritative Off
SSPIOfferBasic Off
SSPIOmitDomain On
require valid-user
</Files>

</Directory>

SubCategories

[edit]

I made a structure using categories and subcategories. My goal is to make a complete Quality Manual using MediaWiki. Using PdfBook extension from a categorie page no sub categories are included in the PDF resulted.

Is there any manner to use pdfbook extension to make a book covering sub and subsubcategories?

Regards, Antonio Todo Bom --Todobom 22:50, 28 October 2007 (UTC)Reply

Unfortunately not sorry, currently it can only work on a list, deeper levels are only done from heading levels not sub-categories. You may be able to use DPL to create reports of the sub-category and sub-sub-category content which could then be printed as a book. --Nad 00:10, 29 October 2007 (UTC)Reply
I'm facing the same problem with the Quality Manual I'm working on. Please let me know if someone solve this problem and I'll do my best to find a solution to this myself.
/Jesper 85.89.79.106 12:43, 30 October 2007 (UTC)Reply


Looks like a job for a recursive program call. When we installed this I thought I would be able to have one master category that contained all the other categories and then just go "Save as pdf" but it's not that easy yet. I hope you are able to add this functionality.

Mistral 16:30, 30 October 2007 (UTC)Reply

It's not as simple as that - how do the categories and sub-categories names map to heading level? and then how do the headings and subheadings etc in the document map to pdf headings? --Nad 20:57, 30 October 2007 (UTC)Reply
I understand the problem... Somehow the new Category should have it's own heading, and if that's the case, all other H1 would become H2 and so on... But, let's ignore that factor and say that you only wants to make a huge PDF Book with all categories, with the same heading levels used today, how to do that?... I tried to use GPL to make it print all articles in a couple of categories and then PDF the category that article was in, but it didn't work... //Jesper 85.89.79.106 08:46, 1 November 2007 (UTC)Reply
I doubt I'll be adding the subcategory functionality for some time, if at all, I just have too much other stuff on. There's an example of using DPL to make books from at OrganicDesign:Creating a PDF book from a DPL query. --Nad 20:40, 1 November 2007 (UTC)Reply
Have a look at Extension:Book. The issue is adressed there. It should also be possible to merge both approaches.--Sh4k 12:23, 29 May 2008 (UTC)Reply

Hello there, Nad. Great work you have there. You mean by this last suggestion that we can draw a custom layout? It would be nice that a page describing the layout, like the following, could specify the heading layout:

 * [[Page for Section 1]]
 :* [[Page for Section 1.1]]
 * [[Page for Section 2|Section 2]]

Is it possible? Nuno Tavares 20:45, 12 January 2008 (UTC)Reply

I also take the chance to ask you to look at the last line: Section 2. The page in fact is called Page for Section 2 but what is desired to be shown is Section 2, so I think that should also be the section name, when building the PDF. This is specially usefull if you are using namespaces. Nuno Tavares 21:15, 12 January 2008 (UTC)Reply
OK, I found a way (a hack, actually) to allow this. In onUnknownAction(), just use "$title->getText()" instead of "basename($ttext)" Nuno Tavares 21:49, 12 January 2008 (UTC)Reply


Subcategory article shows up in pdfbook export

[edit]

When creating a subcategory (= assigning a category page to a category), that (sub)category's page also shows up in the export. Either I've overlooked these pages in previous exports, or this behavior was introduced with a more recent version of MediaWiki (I'm currently using v1.12.0, with pdfbook 0.0.9).

I was quite puzzled, so I thought I'd let someone know about this behavior. -- The rooker 09:51, 28 April 2008 (UTC)Reply

Use CSS when exporting to PDF

[edit]

Hi all. I want to know if there are some way to use CSS when I'm exporting my PDF:s?.. The thing is, I want to make id="toc" invisible instead of having another table of contents in my PDF Books. //Jesper 85.89.79.106 12:57, 31 October 2007 (UTC)Reply

I've been looking round for PDF converters which can handle CSS but I can't find any. You'll have to add __NOTOC__ to remove the toc. --Nad 20:42, 31 October 2007 (UTC)Reply
Hmm... But adding __NOTOC__ removes the table of contents of the page, and as the page is pretty long, I think the users need that one... It would be great if I could make the TOC disappear only in the PDF. //Jesper 85.89.79.106 06:35, 1 November 2007 (UTC)Reply
I've been testing some now and by adding:
$ori_string = 'id="toc"';
$repl_string = 'id="toc" style="visibility: collapse;"';
$html = str_replace ($ori_string, $repl_string, $html);
After "# If format=html in query-string, return html content directly" the TOC disappears in the HTML file, but I can't get the same thing to work with the PDF. //Jesper 85.89.79.106 07:00, 1 November 2007 (UTC)Reply
Good point, it's not useful to have TOC when it's a book which already has a TOC - I've updated it to add a __NOTOC__ before parsing each article --Nad 07:58, 1 November 2007 (UTC)Reply
Ah, Thanks Nad! That was a fast reply and I really appreciate it! //Jesper 85.89.79.106 08:31, 1 November 2007 (UTC)Reply


no index pages

[edit]

--Johnp125 16:59, 8 November 2007 (UTC)Reply

Is there anyway to run the query and not create any autogenerated index pages or put the index number in the text?

--Johnp125 18:26, 8 November 2007 (UTC)Reply

ok just checked out the new html version .9. This does what I would like it to do. Images work and everything.

I was having problems with the images because we have a alias for the wiki /wiki/index.php when you run the pdfbook to pdf format I think it cannot find the /wiki/picture.jpg instead of /picture.jpg, anyway the new html version works just fine.


Header info

[edit]

--Johnp125 18:31, 8 November 2007 (UTC)Reply

I know this question is off on a limb but, is there anyway I could select certain Headline text from not being pulled based on the name like Image Header?

Missing end tag in 0.0.9 source code

[edit]

Just for the record: it seems that the page at Organic Design which lists the v0.0.9 source code is missing a php end tag at the bottom of the file. Cheers, Lexw 09:23, 13 November 2007 (UTC)Reply

End delimiters are removed to avoid whitespace being sent to the output - unfortunately I can't find the link to the official bug report about it. --Nad 19:59, 13 November 2007 (UTC)Reply

Additional functionality in PdfBook

[edit]

Hi Nad, I have added some additional functionality into PdfBook that you might be interested in for a next version. Seems that you have switched off email (which I can understand), so I couldn't contact you that way. Please contact me by email via 'E-mail this user' if you are interested.

Other users: please don't contact me. I might come back to this topic later, first I want to discuss things with Nad.

Regards, Lexw 13:39, 15 November 2007 (UTC)Reply

Added recursive follow functionality

[edit]

Hi Nad, I'm using your PdfBook Extension and I've added some functionality to recursively follow links to produce a PDF. With the parameter follow=deep or follow=broad the created PDF will contain all pages that are referenced from the current page, and recursively all further referenced pages, in a depth-first or breath-first manner. Here are the relevant code snippets:

			if ($title->getNamespace() == NS_CATEGORY) {
				$cat    = $title->getDBkey();
				$db     = &wfGetDB(DB_SLAVE);
				$cl     = $db->tableName('categorylinks');
				$result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey");
				if ($result instanceof ResultWrapper) $result = $result->result;
				while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]);
				}
			else if (isset($_REQUEST['follow'])) {
				$deep = $_REQUEST['follow'] == 'deep';
				wfDebug("PdfBook: following links - " . ($deep ? "depth first\n" : "breadth first\n"));
				$articles[] = $title;
				wfDebug("PdfBook: adding page '" . $title->getText() . "'\n");
				$this->getLinkedArticles($articles,$article,$opt,$deep);
			} else {
				$text = $article->fetchContent();
				$text = $wgParser->preprocess($text,$title,$opt);
				if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links))
					foreach ($links[1] as $link) $articles[] = Title::newFromText($link);
				}
	function getLinkedArticles(&$articles,$article,$opt,$deep) {
		global $wgParser;
		$text = $article->fetchContent();
		$text = $wgParser->preprocess($text,$article->getTitle(),$opt);
		$linktitles = array();
		wfDebug("PdfBook: ----- processing article '" . $article->getTitle()->getText() . "' ($deep)\n");
		if (preg_match_all('/\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links)) {
			foreach ($links[1] as $link) {
				$linktitles[] = Title::newFromText($link);
				wfDebug("PdfBook: found link '" . $link . "'\n");
			}
		}
		wfDebug("PdfBook: processing " . count($linktitles) . " links...\n");
		if ($deep) {
			foreach ($linktitles as $linktitle) {
				$exists = false;
				foreach ($articles as $el) {
					if ($el->getText() == $linktitle->getText())
						$exists = true;
				}
				if (!$exists) {
					wfDebug("PdfBook: adding '" . $linktitle->getPrefixedText() . "'\n");
					$articles[] = $linktitle;
					wfDebug("PdfBook: adding subpages\n");
					$art = new Article($linktitle);
					$this->getLinkedArticles($articles,$art,$opt,$deep);
					wfDebug("----- <\n");
				}
			}
		} else {
			$newlinktitles = array();
			foreach ($linktitles as $linktitle) {
				$exists = false;
				foreach ($articles as $el) {
					if ($el->getText() == $linktitle->getText())
						$exists = true;
				}
				if (!$exists) {
					wfDebug("PdfBook: adding '" . $linktitle->getText() . "'\n");
					$articles[] = $linktitle;
					$newlinktitles[] = $linktitle;
				}
			}
			foreach ($newlinktitles as $linktitle) {
				wfDebug("PdfBook: adding subpages of '" . $linktitle->getText() . "'\n");
				$art = new Article($linktitle);
				$this->getLinkedArticles($articles,$art,$opt,$deep);
			}
		}
	}

I can also send you the complete file if you want. Tbleier 2008-01-25


Added dynamic title page

[edit]

In order to have a proper title page on the generated PDF, I've added a few lines of code that read a plain HTML file and replace some placeholders with values like "Category name", etc... and then use that file with htmldoc's otherwise static "--titlefile" option.

Additionally, I've added 2 new variables: $wgPdfBookTitleFile and $wgPdfBookLogoImage so one can easily select a title page and logo image (to display at the bottom of a page).

I'll make a small package and put it on some webserver instead of posting the code here (too messy already). :) The rooker 14:00, 20 February 2008 (UTC)Reply

That is exactly what I have done and wanted to discuss with Nad (see above), but he doesn't seem to react. I've gone a little further and now create the titlefile dynamically from the PdfBook extension, so there is no more external HTML file necessary for generating the title page. A logo file was included in my implementation too (only I added it to the header, not the footer, but that's a matter of configuration which can be overruled in the general wiki LocalSettings.php).
Since this implementation is not part of the "official" PdfBook extension, I will have to find a place to store it, if anyone is interested. Rooker, have you already stored your solution somewhere? Lexw 09:27, 8 April 2008 (UTC)Reply
@Lexw: I've provided a quickly cleaned version including my modifications. See the "README.txt" inside for details: PdfBook-0.0.9-DynamicTitle.tar.bz2 The rooker 10:57, 17 April 2008 (UTC)Reply
Thankyou for this i am using this part of the code. Anyone got any ideas about how i can include headers and footers on everypage of the pdf? --194.169.24.100 16:48, 19 June 2009 (UTC)Reply

PHP compilation error

[edit]

Hello,

I'm trying to install version 0.0.9 on a Red Hat Entreprise Linux ES 4 on which a mediawiki 1.6.8 is running with php 4.3.9. php-book.php has been copied into the "extensions" directory, then include vi LocalSetting :

require_once( "extensions/pdf-book.php" );

and we have this error :

Parse error: parse error, unexpected T_OBJECT_OPERATOR in /var/wwwwikitn/html/mediawiki-1.6.8/extensions/pdf-book.php on line 66

which is

$msg = $wgUser->getUserPage()->getPrefixedText().' exported as a PDF book';

Any idea ? Thanks !

Just a wild guess: PHP5 needed? Lexw 07:49, 17 April 2008 (UTC)Reply

Problem in pdfbook if only current page should be converted

[edit]

I had the problem if I use

[{{fullurl:{{FULLPAGENAMEE}}|action=pdfbook}} download as PDF]

no PDF was produced because temporary html file was empty.

I had to add the following line to the else block of if ($title->getNamespace() == NS_CATEGORY) {

$articles[] = $title;

Now it works. The new code looks like this

			if ($title->getNamespace() == NS_CATEGORY) { 
				$cat    = $title->getDBkey() ;
				$db     = &wfGetDB(DB_SLAVE);
				$cl     = $db->tableName('categorylinks');
				$result = $db->query("SELECT cl_from FROM $cl WHERE cl_to = '$cat' ORDER BY cl_sortkey");
				if ($result instanceof ResultWrapper) $result = $result->result;
				while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]);
				}
			else {
				$text = $article->fetchContent();
				$text = $wgParser->preprocess($text,$title,$opt);
				if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links))
					foreach ($links[1] as $link) $articles[] = Title::newFromText($link);
				$articles[] = $title;
				}

--Guenterg 11:31, 28 March 2008 (UTC)Reply

I had the same problem, your fix works for me too. Thanks a lot, Guenter!
Now I still have to find a way how to avoid that the PdfBook template itself is included in the PDF document if I place that template on an article, not on a category. But that's a different matter... Lexw 09:16, 8 April 2008 (UTC)Reply


-- This modification works pretty good. It produces a funny numbering scheme for the page index. RHEL 5 / PHP 5.1.6 / LAMP / MW 1.12

Whole Namespace Export

[edit]

The tweak below will allow the extract of a whole NameSpace e.g. "Talk" through the additional action "nspdfbook" eg.

 http://localhost/wiki/index.php?title=Talk:Main_Page&action=nspdfbook

Note:

  • You may have to up the "; Resource Limits ;" in your php.ini, if you use the mod to export all "Articles".
  • May wish to Alter the Order by, to sort on page name rather than id.
	public static function onUnknownAction( $action, $article ) {
		global $wgOut, $wgUser, $wgParser, $wgRequest;
		global $wgServer, $wgArticlePath, $wgScriptPath, $wgUploadPath, $wgUploadDirectory, $wgScript;

		if( $action == 'pdfbook' || $action == 'nspdfbook' ) {

			$title = $article->getTitle();
			$opt = ParserOptions::newFromUser( $wgUser );

			// Log the export
			$msg = wfMsg( 'pdfbook-log', $wgUser->getUserPage()->getPrefixedText() );
			$log = new LogPage( 'pdf', false );
			$log->addEntry( 'book', $article->getTitle(), $msg );

			// Initialise PDF variables
			$format  = $wgRequest->getText( 'format' );
			$notitle = $wgRequest->getText( 'notitle' );
			$layout  = $format == 'single' ? '--webpage' : '--firstpage toc';
			$charset = self::setProperty( 'Charset',     'iso-8859-1' );
			$left    = self::setProperty( 'LeftMargin',  '1cm' );
			$right   = self::setProperty( 'RightMargin', '1cm' );
			$top     = self::setProperty( 'TopMargin',   '1cm' );
			$bottom  = self::setProperty( 'BottomMargin','1cm' );
			$font    = self::setProperty( 'Font',	     'Arial' );
			$size    = self::setProperty( 'FontSize',    '8' );
			$ls      = self::setProperty( 'LineSpacing', 1 );
			$linkcol = self::setProperty( 'LinkColour',  '217A28' );
			$levels  = self::setProperty( 'TocLevels',   '2' );
			$exclude = self::setProperty( 'Exclude',     array() );
			$width   = self::setProperty( 'Width',       '' );
			$width   = $width ? "--browserwidth $width" : '';
			if( !is_array( $exclude ) ) $exclude = split( '\\s*,\\s*', $exclude );
 
			// Select articles from members if a category or links in content if not
			if( $format == 'single' ) $articles = array( $title );
			else {
				$articles = array();
				if( $title->getNamespace() == NS_CATEGORY ) {
					$db     = wfGetDB( DB_SLAVE );
					$cat    = $db->addQuotes( $title->getDBkey() );
					$result = $db->select(
						'categorylinks',
						'cl_from',
						"cl_to = $cat",
						'PdfBook',
						array( 'ORDER BY' => 'cl_sortkey' )
					);
					if( $result instanceof ResultWrapper ) $result = $result->result;
					while ( $row = $db->fetchRow( $result ) ) $articles[] = Title::newFromID( $row[0] );
				}
                        else { if ($action == 'nspdfbook') {
                                   $db     = &wfGetDB(DB_SLAVE);
                                   $pl     = $db->tableName('page');
                                   $ns     = $title->getNamespace();
                                   $result = $db->query("SELECT page_id FROM $pl WHERE page_namespace = $ns ORDER BY page_id");
                                   if ($result instanceof ResultWrapper) $result = $result->result;
                                   while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]);
                                   $book  = "PDFBook_Namespace_Export-".MWNamespace::getCanonicalName($ns);
                                }
				else {
					$text = $article->fetchContent();
					$text = $wgParser->preprocess( $text, $title, $opt );
					if ( preg_match_all( "/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m", $text, $links ) )
						foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link );
				}
}
			}

			// Format the article(s) as a single HTML document with absolute URL's
			$book = $title->getText();
			$html = '';

--Andy 13:55, 11 April 2008 (UTC)Reply

I've updated this code to work (maybe) with newer versions. —Emufarmers(T|C) 09:36, 20 February 2013 (UTC)Reply

SpecialVersion Issue and PHP 5.1.4

[edit]

After installing the PdfBook extension and displaying the version page I get:

 Notice: Object class PdfBook could not be converted to int in ....\SpecialVersion.php on line 275.
   The line in Specialversion.php is "sort ($list);"

I found a general discussion at http://www.webmasterworld.com/php/3586902.htm that talk about 5.1.4 vs 5.2.4. Any thoughts on how to make PdfBook work in php 5.1.4?

I am using Mediawiki 1.12.0 and php 5.1.4.

[edit]

Any ideas on how best to link into the Hierarchy extension? I think this would be very useful because the hierarchy is setup perfect for printing a book. I haven't quite figured out how to set this up though. You would have to use the extensions "hierarchy" table to pull information about where you are on the hierarchy, and what subordinate pages you would have to print. I think it would be nice to print where you are down, like you are on chapter 1, so it only prints chapter 1, but if you are at the title page it will print the whole book. It might also be nice to be able to setup a list of pages and then print that list in order. I am going to do what I can, but I am pretty new to PHP, and any advice is welcome.

--Greg 16:04, 1 May 2008 (UTC)Reply

Exclusions

[edit]

It would also be nice to have exclusion meta tags where you can specify what parts are included in the book and what parts are not (so if you have a header/footer you don't have to include that in the book)

--Greg 16:07, 1 May 2008 (UTC)Reply

I have also run into this problem, wanting to include only one section of a page using the [[PageName#Section]] markup to get just that section as part of the composite print. This would be a great feature.

--Abby621 14:04, 4 June 2008 (UTC)Reply

After looking through the code, I discovered you can accomplish exclusions by placing parts of the article you wish not to include inside of <div> tags (example: <div class= "noprint"> exclude this section </div>)

latest version gives syntax error

[edit]
   Parse error: syntax error, unexpected '}' in Pdf_Book.php on line 49

Is this expected?

FWIW, I'm using Ubuntu Hardy Heron with PHP 5.2.4-2ubuntu5.1 with Suhosin-Patch 0.9.6.2

Swaroopch 21:20, 21 June 2008 (UTC)Reply

Sorry about that, fixed --Nad 21:59, 21 June 2008 (UTC)Reply

"??????" instead of russian letters

[edit]

We have all "?" sings instead of russian letters. Encoding in browser is UTF-8.

  • Change default charset from iso-8859-1 to cp-1251, $charset = $this->setProperty('Charset','cp-1251');
  • Replace php function utf8_decode by other function, what can convert utf8 to cp1251; sample, 89 line of file PdfBook.hooks.php: $html .= iconv("utf-8", "windows-1251", "$h1$text\n"); //utf8_decode( "$h1$text\n" );
  • If no text in pdf displayed, replace fonts used by htmldoc (/usr/share/htmldoc/fonts) by fonts with cyrillic support.

--Rius 16:15, 19 June 2009 (UTC)Reply

How would you modify the script to include the last date and time edited for each article?

[edit]

I'm not a PHP wiz and am wondering what would be involved to output the last edit date/time for each article? Preferably, I would like to see this info directly under the article title. Any help would be excellent. Great extension! --Paul

No images

[edit]
  • I still can't get images in. The image is in the PDF file, and links back to my wiki image, but the picture simply doesn't appear. Help?
  • Also, title page is empty. How to fill it?

Here is my Template: Template:Pdf_book

[[Image:15x18-fileicon-pdf.png]][{{fullurl:{{FULLPAGENAMEE}}|action=pdfbook}} Create a PDF Book]]

Updated bibtex_fields.php

[edit]

Here is an updated bibtex_fields.php with complete Bibtex Entries and Fields.


bibtex_fields.php

<?php
//taken from http://en.wikipedia.org/wiki/BibTeX
//this file in only used in the creation of a new reference as a template. 

$bibtex_fields["article"][]="author"; //mandatory
$bibtex_fields["article"][]="title";
$bibtex_fields["article"][]="journal";
$bibtex_fields["article"][]="year";
$bibtex_fields["article"][]="volume";
$bibtex_fields["article"][]="number";
$bibtex_fields["article"][]="pages";
$bibtex_fields["article"][]="month";
$bibtex_fields["article"][]="note";
$bibtex_fields["article"][]="key";
$bibtex_fields["article"][]="url";
$bibtex_fields["article"][]="keywords";
$bibtex_fields["article"][]="abstract"; 
$bibtex_fields["book"][]="author"; //mandatory
$bibtex_fields["book"][]="editor";
$bibtex_fields["book"][]="title"; //mandatory
$bibtex_fields["book"][]="publisher";
$bibtex_fields["book"][]="year"; //mandatory
$bibtex_fields["book"][]="volume";
$bibtex_fields["book"][]="number";
$bibtex_fields["book"][]="series";
$bibtex_fields["book"][]="address";
$bibtex_fields["book"][]="edition";
$bibtex_fields["book"][]="month";
$bibtex_fields["book"][]="note";
$bibtex_fields["book"][]="key";
$bibtex_fields["book"][]="url"; 
$bibtex_fields["book"][]="keywords";
$bibtex_fields["book"][]="abstract"; 
$bibtex_fields["conference"][]="author"; 
$bibtex_fields["conference"][]="title"; 
$bibtex_fields["conference"][]="booktitle"; 
$bibtex_fields["conference"][]="year"; 
$bibtex_fields["conference"][]="editor";
$bibtex_fields["conference"][]="pages";
$bibtex_fields["conference"][]="organization";
$bibtex_fields["conference"][]="publisher";
$bibtex_fields["conference"][]="address";
$bibtex_fields["conference"][]="month";
$bibtex_fields["conference"][]="note";
$bibtex_fields["conference"][]="key";
$bibtex_fields["conference"][]="url";
$bibtex_fields["conference"][]="keywords"; 
$bibtex_fields["conference"][]="abstract"; 
$bibtex_fields["inbook"][]="author"; 
$bibtex_fields["inbook"][]="editor"; 
$bibtex_fields["inbook"][]="title"; 
$bibtex_fields["inbook"][]="chapter"; 
$bibtex_fields["inbook"][]="pages"; 
$bibtex_fields["inbook"][]="publisher"; 
$bibtex_fields["inbook"][]="year"; 
$bibtex_fields["inbook"][]="volume"; 
$bibtex_fields["inbook"][]="number"; 
$bibtex_fields["inbook"][]="series"; 
$bibtex_fields["inbook"][]="type"; 
$bibtex_fields["inbook"][]="address"; 
$bibtex_fields["inbook"][]="edition"; 
$bibtex_fields["inbook"][]="month"; 
$bibtex_fields["inbook"][]="note"; 
$bibtex_fields["inbook"][]="key"; 
$bibtex_fields["inbook"][]="url"; 
$bibtex_fields["inbook"][]="keywords"; 
$bibtex_fields["inbook"][]="abstract"; 
$bibtex_fields["incolletion"][]="author";
$bibtex_fields["incolletion"][]="title";
$bibtex_fields["incolletion"][]="booktitle";
$bibtex_fields["incolletion"][]="publisher";
$bibtex_fields["incolletion"][]="year";
$bibtex_fields["incolletion"][]="editor";
$bibtex_fields["incolletion"][]="volume";
$bibtex_fields["incolletion"][]="number";
$bibtex_fields["incolletion"][]="series";
$bibtex_fields["incolletion"][]="type";
$bibtex_fields["incolletion"][]="chapter";
$bibtex_fields["incolletion"][]="pages";
$bibtex_fields["incolletion"][]="address";
$bibtex_fields["incolletion"][]="edition";
$bibtex_fields["incolletion"][]="month";
$bibtex_fields["incolletion"][]="note";
$bibtex_fields["incolletion"][]="key";
$bibtex_fields["incolletion"][]="url";
$bibtex_fields["incolletion"][]="keywords";
$bibtex_fields["incolletion"][]="abstract";
$bibtex_fields["inproceedings"][]="author";
$bibtex_fields["inproceedings"][]="title";
$bibtex_fields["inproceedings"][]="booktitle";
$bibtex_fields["inproceedings"][]="year";
$bibtex_fields["inproceedings"][]="editor";
$bibtex_fields["inproceedings"][]="volume";
$bibtex_fields["inproceedings"][]="number";
$bibtex_fields["inproceedings"][]="series";
$bibtex_fields["inproceedings"][]="pages";
$bibtex_fields["inproceedings"][]="address";
$bibtex_fields["inproceedings"][]="month";
$bibtex_fields["inproceedings"][]="organization";
$bibtex_fields["inproceedings"][]="publisher";
$bibtex_fields["inproceedings"][]="note";
$bibtex_fields["inproceedings"][]="note";
$bibtex_fields["inproceedings"][]="key";
$bibtex_fields["inproceedings"][]="url";
$bibtex_fields["inproceedings"][]="keywords";
$bibtex_fields["inproceedings"][]="abstract"; 
$bibtex_fields["manual"][]="title";
$bibtex_fields["manual"][]="author";
$bibtex_fields["manual"][]="organization";
$bibtex_fields["manual"][]="address";
$bibtex_fields["manual"][]="edition";
$bibtex_fields["manual"][]="month";
$bibtex_fields["manual"][]="year";
$bibtex_fields["manual"][]="note";
$bibtex_fields["manual"][]="key";
$bibtex_fields["manual"][]="url";
$bibtex_fields["manual"][]="keywords";
$bibtex_fields["manual"][]="abstract";
$bibtex_fields["mastersthesis"][]="author";
$bibtex_fields["mastersthesis"][]="title";
$bibtex_fields["mastersthesis"][]="school";
$bibtex_fields["mastersthesis"][]="year";
$bibtex_fields["mastersthesis"][]="type";
$bibtex_fields["mastersthesis"][]="address";
$bibtex_fields["mastersthesis"][]="month";
$bibtex_fields["mastersthesis"][]="note";
$bibtex_fields["mastersthesis"][]="key";
$bibtex_fields["mastersthesis"][]="url";
$bibtex_fields["mastersthesis"][]="keywords";
$bibtex_fields["mastersthesis"][]="abstract";
$bibtex_fields["misc"][]="author";
$bibtex_fields["misc"][]="title";
$bibtex_fields["misc"][]="howpublished";
$bibtex_fields["misc"][]="month";
$bibtex_fields["misc"][]="year";
$bibtex_fields["misc"][]="note";
$bibtex_fields["misc"][]="key";
$bibtex_fields["misc"][]="url";
$bibtex_fields["misc"][]="keywords";
$bibtex_fields["misc"][]="abstract";
$bibtex_fields["phdthesis"][]="author";
$bibtex_fields["phdthesis"][]="title";
$bibtex_fields["phdthesis"][]="school";
$bibtex_fields["phdthesis"][]="year";
$bibtex_fields["phdthesis"][]="type";
$bibtex_fields["phdthesis"][]="address";
$bibtex_fields["phdthesis"][]="month";
$bibtex_fields["phdthesis"][]="note";
$bibtex_fields["phdthesis"][]="key";
$bibtex_fields["phdthesis"][]="url";
$bibtex_fields["phdthesis"][]="keywords";
$bibtex_fields["phdthesis"][]="abstract";
$bibtex_fields["proceedings"][]="title";
$bibtex_fields["proceedings"][]="year";
$bibtex_fields["proceedings"][]="editor";
$bibtex_fields["proceedings"][]="volume";
$bibtex_fields["proceedings"][]="number";
$bibtex_fields["proceedings"][]="series";
$bibtex_fields["proceedings"][]="address";
$bibtex_fields["proceedings"][]="month";
$bibtex_fields["proceedings"][]="organization";
$bibtex_fields["proceedings"][]="publisher";
$bibtex_fields["proceedings"][]="note";
$bibtex_fields["proceedings"][]="key";
$bibtex_fields["proceedings"][]="url";
$bibtex_fields["proceedings"][]="keywords";
$bibtex_fields["proceedings"][]="abstract";
$bibtex_fields["techreport"][]="author";
$bibtex_fields["techreport"][]="title";
$bibtex_fields["techreport"][]="institution";
$bibtex_fields["techreport"][]="year";
$bibtex_fields["techreport"][]="type";
$bibtex_fields["techreport"][]="number";
$bibtex_fields["techreport"][]="address";
$bibtex_fields["techreport"][]="month";
$bibtex_fields["techreport"][]="note";
$bibtex_fields["techreport"][]="key";
$bibtex_fields["techreport"][]="url";
$bibtex_fields["techreport"][]="keywords";
$bibtex_fields["techreport"][]="abstract";
$bibtex_fields["unpublished"][]="author";
$bibtex_fields["unpublished"][]="title";
$bibtex_fields["unpublished"][]="note";
$bibtex_fields["unpublished"][]="month";
$bibtex_fields["unpublished"][]="year";
$bibtex_fields["unpublished"][]="key";
$bibtex_fields["unpublished"][]="url";
$bibtex_fields["unpublished"][]="keywords";
$bibtex_fields["unpublished"][]="abstract";
//
?>

Bibtex Required/Optional - for your wiki

[edit]
  • Latex defines three types of fields:
    • Required - always displayed
    • Optional - usually not used
    • Ignored - never used, can be arbitrary
@article{citation_key,
author = {},
title = {},
journal = {},
year = {},
volume = {},
number = {},
pages = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@book{citation_key,
author = {},
editor = {}, % author OR editor required
title = {},
publisher = {},
year = {},
volume = {},
number =	{}, % volume OR number
series = {},
address = {},
edition =	{},
month = {},
note = {},
key = {}, 
url = {},
keywords = {},
abstract = {}
}
@conference{citation_key,
author = {},
title = {},
booktitle = {},
year = {},
editor = {},
pages = {},
organization = {},
publisher = {},
address = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@inbook{citation_key,
author = {},
editor = {}, % author OR editor 
title = {},
chapter = {},
pages = {}, % chapter AND/OR pages
publisher = {},
year = {},
volume = {},
number = {}, % volume OR number
series = {},
type = {},
address = {},
edition = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@incollection{citation_key,
author = {},
title = {},
booktitle = {}, % booktitle should be exactly the same as title? Not sure.
publisher = {},
year = {},
editor = {},
volume = {},
number = {}, % volume OR number
series = {},
type = {},
chapter = {},
pages = {},
address = {},
edition = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@inproceedings{citation_key,
author = {},
title = {},
booktitle = {}, % booktitle should be exactly the same as title? Some kind of bug? Not sure.
year =  {},
editor = {},
volume = {},
number = {}, % volume OR number
series = {},
pages = {},
address = {},
month = {},
organization = {},
publisher = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@manual{citation_key,
title = {},
author = {},
organization = {},
address = {},
edition = {},
month = {},
year = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@mastersthesis{citation_key,
author = {},
title = {},
school = {},
year = {},
type = {},
address = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@misc{citation_key,

author = {},
title = {},
howpublished = {},
month = {},
year = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@phdthesis{citation_key,
author = {},
title = {},
school = {},
year = {},
type = {},
address = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@proceedings{citation_key,
title = {},
year = {},
editor = {},
volume = {},
number = {}, % volume OR number
series = {},
address = {},
month = {},
organization = {},
publisher =  {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@techreport{citation_key,
author = {},
title = {},
institution = {},
year = {},
type = {},
number = {},
address = {},
month = {},
note = {},
key = {},
url = {},
keywords = {},
abstract = {}
}
@unpublished{citation_key,
author = {},
title = {},
note = {},
month = {},
year = {},
key = {},
url = {},
keywords = {},
abstract = {}
}

Types of Bibtex entries = for your wiki

[edit]
  • There are 14 available entry types.


Bibtex Entry Descriptions
article An article from a journal or magazine.
book A book with an explicit publisher.
inbook A part of a book, usually untitled; may be a chapter and/or a range of pages. Use with book to reference a set of pages.
incollection A part of a book with its own title.
booklet A work that is printed and bound, but without a named publisher or sponsoring institution.
conference DO NOT USE! Included for compatibility.
manual Technical documentation.
mastersthesis A Master's thesis.
misc Use this type when nothing else seems appropriate.
phdthesis A PhD thesis.
proceedings The proceedings of a conference.
inproceedings An article in the proceedings of a conference. Use with proceedings to reference a sub-paper or sub-section.
techReport A report published by a school or other institution, usually numbered within a series.
unpublished A document with an author and title, but not formally published.

Bibtex - standard fields - for your wiki

[edit]
  • The available fields depend on which entry type is being used. Each entry type has required and optional arguments.


Bibtex Field Descriptions
address Usually the address of the publishwer or institution. For major publishing houses, omit it entirely or just give the city. For small publishers, you can help the reader by giving the complete address.
annote An annotation. It i not used by the standard bibliography styles, but may be used by other styles that produce an annotated bibliography.
author The name(s) of the author(s). Separate the names with 'and' no quotes. If there are many names, list the prominent ones and the last one as 'et al' no quotes. Most names can be entered "Last, First MI" or "First MI Last". Last names with two capitalized words need to be in the "Last1 Last2, First MI" format. If there is a Jr. in the name, use "Last, Jr., First". Accented letters should be enclosed in braces, {}. For example, "Kurt G{\"{o}}del".
booktitle The title of a book, a titled part of which is being cited. It is used only for the Incollection and Inproceedings entry types; use the title field for book entries. How to type titles is explained in titles.
chapter A chapter (or other sectional unit) number.
crossref The database key of the entry being cross-referenced.
edition The edition of a book -- for example, "Second". (The style will convert to lowercase if needed.)
editor The name(s) of editor(s), typed as indicated above. If there is also an author field, then the editor field gives the editor of the book or collection in which the reference appears.
howpublished How something strange was published.
institution The sponsoring institution of a technical report.
journal A journal name. Abbreviations may exist; see the Local Guide.
key Used for alphabetizing and creating a label when the author and editor fields are missing. This field should not be confused with the key that appears in the \cite{} command and at the beginning of the entry.
month The month in which the workd was published or, for an unpublished work, in which it was written. Use the standard three-letter abbreviations.
note Any additional information that can help the reader. The first word should be capitalized.
number The number of a journal, magazine, technical paper, or work in a series. An issue of a journal or magazine is usually identified by its volume and number; the organization that issues a technical report usually gives it a number; books in a names series are sometimes numbered.
organization The organization that sponsors a conference or that publishes a manual.
pages One or more page numbers or ranges of numbers, such as 42--111 or 7,41,73--97.
publisher The publisher's name.
school The name of the school where a thesis was written.
series The name of a series or set of books. When citing an entire book, the title field gives its title and the optional series field gives the name of a series or multivolume set in which the book was published.
title The work's title. The bibliography style determines whether or not a title is capitalized; the titles of books usually are, titles of articles usually not. Always type the title as if it were capitalized. Always capitalize the first word of the title, the first word after a colon, and all other words except articles and unstressed conjunctions (and, or, if) and prepositions. BIBTEX will change case as needed. If BIBTEX should not change an uppercase to lowercase, then enclose it in braces {}. Example, "Out of {Africa}" and "Out of {A}frica" are equivalent.
type The type of a technical report - for example, "Research Note". It is also used to specify a type of sectional unit in an inbook or incollection entry and a different type of thesis in a mastersthesis or phdthesis entry.
year The year of publication or, for an unpublished work, the year it was written. It usually consists only of numerals, such as 1984, but could also be something like circa 1066.

Bibtex Nonstandard / Optional Fields - for your wiki

[edit]
  • The available fields depend on which entry type is being used. Each entry type has required and optional arguments.


Bibtex Field Descriptions
affiliation The authors affiliation.
abstract An abstract of the work.
contents A Table of Contents
copyright Copyright information.
ISBN The International Standard Book Number.
ISSN The International Standard Serial Number. Used to identify a journal.
keywords Key words used for searching or possibly for annotation.
language The language the document is in.
location A location associated with the entry, such as the city in which a conference took place.
LCCN The Library of Congress Call Number.
mrnumber The Mathematical Reviews number.
URL The WWW Universal Resource Locator that points to the item being referenced. This often is used for technical reports to point to the ftp site where the postscript source of the report is located.

Get rid of temporary files

[edit]

using proc_open (read and write pipes connected to the htmldoc process) you can get rid of temporary files. This also fixes a variable conflict ($link): jhoetzel

<?php
# Extension:PdfBook
# - Licenced under LGPL (http://www.gnu.org/copyleft/lesser.html)
# - Author: [http://www.organicdesign.co.nz/nad User:Nad]
# - Started: 2007-08-08
 
if (!defined('MEDIAWIKI')) die('Not an entry point.');
 
define('PDFBOOK_VERSION','0.0.12, 2008-06-22');
 
$wgPdfBookMagic                = "book";
$wgExtensionFunctions[]        = 'wfSetupPdfBook';
$wgHooks['LanguageGetMagic'][] = 'wfPdfBookLanguageGetMagic';
 
$wgExtensionCredits['parserhook'][] = array(
	'name'	      => 'Pdf Book',
	'author'      => '[http://www.organicdesign.co.nz/nad User:Nad]',
	'description' => 'Composes a book from articles in a category and exports as a PDF book',
	'url'	      => 'http://www.mediawiki.org/wiki/Extension:Pdf_Book',
	'version'     => PDFBOOK_VERSION
	);
 
class PdfBook {
 
	# Constructor
	function PdfBook() {
		global $wgHooks,$wgParser,$wgPdfBookMagic;
		$wgParser->setFunctionHook($wgPdfBookMagic,array($this,'magicBook'));
		$wgHooks['UnknownAction'][] = $this;
 
		# Add a new pdf log type
		global $wgLogTypes,$wgLogNames,$wgLogHeaders,$wgLogActions;
		$wgLogTypes[]             = 'pdf';
		$wgLogNames  ['pdf']      = 'pdflogpage';
		$wgLogHeaders['pdf']      = 'pdflogpagetext';
		$wgLogActions['pdf/book'] = 'pdflogentry';
	}
 
	# Expand the book-magic
	function magicBook(&$parser) {
 
		# Populate $argv with both named and numeric parameters
		$argv = array();
		foreach (func_get_args() as $arg) if (!is_object($arg)) {
			if (preg_match('/^(.+?)\\s*=\\s*(.+)$/',$arg,$match)) $argv[$match[1]] = $match[2]; else $argv[] = $arg;
		}
 
		return $text;
	}
 
	function onUnknownAction($action,$article) {
		global $wgOut,$wgUser,$wgTitle,$wgParser;
		global $wgServer,$wgArticlePath,$wgScriptPath,$wgUploadPath,$wgUploadDirectory,$wgScript;
 
		if ($action == 'pdfbook') {

			# Log the export
			$msg = $wgUser->getUserPage()->getPrefixedText().' exported as a PDF book';
			$log = new LogPage('pdf',false);
			$log->addEntry('book',$wgTitle,$msg);
 
			# Initialise PDF variables
			$layout  = '--firstpage toc';
			$left    = $this->setProperty('LeftMargin',  '1cm');
			$right   = $this->setProperty('RightMargin', '1cm');
			$top     = $this->setProperty('TopMargin',   '1cm');
			$bottom  = $this->setProperty('BottomMargin','1cm');
			$font    = $this->setProperty('Font',	'Arial');
			$size    = $this->setProperty('FontSize',    '8');
			$linkc   = $this->setProperty('LinkColour',  '217A28');
			$levels  = $this->setProperty('TocLevels',   '2');
			$exclude = $this->setProperty('Exclude',     array());
			if (!is_array($exclude)) $exclude = split('\\s*,\\s*',$exclude);
 
			# Select articles from members if a category or links in content if not
			$articles = array();
			$title    = $article->getTitle();
			$opt      = ParserOptions::newFromUser($wgUser);
			if ($title->getNamespace() == NS_CATEGORY) {
				$db     = &wfGetDB(DB_SLAVE);
				$cat    = $db->addQuotes($title->getDBkey());
				$result = $db->select(
					'categorylinks',
					'cl_from',
					"cl_to = $cat",
					'PdfBook',
					array('ORDER BY' => 'cl_sortkey')
				);
				if ($result instanceof ResultWrapper) $result = $result->result;
				while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]);
			}
			else {
				$text = $article->fetchContent();
				$text = $wgParser->preprocess($text,$title,$opt);
				if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m',$text,$links))
					foreach ($links[1] as $link) $articles[] = Title::newFromText($link);
			}
 
			# Format the article's as a single HTML document with absolute URL's
			$book	  = $title->getText();
			$html	  = '';
			$wgArticlePath = $wgServer.$wgArticlePath;
			$wgScriptPath  = $wgServer.$wgScriptPath;
			$wgUploadPath  = $wgServer.$wgUploadPath;
			$wgScript      = $wgServer.$wgScript;
			foreach ($articles as $title) {
				$ttext = $title->getPrefixedText();
				if (!in_array($ttext,$exclude)) {
					$article = new Article($title);
					$text    = $article->fetchContent();
					$text    = preg_replace('/<!--([^@]+?)-->/s','@@'.'@@$1@@'.'@@',$text); # preserve HTML comments
					$text   .= '__NOTOC__';
					$opt->setEditSection(false);    # remove section-edit links
					$wgOut->setHTMLTitle($ttext);   # use this so DISPLAYTITLE magic works
					$out     = $wgParser->parse($text,$title,$opt,true,true);
					$ttext   = $wgOut->getHTMLTitle();
					$text    = $out->getText();
					$text    = preg_replace('|(<img[^>]+?src=")(/.+?>)|',"$1$wgServer$2",$text);
					$text    = preg_replace('|@{4}([^@]+?)@{4}|s','<!--$1-->',$text); # HTML comments hack
					$text    = preg_replace('|<table|','<table border borderwidth=2 cellpadding=3 cellspacing=0',$text);
					$ttext   = basename($ttext);
					$html   .= utf8_decode("<h1>$ttext</h1>$text\n");
				}
			}
 
			# If format=html in query-string, return html content directly
			if (isset($_REQUEST['format']) && $_REQUEST['format'] == 'html') {
				$wgOut->disable();
				header("Content-Type: text/html");
				header("Content-Disposition: attachment; filename=\"$book.html\"");
				print $html;
			}
			else {
				# Send the file to the client via htmldoc converter
				$wgOut->disable();
				header("Content-Type: application/pdf");
				header("Content-Disposition: attachment; filename=\"$book.pdf\"");
				$cmd  = "--left $left --right $right --top $top --bottom $bottom";
				$cmd .= " --header ... --footer .1. --headfootsize 8 --quiet --jpeg --color";
				$cmd .= " --bodyfont $font --fontsize $size --linkstyle plain --linkcolor $linkc";
				$cmd .= " --toclevels $levels --format pdf14 --numbered $layout";
				$cmd  = "htmldoc -t pdf --charset iso-8859-1 $cmd -";
				putenv("HTMLDOC_NOCGI=1");
				$process = proc_open("$cmd" , array(0 => array("pipe", "r"), 1 => array("pipe", "w")), $pipes);
			
				fwrite($pipes[0], $html);
				fclose($pipes[0]);
				fpassthru($pipes[1]);
				fclose($pipes[1]);
				proc_close($process);
			}
			return false;
		}
 
		return true;
	}
 
	# Return a property for htmldoc using global, request or passed default
	function setProperty($name,$default) {
		if (isset($_REQUEST["pdf$name"]))      return $_REQUEST["pdf$name"];
		if (isset($GLOBALS["wgPdfBook$name"])) return $GLOBALS["wgPdfBook$name"];
		return $default;
	}
 
	# Needed in some versions to prevent Special:Version from breaking
	function __toString() { return 'PdfBook'; }
}
 
# Called from $wgExtensionFunctions array when initialising extensions
function wfSetupPdfBook() {
	global $wgPdfBook;
	$wgPdfBook = new PdfBook();
}
 
# Needed in MediaWiki >1.8.0 for magic word hooks to work properly
function wfPdfBookLanguageGetMagic(&$magicWords,$langCode = 0) {
	global $wgPdfBookMagic;
	$magicWords[$wgPdfBookMagic] = array(0,$wgPdfBookMagic);
	return true;
}

Unset variables and blank pdfs...

[edit]

Ok, I know that this is the third post on this subject, but I am still having problems and haven't had much success in debugging the problem. I am on PDFBook version 1.0.0, MediaWiki 1.10.1

We are trying to use pdfbook on our company website. We use drupal for authentication (I don't know it's affects on MediaWiki)

I was getting blank pdf and html documents. I see you pull several global variables on lines 70-71

  global $wgOut, $wgUser, $wgTitle, $wgParser;
  global $wgServer, $wgArticlePath, $wgScriptPath, $wgUploadPath, $wgUploadDirectory, $wgScript;

I checked the variables and found that all of them are blank except: wgServer wgArticlePath wgScriptPath

I could not find the others in the entire $GLOBALS variable...

I'm not SUPER familiar with MediaWiki's structure and backend, but I would imagine that many of those (especially $wgUser) should be set.

Any ideas?
--Greg 18:23, 22 October 2008 (UTC)Reply

hiding numbering on headings and article title

[edit]

great extension! is it possible to hide the numbered headings when printing as a book? i noticed that the extension disregards the user preference and __NONUMBEREDHEADINGS__. we are trying to PDF print a "book" of data entry forms and the heading numbers are not required. thanks --Erikvw 06:23, 18 November 2008 (UTC)Reply

What i have done for now is to remove --numbered from the line

$cmd .= "$toc --format pdf14 --numbered $layout $width";

which seems to work fine.

revision id does not appear on pdf

[edit]

we are tracking revision information for the printed document using

{{REVISIONID}}-{{REVISIONTIMESTAMP}}

When printing the PDF, the REVISIONTIMESTAMP prints but REVISIONID does not. I noticed the same for Pdf_Export. Any ideas? thanks ----Erikvw 04:24, 19 November 2008 (UTC)Reply

some special characters and german umlauts result in empty pdf files

[edit]

When we try to receive categories with umlauts (e.g. "Übersicht") or special characters like "-" in the category name the generated pdf file is empty. Everything else runs real fine and smooth. Great extension.
Any workaround or help regarding this problem would be appreciated. --Fydel 12:29, 15 December 2008 (UTC)Reply

I found a simple workaround for that issue. I changed the line where htmldoc is called
escapeshellcmd($cmd);

to

passthru(escapeshellcmd($cmd));
--Fydel 09:10, 9 January 2009 (UTC)Reply
Hello, I´ve the same problem. The umlaut in the middle of the heading is correct. But at the beginning the umlaut is not visible. There is nothing.
--141.35.213.221 09:24, 19 August 2011 (UTC)Reply

Page Limit in PdfBook

[edit]

How many pages can be fetched using Extension:PdfBook??? Is there any limit for that??

Download snapshot

[edit]

The download snapshot for PDFbook doesn't seem to work.

I found a copy that is hosted at sourceforge as part of the install for Flowchartwiki and it has extras like the checkPDFbook stuff.

Is there somewhere to get the latest version of the whole thing, not just the pdfbook.php file?

Cheers.

ASHighlight

[edit]

The bug with ASHighlight is probably to do with the way that ASHighlight embeds the 'highlight' function's CSS stylesheet output. It's a while since I've done anything with ASHighlight, but I remember this part of it being a bit hacky. Hope this helps. Jdpipe 07:27, 24 March 2009 (UTC)Reply

One way to provide for "compatibility" between the two extensions is to "drop" <style> ... </style> tags before feeding htmldoc command. Could lead to some unexpected results ... but IWFM :

  • In PdfBook.php, there is a main loop for # Format the article(s) as a single HTML document with absolute URL's
  • just add there the following line among the other existing preg_replace control lines
$text    = preg_replace( '|<style(.+?)</style>|s', ' <!-- <nostyle/> -->', $text );                  # Style CSS hack

--Eric SalomĂŠ (@ctx.net) 22:49, 30 August 2010 (UTC)Reply

No such extension "PdfBook"

[edit]

Try to download it and get "No such extension "PdfBook" ". How can I get this extension? --Robinson Weijman 10:39, 19 June 2009 (UTC)Reply

You can download it from Subversion --Rius 14:20, 19 June 2009 (UTC)Reply

Thanks for the tip. I cannot find it - do you have a link? --Robinson Weijman 09:45, 22 June 2009 (UTC)Reply
I'm sorry, there is a link on this article page. --Robinson Weijman 09:48, 22 June 2009 (UTC)Reply
[edit]
"First Htmldoc needs to be installed [...].  Windows Binary can be found 
here (v1.8.24) [...]."

This link is dead (404)...

Hi, I updated the link. Cheers --kgh 19:12, 5 December 2009 (UTC)Reply

Italian charset

[edit]

I have a wiki in Italian and i have tried many charsets but i just can't find the correct one. The apostrophe gets turned into a Question mark anytime the pdf gets rendered. Modo D'Uso becomes Modo D?Uso. I also have a wiki in English and when i type can't it comes out can't. It's not a special character it's an apostrophe. I now it's something stupid but i don't know which charset to use or don't know a work around. I usually use iso8859-1 with no problems. I just can't wrap my head around it. Thank you for your help in advance.

I believe i have resolved the problem. I changed my charset to utf-8 and where the apostrophe is i replacing it with & acute ;(but with no spacing between the characters). It now comes out an apostrophe everytime.

[edit]

I created a parameter that can be defined in the wiki settings that will disable the printing of links when creating PDF Books. The parameter is $wgPDFBookIgnoreLink. By default it is set to false.

34,35c34,35
< 
< 	function PdfBook() {
---
> 	var $ignoreLinks = false;
> 	function PdfBook($ignoreLinks = false) {
44a45,46
> 		
> 		$this->ignoreLinks=$ignoreLinks;
53c55,56
< 
---
> 		global $wgPDFBookIgnoreLink;
> 		
128a132,135
> 					if($this->ignoreLinks){
> 						$text    = str_ireplace('<a','<span',$text);
> 						$text    = str_ireplace('</a>','</span>',$text);
> 					}
193c200,201
< 	$wgPdfBook = new PdfBook();
---
> 	global $wgPDFBookIgnoreLinks;
> 	$wgPdfBook = new PdfBook($wgPDFBookIgnoreLinks);

J.saterfiel 14:51, 16 September 2009 (UTC)Reply

The links (other than the TOC) in my PDFs refer back to the wiki, not to the location in the book. I want to preserve these internal links for online users. Is there a way to format the links to do that?

User:Dlpetry:DlPetry 05:41, 09 September 2011 (UTC)Reply

Permission denied

[edit]

I upgraded from mediawiki 1.12 to 1.15 and now I get invalid pdf files with this error inside when I open it with an editor(The Update Directory is set to images):


Warning:  fopen(/home/.../mediawiki/images/pdf-book4aba1672a5b66) [<a href='function.fopen'>function.fopen</a>]: failed to open stream: Permission denied in /home/.../mediawiki/extensions/PdfBook/PdfBook.php on line 146

Warning: fwrite(): supplied argument is not a valid stream resource in /home/.../mediawiki/extensions/PdfBook/PdfBook.php on line 147

Warning: fclose(): supplied argument is not a valid stream resource in /home/.../mediawiki/extensions/PdfBook/PdfBook.php on line 148


I set both the content of images and the PdfBook-Folder as executable with chmod 755 and they both have the same owner (root). Laquestianne, 23. September

No Images

[edit]

I have Version 1.14 and since I upgraded from 1.12 I have no images anymore in PDF´s. I tried allready the 777 on ./images but didn`t help

Any help on this ?

Still broken?

[edit]

I see a lot of people have problems with this extension, and I am one of them. Has anyone gotten a pdf that is longer than 3bytes long using mw1.15 and PdfBook (Version 1.0.3, 2008-12-09)? I see no php errors in my error log.


Problems with a few categories

[edit]

I have the problem, that the pdfbook can't create pdfs from all categories. For a few categories it works without problems and other categories don't work. For example I have a category Server, if i want to create a pdf of that category only a blank browserpage opens.

I don't know where the problem is, the categories are not very big (about 20 pages), there are no special characters in the category-title,it's not a problem with htmldoc,...

For a test I have created a new category with the name Servers. Then I put the content of Server in the new category Servers. The creation of pdf for Servers works fine. So it seems to me that's a problem with the name of the category and not with the content. Thank you for help!

Empty Pdf, PDFBook Problem

[edit]

Hi, I try to use the mediawiki and the pdfbook extension. I have put the extension to the extension folder and included in the LocalSettings.php. I have installed the htmldoc as well. I am using IIS 5.1. When I put the &action=pdfbook to the URL it creates only an empty pdf. What can be wrong? I have only installed the htmldoc, should I make more settings with it? Br, Zsolt

--Nwessel 08:57, 19 October 2010 (UTC) Please notice that PDFBook works with &action=pdfbook only on categories. When you want to create a pdf for a single page you need to add "&format=single".Reply

I am also having the same issue. Mediawiki 1.29 on Windows IIS and MySQL. pdf downloads as 0 byte blank file. I have tried adding &format=single but I continue to get the same result. Is there something that has to be configured with HTMLDOC that I am missing?

Valid fonts

[edit]

Can anyone give me the complete list of fonts supported for the $wgPdfBookFont setting? Is this dependent on htmldoc or on the system fonts?

I checked the fonts that htmldoc supports, and their FAQ said:

HTMLDOC 1.8.20 and higher support embedding of the base Type 1 fonts: Courier, Helvetica, Symbol, and Times. HTMLDOC does not currently allow embedding of arbitrary fonts specified by the HTML FONT element.

But then I see that the default setting for $wgPdfBookFont is Arial so I'm confused. I'm running MediaWiki on Ubuntu and I have installed the ttf-mscorefonts-installer package, however when I set $wgPdfBookFont to "Verdana" I get Times instead :(

Altered version with new features

[edit]

Some people have complianed about not being able to use the diff code I provided in earlier notes. Because this project isn't being updated I've put full docs on my user page (J.saterfiel) for the altered version. Here is the PdfBook.php I use on my mediawiki installation (1.14). It's a little more advanced than the one currently available.

List of new features:

  • Ability to remove links in the documents
  • Ability for a printed Category (collection of articles) to have a cover page with its Category Name and date created printed on it
  • Ability to have a "Download as PDF" link in the tool bar on any page without needing to explicitly place a link on a page you want to create pdfs on.
  • Ability to change the date format used on the header page (http://us3.php.net/manual/en/function.date.php)
  • Ability to change the information printed on each page header and footer(will need to lookup htmldoc http://www.htmldoc.org/ for more info on what the options are and once installed run htmldoc -help as the full options are not displayed on their website.)

--J.saterfiel 15:05, 27 May 2010 (UTC)Reply

Broken with MW 1.16

[edit]

I recently upgraded from MW 1.11 to 1.16 and I'm having some trouble with this extension. The issue is that pdf's of categories do not work properly. The pdf is created fine, but it uses name of first page in the category for each entry in the pdf unless you put a Heading 1 in the page. If you do put a heading 1 in the page, then it creates a page in between each page with name of the first page in the category.

Expected behavior (worked in MW 1.11), Fruit.pdf (Fruit is the name of the category):

  • Apples
    • (stuff about apples)
  • Bananas
    • (stuff about bananas)
  • Cantaloupe
    • (Stuff about cantaloupe)

Actual behavior, Fruit.pdf:

  • Apples
    • (stuff about apples, first page in the category)
  • Apples
    • (stuff about bananas, since bananas doesn't have a heading 1 saying bananas at the top of the page)
  • Apples
    • (completely blank page, since cantaloupe page below has a heading 1)
  • Cantaloupe
    • (stuff about cantaloupe, has heading 1 saying Cantaloupe)

Can anybody help?

This worked for me

I commented line 129

//$ttext = $wgOut->getHTMLTitle();

That worked, thanks!

line 124 for me

Math rendering is very ugly

[edit]

In pdfs, I'm getting some ugly rendering of mathematical expressions. Symbols are abnormally large, 3 times larger than normal text and resolution is poor. Is this normal? Is Pdfbook causing this? In the wiki they are rendered fine, it's just in pdfs that the problem occurs. For example, try:

<math>\langle T,\mu \rangle</math>

Running Pdfbook Version 1.0.4, 2010-01-05 Pgr94 08:39, 12 September 2010 (UTC)Reply

Nothing happens when action url entered

[edit]

I am running Media wiki 1.16.0 and PdfBook 1.0.4. and have htmldoc installed on server. I installed the extension in the extensions directory and I have added the require line in the Localsettings. but when I enter the url: mywiki.com/wiki/index.php/Category:Software_Documentation&action=pdfbook Nothing happens but the token wiki message "There is currently no text in this page." I tried a different browser, checked the apache error logs, and made sure the /images directory is writable to the web server. All of which gave me no errors or different response. Can someone please give me a push in the right direction here.... Thanks, Melissa

That's not the right URL. Take another look at the instructions and follow the syntax there more closely. —Emufarmers(T|C) 23:23, 3 February 2011 (UTC)Reply

You were right...

How to export all articles to a single file

[edit]

I can create PDF's containing a single category using the following url-call >> http://mywikibox/wikis/wiki_a/index.php?title=Category:CATNAME&action=pdfbook That’s fine so far in my MW 1.15.x setup using latest PdfBook trunk.

My question is simple: How to generate a PDF containing all articles of the entire wiki without creating a new category and adding all articles to that new category.

Problem exporting pdfbook: all category titles (chapters) are the same name

[edit]

See here for a solution. Cheers --[[kgh]] 16:15, 26 February 2011 (UTC)Reply

Patch filed: Error on single article with &action=pdfbook

[edit]

Ahoy,

you probably have this problem often: You add the &action=pdfbook GET parameter in your web browser, but you get an empty PDF file, because you have not selected a category, nor added the &format=single GET parameter. Annoying.

I don't want the &format=single GET parameter to be mandatory for single files. This extension can figure out, whether the selected page is a category page. So I added two lines and it is not necessary any more.

# svn diff
Index: PdfBook.php
===================================================================
--- PdfBook.php (Revision 82953)
+++ PdfBook.php (Arbeitskopie)
@@ -114,6 +114,8 @@
               $text = $wgParser->preprocess( $text, $title, $opt );
               if ( preg_match_all( "/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m", $text, $links ) ) 
                  foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link );
+              else
+                 $articles = array( $title );
            }   
         }   

VoilĂ , here we go. Best regards, --Mquintus 21:00, 28 February 2011 (UTC)Reply

[edit]

Does anyone know how I can add the pdfbook link on the navigation/sidebar, on the vector skin same as on the monobook skin?

Has anyone got any ideas?

Found it from above under the section 'Requests'

# Create toolbox link
$wgHooks['SkinTemplateToolboxEnd'][] = 'fnPDFBookLink';

function fnPDFBookLink( &$vector )
{
    global $wgMessageCache, $wgPdfBookMessages;
    foreach( $wgPdfBookMessages as $lang => $messages ) {
    	$wgMessageCache->addMessages( $messages, $lang );
    }
    $thispage = $vector->data['thispage']; // e.g. "Category:Wiki"
    $nsnumber = $vector->data['nsnumber']; // NS 14 is category

    if ( $nsnumber == 14 ){
	echo "\n\t\t\t\t<li><a href=\"./$thispage?action=pdfbook\">";
    	$vector->msg( 'pdf_book_link' );
	echo "</a></li>\n";
    }
    return true;
}
[edit]

I needed to follow one more link level inside, to complete the product manual. Its working on product page

101a102
>                                       $articles[] = $title;
103c104,113
<                                               foreach ( $links[1] as $link ) $articles[] = Title::newFromText( $link );
---
>                                               foreach ( $links[1] as $link ) {
>                                                       $articles[] = Title::newFromText( $link );
>                                                       $subarticle = new Article ( Title::newFromText( $link ) );
>                                                       $text2 = $subarticle->fetchContent();
>                                                       $text2 = $wgParser->preprocess( $text2, $title, $opt );
>                                                       if ( preg_match_all( '/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m', $text2, $links ) )
>                                                       foreach ( $links[1] as $link )  $articles[] = Title::newFromText( $link );
>                                               }
> 
> 

Regards --Edilsonjr

How to make PDF of All Pages?

[edit]

Tried adding code in MediaWiki v1.16.2 to the file: languages/messages/MessagesEn.php for the All pages variable and the link to download all as PDF appears. Unfortunately the All pages list are only tabled hyperlinks and not bulletted hyperlinks and no pdf generation occurs. Any workaround to have the all page to display as bulletted items and then make into a pdf book?



How to make output landscape instead of portrait?

[edit]

Any ideas about how to make the output landscape instead of portrait?... there is no setting to change this, so not too sure if it is possible?... any help? Thanks, Alan


two small changes of PdfBook.hooks.php serve this purpose
  • add after line 25: $pageorientation = $wgRequest->getText( 'pageorientation');
  • add after line 113: if ($pageorientation == "landscape") { $cmd .= " --landscape "; }
These changes offer a new option for calling pdfBook:
http://.../index.php?title=myTestPage&action=pdfbook&format=single&pageorientation=landscape
Kappa (talk) 14:36, 11 December 2012 (UTC)Reply

CSS for tables

[edit]

hello, can i anywhere add css for the table layout? The tables in the output have no boarder, no color, nothing. Is it posible to define a css file with the table layouts?

Icect

No. PdfBook uses HtmlDoc. The current stable version of HtmlDoc (1.8) does not support CSS. CSS support will be added to v1.9 (currently under development). Remco de Boer 11:05, 9 August 2011 (UTC)Reply


Hey, thank you very much. I´ve now use html attributs and it works. allways I miss the css "empty cells". now I´m looking forward to the new version. --141.35.213.221 09:57, 17 August 2011 (UTC)Reply

oldid of article is not taken into account

[edit]

When exporting an article to a PDF, the extensions always takes the newest revision instead of a specific one, e. g. an approved one, which is shown by default in my configuration. I've tried to submit the oldid as well in the url but it has no effect. Any suggestions?

Select namespaces are not being rendered

[edit]

I have 4 namespaces that belong in an category, only 1 of them is able to be "exported" to an PDF via the tool. When going into the other namespaces I do not get the option across the top to select it into the book. Any ideas?

Extension not available after installation

[edit]

Hello!

Tried to install the extension. I commited all the steps of the installation guide as follows:

- htmldoc installed, works properly - copied all files of the extension into the specified folder - localsettings.php edited.

But there is no link into the navigation to create a pdf or to add pages to pdf-job.

The extension PdfExport works fine, but pdfBook doesn't though the system-entry mentions it as an installed extension?


I've found that the method to generate a PDF depends on what kind of PDF you're trying to create (single page vs multiple pages). The easiest for me was to enable the "Print as PDF" tab - set $wgPdfBookTab = true; to enable this feature. Otherwise, I've had to add text to pages or create new pages to generate PDFs. Hope this helps! Becky

FlaggedRevs and PDFBook

[edit]

Hello, I have a Problem, I want to print the current flagged Version of my article. But PDFBook uses the last edit version (article.php does not provides any other functions)so I try to include the FlaggesRevs Classes in the PDFBook and became an error. I´ve no idea why it does not work.

Someone here who know the problem? I will use the FlaggedRevs Classes to use the last flagged Version.

--141.35.213.221 09:52, 16 September 2011 (UTC)Reply

Slightly different version

[edit]

Excuse my lack of knowledge in how to update Wiki correctly, but I'm editing Boldly, so...

Some of the comments above revolve around:

  • Not being able to use this extension on a historical document
  • Other extensions not resolving

I've done a lot of work on this extension to modify it for my needs, under v1.16. It also includes a lot of the earlier comments and solutions included in this version. And of course resolves the two issues I listed above.

My version has a lot of bespoke coding (strongly formatting documents based on their name, for example), so it's not reasonable to push all that into the mainstream.

So, for those who need it, the code is below.

<?php
/**
 * PdfBook extension
 * - Composes a book from articles in a category and exports as a PDF book
 *
 * See http://www.mediawiki.org/Extension:PdfBook for installation and usage details
 * See http://www.organicdesign.co.nz/Extension_talk:PdfBook for development notes and disucssion
 *
 * Started: 2007-08-08
 * 
 * @package MediaWiki
 * @subpackage Extensions
 * @author Aran Dunkley [http://www.organicdesign.co.nz/nad User:Nad]
 * @copyright Š 2007 Aran Dunkley
 * @licence GNU General Public Licence 2.0 or later
 *
 */
if (!defined('MEDIAWIKI')) die('Not an entry point.');

define('PDFBOOK_VERSION', '1.0.3, 2008-12-09');

$wgExtensionFunctions[]        = 'wfSetupPdfBook';
$wgHooks['LanguageGetMagic'][] = 'wfPdfBookLanguageGetMagic';

$wgExtensionCredits['parserhook'][] = array(
	'name'	      => 'PdfBook',
	'author'      => '[http://www.organicdesign.co.nz/nad User:Nad]',
	'description' => 'Composes a book from articles in a category and exports as a PDF book',
	'url'	      => 'http://www.mediawiki.org/wiki/Extension:PdfBook',
	'version'     => PDFBOOK_VERSION
	);

class PdfBook {

	function PdfBook() {
		global $wgHooks, $wgParser, $wgPdfBookMagic;
		global $wgLogTypes, $wgLogNames, $wgLogHeaders, $wgLogActions;
		$wgHooks['UnknownAction'][] = $this;

		# Add a new pdf log type
		$wgLogTypes[]             = 'pdf';
		$wgLogNames  ['pdf']      = 'pdflogpage';
		$wgLogHeaders['pdf']      = 'pdflogpagetext';
		$wgLogActions['pdf/book'] = 'pdflogentry';
	}

	/**
	 * Perform the export operation
	 */
	function onUnknownAction($action, $article) {
		global $wgOut, $wgUser, $wgTitle, $wgParser, $wgRequest;
		global $wgServer, $wgArticlePath, $wgScriptPath, $wgUploadPath, $wgUploadDirectory, $wgScript;

		if ($action == 'pdfbook') {

			$title = $article->getTitle();
			$opt = ParserOptions::newFromUser($wgUser);
			$oldpage = $wgRequest->getText('oldid');

			# Log the export
			$msg = $wgUser->getUserPage()->getPrefixedText().' exported as a PDF book';
			$log = new LogPage('pdf', false);
			$log->addEntry('book', $wgTitle, $msg);

			# Initialise PDF variables
			$format  = $wgRequest->getText('format');
            
            # setting the format depending on the document title. 
            if (substr($title->getText(), 0, 2) == "FS") $format = 'singlebook';
            if (substr($title->getText(), 0, 3) == "REQ") $format = 'singlebook';
            if (substr($title, 0, 9) == 'Category:') {
                $format = 'book';
                $oldpage=0;
            }
            # EOC
			$notitle = $wgRequest->getText('notitle');
			$layout  = $format == 'single' ? '--webpage' : '--firstpage c1';
            if ($format == 'singlebook') $layout = ' ';
			//$layout  = $format == 'single' ? ' ' : '--firstpage c1';
			$charset = $this->setProperty('Charset',     'iso-8859-1');
			$left    = $this->setProperty('LeftMargin',  '1cm');
			$right   = $this->setProperty('RightMargin', '1cm');
			$top     = $this->setProperty('TopMargin',   '1cm');
			$bottom  = $this->setProperty('BottomMargin','1cm');
			$font    = $this->setProperty('Font',	     'Arial');
			$size    = $this->setProperty('FontSize',    '8');
			$linkcol = $this->setProperty('LinkColour',  '217A28');
			$levels  = $this->setProperty('TocLevels',   '2');
			$exclude = $this->setProperty('Exclude',     array());
			$width   = $this->setProperty('Width',       '');
			$width   = $width ? "--browserwidth $width" : '';
			if (!is_array($exclude)) $exclude = split('\\s*,\\s*', $exclude);
 
			# Select articles from members if a category or links in content if not
			if ($format == 'single' || $format == 'singlebook') $articles = array($title);
			else {
				$articles = array();
				if ($title->getNamespace() == NS_CATEGORY) {
					$db     = wfGetDB(DB_SLAVE);
					$cat    = $db->addQuotes($title->getDBkey());
					$result = $db->select(
						'categorylinks',
						'cl_from',
						"cl_to = $cat",
						'PdfBook',
						array('ORDER BY' => 'cl_sortkey')
					);
					if ($result instanceof ResultWrapper) $result = $result->result;
					while ($row = $db->fetchRow($result)) $articles[] = Title::newFromID($row[0]);
				}
				else {
					$text = $article->fetchContent();
					$text = $wgParser->preprocess($text, $title, $opt);
					if (preg_match_all('/^\\*\\s*\\[{2}\\s*([^\\|\\]]+)\\s*.*?\\]{2}/m', $text, $links))
						foreach ($links[1] as $link) $articles[] = Title::newFromText($link);
				}
			}

			# Format the article(s) as a single HTML document with absolute URL's
			$book = $title->getText();
			$html = '';
			$titlehtml = '';
            $titledone = 0;
			$wgArticlePath = $wgServer.$wgArticlePath;
			$wgScriptPath  = $wgServer.$wgScriptPath;
			$wgUploadPath  = $wgServer.$wgUploadPath;
			$wgScript      = $wgServer.$wgScript;
            # Output some basic metadata for HTMLDOC:
            $html  = "<html><head>";
            #if (substr($book, 0, 3) != "EST" && substr($book, 0, 2) != "FS") {
            #if (substr($book, 0, 3) != "EST") {
                $html .= "<title>$book</title>";
            #}
			foreach ($articles as $title) {
				$ttext = $title->getPrefixedText();
				if (!in_array($ttext, $exclude)) {
                    $article = new Article($title);
                    $text    = $article->fetchContent(strlen($oldpage) == 0 ? 0 : $oldpage);
                    $text    = preg_replace('/<!--([^@]+?)-->/s', '@@'.'@@$1@@'.'@@', $text); # preserve HTML comments
                    if ($format != 'single' && $format != 'singlebook') $text .= '__NOTOC__';
                    $opt->setEditSection(false);    # remove section-edit links
                    $wgOut->setHTMLTitle($ttext);   # use this so DISPLAYTITLE magic works
                    $text    = $wgParser->preprocess($text, $title, $opt, strlen($oldpage) == 0 ? 0 : $oldpage);
                    $out     = $wgParser->parse($text, $title, $opt, true, true);
                    //$ttext   = $wgOut->getHTMLTitle();
                    $text    = $out->getText();
                    $text    = preg_replace('|(<img[^>]+?src=")(/.+?>)|', "$1$wgServer$2", $text);       # make image urls absolute
                    $text    = preg_replace('|<div\s*class=[\'"]?noprint["\']?>.+?</div>|s', '', $text); # non-printable areas
                    $text    = preg_replace('|@{4}([^@]+?)@{4}|s', '<!--$1-->', $text);                  # HTML comments hack
                    #$text    = preg_replace('|<table|', '<table border borderwidth=2 cellpadding=3 cellspacing=0', $text);
                    // Ignore Links code
                    $text    = preg_replace('|<a|','<span',$text);
                    $text    = preg_replace('|</a>|','</span>',$text);
                    // EOC //
                    $ttext   = basename($ttext);
                    $h1      = $notitle ? '' : "<h1>$ttext</h1>";
                    if (strpos($text,'<!-- TOC -->') !== FALSE) {
                        $titlehtml = utf8_decode(substr($text, 0, strpos($text,'<!-- TOC -->')));
                        $text = utf8_decode(substr($text, strpos($text,'<!-- TOC -->') + 12));
                        $h1 = '';
                    }
                    
                    if ($format != 'single' && $format != 'singlebook' && $titledone == 0) {
                        $titlehtml   = utf8_decode("$text\n");
                        $titledone = 1;
                    } else {
                        if (stripos($ttext, "appendix") == true) {
                            $html   .= utf8_decode("$text\n");
                        } else {
                            if (substr($book, 0, 3) == "EST" || substr($book, 0, 2) == "FS" || substr($book, 0, 3) == "REQ") {
                                $html   .= utf8_decode("$text\n");
                            } else {
                                $html   .= utf8_decode("$h1$text\n");
                            }
                        }
                    }
				}
			}
            # Finish off the basic HTML for the production
            $html .= "</body></html>";

			# If format=html in query-string, return html content directly
			if ($format == 'html') {
				$wgOut->disable();
				header("Content-Type: text/html");
				header("Content-Disposition: attachment; filename=\"$book.html\"");
				print $titlehtml.$html;
			}
			else {
				# Write the HTML to a tmp file
				$titlefile = "$wgUploadDirectory/".uniqid('pdf-book');
				$tfh = fopen($titlefile, 'w+');
				fwrite($tfh, $titlehtml);
				fclose($tfh);
				$file = "$wgUploadDirectory/".uniqid('pdf-book');
				$fh = fopen($file, 'w+');
				fwrite($fh, $html);
				fclose($fh);

				$footer = $format == 'single' ? '...' : '../';
                //$footer = '../';
				$header = $format == 'single' ? '...' : '..t';
				$toc    = $format == 'single' ? '' : " --toclevels $levels --toctitle \"Contents\" --tocheader $header --tocfooter ..i";
				//$toc    = " --toclevels $levels --toctitle \"Contents\" --tocheader $header --tocfooter ..i";

                $cmd  = " --book";
                $cmd .= " --links --linkstyle plain --linkcolor $linkcol";
				$cmd .= " --title --titlefile $titlefile";
                $cmd .= " --size A4 --numbered";
                $cmd .= " --left $left --right $right --top $top --bottom $bottom ";
                $cmd .= " --header $header --header1 $header --footer $footer --nup 1";
                $cmd .= "$toc";
                $cmd .= " --portrait --color --no-pscommands --no-xrxcomments --compression=9";
                $cmd .= " --jpeg=75 --fontsize $size --fontspacing 1.1 --headingfont $font --bodyfont $font";
                $cmd .= " --headfootsize $size --headfootfont $font --charset $charset";
                $cmd .= " --no-embedfonts --pagemode document --pagelayout single $layout";
                $cmd .= " --permissions all"; 
                $cmd .= " --browserwidth 680 --no-strict --no-overflow";
				$cmd  = "htmldoc -t pdf14 $cmd $file";
				# Send the file to the client via htmldoc converter
				$wgOut->disable();
				header("Content-Type: application/pdf");
				header("Content-Disposition: attachment; filename=\"$book.pdf\"");
				putenv("HTMLDOC_NOCGI=1");
				passthru($cmd);
				@unlink($file);
				@unlink($titlefile);
			}
			return false;
		}
	
		return true;
	}

	/**
	 * Return a property for htmldoc using global, request or passed default
	 */
	function setProperty($name, $default) {
		global $wgRequest;
		if ($wgRequest->getText("pdf$name"))   return $wgRequest->getText("pdf$name");
		if (isset($GLOBALS["wgPdfBook$name"])) return $GLOBALS["wgPdfBook$name"];
		return $default;
	}

	/**
	 * Needed in some versions to prevent Special:Version from breaking
	 */
	function __toString() { return 'PdfBook'; }
}

/**
 * Called from $wgExtensionFunctions array when initialising extensions
 */
function wfSetupPdfBook() {
	global $wgPdfBook;
	$wgPdfBook = new PdfBook();
}

/**
 * Needed in MediaWiki >1.8.0 for magic word hooks to work properly
 */
function wfPdfBookLanguageGetMagic(&$magicWords, $langCode = 0) {
	global $wgPdfBookMagic;
	$magicWords[$wgPdfBookMagic] = array($langCode, $wgPdfBookMagic);
	return true;
}

//Add on for link to print on the tool bar menu
$wgHooks['SkinTemplateBuildNavUrlsNav_urlsAfterPermalink'][] = 'wfSpecialPdfNav';
$wgHooks['SkinTemplateToolboxEnd'][] = 'wfSpecialPdfToolbox';
 
function wfSpecialPdfNav( &$skintemplate, &$nav_urls, &$oldid, &$revid ) {
        $nav_urls['pdfprint'] = array(
                        'text' => 'Download as PDF',
                        'href' => $nav_urls['href'].'?action=pdfbook&format=single&notitle&oldid='.$oldid
                );
        return true;
}
 
function wfSpecialPdfToolbox( &$monobook ) {
        if ( isset( $monobook->data['nav_urls']['pdfprint'] ) )
                if ( $monobook->data['nav_urls']['pdfprint']['href'] == '' ) {
                        ?><li id="t-ispdf"><?php htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['text'] ); ?></li><?php
                } else {
                        ?><li id="t-pdf"><?php
                                ?><a href="<?php echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['href'] ) ?>"><?php
                                        echo htmlspecialchars( $monobook->data['nav_urls']['pdfprint']['text'] );
                                ?></a><?php
                        ?></li><?php
                }
        return true;
}

--217.158.90.2 12:56, 31 January 2012 (UTC)Reply

Corrupt PDF file when opened contains error message

[edit]

The PDF file created from a page actually contains this:

HTMLDOC Version 1.8.27 Copyright 1997-2006 Easy Software Products, All Rights Reserved.
This software is based in part on the work of the Independent JPEG Group.

ERROR: No HTML files!

Usage:
  htmldoc [options] filename1.html [ ... filenameN.html ]

Options:

  --batch filename.book
  --bodycolor color
  --bodyfont {courier,helvetica,monospace,sans,serif,times}
  --bodyimage filename.{bmp,gif,jpg,png}
  --book
…

Any ideas how to proceed with this?

Thanks,

Gareth.

Possible Solutions

[edit]

Fix variable quoting

[edit]

I am running mediawiki on Windows (via a Bitnami installation) and the default path contains a space (the "Program Files" part). PDFBook does not quote the path when executing the htmldoc command and as such the command line is not parsed correctly and the above error is what appears in the PDF file. The solution is to make line 73 in PDFBook.php:

	$cmd  = "htmldoc -t pdf --charset $charset $cmd $file";

Look like this:

	$cmd  = "htmldoc -t pdf --charset $charset $cmd \"$file\"";

Fix SELinux labels

[edit]

I have just installed PdfBook on a Centos 7 machine, and got the same error. In my case the reason was SELinux preventing httpd from writing to the images directory. Fixing the labels for /path/to/mediawiki/install/images(/.*)? as described in SELinux fixed the issue for me. --217.253.60.186 23:52, 19 January 2016 (UTC)Reply

$wgPdfBookFormat

[edit]

I wanted to control in LocalSettings.php whether to have format=single or not. This is what I came up with (add $wgPdfBookFormat = "single"; to your LocalSettings.php or not), but it leaks memory, maybe someone with actual PHP knowledge has a better idea. Thanks, 67.164.57.135 04:18, 8 June 2012 (UTC)Reply

--- PdfBook-svn/PdfBook.hooks.php.orig  2012-06-07 23:59:31.142185353 +0200
+++ PdfBook-svn/PdfBook.hooks.php       2012-06-08 06:05:41.806193614 +0200
@@ -143,12 +143,14 @@ class PdfBookHooks {
         */
        public static function onSkinTemplateTabs( $skin, &$actions) {
                global $wgPdfBookTab;
+               global $wgPdfBookFormat;
+               if ( $wgPdfBookFormat == "single" ) { $format="&format=single"; } else { $format=""; }
 
                if ( $wgPdfBookTab ) {
                        $actions['pdfbook'] = array(
                                'class' => false,
                                'text' => wfMsg( 'pdfbook-action' ),
-                               'href' => $skin->getTitle()->getLocalURL( "action=pdfbook&format=single" ),
+                               'href' => $skin->getTitle()->getLocalURL( "action=pdfbook$format" ),
                        );
                }
                return true;
@@ -160,12 +162,14 @@ class PdfBookHooks {
         */
        public static function onSkinTemplateNavigation( $skin, &$actions ) {
                global $wgPdfBookTab;
+               global $wgPdfBookFormat;
+               if ( $wgPdfBookFormat == "single" ) { $format="&format=single"; } else { $format=""; }
 
                if ( $wgPdfBookTab ) {
                        $actions['views']['pdfbook'] = array(
                                'class' => false,
                                'text' => wfMsg( 'pdfbook-action' ),
-                               'href' => $skin->getTitle()->getLocalURL( "action=pdfbook&format=single" ),
+                               'href' => $skin->getTitle()->getLocalURL( "action=pdfbook$format" ),
                        );
                }
                return true;

htmldoc binaries location

[edit]

I installed htmldoc on Debian 6 up-to-date but I didn't have any binairies in /usr/local/bin as exposed in the setting command but in /usr/bin. It works since I fixed the path but I'm not sure I did the right thing.

Shimegi (talk) 07:03, 3 July 2012 (UTC)Reply

This is correct. Different distributions use different paths --Pastakhov (talk) 07:26, 3 July 2012 (UTC)Reply

Header level incorrect

[edit]

When I'm exporting a category, I have a problem:

There are pages A and B in the category.

Pages A and B each have two levels of headers.

What I get in the resulting PDF is:

1) Title of Page A
2) Header Level 1 of Page A
2.1) Header Level 2 of Page A
3) Another Header Level 1 of Page A
4) Title of Page B
5) Header Level 1 of Page B
6) Another Header Level 1 of Page B
6.1) Header Level 2 of Page B

whereas the more correct result IMHO would be:

1) Title of Page A
1.1) Header Level 1 of Page A
1.1.1) Header Level 2 of Page A
1.2) Another Header Level 1 of Page A
2) Title of Page B
2.1) Header Level 1 of Page B
2.2) Another Header Level 1 of Page B
...
2.2.1) Header Level 2 of Page B

You get the idea...

Attempt to fix

[edit]

This fix increments the level of each header in a document by one (in a plain way).

 foreach ( $articles as $title ) {
        $ttext = $title->getPrefixedText();
        if ( !in_array( $ttext, $exclude ) ) {
                $article = new Article( $title );
                $text    = $article->fetchContent();
                $text    = preg_replace( '/<!--([^@]+?)-->/s', '@@'.'@@$1@@'.'@@', $text ); # preserve HTML comments
                if ( $format != 'single' ) $text .= '__NOTOC__';
                $opt->setEditSection( false );    # remove section-edit links
                $wgOut->setHTMLTitle( $ttext );   # use this so DISPLAYTITLE magic works
                $out     = $wgParser->parse( $text, $title, $opt, true, true );
                $ttext   = $wgOut->getHTMLTitle();
                $text    = $out->getText();
                $text    = preg_replace( '|(<img[^>]+?src=")(/.+?>)|', "$1$wgServer$2", $text );       # make image urls absolute
                $text    = preg_replace( '|<div\s*class=[\'"]?noprint["\']?>.+?</div>|s', '', $text ); # non-printable areas
                $text    = preg_replace( '|@{4}([^@]+?)@{4}|s', '<!--$1-->', $text );                  # HTML comments hack
                $text    = preg_replace('|<table|', '<table border borderwidth=2 cellpadding=3 cellspacing=0', $text);
        # JM 2012-07-26
                $text = preg_replace('/<h5/', '<h6', $text);
                $text = preg_replace('/<h4/', '<h5', $text);
                $text = preg_replace('/<h3/', '<h4', $text);
                $text = preg_replace('/<h2/', '<h3', $text);
                $text = preg_replace('/<h1/', '<h2', $text);
                $text = preg_replace('|</h5|', '</h6', $text);
                $text = preg_replace('|</h4|', '</h5', $text);
                $text = preg_replace('|</h3|', '</h4', $text);
                $text = preg_replace('|</h2|', '</h3', $text);
                $text = preg_replace('|</h1|', '</h2', $text);
                }
        # end JM
                $ttext   = basename($ttext);
                $h1      = $notitle ? '' : "<center><h1>$ttext</h1></center>";
                $html   .= utf8_decode("$h1$text\n");
        }
 }

Note the preg_replaces in-between the comments.

BTW if you'd like to keep the page breaks you can replace

 $text = preg_replace('/<h1/', '<h2', $text);

by

 $text = preg_replace('/<h1/', '<!-- NEW PAGE --><h2', $text);

Problems with Htmldoc (1.9.0)

[edit]

Hello,

With new version of htmldoc (as the one used by archlinux), I got all my content on one single Line (I mean all content of all pages on only one line !). To fix this, I've added, in phpbook.hook.php a "<body>" tag for the html content.

$html = "<body>".$html."</body>";

I hope it may help other users !


Mathieu

Is this extension actively maintained?

[edit]

I'm interested in porting this extension to use the wkhtmltopdf library, which does a superior job of HTML>PDF than HtmlDoc. Has anybody already done any work on this? Is the extension being actively maintained and developed? Andrujhon (talk)

Maintainance

[edit]

Aran Dunkley seems to be off to Brazil - I just sent him a Facebook message to find out whether he's going to maintain the pdfbook extension. There seem to be a few modified versions out there already e.g.

Please add any other link. I'm volunteering to setup a new maintained version on github if Aran doesn't want to continue to work on the svn version. -- Seppl2013 (talk) 15:54, 22 September 2013 (UTC)Reply


Got it working!

[edit]

I somehow got it working..

I'm not exactly sure which steps are the ones that helped, so here are some things I did:

  1. I did not use the .php files listed here. Instead, I used the ones User:Seppl20 pointed to at github.
  2. Installed HTMLDOC into the Apache (I'm using XAMPP) cgi-bin folder.
  3. I don't know anything about Apache PATH whatevers, so I tried to follow the instructions here to open Apache's path.
  4. Finally, I tried following HTMLDOC manual instructions for a little bit but I kept getting stuck with an empty .pdf. I went to my host computer and tried running the same change to the url (?title=Category:blah&action=pdfbook) and found that I was getting the missing LIBEAY32.DLL error.
  5. Blindly copied/pasted the htmldoc.exe, LIBEAY32.DLL, MSVCR.dll, SSLEAT32.DLL files into every possible directory from the server down to the wiki. I can't be certain which folder was the one that needed it. I want to say it was in /htdocs/mywiki, but again... blind pasting everywhere.

I read a lot of comments about about commenting out lines or adding in small amounts of text, but my final product did not entail any of that.

Good luck!

Hollymollybobolly (talk) 22:35, 20 December 2013 (UTC)Reply

Set href format=single if page is not a category (wgPdfBookTab = true)

[edit]

I modified the code that the format=single is used if the page is not a category.

+++ PdfBook/PdfBook.hooks.php   2014-03-19 11:19:05.000000000 +0100
@@ -162,11 +162,20 @@
                global $wgPdfBookTab;

                if ( $wgPdfBookTab ) {
-                       $actions['views']['pdfbook'] = array(
-                               'class' => false,
-                               'text' => wfMsg( 'pdfbook-action' ),
-                               'href' => $skin->getTitle()->getLocalURL( "action=pdfbook&format=single" ),
-                       );
+                       if ( $skin->getTitle()->isContentPage() ) {
+                               $actions['views']['pdfbook'] = array(
+                                       'class' => false,
+                                       'text' => wfMsg( 'pdfbook-action' ),
+                                       'href' => $skin->getTitle()->getLocalURL( "action=pdfbook&format=single" ),
+                               );
+                       }
+                       else {
+                               $actions['views']['pdfbook'] = array(
+                                       'class' => false,
+                                       'text' => wfMsg( 'pdfbook-action' ),
+                                       'href' => $skin->getTitle()->getLocalURL( "action=pdfbook" ),
+                               );
+                       }
                }
                return true;
        }                                                    

Compatability with MediaWiki v1.23

[edit]

I'm currently on branch wmf/1.23wmf19 and I found the headings (h2, h3, etc) wouldn't show up correctly in the PDF. It turns out that the headings now contain some extra span tags that don't go well together with HTMLDOC. I wrote a hack in Perl to remove the unwanted elements. My PHP isn't up to par to implement it well in PdfBook so I hope someone will find this useful and implement it properly. Here are the Perl regexp's I used.

$content =~ s/<span class="mw-headline".*?>(.*?)<\/span>/$1/g;
$content =~ s/<span class="mw-editsection-bracket">(.*?)<\/span>//g;
$content =~ s/<span class="mw-editsection">(.*?)<\/span>//g;

I`ve added in phpbook.hooks.php

$text = preg_replace( '/<span class="mw-headline" id="(.*?)">(.*?)<\/span>/', "$2", $text );<code> after <code>$text    = preg_replace( "|<div\s*class=['\"]?noprint[\"']?>.+?</div>|s", "", $text );     // non-printable areas

How hidding some items identified by css id or class when exporting to pdf ?

[edit]

I would like that the table of contents (of the wiki page) and some table identified by class "wikitable" dont appear in the pdf file.

Thanks,

Cheers

Nicolas NALLET (talk) 13:49, 23 April 2014 (UTC)Reply

[edit]

$wgMessageCache is deprecated since 1.18. (the solution shown in a previous topic is no longer working)

So how would I add a link to a PDF version in the sidebar of a current Wiki version?

Preferably the link should be dynamic, so that for single pages it would contain "&format=single", whereas for categories it need not.

Downloadable example PDF for demonstration?

[edit]

I would like to have a pdf-file or two downloadable from the description page for this extension. If one gets a view from any selected category on the one side and the view of the resulting pdf on the other side, one has it much easier to get an idea how it works and what one can expect it to achieve. Maybe simple example with about 10 pages and a bigger one with sub-categories, pictures, tables and about 100 pages? --Manorainjan (talk) 14:46, 2 November 2014 (UTC)Reply

Missing $wgOut = setHTMLtitle-line in pdfbook.php

[edit]

According to http://www.mediawiki.org/wiki/Extension:PdfBook you should edit line 122 of PdfBook.php but my PdfBook.php (downloaded from the snapshot for release 1.23) has only 48 lines. Downloading for other branches give the same.

Also noticed that version 1.1.0 is shown under Special:Version whereas the above mentioned link states that the latest release is 1.2.3. Has there been some hacking done?

At least I got it working now.

How to resize images in MediaWiki 1.24 with PdfBook 1.1.0, 2014-04-01

[edit]

I have downloaded PdfBook v1.1.0 and created a PDF of a single page with a large image on it. The image is not resized to fit the PDF and is cut off. I tried following these instructions: http://www.mediawiki.org/wiki/Extension_talk:PdfBook#Hacks_to_change_PDF_output_.28v._0.6.29

but updating PdfBook.hooks.php instead of PdfBook.php as this appears to be since version 0.6 of the extension. However, it doesn't work. If I include the full line:

   $cmdext =  " --browserwidth 800 --titlefile $wgUploadDirectory/PDFBook.html";

the PDF cannot be opened and gives an error that it is either not supported or corrupted. If I set the line to:

   $cmdext =  " --browserwidth 800";

the PDF generates, but the image is not resized. Is there a way to have images resized in version 1.1.0?

Fix for the image problem

[edit]

Generally, all the images look larger than they are in the wiki page, and there are images that don't even fit. There is a third problem, all the images have an approx. 40px indent, so they are not in line with the text. Here is my fix, that I've been using for a long time:


Put the following code between these lines:

113:     $text = preg_replace( "|(<img[^>]+?src=\"$imgpath)(/.+?>)|", "<img src=\"$wgUploadDirectory$2", $text );
                        ....fix goes here....
114: }
115: if( $nothumbs == 'true' ) $text = preg_replace( "|images/thumb/(\w+/\w+/[\w\.\-]+).*\"|", "images/$1\"", $text );


And here is the fix:

//If any image widther than this, the image will be resized to this size
$max_image_size=650;

//By default, every image in the PDF has a 40px indent. If this is true, the images will be 
//in line with the text
$remove_image_indent=true;

//Generally, the images in the PDF are larger than it was in the wiki page. If this flag is true, 
//all the images, that width weren't modified due to it was larger than 'max_image_size' will be resized 
//with the percentage defined in 'resize_percentage' variable. 
$resize_small_images=false;
$resize_percentage=80;


if ($remove_image_indent) {					
   $text    = preg_replace( "|<dl><dd>(<a .+><img.+\/><\/a>)<\/dd><\/dl>|", "$1", $text );
}

if ( preg_match_all("/<img[^>]+?width=\"([0-9]+)\"/", $text, $matches)) {

    for ($i = 0; $i < count($matches[1]); $i++) {
       
	$w=$matches[1][$i];

	if (preg_match("/<img[^>]+?width=\"$w\"\s+height=\"([0-9]+)\"/", $text, $matches2)) {
		$h=$matches2[1];

		if ($w > $max_image_size) {
			$w2=$max_image_size;
			$h2=round($h*($w2/$w));
			$text    = preg_replace( "|width=\"$w\"\s+height=\"$h\"|", "width=\"$w2\" height=\"$h2\"", $text );
		} else if ($resize_small_images) { 
			$w2=$w*($resize_percentage/100);
			$h2=round($h*($w2/$w));	
			$text    = preg_replace( "|width=\"$w\"\s+height=\"$h\"|", "width=\"$w2\" height=\"$h2\"", $text );
		}
	}
    }
}

Can't Generate PDF From Category (MediaWiki 1.24 with PdfBook 1.1.0)

[edit]

I tried to generate a PDF using a category as per this example:

   http://www.foo.bar/wiki/index.php?title=Category:foo&action=pdfbook

but the PDF couldn't be opened with an error that the file isn't supported or is corrupted.

This is the log output for the request

   Start request GET /index.php/?title=Category:My%20Category&action=pdfbook
   HTTP HEADERS:
   HOST: xxx
   USER-AGENT: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0
   ACCEPT: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
   ACCEPT-LANGUAGE: en-US,en;q=0.5
   ACCEPT-ENCODING: gzip, deflate
   COOKIE: wikiUserName=Admin; base-viewstate=opened; flexiskin-viewstate=opened; wiki_session=ac690952f6cac677a345f2168db15abc; wikiUserID=1
   VIA: 1.1 xxx:8080 (squid/2.6.STABLE21)
   X-FORWARDED-FOR: xxx, xxx
   CACHE-CONTROL: max-age=259200
   CONNECTION: keep-alive
   [caches] main: EmptyBagOStuff, message: SqlBagOStuff, parser: SqlBagOStuff
   [caches] LocalisationCache: using store LCStoreDB
   Fully initialised
   Connected to database 0 at localhost
   Connected to database 0 at localhost
   MessageCache::load: Loading en... got from global cache
   Title::getRestrictionTypes: applicable restrictions to Main Page are {edit,move}
   [ContentHandler] Created handler for wikitext: WikitextContentHandler
   User: cache miss for user 1
   User: loading options for user 1 from database.
   User: logged in from session
   Unstubbing $wgLang on call of $wgLang::_unstub from ParserOptions::__construct
   User: loading options for user 1 from override cache.
   [deprecated] Use of wfMsg was deprecated in MediaWiki 1.21. [Called from PdfBookHooks::onUnknownAction in /var/www/mediawiki-1.24.1/extensions/PdfBook/PdfBook.hooks.php at line 18]
   [deprecated] Use of wfMsgReal was deprecated in MediaWiki 1.21. [Called from wfMsg in /var/www/mediawiki-1.24.1/includes/GlobalFunctions.php at line 1479]
   [deprecated] Use of wfMsgGetKey was deprecated in MediaWiki 1.21. [Called from wfMsgReal in /var/www/mediawiki-1.24.1/includes/GlobalFunctions.php at line 1577]
   DatabaseBase::query: Writes done: INSERT INTO `logging` (log_id,log_type,log_action,log_timestamp,log_user,log_user_text,log_namespace,log_title,log_page,log_comment,log_params) VALUES (NULL,'X')
   Unstubbing $wgParser on call of $wgParser::preprocess from PdfBookHooks::onUnknownAction
   Parser: using preprocessor: Preprocessor_DOM
   LoadBalancer::reuseConnection: this connection was not opened as a foreign connection
   Request ended normally

Fixed Itself

[edit]

I tried again, and this time the PDF generated properly. I didn't make any changes to the extension config, so not sure what happened.

How it worked for me

[edit]

Hi eveybody; I post this topic because this extension is the only of the kind actually working on my WAMP server. This is how :

I use a mediawiki 1.25.2 on a local server (as personal wiki) with Wampserver on Windows 10.
installed htmldoc 1.8.28, binary from www.paehl.com according to his setup.txt included in the download.
note : you must install in c:\Program Files\HTMLDOC because this path in harcoded somewhere.
installed PdfBook 1.1.0
set $wgPdfBookFont = "Times New Roman"; otherwise some characters (eg single quotes) are not rendered

And it works well, even for category export. Images are rendered inline regardless of their position in wiki text. Thanks to the developpers Phcalle (talk) 11:38, 5 April 2016 (UTC)Reply

[edit]

It is written in Extension:PdfBook as : "In order to include Fullurl parser function link automatically to every category page, add it to the Mediawiki:Categoryarticlecount page"

I tried this . But didnt work.

How to include parser function link automatically to every category page to display the download link? How to achieve that?

Headings not printed to PDF

[edit]

I'm running mediawiki 1.25 on a linux VM. I installed htmldoc-1.8.28-4.el7.x86_64.rpm. Put PdfBook in the extension folder and updated LocalSettings.php with the line of code specified in the installation instructions. However, when I print a PDF, all the headings are omitted. Anything text within <h></h> tags is left out. The text from within the headings does appear in the Table of Contents but that's the only place it is visible.

Does anyone have any troubleshooting ideas for this problem?

Look

[edit]

https://www.mediawiki.org/w/index.php?title=Extension_talk:PdfBook&section=92#Compatability_with_MediaWiki_v1.23 --217.247.179.145 17:38, 23 November 2016 (UTC)Reply

Blank page/Error 500 when exporting

[edit]

I have an issue when trying to export (whether from a category or a single page) : the page is totally blank on Mozilla, and returns an error 500 on IE. I installed the prerequisite HTMLDOC and the extension again, and I still have the problem. The Mediawiki version is 1.19, on a red hat OS. Has anyone ever met this problem please ?

Hi, an HTTP 500 error ("internal server error") will usually leave some info in your web server logs. You may find something like a PHP stack trace there that provides more information about the cause of the error. --Remco de Boer 18:52, 8 October 2016 (UTC)Reply

Thanks a lot. We have this apache error

[Mon Oct 10 14:11:29 2016] [error] [client <ip>] PHP Fatal error:  Call to undefined method WikiPage::getContent() in <apache_URL>/extensions/PdfBook/PdfBook.hooks.php on line 74, referer: https://<server>/<wiki>/index.php/Accueil

I finally succeed by using an older version of PdfBook. No more PHP error.

Article limit ?

[edit]

Hello, I just installed the PdfBook extension on my Mediawiki, and it globaly works fine (export only one article, export articles from a category...). But when I tried to export a greatest category (over 180 articles approximately), I get this error : Fatal error: Maximum execution time of 30 seconds exceeded in <Wiki_directory>/includes/parser/Preprocessor_DOM.php on line XXXX (the line differs each time). Is there a way to solve this, please ? Thanks in advance

Missing PDFBook.php?

[edit]

I am attempting to install the master branch version on Mediawiki 1.27, running from Turnkey Linux appliance. I've installed HTMLDoc using apt-get, with no errors. I go to download PDFBook and there seems to be no PDFBook.php file. Resulting in nothing working when I add lines to LocalSettings.php. Any advice on where I go to get this file? Alternately, I tried the 1.24 branch, and while there is a PDFBook.php included in that, it does not work in my MediaWiki. Thanks in advance!

Printing multiple copies of specific pages?

[edit]

Not sure how to accomplish this, but say a category has a dozen various pages I want to turn into a PDF, but I want 3 copies of one of the pages? I work at a site that has a fair amount of hand written paperwork used daily, with some pages printed out 6-8 times at once for different stations. I'd like to automate this printing somehow.

Jhollinden (talk) 13:55, 28 March 2017 (UTC)Reply

Images too large (not resized)

[edit]

I have an issue with the images (whatever the type, JPG, PNG...) : they do not fit to the page and the big ones are cut on the right. I just found that when the option "--browserwidth" is set at a high value, large images are entirely in the output PDF, but the smallest images are now too small because of the reduction. Does someone have an idea please ?

Problem with table borders

[edit]

The borders of the tables are not visible in the output PDF (only the values in the cases). Is there an option or something to modify ? Thanks.

Why use move from GitHub to gitlab?

[edit]

Hi, why did you delete your GitHub source code? Users were using it and not all of us like using gitlab. I understand you may have moved to gitlab due to microsoft buying GitHub. But why? Microsoft is a very different company under there new ceo then they were before. Please give them a chance before giving up on GitHub. Alot of submodules broke now (there was little warning).

Also the gitlab project is private so you have to login to view it. Paladox (talk) 17:08, 6 June 2018 (UTC)Reply

I have deleted everything from Github as a form of protest about what has happened. I understand that others may not feel the same way as me or not as strongly as me, but Organic Design will not have it's code managed by a Microsoft product. Feel free to fork it and maintain a repo in Github if you like. --Nad (talk) 19:56, 6 June 2018 (UTC)Reply
p.s. Sorry it wasn't supposed to be private, it's public now. --Nad (talk) 20:01, 6 June 2018 (UTC)Reply
Still says "You need to sign in or sign up before continuing." Paladox (talk) 21:19, 6 June 2018 (UTC)Reply
I don't know why that would be, it's public now and I can got to the extensions repo and download a zip or clone it with no login required... --Nad (talk) 21:28, 6 June 2018 (UTC)Reply
I am getting the same note. Admittedly I did not try to clone and see if at least this works. Cheers --[[kgh]] (talk) 21:30, 6 June 2018 (UTC)Reply
does this work https://gitlab.com/OrganicDesign/extensions/tree/master/MediaWiki/PdfBook/ logged out? Paladox (talk) 21:31, 6 June 2018 (UTC)Reply
Also per Microsoft, they are keeping GitHub separate so it will still be the same old GitHub (open) :). Paladox (talk) 21:31, 6 June 2018 (UTC)Reply

Well, the user name was changed from OrganicDesign to Aranad. Fixed now. Cheers --[[kgh]] (talk) 21:37, 6 June 2018 (UTC)Reply
The move to Gitlab would have been an ideal change to separate the extensions you created. When downloading a zip etc. we are still getting all software you created. Well, its accessible which is the main thing in the end. --[[kgh]] (talk) 21:41, 6 June 2018 (UTC)Reply
Yeah I thought about that, but I still need to do a lengthy process in order to preserve the history. I have made some progress though as I've got the process documented for how to separate a specific folder out and nuke everything else including its history, so I think I'll be splitting them all out soon :-) --Nad (talk) 21:53, 6 June 2018 (UTC)Reply
Great to read this. I guess this will make things much easier for novice users of your software. Well,for others, too. :) Cheers --[[kgh]] (talk) 21:56, 6 June 2018 (UTC)Reply
I've just tested the process and after few tweaks got it working, I'll update the links again. here is PdfBook by itself with its history intact, I'll do the other most used extensions soon. --Nad (talk) 22:37, 6 June 2018 (UTC)Reply
That is really cool. Thanks a lot for doing this! Cheers --[[kgh]] (talk) 06:51, 7 June 2018 (UTC)Reply

Some problems, seem doesn't work...

[edit]

Hi guys, I have some problems wit this extension. I installed it, installed htmldoc, buut if I visit a link such as:

http://172.20.0.107/mediawiki/index.php/page_name&action=pdfbook&format=single

it returns:

Page name&action=pdfbook&format=single There is currently no text in this page. You can search for this page title in other pages, search the related logs, or create this page.

Why?

The link format is wrong the first & should be a ? and to be more independent of server configuration you should really use index.php?title=PAGENAME&action.... --Nad (talk) 12:57, 18 June 2018 (UTC)Reply
[edit]

Hello,

I am migrating a Mediawiki installation from version 1.37 to 1.31 and so far so good, apart from one small thing regarding this extension.

I have enabled the tab "Print as PDF", but it only appears if I am logged-in. In my company we want everyone to being able to export articles to PDF without having to log-in. We, the IT department, are pretty much the only editors of the Wiki.

Is this a standard behaviour that the link is only available for logged-in users now or have I something misconfigured? In Mediawiki 3.27 with an older version of this extension, it used to work as desired.

Thanks in advanced for your help


=> To show the tab "Print as PDF" to all users :
Edit this file PdfBookHooks.php
Comment the If condition, only line 21:
// if ( $wgPdfBookTab && $skin->getUser()->isLoggedIn() ) {
and line 27 the closure bracket:
// }

Compatibility with MediaWiki 1.32

[edit]

The hook UnknownAction, which does pretty much everything for this extension, was removed in 1.32. It should be migrated to use $wgActions. -- The Voidwalker Whispers 19:41, 20 January 2019 (UTC)Reply

Where is the Extension?

[edit]

I installed PDFBook and put the line into the localsettings. I can see in special pages, that PDF Book is now installed. But when I enter my wiki, i can't find it. I have no links in my menu.

Can you help?

Did you set $wgPdfBookTab to true? See the configuration options. --Remco de Boer 11:42, 1 July 2019 (UTC)Reply

Yes this is already in the localsettings file. still can't find the pdf book links. any other ideas?

Are you logged in? (See the issue reported above)? --Remco de Boer 07:05, 2 July 2019 (UTC)Reply
Additionally: which version of MediaWiki are you using? --Remco de Boer 07:06, 2 July 2019 (UTC)Reply
Yes, I'm logged in as Admin. Version 1.29.2
Not sure why you don't see the tab. Have you tried to manually enter a PDF creation URL? E.g. http://www.foo.bar/wiki/index.php?title=Main_Page&action=pdfbook&format=single. --Remco de Boer 18:54, 2 July 2019 (UTC)Reply
With this link a PDF gets created, it works. What is the thing with HTMLDOC? I installed it on my server (I'm using a Windows Webserver IIS). Do I need to add something to the Path in Control Panel?

Blank PDFs being generated

[edit]

Hi, I am trying to export a whole Wiki via the categories.

In total there are around 700 pages. When I run PdfBook I get blank PDFs. I have had it briefly working on a docker installation, but have failed to replicate this.

I have tried on a local server, on a local server using docker, but I just can't get it working.

I have been unable to install it on my production server due to dependency issues with Windows and htmldoc.

My local server is on Fedora, the docker one, docker is running on Windows 10, then it is just the standard MediaWiki image/container. I have installed htmldoc using the command line in both Fedora and Docker.

Any help is appreciated. I note this has been discussed before, but can't find an actual answer

--Squeak24 (talk) 21:27, 23 July 2019 (UTC)Reply