Extension talk:Proofread Page
Add topicplease discuss bugs and feature requests with the wikisource community, at oldwikisource:Wikisource:ProofreadPage
manque des explications
[edit]Il manque probablement une partie des explications, car, après avoir suivi toutes les instructions, on n'obtient rien d'autres que des images noires ou des erreurs d'affichage quand on clique sur l'onglet image. J'ai essayé une installation sur internet et sur mon ordinateur, et c'est tout ce que j'obtiens à chaque fois. Mode41 16:23, 6 May 2010 (UTC)
How to perform steps 3 and 4?
[edit]Forgive the beginner, but how does one execute the SQL file? I tried executing ProofreadPage.sql in the command prompt and got nothing. And what about #4: am I correct in assuming that that is telling me to edit ProofreadPage.sql to use the correct prefix?
I'm assuming that my failure to complete these two steps is what caused me to get the error:
- Database returned error
"1146: Table 'mediawiki.pr_index' doesn't exist (localhost)".
Can anyone help? --Spangineer 04:26, 7 February 2011 (UTC)
- Two ways :
- run update.php, or
- copy/paste the sql file content in phpmyadmin.
Proofread error: "no such file"
[edit]Hi, technical question.
I've tried to install Proofread on a fresh MW site, and it gave the error above every time I tried to make and index page (source came from archive.org and uploaded fine, even though no thumbnail on the File description). I've done this several times in id.wikisource too, so I'm stuck with this error for more than a year now. Anyone can help me?
Specs:
MediaWiki 1.16.0 PHP 5.2.10 (apache2handler) MySQL 5.0.77 CheckUser (Versi 2.3) Collection (Versi 1.4) Cite ParserFunctions Poem PDF Handler ConfirmEdit ProofreadPage (Versi 2009-04-20) SpamBlacklist confirmEditSetup, pr_main, wfRssExtension dan wfSetupParserFunctions <pagelist>, <pagequality>, <pages> dan <rss> expr, if, ifeq, ifexist, ifexpr, rel2abs, switch, time, timel dan titleparts
I've added new namespaces (100-103) and updated the database, but I've got the following error when I saved the index page:
Ada kesalahan sintaks pada permintaan basis data. Kesalahan ini mungkin menandakan adanya sebuah bug dalam perangkat lunak. Permintaan basis data yang terakhir adalah:
(Permintaan SQL disembunyikan)
dari dalam fungsi "". Basis data menghasilkan kesalahan "1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '0,0,0,0,0)' at line 1 (localhost)".
Even though it generates error, the index page was saved sucessfully. But it gave the "Error: no such file" message like the one in Index:Federal_Cases,_Volume_19.djvu. While the File:Federal_Cases,_Volume_19.djvu was corrupt in this case, mine was okay. So again, I don't know what's wrong with my installation.
Many thanks before. Bennylin (talk) 13:15, 2 January 2012 (UTC) (crossposted from English Wikisource's Scriptorium)
No Image
[edit]I installed the extension on my wiki but when I create a Page:xyz.jpg there is no image at the right side. Via Scan I can see the image. What's wrong? --87.146.17.135 13:10, 4 November 2012 (UTC)
- You may have not well configure support for pdf, tiff or djvu files. Tpt (talk) 16:43, 4 November 2012 (UTC)
- these extensions working fine.. --87.146.7.92 18:30, 4 November 2012 (UTC)
- I'm having some trouble with the "Image" tab in the "Page:" namespace. It returns a 404 error. I've added to the Apache rewrite rule to the .htaccess in my MediaWiki folder, but this continues. --Inops (talk) 23:59, 4 November 2012 (UTC)
- Are you using InstantCommons ? If you give me error messages you see and format of files you use, it would be better. Tpt (talk) 08:25, 5 November 2012 (UTC)
- Am I not using InstantCommons (I was thinking for using it though, does it have an adverse affect on the extension?). I am attempting to use the extension with a PDF file, the extension functions perfectly well with PDF otherwise (I don't require the use of .djvu), but this error is a minor ningle of mine. The "Image" tab returns a generic 404 error (the displayed messaged dependant on the browser, obviously).
- The URL of the tab e.g. "/mediawiki/images/thumb/1/13/Example.pdf/page10-1275px-Example.pdf.jpg", this returns a 404 error. However, "/mediawiki/thumb.php?f=Example.pdf&width=1275&page=10" will return the JPG rendering of the specific page of the file, and thereafter "/mediawiki/images/thumb/1/13/Example.pdf/page10-1275px-Example.pdf.jpg" will correctly return that image.
- It seems that "mediawiki/images/thumb/" often produces the error, but "mediawiki/thumb.php?" does not, and does rather fix the former's error in a particular example. I assume this is to do with the creation of the thumbnail in the installation; thumb.php? creating the thumbnail and /thumb/ recounting the thumbnail. If the tab hyperlinked to the thumb.php address, this wouldn't be an issue. Is there a fix to this? Thanks, Jordan. --Inops (talk) 12:21, 5 November 2012 (UTC)
- It's maybe because you haven't well configure thumb system. If you add the URL rewriting configuration describe there I hope that will solve your problem. Tpt (talk) 09:05, 6 November 2012 (UTC)
- I've tried the code you suggested, but this didn't seem to make a difference. I've decided to disable the broken (for me) function by disabling the code in ProofreadPage/ProofreadPage.body.php. Thanks for the help. :) --Inops (talk) 09:54, 6 November 2012 (UTC)
- It's maybe because you haven't well configure thumb system. If you add the URL rewriting configuration describe there I hope that will solve your problem. Tpt (talk) 09:05, 6 November 2012 (UTC)
- Are you using InstantCommons ? If you give me error messages you see and format of files you use, it would be better. Tpt (talk) 08:25, 5 November 2012 (UTC)
5050 error on save
[edit]I'm unable to save any page in the Page namespace. It returns a 5050 error. So, the extension is rendered useless. My error log gives:
- [07-Nov-2012 18:17:02] PHP Fatal error: Undefined class constant 'READ_LATEST' in D:\www\mediawiki\extensions\ProofreadPage\ProofreadPage.body.php on line 1238
Though, I am not sure if that's anything to do with it. Any help with this would be great. 90.220.162.151 18:22, 7 November 2012 (UTC)
- It's a constant that is introduce in MediaWiki 1.20, version that have been release today. So upgrade your MediaWiki installation to MediaWiki 1.20 will fix the problem. If you don't want to upgrade to 1.20 you can remove ", Revision::READ_LATEST" from the line 1238 of the file D:\www\mediawiki\extensions\ProofreadPage\ProofreadPage.body.php and it will work (but can introduce a bug if two people save the page as the same time).
- Thanks! Works perfectly. I had to install a newer version of Extension:Vector though. 90.220.162.151 00:25, 8 November 2012 (UTC)
- It's a constant that is introduce in MediaWiki 1.20, version that have been release today. So upgrade your MediaWiki installation to MediaWiki 1.20 will fix the problem. If you don't want to upgrade to 1.20 you can remove ", Revision::READ_LATEST" from the line 1238 of the file D:\www\mediawiki\extensions\ProofreadPage\ProofreadPage.body.php and it will work (but can introduce a bug if two people save the page as the same time).
Localization
[edit]How to add new translations of the namespace-names? I don't see it on translatewiki. --Bjarki S (talk) 01:20, 19 January 2013 (UTC)
- I've updated translatewiki:Translating:MediaWiki#Translating namespace names. In short, you can now file a request on bugzilla, MediaWiki extensions>ProofreadPage. Previously all wikis had to configure it locally. --Nemo 06:43, 19 January 2013 (UTC)
Alright. Thanks! --Bjarki S (talk) 20:58, 19 January 2013 (UTC)
no such file at wikilivres
[edit]The Wikilivres.ca website has been getting the "no such file" errors reported here at Proofread error: "no such file" for maybe six months. Could anyone here suggest a solution? Please see wikilivres:wikilivres:Community_Portal/en#djvu still reports no such file for examples of files with the problem, together with links to a database error occurring when the pagelist tag is used. Thank you. -84user (talk) 23:04, 19 January 2013 (UTC)
- Are you sure that DjVu support is well configure on your server ? I think that this is the cause of the issue. See DjVu for more information. Tpt (talk) 20:29, 20 January 2013 (UTC)
OCR software
[edit]I can't seem to find any information on the OCR software that this extension uses. We have just finished setting this up on the Icelandic Wikisource but there is a marked difference in the quality of the OCR retrieved from an English test document and an Icelandic one with the same font and font size. I am wondering if the support for the Icelandic language is not built into the OCR software or if there are any ways to improve it. Does the software learn by itself to recognize strange new characters like þ and ð if given enough practice or would that be a waste of time and effort? --Bjarki S (talk) 04:59, 20 January 2013 (UTC)
- The extension doesn't include any OCR software, it only extracts the text embedded in the PDF/DjVU files you're using: check what generated them. s:en:Help:DjVu files has a lot of advice on how to get DjVu with decent OCR; for instance, if you use archive.org don't forget to specify the language of the document in your metadata. --Nemo 09:44, 20 January 2013 (UTC)
- Hmm. So what does the "OCR" button do? Seems to work fine with PDFs without embedded text layer. --Bjarki S (talk) 17:15, 20 January 2013 (UTC)
- My information may be outdated, but are you sure it's a PDF without text or is just your PDF reader unable to read/select/copy it? pdfinfo or pdftotext commands would tell you. --Nemo 18:00, 20 January 2013 (UTC)
- OCR button is managed by a script stored on oldwikisource that call a script on toolserver that use the free but not very good Tesseract OCR software configured for the English language. So, if you use the script on an English and on an Icelandic texts, it's normal that the OCR is better for the English one. Tpt (talk) 20:34, 20 January 2013 (UTC)
- Alright, thanks! Seems like there is no free option available for Icelandic. --Bjarki S (talk) 04:18, 29 January 2013 (UTC)
- Eh, there are options available for Icelandic. There are three online services that have icelandic OCR support: Archive.org, ocr-extract.com and newocr.com. Plus, there is an Icelandic language file for Tesseract on Google code, and that is enough to make Tesseract compatible with Icelandic.--Snaevar (talk) 15:50, 29 January 2013 (UTC)
- I installed support for icelandic on the toolserver, as for all language with diacritics accuracy of results depends a lot on the quality of scan. Beside that I also upgraded tesseract, result should be a bit better for all supported language. Phe (talk) 22:29, 30 January 2013 (UTC)
- Hi, I'm a newbie on this, so forgive me if my Q's are too simple. I've uploaded this file:http://commons.wikimedia.org/wiki/File:ChFSA_FD1197205170%281%29.djvu, which is in spanish. The Proof Read Page doesn't seem to catch the OCR. I converted the .djvu, using Any2DjVu (Medium (300 dpi); Lossless; OCR (only works reliably for english text, locate columns automatically.)), at this point I really don't care about the quality of the OCR ('cause in Spanish), but that the proof reader page program actually performs the transclution to the Wikisource page, I'm doing sometihg wrong here? Thanks!--3BRBS (talk) 04:20, 2 February 2013 (UTC)
- Hi! If I've well understood what your are saying, the issue is that you doesn't manage to get the text layer of the djvu (that contains a text layer) when your tired to get create pages of the Page: namespace? If yes, It's very strange because this extraction of the text layer of this file works fine for me on my test wiki. Tpt (talk) 15:57, 2 February 2013 (UTC)
- Yes, you got it right, but after I wrote the message (above), I uploaded a new version of the file, which I make sure had a text layer (I runned the google script pdf2djvu, and quit using the website online converter). I had to install a view program to check that out, but in the end I gave up, because I couldn't check if the program (proof reader) could actually extract the text. If you say it works... that's great... but how can I check it on my own? Could you run the program to check on the three different version of the file I uploaded so I figure out what the problem was? (I'm thinking that is a failure of the website, because it said it runned the OCR, but I couldn't extract anything from it). Thanks!! :D--3BRBS (talk) 13:45, 5 February 2013 (UTC)
- The first two versions of your file has no text layer according to evince and djView4. So, Proofread Page doesn't extract a text layer because there is no text layer in the file. So, there is no bug in Proofread Page. Tpt (talk) 17:06, 5 February 2013 (UTC)
- Thanks for taking the time for checking, I believe there is a bug with the french website then, since it "OCR"ed the file, but added no text layer then. The third time, I used the google script! Best.--3BRBS (talk) 21:59, 9 February 2013 (UTC)
- The first two versions of your file has no text layer according to evince and djView4. So, Proofread Page doesn't extract a text layer because there is no text layer in the file. So, there is no bug in Proofread Page. Tpt (talk) 17:06, 5 February 2013 (UTC)
- Yes, you got it right, but after I wrote the message (above), I uploaded a new version of the file, which I make sure had a text layer (I runned the google script pdf2djvu, and quit using the website online converter). I had to install a view program to check that out, but in the end I gave up, because I couldn't check if the program (proof reader) could actually extract the text. If you say it works... that's great... but how can I check it on my own? Could you run the program to check on the three different version of the file I uploaded so I figure out what the problem was? (I'm thinking that is a failure of the website, because it said it runned the OCR, but I couldn't extract anything from it). Thanks!! :D--3BRBS (talk) 13:45, 5 February 2013 (UTC)
- Hi! If I've well understood what your are saying, the issue is that you doesn't manage to get the text layer of the djvu (that contains a text layer) when your tired to get create pages of the Page: namespace? If yes, It's very strange because this extraction of the text layer of this file works fine for me on my test wiki. Tpt (talk) 15:57, 2 February 2013 (UTC)
- Hi, I'm a newbie on this, so forgive me if my Q's are too simple. I've uploaded this file:http://commons.wikimedia.org/wiki/File:ChFSA_FD1197205170%281%29.djvu, which is in spanish. The Proof Read Page doesn't seem to catch the OCR. I converted the .djvu, using Any2DjVu (Medium (300 dpi); Lossless; OCR (only works reliably for english text, locate columns automatically.)), at this point I really don't care about the quality of the OCR ('cause in Spanish), but that the proof reader page program actually performs the transclution to the Wikisource page, I'm doing sometihg wrong here? Thanks!--3BRBS (talk) 04:20, 2 February 2013 (UTC)
- I installed support for icelandic on the toolserver, as for all language with diacritics accuracy of results depends a lot on the quality of scan. Beside that I also upgraded tesseract, result should be a bit better for all supported language. Phe (talk) 22:29, 30 January 2013 (UTC)
- Eh, there are options available for Icelandic. There are three online services that have icelandic OCR support: Archive.org, ocr-extract.com and newocr.com. Plus, there is an Icelandic language file for Tesseract on Google code, and that is enough to make Tesseract compatible with Icelandic.--Snaevar (talk) 15:50, 29 January 2013 (UTC)
- Alright, thanks! Seems like there is no free option available for Icelandic. --Bjarki S (talk) 04:18, 29 January 2013 (UTC)
- OCR button is managed by a script stored on oldwikisource that call a script on toolserver that use the free but not very good Tesseract OCR software configured for the English language. So, if you use the script on an English and on an Icelandic texts, it's normal that the OCR is better for the English one. Tpt (talk) 20:34, 20 January 2013 (UTC)
- My information may be outdated, but are you sure it's a PDF without text or is just your PDF reader unable to read/select/copy it? pdfinfo or pdftotext commands would tell you. --Nemo 18:00, 20 January 2013 (UTC)
- Hmm. So what does the "OCR" button do? Seems to work fine with PDFs without embedded text layer. --Bjarki S (talk) 17:15, 20 January 2013 (UTC)
Error
[edit]I am using Mediawiki in "Malayalam" language. I installed this extension, but still getting my Index pages similar to this. Please help.--Balasankarc (talk) 19:54, 16 April 2013 (UTC)
- You have to edit Mediawiki:Proofreadpage index template (that is the template outputted in index pages) and use here parameters setup in Mediawiki:proofreadpage index attributes. Tpt (talk) 07:01, 25 April 2013 (UTC)
Refactoring of Code
[edit]I came across the project idea Proofread Page extension needs to be refactored. I would like to know what are main problems in the code which we would like to overcome when we refactor. Are there a set of features according to which the code has to be refactored to support them in future releases? --Aarti Dwivedi
- The main problems are related to the Page: pages edition system that is currently an horrible hack and this part of code as become too complicated to be modified easily without breaking everything. The goal is to implement it cleanly and make it compatible with the Visual Editor (see bugzilla:46616). Tpt (talk) 06:57, 25 April 2013 (UTC)
Edit the format of text area
[edit]Hi,
I like to use a Semantic Form instead of the default text area on the left side. I intend to do this for digitizing a dictionary. So I need the image on the right and a semantic form on the left. Is there anyway I can do this?--Balasankarc (talk) 16:26, 16 May 2013 (UTC)
OCR for Bengali wikisource
[edit]Hi, I am from Bengali wikisource.(bn.wikisource.org). There are one OCR (open source) available at https://code.google.com/p/banglaocr/. Could you you please add this in for Bengali Wikisource? Jayantanth (talk) 08:48, 24 October 2013 (UTC)
- Hi, do you know if tesseract-ocr-3.02.ben.tar.gz Bengali language data for Tesseract 3.02 at [1] is the same thing and can be installed instead ? Phe (talk) 16:29, 25 October 2013 (UTC)
- I installed the file I mentioned above and tried it on [2], I'm unsure how bad are the result but the ocr quality seems very poor :/ 18:45, 25 October 2013 (UTC)
- Thank you for installing. I know that OCR till in needs to be some development. I have chacked severela times in of line in desktop, its working fine with good 300dpi images. But here its not responding. I am not sure about Tesseract 3.02. I am trying to contact main developer Md. Abul Hasnat & Murtoza Habib.Jayantanth (talk) 13:28, 26 October 2013 (UTC)
Hi Phe, I had contacted to main developer Md. Abul Hasnat. He had replied to me with the following answer below.
"Do you know if tesseract-ocr-3.02.ben.tar.gz Bengali language data for Tesseract 3.02 at [1] is the same thing and can be installed instead ?"
- My comment: In general if you replace the tesseract training file on the tessdata sub-directory inside the BanglaOCR software it should provide results accordingly. The reason is that, BanglaOCR uses tesseract as an external OCR engine. However, several source confirmed me that tesseract Tesseract 3.02 still works better with the old training data rather than tesseract-ocr-3.02.ben.tar.gz. However, I did not have a chance to validate this by myself.
Many people may complain that BanglaOCR with tesseract provides extremely poor results.
- My comment: BanglaOCR is a first complete OCR framework which was released as open source with the aim of continuous development. Unfortunately, after version 0.7 no one worked on that and hence in the past five years there is not enough progress. It was giving reasonable results for certain types of documents. However, it was not extended to handle any type of documents.
What will you suggest for tesseract-ocr-3 version?
Jayantanth (talk) 16:50, 13 April 2014 (UTC)
"Proofreadpage index data config" list of types
[edit]Shouldn't the list of types contain "langcode" as it used by the "Language" field?
Currently, it says:
"Possibles values: string, number, page"
I don't have enough knowledge in this area to feel comfortable editing it myself. Are there other types that aren't listed?
OCR Languages
[edit]Is there a list somewhere of the languages that are supported by the toolserver script? It seems that it does not support Hebrew. Who should I ask to add support for Hebrew (it seems that Tesseract-OCR has a data file for Hebrew). Inkbug (talk) 06:47, 29 January 2014 (UTC)
- I installed it. Phe (talk) 13:04, 31 January 2014 (UTC)
- Thank you! Inkbug (talk) 16:14, 1 February 2014 (UTC)
API Documentation & Improvement
[edit]Proofread Page Extension adds two API hooks to Query module. One meta hook for information ("proofreadinfo") and another property hook for quality status ("proofread") of the Proofread pages.
meta = proofreadinfo
[edit]Meta information about the configuration of Proofread extension.
- Parameters
piprop
: Which proofread properties to get. Values (separate with '|'): namespaces, qualitylevelsnamespaces
: Information about Page & Index namespaces (Default)qualitylevels
: List of proofread quality levels (Default)
- Example
Result |
---|
<api>
<query>
<proofreadnamespaces>
<index id="106" />
<page id="104" />
</proofreadnamespaces>
</query>
</api>
|
prop = proofread
[edit]Proofread status of the given Index/Page. Index means the entire book.
Note: prop=proofread
expects a namespace parameter gapnamespace
residing in previous API call.
gapprefix
: Name of the Index/Pagegapnamespace
: Page/Index namespace returned from themeta=proofreadinfo
gapcontinue
: Appended for pagination.gaplimit
: Limit the number of results per API call
- Example
Result |
---|
<api>
<query-continue>
<allpages gapcontinue="Love_among_the_chickens_(1909).djvu/108" />
</query-continue>
<query>
<pages>
<page pageid="1639748" ns="104" title="Page:Love among the chickens (1909).djvu/1">
<proofread quality="0" quality_text="Without text" />
</page>
<page pageid="1639815" ns="104" title="Page:Love among the chickens (1909).djvu/10">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642429" ns="104" title="Page:Love among the chickens (1909).djvu/100">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642430" ns="104" title="Page:Love among the chickens (1909).djvu/101">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642431" ns="104" title="Page:Love among the chickens (1909).djvu/102">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642432" ns="104" title="Page:Love among the chickens (1909).djvu/103">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642433" ns="104" title="Page:Love among the chickens (1909).djvu/104">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642434" ns="104" title="Page:Love among the chickens (1909).djvu/105">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642435" ns="104" title="Page:Love among the chickens (1909).djvu/106">
<proofread quality="4" quality_text="Validated" />
</page>
<page pageid="1642436" ns="104" title="Page:Love among the chickens (1909).djvu/107">
<proofread quality="4" quality_text="Validated" />
</page>
</pages>
</query>
</api>
|
Inputs Notes/Sources
[edit]In same context any thoughts or points to compile API related notes.
Points could include,
- Use-case of API
- Existing components/projects/bots already using proofread API features
- Anything else.
w.r.t IRC chat
[edit]TPT suggests we need to specify serialise formatting for specific format of Page: and Index: pages:
For Page: pages format see:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FProofreadPage.git/1c5685425ba4bc41c174552d5e61b1d4de343043/includes%2Fpage%2FPageContentHandler.php#L35
Some samples of the Page: pages serialization:
https://git.wikimedia.org/blob/mediawiki%2Fextensions%2FProofreadPage.git/1c5685425ba4bc41c174552d5e61b1d4de343043/tests%2Fincludes%2Fpage%2FPageContentHandlerTest.php#L26
Need expansion here
TPT && PHE conversation:
All books has an index page. That contains all the meta information about the book.
For. E.g.:
https://en.wikisource.org/wiki/Index:Love_among_the_chickens_(1909).djvu
There after to get the Proofread Quality of each page inside book. We can use API in following way. There are 2 important things needed for API hook to work.
- GAPNAMESPACE
- GAPPREFIX
Note: All the parameters are pending to be documented
GAPPREFIX contains the index page of the book. And GAPNAMESPACE can be derived by querying prop=proofreadinfo.
http://en.wikisource.org/w/api.php?action=query&meta=proofreadinfo&piprop=namespaces%7Cqualitylevels
Result |
---|
<api>
<query>
<proofreadnamespaces>
<index id="106"/>
<page id="104"/>
</proofreadnamespaces>
</query>
</api>
|
Note namespace is different for every domain.
Thus an example API call can be :-
Result |
---|
<api>
<query-continue>
<allpages gapcontinue="Love_among_the_chickens_(1909).djvu/108"/>
</query-continue>
<query>
<pages>
<page pageid="1639748" ns="104" title="Page:Love among the chickens (1909).djvu/1">
<proofread quality="0" quality_text="Without text"/>
</page>
<page pageid="1639815" ns="104" title="Page:Love among the chickens (1909).djvu/10">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642429" ns="104" title="Page:Love among the chickens (1909).djvu/100">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642430" ns="104" title="Page:Love among the chickens (1909).djvu/101">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642431" ns="104" title="Page:Love among the chickens (1909).djvu/102">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642432" ns="104" title="Page:Love among the chickens (1909).djvu/103">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642433" ns="104" title="Page:Love among the chickens (1909).djvu/104">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642434" ns="104" title="Page:Love among the chickens (1909).djvu/105">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642435" ns="104" title="Page:Love among the chickens (1909).djvu/106">
<proofread quality="4" quality_text="Validated"/>
</page>
<page pageid="1642436" ns="104" title="Page:Love among the chickens (1909).djvu/107">
<proofread quality="4" quality_text="Validated"/>
</page>
</pages>
</query>
</api>
|
Index page for same can be found here
https://en.wikisource.org/wiki/Index:Love_among_the_chickens_(1909).djvu
Note the GAPCONTINUE parameter in the result. This is used to interate over further pages of the book.
Also GAPLIMIT can be used to limit the number of pages in 1 API call.
w.r.t to Maillist replies
[edit]Gaurav Vaidya has created a perl module to download an entire book from wikisource.
hypothetical “Index:Entire book.pdf” by:-
- Using prop=imageinfo to get the number of pages for “File:Entire book.djvu".
- Using prop=revisions to download the Wikitext for each individual page from “Page:Entire book.djvu/1” to “Page:Entire book.djvu/9999” (if the image had 9,999 pages).
This will work for Wikisources that redirect “File:”, “Index:” and “Page:” into their local namespaces.
He suggests it might helpful to have an API query that could return the proofread status for every page in an Index page.
Explaination on generator parameters. (Need to be documented someplace.)
the first ‘g’ is for ‘generator’, the next two letters tell you which generator the parameters are intended for (‘ap’ = allpages). As far as I know (I might be wrong!), the way this API call works is:
- Mediawiki starts with ‘generator=allpages’; it takes all parameters that start with ‘g’ (‘gapnamespace’, ‘gapprefix’), strips out the leading ‘g’, and hands them over to the ‘allpages’ generator (https://www.mediawiki.org/wiki/API:Allpages).
- The allpages generator returns all pages starting with (‘apprefix’) “Love_among_the_chickens_(1909).djvu” in the namespace (‘apnamespace') 104 (which I think is the Wikisource namespace identifier for the “Page:” namespace).
- All the remaining parameters are treated as parameters to “action=query” as documented at https://www.mediawiki.org/wiki/API:Query — so for each page found by the allpages generator, Mediawiki will determine the props (in this case, only ‘proofread’ — but try adding other props to that list, such as ‘categories’ or ‘info’, from https://www.mediawiki.org/wiki/API:Properties). These are returned in the results.
TPT suggests to describe both the hooks properly which is our end goal anyways. Im writing a rought draft for same and will post in this section. For reference i'll go with Thomas suggestion https://www.mediawiki.org/wiki/API:Properties#imageinfo_.2F_ii.
--Kishanio (talk) 12:35, 17 May 2014 (UTC)
MobileFrontend Support
[edit]- Look at this mess: https://phonehomebook.org/de/index.php?title=Index:PRISM.pdf&mobileaction=toggle_view_mobile
- could you at least please support backward- / forward-buttons on each single page of a document?
Metadata Encoding and Transmission Standard
[edit]See [Wikisource-l] [BEIC] METS and structural map, I think w:METS is very relevant to this extension. --Federico Leva (BEIC) (talk) 15:14, 24 September 2014 (UTC)
Recto/verso numbering through <pagelist/>?
[edit]A lot of early books were numbered by leaf instead of by page-side, meaning when you open the book only the right-side page has a number. The page numbering is usually expressed in the same way as manuscripts, i.e. page number and recto or verso (see the image to the right). At the moment, it seems like there's no support for this style of numbering in the <pagelist/>
function and the numbering has to be entered by hand for each page, like this:
<pagelist 1="1" 2="1v" 3="2" 4="2v" 5="3" 6="3v" ... />
Is there some way to achieve this that I'm missing? Or can "folio" (and "folioroman") styles be added so we can just mark them like this?
<pagelist from=1 to=50 1to50="folio" />
Cross-posted with old Wikisource because I'm not sure where the best place to request this is.
~ Michael Chidester (Contact) 19:18, 2 October 2014 (UTC)
- The best place for this request is on Bugzilla. Feel free to open a bug on it. Tpt (talk) 08:55, 6 October 2014 (UTC)
- This feature request is tracked on task T73821. Candalua (talk) 15:34, 7 May 2018 (UTC)
Translation
[edit]Where is the page that hosts variables like {{{year}}} or {{{Publisher}}} used on Index page? — Revi 11:57, 19 January 2015 (UTC)
- Have you tried following translatewiki:FAQ#Finding messages? --Nemo 07:59, 20 January 2015 (UTC)
- Of course I already did before posting question, but I cannot find value like
{{{Illustrator|}}}
on MediaWiki:Proofreadpage index template@enwikisource. — Revi 08:39, 12 February 2015 (UTC)- I'm confused. s:en:MediaWiki:Proofreadpage index template has the variables you mentioned: you found it! Looks like the index just works like a template, passing the parameters to this MediaWiki namespace page. Local templates are not translated anywhere, you can only port them. --Nemo 08:44, 12 February 2015 (UTC)
- You should be confused - messages, parameters, labels and/or values directly or indirectly related to the Proofread Page extension are spread across multiple places n' multiple pages. I wish the entire thing was redesigned from scratch for reasons like that.
I think Revi is also looking for... - ... in addition to the pseudo template already given. oldwikisource:MediaWiki:Base.js & oldwikisource:MediaWiki:Common.js are two other possible places for "message listings". -- George Orwell III (talk) 00:38, 14 February 2015 (UTC)
- You should be confused - messages, parameters, labels and/or values directly or indirectly related to the Proofread Page extension are spread across multiple places n' multiple pages. I wish the entire thing was redesigned from scratch for reasons like that.
- I'm confused. s:en:MediaWiki:Proofreadpage index template has the variables you mentioned: you found it! Looks like the index just works like a template, passing the parameters to this MediaWiki namespace page. Local templates are not translated anywhere, you can only port them. --Nemo 08:44, 12 February 2015 (UTC)
- Of course I already did before posting question, but I cannot find value like
Proofreadpage_data_config validation
[edit]I created a Proofreadpage_data_config and saved it without problems. First time I tried to create a new index page, I got an error message "invalid JSON" and the index page creation failed. The JSON does indeed have an error, but Proofreadpage_data_config is frozen and cannot be edited, which means having to jump through hoops to correct it. Lesson: Instead of checking the JSON before each use, it should be checked once before it's saved. 2A02:2149:A000:8200:216:76FF:FE91:2064 19:13, 2 August 2015 (UTC)
API example to create the Index page
[edit]Could someone please provide an example of using the API to fill in the fields of the Index page while creating it? Just passing form data like wpprpindex-Title=this and wpprpindex-Year=that in the POST returns a warning "Unrecognized parameters: 'wpprpindex-Title" etc and creates the page but does not populate the index fields. Zenonp (talk) 08:14, 15 August 2015 (UTC)
Solution thanks to This, that and the other:
- Go to non-existent index page and hit Edit
- On the edit page, hit "Show changes"
- Pass the template shown on the right as the text body to the API
Wikisource example with mwclient:
text = "{{:MediaWiki:Proofreadpage_index_template\n|Type=" + type + "\n|Title=" + title + "\n|Language=" + lang + ... "\n|Footer=" + foot + "\n}}" summary = 'automatic creation' res = page.save(text, summary) if res['result'] != 'Success': print 'Failed!'
Zenonp (talk) 08:15, 2 September 2015 (UTC)
Could you remove space characters between pages in Thai, Chinese, Japanese etc ?
[edit]Hi. This extension creates space characters between pages. Although many European languages put spaces between words, some Asian languages don't put spaces between words in Thai, Mandarin, Cantonese, Japanese etc. Could you create a parameter to get off/on the insertion of space characters between words for these languages ? Thank you for your maintenance. --Akaniji (talk) 10:02, 15 October 2015 (UTC)
- Akaniji: this has been recently implemented in phab:T60729. Wikis which need to suppress the space between pages will be able to open a site request, asking to set wgProofreadPagePageSeparator = "". zh.source has already been notified, can you spread the news to others? --Candalua (talk) 15:08, 7 May 2018 (UTC)
Categories and radio buttons
[edit]By clicking on the radio buttons underneath the page, a category is added to the summary text box. Unfortunately the text selected for those categories in my local Wikisource (fa.wikisource.org) is not correct. We have decided to change it but we don't know how?! Would you please tell us? --Yoosef Pooranvary (talk) 21:36, 13 June 2017 (UTC)
The content model 'proofread-page' is not registered on this wiki.
[edit]When I run refreshLinks.php, I constantly get following error message
[84df081f98fd28079f36f0b3] [no req] MWUnknownContentModelException from line 306 of /var/www/html/includes/content/ContentHandler.php: The content model 'proofread-page' is not registered on this wiki. See https://www.mediawiki.org/wiki/Content_handlers to find out which extensions handle this content model. Backtrace: #0 /var/www/html/includes/content/ContentHandler.php(243): ContentHandler::getForModelID(string) #1 /var/www/html/includes/Title.php(4746): ContentHandler::getForTitle(Title) #2 /var/www/html/includes/parser/Parser.php(895): Title->getPageLanguage() #3 /var/www/html/includes/parser/Parser.php(2129): Parser->getTargetLanguage() #4 /var/www/html/includes/parser/Parser.php(2094): Parser->replaceInternalLinks2(string) #5 /var/www/html/includes/parser/Parser.php(1322): Parser->replaceInternalLinks(string) #6 /var/www/html/includes/parser/Parser.php(451): Parser->internalParse(string) #7 /var/www/html/includes/content/WikitextContent.php(330): Parser->parse(string, Title, ParserOptions, boolean, boolean, NULL) #8 /var/www/html/includes/content/AbstractContent.php(497): WikitextContent->fillParserOutput(Title, NULL, ParserOptions, boolean, ParserOutput) #9 /var/www/html/includes/content/AbstractContent.php(230): AbstractContent->getParserOutput(Title, NULL, ParserOptions, boolean) #10 /var/www/html/maintenance/refreshLinks.php(278): AbstractContent->getSecondaryDataUpdates(Title, NULL, boolean) #11 /var/www/html/maintenance/refreshLinks.php(199): RefreshLinks::fixLinksFromArticle(integer, boolean) #12 /var/www/html/maintenance/refreshLinks.php(82): RefreshLinks->doRefreshLinks(integer, boolean, string, boolean, boolean) #13 /var/www/html/maintenance/doMaintenance.php(111): RefreshLinks->execute() #14 /var/www/html/maintenance/refreshLinks.php(495): require_once(string) #15 {main}
I think I have done everything that installation guide say. The extension is working as expected, and I have created multiple pages in the Page and Index namespace. I have run update.php, but it do not solve the problem.
--Magol (talk) 20:20, 7 October 2017 (UTC)
- Did you solve the problem? I am having a similar issue --Loman87 (talk) 08:47, 17 January 2018 (UTC)
- It's an interesting problem. Which version of MediaWiki are you using? Tpt (talk) 20:00, 31 May 2018 (UTC)
This page is not enabled for semantic in-text annotations due to namespace restrictions
[edit]I get this error. The created index page is empty. — Preceding unsigned comment added by 83.175.70.68 (talk • contribs) 13:03, 24 September 2018 (UTC)
- What did you do? On which wiki? What is installed on that wiki? Server logs (if available)? Etc., etc. —Tacsipacsi (talk) 17:49, 24 September 2018 (UTC)
- MediaWiki v. 1.27.4; I followed the installation and configuration guide of the Proofread Page extension. All requirements and recommendations (LabeledSectionTransclusion (strongly recommended), Cite (default page footer contains , Poem, PdfHandler (may require additional PHP packages) — adds PDF support, PagedTiffHandler, ParserFunctions) are installed.
- I have the same problem with the same message. First I thought that I have to declare the new namespace (index, page) to SMW with the parameter "$smwgNamespacesWithSemanticLinks" but also didn't work. My wiki ist still a closed-one (develop-state), so I cannot get a link here. How can I activate Proofread with Annotation?
Is there any information about how I can use semantic-mediawiki in combination with proofread-pages? How can I configure the wiki that it recognizes the SMW-annotations? (talk)
How do I set proofread quality level of a Page via the API?
[edit]How can I create a Page entry and set the proofread_page_quality_level
via the API? I see that I can use the API to read the quality level but the documwntation gives no indication of how to set it.
The use case is that I wish to import a partially proofread work from another source. /Lokal_Profil (talk) 21:20, 5 November 2018 (UTC)
- Or is it only stored in the wikitext of the page? /Lokal_Profil (talk) 21:23, 5 November 2018 (UTC)
Problem after update to last git version
[edit]Last update my wiki disfunct all pages with tag <pages /> (see link to page index is Index:roll_1907.djvu) and edit a index page is viewed only as wikitext, without forms, with notice:
You cannot edit this revision because its content model is wikitext, which differs from the current content model of the page proofread-index.
Where is problem? Want (talk) 22:28, 29 March 2019 (UTC)
- @Want: I see “No such action” errors everywhere, not even the above-mentioned one. Maybe you should avoid using newer MediaWiki than Wikisource’s (Wikisource uses f51b1f3 from 28th March, while you use a33eae8 from 29th). —Tacsipacsi (talk) 14:38, 30 March 2019 (UTC)
- @Tacsipacsi: It is normal. My wiki use my extension AccessControl and full access is allow only for logged users. By my mind is not problem in version MediaWiki (Version) but in PHP, because upgrade was from PHP 5.6 to PHP 7.3 Above that, when I was explore code from Wikisource, I found that template MediaWiki:Proofreadpage header template is redirected to page, what use module in Lua. My wiki don't use Lua modules. After update:
- Disappeared navigation tabs from pages of books.
- Wiki now ignore content from some Proofread templates ad element <pages /> is disfuncted.
- I know how resolve mentioned problems. For include pages from books I can use DPL extension and content from index pages may be managed over template and Page Forms extension. But I'd like to know what the problem might be. —Want (talk) 16:39, 1 April 2019 (UTC)
- @Tacsipacsi: It is normal. My wiki use my extension AccessControl and full access is allow only for logged users. By my mind is not problem in version MediaWiki (Version) but in PHP, because upgrade was from PHP 5.6 to PHP 7.3 Above that, when I was explore code from Wikisource, I found that template MediaWiki:Proofreadpage header template is redirected to page, what use module in Lua. My wiki don't use Lua modules. After update:
- Resolved! Core of my problem were changes in code after release REL_1.30, when were included into code of extension the verification of content model of index pages. My upgrade was from MediaWiki 1.29 to 1.33-alpha (now I use 1.34-alpha) and index pages for it was wikitext type. I wanted do change type on Book index, but had not this choice in rollup menu on special page for change of content model of page.
- Only by chance I found resolv for it, when I do change content model first to plain text, and only after is choice 'Book index' is allowed in select. Thanks Tpt for your help. --Want (talk) 21:17, 15 April 2019 (UTC)
- By the way, have you run
maintainance/upgrade.php
after upgrading MediaWiki? It should call a ProofreadPage script that converts all Index: and Page: pages to the correct content model. Tpt (talk) 07:49, 16 April 2019 (UTC)- @Tpt: Yes. It is first what I do after upgrade of code. Before it was only composer update. I updated for this reason all code to master git branch. By my mind was problem with another type of content for index pages. See message, which I was copy here on the start. Original type index page was
wikitext
, notproofread-index
, but premise of extension code was, when page is from index namespace, it must exist as typeproofread-index
. The update script could not have suspected, that content in the database was created before a content type of this pages was included. --Want (talk) 10:08, 16 April 2019 (UTC)
- @Tpt: Yes. It is first what I do after upgrade of code. Before it was only composer update. I updated for this reason all code to master git branch. By my mind was problem with another type of content for index pages. See message, which I was copy here on the start. Original type index page was
- By the way, have you run
Converting index_attributes and js_attributes to data_config
[edit]I'm looking to convert a wikisource from the old config style to the new json style. While it is described how index_attributes is related to the json there is no info (or any documentation at all which I could find) on what MediaWiki:Proofreadpage_js_attributes is supposed to do or how it converta to the new json format. /Lokal_Profil (talk) 18:52, 5 April 2019 (UTC)
- @Lokal Profil: Listing a field ID there is equivalent to setting
"header": true
for it in the JSON version, whatever that means… —Tacsipacsi (talk) 19:11, 5 April 2019 (UTC)- @Tacsipacsi: Many thanks. That was my suspicion but since there wasn't a 1-to-1 correlation between the deleted js_attributes on oldwikisource and it's new data_config I became unsure. I'll add a note about this in the config section. /Lokal_Profil (talk) 06:37, 6 April 2019 (UTC)
Pagequality
[edit]I want to make a template in fawikisource to show the percentage of proofread pages of a book in a colored bar. What I need, is a bunch of variables (or whatever they are called) to make the bar based on them. I wonder if they are available right now? --Yoosef Pooranvary (talk) 14:00, 9 July 2019 (UTC)
- @Yoosef Pooranvary: No, they are not available from wikitext (templates), only through the API (for use in gadgets and off-wiki software like bots). —Tacsipacsi (talk) 18:40, 10 July 2019 (UTC)
Turn page buttons in the footer
[edit]Don't know if this could be helpful for someone, anyway I want to share a trick to add to the footer of each page in the page namespace next and previous page buttons, using parser functions.
<div style="margin-top:10px"><span style="float:left; font-size:xx-large">{{#ifexist: {{#titleparts: {{FULLPAGENAME}}|1|1}}/{{#expr:{{#titleparts: {{PAGENAME}}|2|2}}-1}}|[[{{#titleparts: {{FULLPAGENAME}}|1|1}}/{{#expr: {{#titleparts: {{PAGENAME}}|2|2}}-1}}|◅]]|}}</span><span style="float:right; font-size:xx-large">{{#ifexist: {{#titleparts: {{FULLPAGENAME}}|1|1}}/{{#expr:{{#titleparts: {{PAGENAME}}|2|2}}+1}}|[[{{#titleparts: {{FULLPAGENAME}}|1|1}}/{{#expr: {{#titleparts: {{PAGENAME}}|2|2}}+1}}|▻]]|}}</span></div>
I know that this solution is pretty trivial, but it works fine; I call it via template. I needed it because on my wiki I don't transclude pages and users read texts in the page namespace, so for them it's more comfortable to have a turn page button at the bottom of the page, instead of scrolling up and click on that one on the top.
Bye!
Is there a way for lua to get page name set in pagelist?
[edit]Hi!
I want to get page name set in pagelist in Lua when writing this. How can I do that?--維基小霸王 (talk) 14:30, 12 June 2020 (UTC)
Hide the link to the source
[edit]Hi! I need to keep the index loading, but hide the top link. How do I hide the link to the source? (Top green state of proofreading). Thanks --Arxivist (talk) 18:39, 4 September 2020 (UTC)
Screenshot?
[edit]No picture of a visual tool? How odd!
Problem with DjVu files after upgrade to REL1_39
[edit]Hi,
MW and extensions were upgraded from 1.35 to 1.39. When queried for index .djvu files no return no pages, invalid interval. Accessing .djvu page by its page number also renders nothing. There are no errors in MW debug log. PDF files do work, but majority of our scanned files are in DjVu format. Anyone seeing this problem? Any hints on fixing?
Thank you. Sugarbravo (talk) 16:16, 9 January 2023 (UTC)
- Changes made to includes/media/DjVuHandler.php from 1.35 to 1.37 cause Parser return empty objects for .djvu files. These are used by parser tag hooks defined in Changes made to includes/media/DjVuHandler.php.
- Copying DjVuHandler.php from 1.35 over one in 1.39 lets Proofread Page to work with .djvu files again. Not an ideal solution... Sugarbravo (talk) 17:36, 11 January 2023 (UTC)
- @Sugarbravo, I found purging the file page on Commons and index page on Wikisource fixed the issue, when I saw issues with recent dvju uploads. Arjunaraoc (talk) 05:45, 24 November 2024 (UTC)
PHP Fatal error: Cannot declare class ParserOutput, because the name is already in use
[edit]Hi, I get this error when I try to the "Link to the index page" of a Pdf file on my wiki, e.g. here. The file, as you can see, is not viewed properly. This another issue I can't solve, I tried everything indicated in the extension (and dependencies) installation guide. I think it's something very basic that I am missing: does anyone have any idea? My configuration is here
CodeMirror for headers and footers
[edit](@Tpt) Just wondering, but how outlandish would it be to get that?
It certainly would be useful, for syntax coloring, and for ctrl-z to work after .val()ing. — Alien 3
3 3 17:55, 16 September 2024 (UTC)