Help talk:Extension:Wikisource/Wikimedia OCR

About this board

OCR images on Commons in specified categories

2 comments • 23:56, 11 October 2024 2 months ago

2

Hello, this seems like a very useful tool. I think it has much larger potential and could be used far more if it could be used for Commons categories / petscan category intersections. @Samwilson and others could you please take a look at this proposal? (now here)

Maybe there already is some tool to make it scan all images in a category which could be altered so as to also enable using petscan results and the addition of categories based on which text has been found. It would be really useful for many applications.

Reply Edited 13:28, 21 September 2024 3 months ago

Prototyperspective (talkcontribs)

I exported the petscan file results and converted it to URLs so they can be opened quickly in new tabs for categorization. I still think a feature to OCR files in a specified category would be very useful. Instead of enabling adding categories based on that I guess one could have the tool write the OCR text somehow to the file info whereby one could then create a search query to bulk-categorize them from SpecialSearch using cat-a-lot....e.g. sth like ocr:", 2016" deepcategory:"Our World in Data maps" (or insource:"|ocr=, 2016") would go into cat c:Category:2016 maps of the world (except for nonworld maps which can be easily spotted). This was just an example.

Adding a feature to OCR all files in a category using incategory search operator
Adding a feature to write the OCRd text to the file description

@Enhancing999 and Glrx: you may be interested in it since you participated in the discussion. Nevertheless, I don't think it's an overly important issue and having so much OCRd text in Commons could also cause problems if files also show up when terms in the ocr field of the Information template(?) are searched for without something like ocr:"search terms". However, since that OCR tool is already there maybe implementing it wouldn't take that much time and be worth it or it may be good to track this somewhere else.

Reply 23:56, 11 October 2024 2 months ago

Reply to "OCR images on Commons in specified categories"

Translates marks problems

2 comments • 01:37, 11 September 2023 1 year ago

2

Shooke (talkcontribs)

@SGill (WMF): This page have a problem to translate, marks not corrects, ans the translate system not work. See spanish version. Shooke (talk) 15:05, 10 September 2023 (UTC)

Reply 15:05, 10 September 2023 1 year ago

Samwilson (talkcontribs)

@Shooke: Sorry to hear it's not working. I think we need some more detail, in order to debug. Can you link to an example on Spanish Wikisource where this problem is occurring? Do you get any error message? Thanks!

Reply 01:37, 11 September 2023 1 year ago

Reply to "Translates marks problems"

Tesseract OCR : How to save advanced option ?

One comment • 16:27, 26 September 2022 2 years ago

1

Yug (talkcontribs)

Is there a way to "save" my advanced option psm=4 in wikisource.org itself ?

Discussion moved to phab:T318594.

Reply Edited 16:27, 26 September 2022 2 years ago

Reply to "Tesseract OCR : How to save advanced option ?"

How do I enable it?

One comment • 01:04, 15 December 2021 3 years ago

1

2601:280:4200:3B9:5161:F12B:6669:9A47 (talkcontribs)

I included wfLoadExtension( 'Wikisource' ); and $wgWikisourceEnableOcr = true; in my LocalSettings.php, but I don't see the transcribe text button when editing. I'm using wfLoadExtension( 'WikiEditor' ); for the toolbar.

Reply 01:04, 15 December 2021 3 years ago

Reply to "How do I enable it?"

There are no older topics