Jump to content

Help talk:Extension:Wikisource/Wikimedia OCR

About this board

OCR images on Commons in specified categories

2
Prototyperspective (talkcontribs)

Hello, this seems like a very useful tool. I think it has much larger potential and could be used far more if it could be used for Commons categories / petscan category intersections. @Samwilson and others could you please take a look at this proposal? (now here)

Maybe there already is some tool to make it scan all images in a category which could be altered so as to also enable using petscan results and the addition of categories based on which text has been found. It would be really useful for many applications.

Prototyperspective (talkcontribs)

I exported the petscan file results and converted it to URLs so they can be opened quickly in new tabs for categorization. I still think a feature to OCR files in a specified category would be very useful. Instead of enabling adding categories based on that I guess one could have the tool write the OCR text somehow to the file info whereby one could then create a search query to bulk-categorize them from SpecialSearch using cat-a-lot....e.g. sth like ocr:", 2016" deepcategory:"Our World in Data maps" (or insource:"|ocr=, 2016") would go into cat c:Category:2016 maps of the world (except for nonworld maps which can be easily spotted). This was just an example.

  • Adding a feature to OCR all files in a category using incategory search operator
  • Adding a feature to write the OCRd text to the file description

@Enhancing999 and Glrx: you may be interested in it since you participated in the discussion. Nevertheless, I don't think it's an overly important issue and having so much OCRd text in Commons could also cause problems if files also show up when terms in the ocr field of the Information template(?) are searched for without something like ocr:"search terms". However, since that OCR tool is already there maybe implementing it wouldn't take that much time and be worth it or it may be good to track this somewhere else.

Reply to "OCR images on Commons in specified categories"

Translates marks problems

2
Shooke (talkcontribs)

@SGill (WMF): This page have a problem to translate, marks not corrects, ans the translate system not work. See spanish version. Shooke (talk) 15:05, 10 September 2023 (UTC)

Samwilson (talkcontribs)

@Shooke: Sorry to hear it's not working. I think we need some more detail, in order to debug. Can you link to an example on Spanish Wikisource where this problem is occurring? Do you get any error message? Thanks!

Reply to "Translates marks problems"

Tesseract OCR : How to save advanced option ?

1
Yug (talkcontribs)

Is there a way to "save" my advanced option psm=4 in wikisource.org itself ?

Discussion moved to phab:T318594.

Reply to "Tesseract OCR : How to save advanced option ?"
2601:280:4200:3B9:5161:F12B:6669:9A47 (talkcontribs)

I included wfLoadExtension( 'Wikisource' ); and $wgWikisourceEnableOcr = true; in my LocalSettings.php, but I don't see the transcribe text button when editing. I'm using wfLoadExtension( 'WikiEditor' ); for the toolbar.

Reply to "How do I enable it?"
There are no older topics