Jump to content

Topic on Extension talk:CodeEditor/Flow

Box width and syntax highlight locations

3
Aubrey (talkcontribs)

Hi all, we are trying to integrate the CodeEditor within the Proofreadpage Extension. I would like to ask a few questions:

  • it seems that the CodeEditor, as soon as it is activated with the Toggle button, resizes itself to the full width of the page, thus overlapping the scan of the page (as the Proofreadpages ext does). We have checked inside the jquery.codeditor.js within the setupCodeEditor function, but we haven't found where the new width is exactly set up. We would just be happy to stop CodeEditor from resizing :-) We've seen that, if we disable (in the browser) the width property in the div containing the ace editor, then we are good.
  • as an information for syntax highlghting: we are trying to implement a basic syntax highlighter for XML in CodeEditor. We tried to look inside the code for css and js syntax highlight, and we've seen that it refers to some highlight rules files that we don't know anything about, and we can't manage to lacate. Any hint on how to proceed?

Thanks a lot.

TheDJ (talkcontribs)

Can you please link to WHERE you want to use it ? That is so much easier for me. Currently it only supports .js, .css files, Lua Modules and json. XML can be easily added, but I need to know where it is located.

George Orwell III (talkcontribs)

I might be able offer some background and pointers on this... but I've been wrong before so take it for what its worth.

The ProofreadPage extension is exclusively used on the Wikisource projects - specifically for side-by-side transcription of primarily written works published in print-only and later scanned into doc or image files (.PDF or .DjVu file formats mostly).

I believe the width issue raised has to with do the customized $wgTextarea-to-thumbnail (the side-by-side part) "layout" in the Page: (ns-104) namespace. See an example HERE. Personally, that entire approach is probably filled with all sorts of problems and could do better if overhauled from scratch but that just my opinion based on all the time I've spent at Wikisource.

The XML issue is far more complicated. In a nutshell both PDF and DjVu document formats contain a "hidden" text layer underneath what amounts to a scanned image of a printed page of mostly (rich) text. Currently, this hidden text-layer, when present, is automatically "dumped" into the editbox upon article creation in the Page: namespace. Nearly all of the formatting and detail is lost in the process and what we get left with transcribing is little more than plain-text - usually generated from and OCR of the scanned pages.

Putting PDFs to the side for the moment & focusing on just DjVu files, it is believed there are ways to take that text-layer and convert & parse it as XML, then modify an associated .DTD file all to produce "dumps" that retain a lot if not all of that use useful detail/formatting info (potentially saving huge amounts of effort wasted in [re]transcribing content).

I can't speak for Aubrey & co. but the problem starts with ancient coding that skips the creation of the XML variant and goes straight to doing a simple text dump instead. I believe the files in question can be found in the git Core under .../includes/media/DjVu.php & /DjVuImage.php (both calling executables from sourceforge.net's DjVuLibre project & probably woefully out of date to boot).

Drop me a line at my talk page if the above wasn't enough to discourage you so far.

Reply to "Box width and syntax highlight locations"