Extension talk:Translate/Mass migration tools/Alignment
Add topicAppearance
Latest comment: 10 years ago by Nemo bis in topic Non-linguistic text comparison
Non-linguistic text comparison
[edit]It seems we agree on the following.
- Those markup examples are just examples, they won't cover all that's needed.
- It's ok to implement them as first coding step, aka "getting hands dirty experimenting with section aligment", to see what comes out of it.
- Later we'll need something more robust that doesn't require tedious work to hardcode all kinds of checks. It's not important to have a perfect solution, just a reasonable initial alignment humans can work on. Ideas.
- A MediaWiki markup extractor or DOM tree, whatever, to compare paragraphs with (probably hard).
- Use the MediaWiki parser: make it produce the parsed text, subtract visible text to wikitext, calculate e.g. edit distance over what's left.
- «CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related».