After enabling this feature on Wiktionary, when I search for "son", instead of landing at son, I get pointed to the page... "són".
If I wanted to look for a page with a diacritic in its title, I would just type it myself, dammit.
After enabling this feature on Wiktionary, when I search for "son", instead of landing at son, I get pointed to the page... "són".
If I wanted to look for a page with a diacritic in its title, I would just type it myself, dammit.
I'll have a look at this soon. I've filed it here: https://bugzilla.wikimedia.org/show_bug.cgi?id=59841
Replying to my non-WMF account as my WMF account just to make things more confusing. Anyway, I have information: Wiktionary has 8 pages that all "near match" son with the current analysis setup: sơn Son són son sön SON søn soñ
If we turn off ascii folding it still has three: Son son SON
I'm inclined to have the "go" search for things that have multiple options just drop you on the search results page.
Other options are to try to guess what you wanted with various strategies: 1. Assume you want the most linked page that near matches "son". That is "son" on wiktionary by a wide margin. 2. Assume you want the page that is "closest" to what you typed. I'm not sure how I'd resolve ties other than giving up and showing you the search results page. Still, that would also take you to the "son" page in this case. More stuff?
Obviously going to "són" is wrong. I'm not really sure what is right though.
The following is a must (I think): If there is a page with the exact same capitalization+accents, go to that page (disregarding capitalization of the first letter for non-casesensitive projects).
I would then prefer to have: if there is one page with the exact same accents, but different capitalization, go directly to that page.
Otherwise, show the search results.
@NEverett: It is undesirable to just go to search results. Skalman's sequence seems right, though there may be still more wrinkles. If you can get this right for English Wiktionary (lots entries, lots of languages, lots of scripts), you should have many situations covered.
I've submitted for review something pretty close to Skalman's sequence. I'm not sure when we'll next push code to Wiktionary but you should be able to monitor the bug for status.
This is deployed. When you get a chance please give it a shot. I tried it with various changes to "son" and it seemed much better to me.