Jump to content

Topic on Help talk:Extension:ParserFunctions

How to compare text strings ?

9
Place Clichy (talkcontribs)

Hello!

I am looking for a way to test 2 text strings alphabetically in template code. I want to add a parent bilateral relations category, which are in the format of e.g. commons:Category:Relations of Bangladesh and Myanmar (note the alphabetical order: Bangladesh < Myanmar). What I would like to do is something like that:

{{#ifexpr: {{{1}}} < {{{2}}} 
   | [[Category:Relations of {{{1}}} and {{{2}}}]]
   | [[Category:Relations of {{{2}}} and {{{1}}}]]
}}

Of course this does not work, because #ifexpr compares numerical expressions, not alphabetical ones. The way I am currently doing it is (simplified):

{{#ifexist: Category:Relations of {{{1}}} and {{{2}}}
   | [[Category:Relations of {{{1}}} and {{{2}}}]]
   | [[Category:Relations of {{{2}}} and {{{1}}}]]
}}

The trouble is that if parameters are in the reverse alphabetical order (e.g. {{{1}}} is Myanmar and {{{2}}} is Bangladesh) and there is a category redirect (e.g. commons:Category:Relations of Myanmar and Bangladesh softly redirects to commons:Category:Relations of Bangladesh and Myanmar), then this code adds the redirected category instead of the target one.

Does anyone have any idea? The template in question is commons:Template:Aircraft of in category.

Verdy p (talkcontribs)

There's no builtin support in "#ifexpr:" or "#if:" to compare strings. You need another parserfunction, for example you can use "#invoke:" via Scribunto, to call a function defined by a Lua module. Note however that Lua basically performs a lexicographic comparison of strings with its "<" operator, it does not trim them, does not parse any HTML (not even HTML comments that may be present in parameters), does not convert HTML character entities, does not normalize strings, and then does not perform any UCA collation (so "é" would sort *after* "f" and not between "e" and "f").

You may want to call:

{{#ifeq: {{{1|}}} | {{{2|}}}
| <!-- empty -->
| {{#ifexpr: {{#invoke:Modulename|compare|{{{1|}}}|{{{2|}}}}} < 0
  |  {{#ifexist: Category:Relations of {{{1|}}} and {{{2|}}}
     | [[Category:Relations of {{{1|}}} and {{{2|}}}]]
     | [[Category:Relations of {{{2|}}} and {{{1|}}}]]
     }}
  | {{#ifexist: Category:Relations of {{{2|}}} and {{{1|}}}
    | [[Category:Relations of {{{2|}}} and {{{1|}}}]]
    | [[Category:Relations of {{{1|}}} and {{{2|}}}]]
    }}
  }}
}}

However this does not resolve the redirects (#ifexist are giving false hints). For that you need a Lua module that can not only test the effective existence iof either links, and then detect if one is a redirect and get its target (it has to load the page and parse its begining, because MediaWiki still does not expose in Lua if a page is a redirect and what is its target; MediaWiki internally parse pages and detects that and maintain that in a cache that is used when loading any page name via some links, but it does not index that information in an accessible way; loading and parsing the page manually in Lua is a bit costly and errorprone due to the MediaWiki syntax).

So the best way you can do is to use your template with parameters 1 and 2, not perform any test on them. But then update the page containing tranclusions of your template using the explicit parameters values in the correct order (and then swap that order if one is a redirect).

There are other caveats: the parameters 1 and 2 may contain disambiguation suffixes, that may be removed in the binary relation (e.g. "Paris, Texas" and "Austin, Texas": would you name your category as "Relations of Austin, Texas and Paris, Texas", or as "Relations of Austin and Paris (Texas)"... Beware that naming pages automatically is tricky, there are frequently "aliases" (e.g. "Relations of France with the United States" or the reverse, note that there may be other way to express the combination), and some preferences that may change over time (or will need to take into account some decisions, not always the same between countries or languages, and sometimes conflicting). As well you have to manage the possible insertion of articles (like "the" in English) before some entity names, which may not be present when entity names are used alone in page names (e.g. with "United States": "Relations of France and the United States", "Relations of the United Kingdom and the United States", "Relations of the United States and Vietnam").

Such binary relations with arbitrary combination should be avoided, they explode exponentially and are a nightmare to maintain (e.g. for 200 countries, you get almost 40,000 relations, and most of them will be empty; and for ternary relations you'd reach about 8,000,000!). They should be created manually and added individually where relevant.

Dinoguy1000 (talkcontribs)

MediaWiki still does not expose in Lua if a page is a redirect and what is its target

This is blatantly false; the mw.Title library supports finding if a title is a redirect and what title it redirects to, as seen with e.g. w:Template:Target of.

Verdy p (talkcontribs)

Interesting to know, because the last time I checked, there was no such extension in the Scribunto library. So it was added recently after many years asking for it (yes I know it was present in the internal PHP API, but it was not at the original and pages had to be parsed to find if it was a redirect and find its target; this was added to accelerate the navigation, because of course the MediaWiki parser could store the result when parsing a saved page)!

Also please moderate your terms and avoid such fast unthought reply in your first phrase. For many years we had to use a workaround for that (for example in Commons) because there was no such builtin support. And remember that this question is essentially about categories in Commons, rather than English Wikipedia.

Dinoguy1000 (talkcontribs)

isRedirect has been part of Lua since at the latest March 2013; redirectTarget dates to May 2016 (phab:T68974). Hardly "recent" on either account, when we've only had Scribunto/Lua since 2012 or so.

Also please moderate your terms and avoid such fast unthought reply in your first phrase.

Given basically everything I've seen you say/do, and my own past interactions with you, I think I won't, thanks.

Tacsipacsi (talkcontribs)

Actually, it doesn’t really matter whether Scribunto provides information on what MediaWiki thinks to be a redirect; it won’t catch category redirects using c:Template:Category redirect anyway. Category redirects are rarely if ever real redirects.

Verdy p (talkcontribs)

On Commons there are redirects on categories. Especially those given in the example above.

There's a workaround used actually in Commons that can also detect soft redirects on categories, and find their targets (that cannot use the "redirectTarget", which is also very costly, jsut like almost all functions in the "mw.title" module in Lua, and does not work in practice due to its severe limitations). This still requires parsing category pages, because there's still no support in MediaWiki for them (by some extension?). I've tried the "redirectTarget" and yyes your suggestion does not work and is not a correct reply to the request made above, so my reply was correct (absolutely not "blatantly false" as you said with your abusive reply).

And if you (Dinoguy1000) don't want to moderate your terms in direct reply to a thread where you were not involved or cited at all, then you are clearly abusing the contributor terms, because you don't provide any help to any one, and you are here just to cause troubles.

Place Clichy (talkcontribs)

Thanks for the input! I guess that my question is now: is there an available Lua-coded function which compares two text strings alphabetically e.g. is A < B? User:Verdy p mentioned that Lua basically performs a lexicographic comparison of strings with its "<" operator but I'm not sure how I can use this operator, and writing a Lua module entirely for that seems overkill and out of my reach.

The suggestion of putting parameters in the right order in the first place is not feasible, as the template does other things too. Obviously Aircraft of Brazil in France (populated by {{Aircraft of in category|Brazil|France}} is not the same as Aircraft of France in Brazil (populated by {{Aircraft of in category|France|Brazil}}; however both should be in the same parent Category:Relations of Brazil and France.

I do not really intend testing for category redirects. Category redirects are always soft redirects (either on Commons or English Wikipedia), so they're hard to track.

The article before the country name is managed by {{CountryPrefixThe}} and that works well.

Re: other suggestions, there is in fact an implicit assumption that this one template will only be used for country names found in commons:Category:Bilateral relations and its subcategories. There may therefore be no need to clean HTML formatting, disambiguators and the like. In case the bilateral relations category does not exist, the template's code catches in a maintenance category and it can be created manually. Of course, there are some cases that cannot be entirely foreseen, such as the inconsistent use of China vs. People's Republic of China in the bilateral relations category tree, but they can, or have to, be managed manually.

My main concern really is the management of these category redirects related to alphabetical order.

Verdy p (talkcontribs)

Lexicographic comparison means that it only compares the texts byte per byte (it is UTF-8 encoded). Lua strings themselves do not directly handle Unicode and the related UCA collation.

MediaWiki provides an API with the module "ustring", which adds some support for Unicode, but not any comparison operator or UCA collation for now (what it supports is the concept of Unicode "code points" so that a single code point may be encoded on several UTF-8 bytes, and positions for substrings are counted by codepoints, being aware of their variable encoding length; it also provides support for normalization, as well as case conversion needed by MediaWiki for its builting basic parser functions "LC:" "UC:", "LCFIRST", "UCFIRST"; note that it does not perform any MediaWiki parsing, so it's up to the caller to manage trimming).

So for now there's no collation in mw.string, and so no function you can call from it to compare strings. Some modules have defined a "weak" collation algorithm for sorting. But still this won't be sufficient for your need on Commons, because there's actually for now NO standard fixing the order in category names between "Relations of A and B" and "Relations of B and A". So you'll end up having redirecting categories from one to the other (using {{Category redirect|Target name}}: you need to parse in Lua the target page to detect these templates and fing the target that you'd like to link to (and that will solve the ordering problem without needing any collation, and also take into account the problem of variable disambiguation suffixes that may be needed in category names).

As you see, there's no "simple" instantaneous solution. This requires code and tests by navigating all the categories you'll want to link to and find how they are effectively named.

There's a module in Commons for that: "Module:Redirect", but others can be helpful to help you manage category redirections.

One example is "Module:Countries" that performs such detection of redirects (pluis handles know "aliases" for category names that don't always need a disambiguation suffixes, or a leading article), and also provides a basic collation for its listed items (note that they are ordered using their *translated* names, found in Wikidata; that order is "crude", but for now has been sufficient even if technically it's still not fully UCA compliant, and cannot manage collation orders depending on the language used: the order is locale-neutral, similar to the UCA DUCET, except that it sometimes needs tweaks, notably in Chinese where the order of times can be tuned, or in Germanic languages that consider letters with diacritics sorted as primary letters at end of their alphabet and not as secondary variants: tweaking the order is made in data modules).

Reply to "How to compare text strings ?"