Manual:Performing string operations with parser functions
Various string functions are part of Extension:ParserFunctions. However, this part is disabled on Wikimedia wikis, with $wgPFEnableStringFunctions = false
. The following examples show how to perform string operations without using those parser functions.
Concatenation
[edit]String concatenation is done by juxtaposition (i.e., simply placing the segments adjacent to one another).
The expansion of the concatenation of two balanced segments of wikitext is equal to the concatenation of the two expanded wikitexts.
However, the expansion of the concatenation of two unbalanced segments, like '''bo
and ld'''
, will not be rendered the same way when those two segments are expanded and then concatenated (bold vs. '''bold''').
Trimming
[edit]Trimming is the removal of newlines and spaces from the start and the end of a string.
Automatically trimmed are:
- the value of a named template parameter
- the value of a parameter of a parser function (see Help:Magic words#Parser functions and Help:Extension:ParserFunctions), including the one after the colon
Thus, if trimming is desired, this is in many cases conveniently performed together with another operation one wants to perform. To just trim string S, one applies a template or parser function such that this does not affect the string, apart from the trimming, e.g.:
{{1x|1=S}}
(using Template:1x containing{{{1}}}
){{#if:x|S}}
{{#switch:|S}}
{{padleft:S}}
{{padright:S}}
Equality
[edit]For equality there are #ifeq:
and #if:
. To force a string-based comparison with #ifeq:
, add a non-numerical character to both compared arguments.
Note that this prevents the trimming at the side at which the character is added. If desired, trim the arguments first. If trimming is not desired, put a non-numerical character before and after both strings, e.g. put them in quotes.
Case
[edit]See Help:Magic words#Formatting.
Padleft
[edit]The following applies to trimmed strings S and P, and n = 0, 1, 2,.. Note that the result is not trimmed: if S is empty the result can end with spaces and/or newlines.
{{padleft:S|n|P}}
pads a given string S on the left with a padding string P (default: "0"), to increase the length (where a newline counts as one character) to min(n,500), or if length (S) ≥ min(n,500) it just returns S. The padding string P is zero or more times repeated, with finally possibly a truncated P. If P is empty, S is returned.
Properties:
- length (
{{padleft:S|n|P}}
) = if P is non-empty then max ( min(n,500), length (S) ) else length (S) {{padleft:S|n|P}}
is equal to S if and only if length (S) ≥ min(n,500) or P is empty.{{padleft:S|n|P}}
is equal to P if and only if [length (P) = max ( min(n,500), length (S) ) or 0] and S is equal to a substring of P ending at the end of P (i.e., P is the concatenation RS of some string R with S).
An important special case is that where S is empty: {{padleft:|n|P}}
produces a string of length min(n,500) consisting of zero or more repetitions of P, with finally possibly a truncated P. In particular, if P has a length greater than min(n,500), this is the truncation of P to min(n,500) characters, i.e., the substring of P of min(n,500) characters starting at the start of P.
The above-mentioned properties reduce to:
- length (
{{padleft:|n|P}}
) = if P is non-empty then min(n,500) else 0 {{padleft:|n|P}}
is equal to P if and only if length (P) = min(n,500)
Examples:
"{{padleft:|0|ab}}"
gives "" [1]"{{padleft:|1|ab}}"
gives "a" [2] (truncation)"{{padleft:|2|ab}}"
gives "ab" [3]"{{padleft:|3|ab}}"
gives "aba" [4]"{{padleft:|4|ab}}"
gives "abab" [5]"{{padleft:|5|ab}}"
gives "ababa" [6]
"{{padleft:|3| a b }}"
gives "a " (the parameter is trimmed, the result is not)"{{padleft:|3|abcdefgh}}"
gives "abc" (the parameter is trimmed to the desired length)
"{{padleft:1|0|ab}}"
gives "1" [7]"{{padleft:1|1|ab}}"
gives "1" [8]"{{padleft:1|2|ab}}"
gives "a1" [9]"{{padleft:1|3|ab}}"
gives "ab1" [10]"{{padleft:1|4|ab}}"
gives "aba1" [11]
"{{padleft:1|5}}"
→ "00001" [12]"{{padleft:1|5|}}"
gives "1" [13]"{{padleft:1|5| }}"
gives "1" [14]
"{{padleft:é|5}}"
→ "0000é" [15]
{{padleft:|2|14:38}}
gives 14 (hour; not needed for the current time, because there is also the variable{{CURRENTHOUR}}
; see also the #time function)
Maximum string length
[edit]If characters of P are in the result (P is not empty and the string is not already longer than the required length and not longer than 500) the maximum length of the resulting string is 500:
"{{padleft:abc|507|12345678 0}}"
gives "12345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 012345678 01234567abc" [16]"{{padleft:|507|123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123}}"
gives "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 " [17]
However, if no characters of P are in the result (P is empty or the string is already longer than the required length or longer than 500), the whole string S is returned, even if it is longer:
"{{padleft:123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123|507|}}"
gives "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123" [18]"{{padleft:123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123|507|p}}"
gives "123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123" [19]
Applications with respect to a page
[edit]Rendering the truncated expanded wikitext of a page:
{{padleft:|500|{{Help:Parser function}}}}
gives "
<table class="template-pd-help-page"><tr>
<td class="icon-cell">[[File:PD-icon.svg|30px|link=|alt=PD]]</td>
<td>'''Note:''' When you edit this page, you agree to release your contribution under the [https://creativecommons.org/publicdomain/zero/1.0/ CC0]. See [[Special:MyLanguage/Project:PD help|Public Domain Help Pages]] for more info.
</td>
<td class="icon-cell">[[File:PD-icon.svg|30px|link=|alt=PD]]</td>
</tr></table>
<div style="clear: right; margin-bottom: .5em; float: right; margin-left:"
Note that when specifying an arbitrary number of characters. links, table structure, XML-style tags etc. may be broken. However, if the braces are balanced in the source page there are neither unbalanced braces in the result, because expansion is done before truncation.
Displaying the truncated expanded wikitext:
{{#tag:nowiki|{{padleft:|500|{{Help:Parser function}}}}}}
gives "
<table class="template-pd-help-page"><tr>
<td class="icon-cell">[[File:PD-icon.svg|30px|link=|alt=PD]]</td>
<td>'''Note:''' When you edit this page, you agree to release your contribution under the [https://creativecommons.org/publicdomain/zero/1.0/ CC0]. See [[Special:MyLanguage/Project:PD help|Public Domain Help Pages]] for more info.
</td>
<td class="icon-cell">[[File:PD-icon.svg|30px|link=|alt=PD]]</td>
</tr></table>
<div style="clear: right; margin-bottom: .5em; float: right; margin-left:"
Displaying the truncated wikitext:
{{padleft:|500|{{msgnw:Help:Parser function}}}}
gives "<languages />
{{PD Help Page}}
{{TOCRight}}
'''Magic words''' are strings of text that MediaWiki associates with a return value or function, such as time, site details, or page names. This page explains only the standard magic words; for a technical reference, see {{ll|Manual:Magic words}}.
There are three general types of magic words:
* '''[[#Behavior switches|"
{{padleft:|500|{{Help:Template}}}}
gives "{{padleft:|500|{{Help:Template}}}}" (bugged display, example suppressed)
The page has been prepared with <onlyinclude>
tags, so that the small maximum of 500 characters is not wasted on a header box or table code. This also reduces the size of what is transcluded before applying padleft, reducing the post-expand include size, for which there is a limit of 2048000 bytes per page. A disadvantage could be that other pages cannot include the rest of the page either.
Padright
[edit]The following applies to trimmed strings S and P, and n = 0, 1, 2,.. Note that the result is not trimmed.
{{padright:S|n|P}}
pads a given string S on the right with a padding string P (default: "0"), to increase the length to n, or if length (S) ≥ n it just returns S. The padding string P is zero or more times repeated, with finally possibly a truncated P. If P is empty, S is returned.
Properties:
- length (
{{padleft:S|n|P}}
) = if P is non-empty then max ( n, length (S) ) else length (S) {{padright:S|n|P}}
is equal to S if and only if length (S) ≥ n or P is empty.
The special case where S is empty is the same as for padleft.
Examples:
"{{padright:|0|ab}}"
gives "" [20]"{{padright:|1|ab}}"
gives "a" [21] (truncation)"{{padright:|2|ab}}"
gives "ab" [22]"{{padright:|3|ab}}"
gives "aba" [23]"{{padright:|4|ab}}"
gives "abab" [24]"{{padright:|5|ab}}"
gives "ababa" [25]
"{{padright:1|0|ab}}"
gives "1" [26]"{{padright:1|1|ab}}"
gives "1" [27]"{{padright:1|2|ab}}"
gives "1a" [28]"{{padright:1|3|ab}}"
gives "1ab" [29]"{{padright:1|4|ab}}"
gives "1aba" [30]
"{{padright:1|5}}"
→ "10000" [31]"{{padright:1|5|}}"
gives "1" [32]"{{padright:1|5| }}"
gives "1" [33]
"{{padright:abc|13| d e }}"
gives the wikitext"abcd ed ed "
rendered as "abcd ed ed ""{{padright:abc|14| d e }}"
gives the wikitext"abcd ed ed "
rendered as "abcd ed ed "
Length of a string
[edit]There is no function for the length of a string, but from the above follows:
{{padleft:S|n}}
is equal to S if and only if length (S) ≥ min(n,500).
(Note that {{padleft:S|n}}
may end with spaces and/or newlines, so care should be taken that it is not trimmed before the comparison.)
Thus with a binary search the length can be found, except that if it is 500 or more this fact can be found, not the actual length. This is done in m:Template:Len with the help of m:Template:Len/digit. They use quotation marks around the string, so the maximum length found for the string itself is 498.
Extracting a character
[edit]As follows from the above, the first character of a string P can be extracted with {{padleft:|1|P}}
. This method is preferable for this case compared with the method below, for efficiency and because there are less limitations.
There is no function for extracting a character from an arbitrary given position of a string. However, for a given character we can compare the truncation of the string up to and including the given position with the truncation of the string until that position, concatenated with the character. Thus we determine whether the character at the given position in the string is equal to the character we tried. (Note that the truncation of the string until the given position may end with spaces and/or newlines, so care should be taken it is not trimmed before the concatenation. Similarly, when trying whether the character is a space or newline, care should be taken that the compared strings are not trimmed before the comparison, because then we would not distinguish between a space and a newline.) Also care should be taken that the result, which may be a space or a newline, is not unintentionally trimmed.
This is done in m:Template:Chr with the help of m:Template:Chr/list. The latter contains a switch with a case for each of the supported characters.
The automatic newline feature/bug for "{|" does not affect the result, except that it adds the newline if "{|" is at the start of the substring, since Sub calls Chr for every character position separately. However, just "*", "#", ":", and ";" without newline cannot be produced by any template or parser function. Possible remedies:
- put the character in <nowiki> tags
- Postpone adding a resulting character to the output until the next charaxter is checked; output it together with that character is the second character is "*", "#", ":", or ";". This seems rather complex, since there is no feature for variables in the computer programming sense; it may require deep nesting, but then one may be restricted by the system limitations on the number of levels.
- Post-process the resulting string, removing the added newlines.
Extracting a substring
[edit]As mentioned above, if n is not greater than the length of a string P, {{padleft:|n|P}}
produces the truncation of P to n characters, i.e., the substring of P of n characters starting at the start of P. This method is preferable for this case compared with the method below, for efficiency and because there are less limitations.
There is no function for extracting a substring that does not start at the start of the string, neither one for looping. However, for every potential position we can determine whether it is within the range of the required substring, and if so, extract the character. The results are simply concatenated.
For each of the required character positions it requires two calls of padleft and a call of a switch. Thus quickly system limits are reached.
Extracting a substring from the right side of a given string
[edit]Extracting a substring from some given position to the end of a string requires the specification or determination of the length of the string. This also applies to extracting a substring of a given length from the end of the string. Of course, if the length is known it is more efficient to specify it than to have the system determine it.
XML-style tags
[edit]XML-style tags, e.g. <nowiki>
tags and <math>
tags, together with their content, are temporarily replaced by a so-called strip marker, a unique code with a length of ca. 37 characters plus the length of the tag name (independent of the length of the content), e.g.
foo
This affects string functions. If a strip marker is truncated, the remainder is exposed. Also, padding a string with a strip marker is based on the length of that, not of what it represents.
Examples with <nowiki>
tags:
{{padleft:|38|<nowiki>abc</nowiki>}}
gives{{padleft:|40|<nowiki>abc</nowiki>}}
gives{{padleft:|41|<nowiki>abc</nowiki>}}
gives{{padleft:|42|<nowiki>abc</nowiki>}}
gives{{padleft:|43|<nowiki>abc</nowiki>}}
gives{{padleft:|48|<nowiki>abc</nowiki>}}
gives
{{padleft:<nowiki>abc</nowiki>|40|1234567890}}
gives 123456abc{{padleft:<nowiki>abc</nowiki>|43|1234567890}}
gives 123456789abc{{padleft:<nowiki>abc</nowiki>|44|1234567890}}
gives 1234567890abc{{padleft:<nowiki>abc</nowiki>|50|1234567890}}
gives 1234567890123456abc{{padleft:<nowiki>abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc</nowiki>|50|1234567890}}
gives 1234567890123456abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc
Examples with <math>
tags:
{{padleft:|38|<math>abc</math>}}
gives for example{{padleft:|40|<math>abc</math>}}
gives for example{{padleft:|43|<math>abc</math>}}
gives{{padleft:|48|<math>abc</nowiki>}}
gives
{{padleft:<math>abc</math>|43|1234567890}}
gives 12345678901{{padleft:<math>abc</math>|44|1234567890}}
gives 123456789012{{padleft:<math>abc</math>|50|1234567890}}
gives 123456789012345678{{padleft:<math>abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc</math>|50|1234567890}}
gives 123456789012345678
String storage
[edit]Suppose we have the string ABCD and also need the substring CD, then because of the inefficiency of finding the substring we can better store (or pass on as parameter values) AB and CD as separate data items.
Date and time variables are available providing output in the form of numbers, allowing easy processing without expensive string operations.
However, for processing the result of {{PAGENAME}} (on the page itself or passed on to a template as parameter value) string functions can be useful, to extract the text after "Wikiproject ", the text between parentheses, the text before the parentheses, etc.
See also
[edit]- Extension:Scribunto/Lua reference manual#String Manipulation - for wikis with the Module namespace
- Help:Magic words#Formatting
- Help:Extension:ParserFunctions##titleparts
- ↑ See the content of the file Parser.php.