User:X-Savitar/Sub-pages/Why d to 0-9
Appearance
Why is [0-9]
better over \d
in MediaWiki?
This question was asked by me on -tech IRC channel and got responses from Nemo_bis and TimStarling. See below:
10:09 (xSavitar) Hi, apart from readability, what are the advantages of using [0-9] over \d in a regular expression? The former is self explicit as one knows it's values between 0 to 9. 11:20 (Nemo_bis) xSavitar: portability is a concern for some 11:22 (xSavitar) Nemo_bis: Okay. So we need to check that first before making a change right? The portability aspect? 11:26 (xSavitar) How does one actually tell if making a change breaks portability concerns, I'm not sure about that tbh 11:28 (TimStarling) apparently \d can indeed use the libc locale, which is quite alarming 11:28 (TimStarling) nothing in mediawiki uses the locale on purpose 11:30 (TimStarling) probably best to replace all usages of \d with [0-9] 11:30 (TimStarling) with the /u modifier, PHP sets the PCRE2_UCP flag, which means that \d will act like \p{Nd} 11:30 (TimStarling) which is probably never intended 11:31 (TimStarling) better to use \p{Nd} explicitly if you want that 11:32 (TimStarling) I guess that's what Nemo_bis means by portability, he means that the syntax of whatever thing you are parsing may randomly change depending on environment variables in the shell used to start apache 11:35 (xSavitar) TimStarling, Nemo_bis, thanks a lot for the context! 11:35 BRPever_ has left IRC (Quit: Connection closed for inactivity) 11:37 (TimStarling) if would be good to reproduce this, but I can confirm that PHP calls pcre2_maketables(), which enables locale-specific matching 11:37 (TimStarling) and the PCRE manual says "By default, characters whose code points are greater than 127 never match \d, \s, or \w, and always match \D, \S, and \W, although this may be different for characters in the range 128-255 when locale-specific matching is happening." 11:37 (TimStarling) http://pcre.org/current/doc/html/pcre2pattern.html