Jump to content

Topic on Extension talk:AbuseFilter/Rules format/Archive 2

How to detect "br /" ?

5
Erik Baas (talkcontribs)

This filter:

added_lines irlike ".*\<(\/br|br\/|br&#32;\/)\>.*"

detects "/br" and "br/", but not "br /"! Replacing "&#32;" with a literal space character doesn't work either. What am I doing wrong?

Daimona Eaytoy (talkcontribs)

Entities should not be encoded, so replacing " " with a literal space is the right thing to do. You can test the modified regex on Special:AbuseFilter/tools and you'll see that it does match "br /". Also, regexps are not anchored in AbuseFilter, meaning you can omit the ".*" at the beginning and end of the expression. You could rewrite the above as added_lines irlike "<(\/br|br *\/)>", and this should work as far as I can see.

BDavis (WMF) (talkcontribs)

You could try using \s* as the PCRE expression for "zero or more consecutive whitespace characters". That could look something like .*<\/?br\w*\/?>.*. That regex could be explained as "anything, followed by '<', optionally followed by '/', followed by 'br', followed by zero or more consecutive whitespace characters, optionally followed by '/', followed by '>', followed by anything".

Erik Baas (talkcontribs)

Yes, thank you!

Od1n (talkcontribs)

Minor mistake in BDavis code: should be .*<\/?br\s*\/?>.*. Also (as already noted by Daimona Eaytoy), you may omit the leading and trailing .*, they are useless though make the regexp slower (because of backtracking).

Reply to "How to detect "br /" ?"