Jump to content

Topic on Extension talk:CanonURL

Krinkle (talkcontribs)

As demonstrated in this example, the complete lack of escaping in this extension is causing two problems:

  • Titles that contain special characters will not be escaped, as such the url outputted in the rel="canonical" will not lead to the real article but a "Not found page" or "Bad title" error.
  • Title that contain html script characters will result in an arbitrary html injection vector.

I've filed https://github.com/Abhi-M/CanonURL/issues/1 to track this issue. I'll provide a patch shortly.

Abhi M Balakrishnan (talkcontribs)

Released an immediate new version to fix the security issue. Thanks a lot for pointing it out. Seen some other suggestions from your side on the GitHub page (https://github.com/Abhi-M/CanonURL/issues), will be more than glad to implement them.

Carlb (talkcontribs)

It looks like characters are being now URL escaped where they should not be, such as the colon (namespace separator) and slash (subpage separator).

For example, http://en.uncyclopedia.co/wiki/User:Kaizer_the_Bjorn/Caffeine is mirrored at http://mirror.uncyc.org/wiki/User:Kaizer_the_Bjorn/Caffeine and the mirror is flagged with extension:CanonURL to direct search bots to the main site.

The result is <link rel="canonical" href="http://en.uncyclopedia.co/wiki/User%3AKaizer_the_Bjorn%2FCaffeine" /> which is a big fat 404 depending on how the target site's Apache rewrites short URLs.

Mediawiki seems to tolerate the %3A namespace separator. It's the / as %2F which is breaking and it seems to depend on Apache rewriting URLs and not on MW: http://en.wikipedia.org/wiki/User%3AKaizer_the_Bjorn%2FCaffeine would work if the subpage existed while http://wikipedia.org/wiki/User%3AKaizer_the_Bjorn%2FCaffeine goes 404 and http://wikipedia.org/wiki/User%3AKaizer_the_Bjorn/Caffeine would be valid.

Admittedly, the page doesn't exist on Wikipedia, but a 404 at the Apache level indicates MediaWiki didn't even see these; perhaps the %2F isn't a / to mod_rewrite on all MW sites?

Krinkle (talkcontribs)

Your latest revision addresses it partially by url escaping the value, it does not however apply any html escaping. In practice this is much less of an issue, however there are lots of tiny downsides and inconveniences with how the extension operates in general. I'd highly recommend considering replacing the code with https://github.com/Abhi-M/CanonURL/pull/4.

Abhi M Balakrishnan (talkcontribs)

Yes, agrees with you. Last time, I haven't seen the solution you gave and went with a solution that came to my mind. It stands fixed and I added your name, will be glad to add you as a developer if it is alright with you.

Reply to "Patch: Escape output"