The list contains no such expansion: https://www.mediawiki.org/wiki/Special:ExtensionDistributor/CanonURL Where does it go?
Extension talk:CanonURL/Flow
Using this extension, and hopefully its all up and running the way it should. Looking forward to a stable release. Thanks for the great extension!
As demonstrated in this example, the complete lack of escaping in this extension is causing two problems:
- Titles that contain special characters will not be escaped, as such the url outputted in the
rel="canonical"
will not lead to the real article but a "Not found page" or "Bad title" error. - Title that contain html script characters will result in an arbitrary html injection vector.
I've filed https://github.com/Abhi-M/CanonURL/issues/1 to track this issue. I'll provide a patch shortly.
Released an immediate new version to fix the security issue. Thanks a lot for pointing it out. Seen some other suggestions from your side on the GitHub page (https://github.com/Abhi-M/CanonURL/issues), will be more than glad to implement them.
It looks like characters are being now URL escaped where they should not be, such as the colon (namespace separator) and slash (subpage separator).
For example, http://en.uncyclopedia.co/wiki/User:Kaizer_the_Bjorn/Caffeine is mirrored at http://mirror.uncyc.org/wiki/User:Kaizer_the_Bjorn/Caffeine and the mirror is flagged with extension:CanonURL to direct search bots to the main site.
The result is <link rel="canonical" href="http://en.uncyclopedia.co/wiki/User%3AKaizer_the_Bjorn%2FCaffeine" /> which is a big fat 404 depending on how the target site's Apache rewrites short URLs.
Mediawiki seems to tolerate the %3A namespace separator. It's the / as %2F which is breaking and it seems to depend on Apache rewriting URLs and not on MW: http://en.wikipedia.org/wiki/User%3AKaizer_the_Bjorn%2FCaffeine would work if the subpage existed while http://wikipedia.org/wiki/User%3AKaizer_the_Bjorn%2FCaffeine goes 404 and http://wikipedia.org/wiki/User%3AKaizer_the_Bjorn/Caffeine would be valid.
Admittedly, the page doesn't exist on Wikipedia, but a 404 at the Apache level indicates MediaWiki didn't even see these; perhaps the %2F isn't a / to mod_rewrite on all MW sites?
Your latest revision addresses it partially by url escaping the value, it does not however apply any html escaping. In practice this is much less of an issue, however there are lots of tiny downsides and inconveniences with how the extension operates in general. I'd highly recommend considering replacing the code with https://github.com/Abhi-M/CanonURL/pull/4.
Yes, agrees with you. Last time, I haven't seen the solution you gave and went with a solution that came to my mind. It stands fixed and I added your name, will be glad to add you as a developer if it is alright with you.
if (isset($wgArticle)) {
$out->addHeadItem( 'canonical',
'<link rel="canonical" href="'.$wgArticle->getTitle()->getFullURL().'" />'."\n");
} else {
$out->addHeadItem( 'canonical',
'<link rel="canonical" href="'.$CanonBaseURL.$pg_title.'" />'."\n");
}
Don't forget to run the attribute value through htmlspecialchars()
, and use Title::getCanonicalURL
instead of Title::getFullURL
so that sites that support both HTTP and HTTPS output the canonical url here instead of the one on the current request.