Jump to content

Topic on Talk:ResourceLoader/Flow

Is it possible to load resources from a different URL

10
199.247.190.198 (talkcontribs)

Hi,

I'm trying to load all the static resources (js, css and image files) from a separate URL (a CDN actually), and using $wgStylePath has the images working just fine, but all the css and js loaded through the ResourceLoader still comes from the original domain.
Is it possible (now in 1.17.0 or in the future) to make these files load from a different URL?
Thanks,
-Dan

Krinkle (talkcontribs)

Wikipedia does this as well, they load resources through http://bits.wikimedia.org/

To achieve this set $wgLoadScript to where you want requests for load.php to end up.

Note that this means that your CDN needs to have access to your database, MediaWiki and your configuration.

199.247.190.198 (talkcontribs)

Thanks for the answer.
From my testing, setting $wgLoadScript also means the CDN has to be able to execute PHP (load.php), so in other words it has to be a full-on webserver, not just a static file server (as 99% of all CDNs are). The fact that it also needs to access the db and LocalSettings.php means it probably has to be on the same subnet, etc.
I'm starting to wonder about this resourceloader - it goes to a lot of trouble to minify a ton of css and js on every single request, using php to do so, and hitting the main web server (in this case, memory-hungry Apache).
Wouldn't it be better to minify as many of those files as possible beforehand and just serve them as they are (and then they can also come from a static-file-server CDN somewhere else on the internet (ie. entirely different subnet, etc.) ?
Is there any way to achieve this?
Thanks again,
-Dan

Krinkle (talkcontribs)

Well depending on your set up, it obviously is not intended to go through all minification, combination, embedding, localization etc.

Because of various unique combinations in the URL such as the version timestamps that ResourceLoader includes in requests to load.php, these requests are highly cachaeble!

Because of the wide range of scenarios that MediaWiki has to support to be able to scale to a platform like Wikipedia, it needs access to the filesystem and the database. (Consider localization into over 300 languages including right-to-left and non-latin languages, support for user-preferences that allow users to change the skin on a per-user base (and thus changing the modules to be loaded), gadgets (enabling custom scripts), scripts from extensions loaded only under certain conditions)

The way Wikipedia has this set up is by using a reverse-proxy like Squid or Varnish that serve a static cache of all resources and, because of the timestamps, it can cache these requests "for ever". Whenever a module has changes (which won't happen for an average MediaWiki install unless you upgrade MediaWiki, change configuration files or install/uninstall extensions), it will use the new timestamp in the request for that module, thus changing the url to load.php. Then the bits-server will only initiate MediaWiki if there is no static cache for the url.

According to the stats as of August 2011, bits.wikimedia.org/../load.php has a cache hit ratio of 98.2%. For all those requests MediaWiki was not initialized, no database connection etc.

However a simple static-file-server CDN does not suffice. It's not impossible to use a static-file-server CDN, but no implementation for that was made as Wikipedia uses Varnish as a reverse-proxy for cache. You could contact User:Catrope if you're interested in building in support for a static-file-server CDN (e.g. somehow upload static files through FTP or something to that CDN when new ones need to be generated and embed or 301-redirect to those urls directly.

199.85.228.100 (talkcontribs)

Thanks for the great explanation Krinkle, I understand the situation much better now. The timestamp-on-the-resourceLoader-URLs makes a lot of sense for a reverse-proxy cache like Varnish
I had always intended to put Varnish in front of MediaWiki/Apache at some point, so I'll focus my efforts there instead of trying to offload more stuff into a static file-serving CDN.
Thanks again!
-Dan

Catrope (talkcontribs)

You meant 99.8% ;) . Also, all Wikimedia wikis run off four web servers with a total of nine Varnish servers (in two different data centers) in front of them. And those four Apaches barely get any load; their CPU usage is like 10% so in theory ResourceLoader could run off a single Apache. See also our OSCON presentation, and the slide with these numbers. --Catrope 19:14, 18 October 2011 (UTC)

204.14.239.208 (talkcontribs)

From your description, we could setup anything for load.php to goto the CDN. If the CDN uses our server as the origin (and includes the query params as part of the caching) then the very first request hits our server and what it produces is imminently cachable by the CDN (forever).

So why couldnt we use a CDN (with no Varnish, no access to the DB etc.) with load.php?

Krinkle (talkcontribs)

Nobody is saying you can't use a CDN. If you have a CDN that works the way you just described (automatically pulling uncached urls from your origin web server, including query parameters), then by all means, go for it :)

Krinkle (talkcontribs)