Deployment tooling/Notes/l10nupdate dataflow
There are two scripts used for updating the l10n cache files: l10nupdate (run by a cronjob nightly) and mw-update-l10n (which is also run by scap). The former calls extensions/LocalisationUpdate/update.php and rebuildLocalisationCache.php and extensions/WikimediaMaintenance/refreshMessageBlobs.php, while the latter calls mergeMessageFileList.php and rebuildLocalisationCache.php.
Data flow
[edit]mergeMessageFileList.php
[edit]This generates the ExtensionMessages.php file.
Input:
- Currently deployed tree
- wmf-config/extension-list
- Files listed in $wgExtensionEntryPointListFiles (currently wmf-config/extension-list-$version)
Output:
- File eventually copied to wmf-config/ExtensionMessages-$version.php
Timing:
- Negligible, less than 1s.
extensions/LocalisationUpdate/update.php
[edit]This basically automatically "backports" non-English translations from master, updating cache files for use by rebuildLocalisationCache.php.
Input:
- Currently deployed tree
- Checkout of mediawiki/core master
- Checkout of mediawiki/extensions master
- (optional) $wgLocalisationUpdateDirectory/l10nupdate-hashes.cache
- If present, this allows for skipping reprocessing of i18n files that haven't changed (by md5) since the last run
- (optional) $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache
- When the English message changes, non-English messages are no longer backported. But any that were previously backported (and so are already in this file) will be kept.
Output:
- $wgLocalisationUpdateDirectory/l10nupdate-hashes.cache
- $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache
- These are currently 37 bytes to 1.7M, 109M in total.
Timing:
- Initial run: 370s
- Run with no changes: 18s
rebuildLocalisationCache.php
[edit]This constructs the l10n cache CDB files, which MediaWiki uses for faster access to messages.
Input:
- Currently deployed tree
- (indirectly) ExtensionMessages-$version.php
- (optional) cdb files and files (by full path) used by the last rebuildLocalisationCache.php run
- If present, this allows for skipping reprocessing of i18n files that haven't changed (by mtime) since the last run.
- $wgLocalisationUpdateDirectory/l10nupdate-$lang.cache
Output:
- cdb files. Currently 2.0M-2.9M, 779M total.
Timing:
- Initial run: 270s (4.5 minutes)
- Run with no changes: about 3.5s
- Run with one language deleted: about 5.5s.
- Run with 10 languages deleted: about 11.5s.
- Run with 20 languages deleted: about 18-19s.
extensions/WikimediaMaintenance/refreshMessageBlobs.php
[edit]This updates the ResourceLoader message cache; we can't just flush the cache, because that results in cache stampedes that can bring down the site temporarily.
Input:
- Cache cdb files (mtime checked).
- Database.
Output:
- Database is updated.
Timing:
- Based on wikitech:Server admin log, somewhere between 7 and 40 minutes.
Analysis
[edit]More timing tests could be run. In particular, better statistics on how long rebuildLocalisationCache.php takes with varying degrees of out-of-dateness would be useful. Still, the worst case for l10nupdate's update.php and rebuildLocalisationCache.php is about 11 minutes, while the Server admin logs indicate worst-case times of 20 minutes or more.
If the assumption that a large part of the l10n-related time during scap is due to copying the cdb files to the apaches is correct, we could likely realize a speedup by reworking things as follows:
- l10nupdate: (nightly cronjob)
- Run extensions/LocalisationUpdate/update.php
- Copy cache files (109M) to apaches
- Copy deployment tree to apaches
- Run rebuildLocalisationCache.php on all apaches (and tin, etc.)
- Once #3 is complete on all apaches, run extensions/WikimediaMaintenance/refreshMessageBlobs.php
- mw-update-l10n / scap:
- Run mergeMessageFileList.php
- Copy ExtensionMessages-$version.php to apaches
- Copy deployment tree to apaches (which scap does anyway, currently after the l10n sync)
- Run rebuildLocalisationCache.php on all apaches (and tin, etc.)
Items to investigate related to the above:
- Do the apaches have sufficient space to store the cache files for deployed versions of MediaWiki?
- Do the apaches have sufficient free CPU to run rebuildLocalisationCache.php without unduly impacting site performance?
- Is it ok to have out-of-date or missing messages for the time it takes rebuildLocalisationCache.php to run on the apaches? Could these missing messages be cached inappropriately?