Technical decision making/Decision records/T292402
What are your constraints?
[edit]General Assumptions and Requirements | Source |
Deploying configuration for MediaWiki MUST NOT require rebuilding a docker container | RelEng (Dan Duval) |
At least some specific config changes MUST be possible within seconds (for incident response) | Ops (Guiseppe) |
It SHOULD be easy to see the effective configuration for a given site (and data center, and server group). | RelEng (Ahmon), SRE |
Loading configuration MUST be performant (at least comparable to current performance) | Performance Team (Timo) |
Security Requirements | |
It MUST be possible to securely provide secrets (e.g. database passwords) to MediaWiki | Common sense |
MUST provide a way to deploy ad-hoc PHP code as a security measure | Security team (Scott) |
Privacy Requirements | |
n/a | |
Important Questions
[edit]Question | Who can answer? | Resolution, answer or action |
What caching strategy should be used for configuration? | ServiceOps, Performance | Three strategies, for different use cases: Pre-generated PHP files are the fastest option, used for things that rarely change. Config read from YAML file can be cached in APCu and invalidated based on mtime. Config coming from etcd can be cached briefly in APCu and must support stale reads, to be resilient against failure of the etcd service. |
How do we de-risk deployment of a new config loading mechanism? | ServiceOps | Feature switch to be toggled in the MultiVersion wrapper, based on host name. Patched in manually. |
How will the new approach relate to the existing functionality in the SiteConfiguration class? | RelEng | SiteConfiguration can in the future be used to generate per-wiki config files which get copied into the deployment image. |
How do we overcome the 1MB limit of k8s config maps? | RelEng | Large bits of the config rarely change, it can be deployed as part of the image. Smaller parts of the config that may change more frequently can be deployed via config maps. Highly critical overrides (like the name of the active data center) can be overwritten from etcd. |
Decision
[edit]Selected Option | Option 2: Allow MediaWiki to load configuration from a set of JSON files. |
Rationale | The current situation (option 1) is undesirable because it is hard to determine the effective configuration for each wiki. The current form of configuration also does not fit well with how configuration management is done in kubernetes.
The alternative approach of implementing all needed functionality in the mediawiki-config repo (option 3) is inferior, since it would not allow development environments, CI scenarios, and third parties to benefit from the new configuration mechanism. |
Data | See the stakeholder consultation minutes and the design decision section off this document. |
Informing | PET has been informing the stakeholders in a series of meetings. We will publish a write-up that summarizes the changes that will be in the 1.38 release. |
Who | Daniel Kinzler |
Date | The general approach was decided in December 2021, with some tuning on the details up to March 2022. |
What are your options?
[edit]Option 1: Do Nothing | |
Description | Configuration remains as complex executable code, deployed as part of the MediaWiki image. |
Benefits | No development effort, no risk of breaking things. |
Risks | Configuration is hard to maintain and risky to change. Configuration changes require the docker image to be re-build. |
Effort | none |
Costs | none |
Testing | Same as always: use a âmostly similarâ configuration in Beta Cluster. |
Performance & Scaling | none |
Deployment | The ones we currently have |
Rollback and reversibility | None needed |
Operations & Monitoring | Same as we already have |
Additional References | https://wikitech.wikimedia.org/wiki/MediaWiki_at_WMF#MediaWiki_configuration |
Consultations | |
RelEng | Undesirable. The current structure makes it hard to see the effective configuration of a given wiki. The large number of conditionals in CommonSettings.php makes it hard to reason about. |
SRE | Undesirable. The current structure makes it hard to see the effective configuration of a given wiki. The large number of conditionals in CommonSettings.php makes it hard to reason about. |
Code Health | Undesirable, since it does not give us a good way to control configuration for end-to-end test scenarios. |
Growth | Undesirable, since it does not give us a good way to control configuration for end-to-end test scenarios. |
Performance | The current system is probably the fastest, so would be ok. |
Security | The current system provides an easy way to inject ad hoc security measures. However, since the configuration is hard to reason about, it is not ideal from a security perspective. |
Option 2: Allow MediaWiki to load configuration from a set of JSON files. | |
Description | Allow MediaWiki to load configuration from a set of JSON files. This needs a mechanism to pick the correct file for the requested site (multi-tenancy) as well as a mechanism to merge configuration from different sources.
The concept of âconfigurationâ also needs to be expanded beyond configuration settings, to include the list of extensions and skins to load, and adjustments to be made to the PHP runtime environment. |
Benefits |
|
Risks | Adding the new capability to MediaWiki is virtually risk-free. Transitioning our production environment to using the new system will involve risks. That transition however is outside the scope of this proposal. |
Effort | Four to eight weeks (three senior engineers) to do the changes in core. Transitioning our production environment to using the new system will be an iterative process, and will take more time. That transition however is outside the scope of this proposal. |
Costs | none |
Testing | None for the change to MediaWiki core. When transitioning our production environment to using the new system, we should probably make this change for the deployment-prep (aka âbetaâ) environment first. |
Performance & Scaling | We should create a mock configuration that corresponds to what we would be loading in production, and compare loading time to the time it currently takes to execute CommonsSettings.php. |
Deployment | None for the change to MediaWiki core. Transitioning our production environment to using the new system will require careful planning. That transition however is outside the scope of this proposal. |
Rollback and reversibility | When transitioning the loading of config defaults to the new system, a feature switch will need to be introduced into the MultiVersion wrapper so we can easily switch back to the old system if problems arise.
While transitioning our production environment to using the new system, it should easily be possible to go back to the old system of configuration by simply reverting to an old version of the mediawiki-config repository. That transition however is outside the scope of this proposal. |
Operations & Monitoring | When transitioning the loading of config defaults to the new system, the performance impact needs to be monitored carefully.
When transitioning our production environment to using the new system will require careful monitoring, since it affects all configuration, and problems could manifest in various ways in any part of the system. That transition however is outside the scope of this proposal. |
Additional References |
|
Consultations | |
RelEng | Support. Would provide us with flexibility as to how we represent configuration and how we combine different parts and aspects. This would allow us to transition to a system that makes configuration easier to maintain and reason about.
Note: we will need to change how our end of the config system works in any case. But having good support for loading and combining configuration in core will make this much easier. The explorations that have been done around T263166 will likely be useful in improving our end of the config system. |
SRE | Support. Would provide us with flexibility as to how we represent configuration and how we combine different parts and aspects. This would allow us to transition to a system that makes configuration easier to maintain and reason about.
Caution: Care needs to be taken to get the caching characteristics right, with respect to performance but also fault tolerance and the ability to quickly change configuration. |
Code Health | Support. This improves testability of our configuration system, and configurability of our testing system. In addition, it moves us away from relying on global variables for configuration, which should improve the overall testability of the code base. |
Growth | Support. This doesnât quite provide the support for testing scenarios we need, but it is moving in the right direction. Once this is implemented, it should be much easier for us to get what we need. |
Performance | No objections.
Caution: Care must be taken to design the new system in a performant way, especially with respect to caching. Runtime overhead must be measured carefully during deployment. |
Security | No objections. The design of the proposed system doesnât introduce security issues.
We need to retain a way to deploy ad hoc php code as a security measure, though. |
Option 3: No changes to MediaWiki core, rewrite mediawiki-config repo | |
Description | Implement the logic for loading and merging configuration from static files as described in option 2, but the mediawiki-config repo, not in MediaWiki itself. This basically means doing the âtransition production to the new config systemâ project that would follow option 2 immediately. |
Benefits | No changes needed to MediaWiki as a software. Avoids generalization, fully customized to WMFâs production needs. Only one project needed (transitioning production config) instead of two (option 2 adds capabilities to core first, which are then used to transition the prod system). |
Risks |
|
Effort | A first milestone can probably be reached in four to eight weeks by three senior engineers. Completion of the transition will be an iterative process. |
Costs | None at the moment, leave everything for later. |
Testing | The only way we have for testing configuration is deployment-prep (aka âbetaâ). |
Performance & Scaling | Similar to option 2: Manual benchmarking will have to be injected into the configuration code. Performance changes will have to be monitored for each change. |
Deployment | Changes would be deployed like regular configuration changes. |
Rollback and reversibility | Changes to the mediawiki-config environment are easy to roll back. |
Operations & Monitoring | Careful monitoring will be required immediately, since every change potentially affects all configuration, and problems could manifest in various ways in any part of the system.
Note that there is no CI testing for the production config system. |
Additional References | There has been some exploration of this idea:
|
Consultations | |
RelEng | Undesirable. It would fall on us to come up with a new system of maintaining configuration, and mapping it directly to the global state of a PHP code base, rather than data structures that can be compared and validated. It would also be much more work to allow the CI system to benefit.
Note: we will need to change how our end of the config system works in any case. But having good support for loading and combining configuration in core will make this much easier. The explorations that have been done around T263166 will likely be useful in improving our end of the config system. |
SRE | Neutral. As long as someone comes up with a better way to manage configuration, we do not care who writes it or where it lives. |
Code Health | Undesirable, since it offers no improvement over the current situation. Development environments and CI systems would not benefit, nor would the quality of the MediaWiki code base itself. |
Growth | Undesirable, since it does not move us closer to the functionality we need, namely control of end-to-end testing scenarios. |
Performance | Neutral. As long as the new system doesnât slow things down, we do not care who writes it or where it lives. |
Security | Neutral. As long as the new system doesnât pose a risk and we can still deploy ad hoc measures, we do not care who writes it or where it lives. |
Design Decisions
[edit]Should the new config loading mechanism become part of MediaWiki core? | YES. Having the option to maintain configuration as standalone data files, and especially the ability to easily and safely combine multiple such files, is likely to benefit development environments, CI setup, as well as third party installations. |
Should settings files and extension.json files have the same schema/structure? | NOT RIGHT NOW, but perhaps eventually. Settings files and extension.json files serve essentially the same purpose, and need to be processed in the same way. Having them use the same structure would avoid confusion and duplication of logic. However, we would have to retain backwards compatibility to the old format for the foreseeable future, so there is no immediate benefit.
Converting extension.json to the same structure as settings files internally seems advantageous though, since it will allow us to share code between config loading and extension registration, and avoid inconsistencies. |
Shall we support YAML in addition to JSON? | YES. Configuration files need to be human editable. JSON files are hard to read and edit and donât allow comments. Performance of loading YAML isnât great, but can be improved by using a native PHP extension rather than a YAML parser written in PHP. Also, the performance implication of loading files will be mitigated by a transparent caching layer.
However, YAML is a complex format with surprising edge cases and unintuitive behavior. We should investigate tooling to mitigate these issues. |
Should we use APCu for caching configuration? | YES, for now. But we also need to support loading from generated PHP files to make use of the opcode cache, which is by far the fastest option.
However, there is a desire to disable opcode cache revalidation in production, which means that we canât update config represented as PHP arrays without a pod restart. The solution is to use config loaded from PHP arrays as a baseline, and override it with values coming from config maps or etcd, and cached in APCu. |
Shall we merge together multiple settings files before caching? | NOT RIGHT NOW, but the design needs to allow for us to change direction on this. Batching reduces the amount of work needed to be done while loading configuration from cache. However, because of the large number of possible permutations of config files, we may end up evicting cache entries and degrade performance. Which solution is better depends on a large number of factors, such as how we end up splitting the configuration, the size of each data file, the complexity of the merge operations and the hardware specification of the application servers. |
Do we need to support interpolation or other pre-processing of settings values, such as php constants? | NOT RIGHT NOW, but keep the option to add this feature later. In particular the ability to reference namespace constants would be nice to have in manually maintained YAML files. But the additional complexity does not seem worthwhile to this time; also there is a danger of adding in more complex pre-processing such as expression evaluation, which could defeat the idea of making configuration easier to reason about. |
Should we convert DefaultSettings.php to data files (JSON or YAML or such)? | YES and NO: we need a schema, but it should be defined in PHP, for better integration with phpdoc and so we can use constants.
We need a schema, rather than just default values, so we can determine the merge strategy for each configuration key. After some experimentation, we settled on defining JSON schema structures for each config setting as a constant in a PHP file. This allows us to generate a proper JSON Schema file, while retaining much of the structure and documentation currently present in DefaultSettings.php. We also retain the ability to use PHP constants (especially for namespaces) in default values, which would have been lost when representing the schema as YAML or JSON. Having a schema for configuration will be useful for validating configuration, especially when parts of the configuration is maintained by the community on-wiki. |