Core Platform Team/Initiatives/Shard Revision Table
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Shard Revision Table
|
Initiative Description
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Summary
The ability to do multiple MySQL read queries in parallel, hitting multiple shards at the same time, would allow sharding of the revision table to be done without duplication of data, and without any client latency impact. However, this comes at the expense of duplicating the read query rate. A read query that includes an index can be done without duplicating read query traffic if shards are split by the same index. However, this adds complexity in the application.
Joins on small tables could be supported by replicating those tables into all shards. Joins on large tables, such as a join of the revision table with itself, would have to be emulated in MediaWiki.
The minimum viable product is to distribute rows into shards randomly and to duplicate all read queries asynchronously. If approved, the project plan would then be:
- Evaluate existing asynchronous/parallel MySQL clients in PHP. Choose one.
- Provide an interface within MediaWiki with the following capabilities:
- Distributing SELECT queries to all shards, reassembling and sorting results.
- Distributing DELETE and UPDATE queries to all shards.
- Sending INSERT queries to a single shard based on some simple policy, e.g. fixed or random.
- Remove problematic joins from MediaWiki.
- Provide a maintenance script for moving data between shards.
- Work with DBAs to deploy the system to the Wikimedia cluster.
- Significance and Motivation
Splitting up storage over multiple servers will allow faster schema changes and possibly cheaper hardware. In some cases, this can increase our capacity. This will enable us to handle more applications like JADE.
- Outcomes
- Schema change speed. This process directly slows down any feature that requires database changes. The amount of improvement we get is dependent on the number of shards we introduce.
- Performance. We should have parity with current MYSQL access. We should have no performance degradation.
- Baseline Metrics
None given
- Target Metrics
None given
- Stakeholders
- Extension developers
- Scoring
- WMDE (Wikibase)
- Known Dependencies/Blockers
Actor and Comment Table Schema migrations
MCR (Slot and Content Table) Schema migrations
Epics, User Stories, and Requirements
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Survey all the code that uses the revision table in order to get the full scope of the work
- Investigate options for asynchronous/parallel MYSQL queries
- Write RFC - Covering the whole project. This includes:
- Possible Implementation options
- abstract DB layer
- have business logic be aware of multiple databases
- Migration path
- Possible Implementation options
- Depending on the implementation, potentially reach out to extension developers about changes
- Implement
- Migrate
Time and Resource Estimates
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Estimated Start Date
April 2019
- Actual Start Date
None given
- Estimated Completion Date
None given
- Actual Completion Date
None given
- Resource Estimates
TBD - we will know after analysis in the first step of the project. Should be comparable to the Actor/Comment table project
- Collaborators
- SRE
Open Questions
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Do we do parallel or serial MYSQL queries? PHP doesn’t support parallel MYSQL queries and so we will need to use a library.
- How do you split up the table, over which indexes?
- How much of as speed improvement do we want for schema changes? This will determine the number of shards.
- How do we determine measure our metrics?
Subpages