User:Brooke Vibber/Compacting the revision table round 2
Overview
[edit]Per ongoing discussion in ArchCom and at WikiDev17 about performance, future requirements, and future-proofing for table size it's proposed to do a major overhaul of the revision
table, combining the following improvements:
- Normalization of frequently duplicated data to separate tables, reducing the dupe strings to integer keys
- Separation of content-specific from general-revision metadata to support:
- Multi-content revisions allowing for storing of multiple content blobs per revision -- not related to compaction, but of great interest for structured data additions planned for multimedia and articles
- general reduction in
revision
table width / on-disk size will make schema changes easier in future - trying to avoid inconsistencies in live index deployments
- ideally all indexes should fit on all servers, making it easier to switch database backend around in production
cf other & older notes:
- User:Brion VIBBER/Compacting the revision table
- Multi-Content_Revisions/Content_Meta-Data
- Wikimedia Developer Summit/2017/Scaling the Wikimedia database schema
Current
[edit]Column | Type |
---|---|
rev_id | int
|
rev_page -> page_id | int
|
rev_text_id -> old_id | int
|
rev_comment | varbinary(767)
|
rev_user -> user_id | int
|
rev_user_text | varchar(255)
|
rev_timestamp | binary(14)
|
rev_minor_edit | tinyint
|
rev_deleted | tinyint
|
rev_len | int
|
rev_parent_id -> rev_id | int
|
rev_sha1 | varbinary(32)
|
rev_content_model | varbinary(32)
|
rev_content_format | varbinary(64)
|
Column | Type |
---|---|
ar_id | int
|
ar_namespace | int
|
ar_title | varchar(255)
|
ar_text | mediumblob
|
ar_comment | varbinary(255)
|
ar_user | int
|
ar_user_text | varbinary(255)
|
ar_timestamp | binary(14)
|
ar_minor_edit | tinyint
|
ar_flags | tinyblob
|
ar_rev_id | int
|
ar_text_id | int
|
ar_deleted | tinyint
|
ar_len | int
|
ar_page_id | int
|
ar_parent_id | int
|
ar_sha1 | varbinary(32)
|
ar_content_model | varbinary(32)
|
ar_content_format | varbinary(64)
|
Provisional
[edit]/tables.sql (needs updating)
Column | Type | Change |
---|---|---|
rev_id | bigint
|
int -> bigint
|
rev_page -> page_id | int
|
— |
rev_comment_id -> comment_id | bigint
|
Moved to own table. |
rev_actor -> actor_id | bigint
|
Separated user /user_text to own table.
|
rev_timestamp | binary(14)
|
— |
rev_minor_edit | tinyint
|
— |
rev_deleted | tinyint
|
— |
rev_parent_id -> rev_id | bigint
|
int -> bigint
|
Column | Type | Change |
---|---|---|
ar_id | bigint
|
int -> bigint
|
ar_namespace | int
|
— |
ar_title | varchar(255)
|
— |
ar_page_id | int
|
— |
ar_rev_id | bigint
|
int -> bigint
|
ar_comment_id -> comment_id | bigint
|
Moved to own table. |
ar_actor -> actor_id | bigint
|
Separated user /user_text to own table.
|
ar_timestamp | binary(14)
|
— |
ar_minor_edit | tinyint
|
— |
ar_deleted | tinyint
|
— |
ar_parent_id -> rev_id | bigint
|
int -> bigint
|
Column | Type | Change |
---|---|---|
slot_revision -> rev_id/ar_rev_id | bigint
|
Link back to revision or archive |
slot_content -> cont_id | bigint
|
Link to content |
slot_role -> cr_id | smallint
|
Link to content_role |
Column | Type | Change |
---|---|---|
cont_id | bigint
|
N/A |
cont_address | int
|
Replaces rev_text_id. |
cont_len | int
|
Moved; no other change. |
cont_sha1 | varbinary(32)
|
Moved; no other change. |
cont_model -> cm_id | smallint
|
Link to content_model |
cont_format -> cf_id | smallint
|
Link to content_format |
Column | Type | Change |
---|---|---|
comment_id | bigint
|
N/A |
comment_text | mediumblob
|
varbinary(767) -> mediumblob (for phab:T6714) |
Column | Type | Change |
---|---|---|
actor_id | bigint
|
N/A |
actor_user -> user_id | int
|
Link to user; moved from rev_user
|
actor_text | varchar(255)
|
Moved from rev_user_text ; no other change.
|
Column | Type | Change |
---|---|---|
cm_id | smallint
|
N/A |
cm_model | varbinary(32)
|
Moved; no other change. |
Column | Type | Change |
---|---|---|
cf_id | smallint
|
N/A |
cf_format | varbinary(64)
|
Moved; no other change. |
Column | Type | Change |
---|---|---|
cr_id | smallint
|
N/A |
cr_role | varbinary(32)
|
New concept for MCR. |
TODO: consider further changes to archive table (full revdel?)
TODO: maybe switch text table around and change the text_id ref to a URL? (In progress of thought)
TODO: plan updates to other tables that have the user_text pattern.
Thoughts
[edit]- That seems like a lot of tables!
- Most of them are the small tables for inlining strings -- content models, content formats, content slot roles for MCR, and user refs/IP addresses for
actor
. These should save a fair chunk of duplicated space. Additionally the MCR split between revision & content makes each of the two tables smaller and more malleable. - What happened to rev_text_id?
- content.cont_address replaces it.
- It may be an open question whether we want to make that change immediately, or whether to change the 'text' table as well, etc.
- Why isn't rev_deleted moved to content?
- rev_deleted is a bitfield and most of its options apply to things that aren't part of a Content object, such as the edit comment and the username. If separately "rev-deleting" just one content item is needed, a second bitfield or flag will need to be added on the content table too...
- What about rev_len, rev_sha1 -- do they belong in content?
- Not sure about this. Do we need to keep the fields for summing from multiple content objects?
- How hard will it be to change queries?
- Those that WHERE on rev_user/rev_user_text directly, or read fields directly, etc will need to be updated. :(
- Things that just use Revision::*FromConds() and the accessor functions will be able to fall back to lazy loading without
- Stuff that touches rev_text_id directly will need changing.
- Stuff that wants to pre-join a bunch of data may need changing. May be able to add abstractions on Revision function to hide some of that, or build new abstractions that are less freaky.
- Stuff that inserts manually will need updating; stuff that uses Revision accessors should remain safe, but should update to MCR-specific interfaces in future
- What would a transition look like? What kind of background processing and what kind of downtime to expect?
- See #Transition -- we'll need a transitionary mode in MediaWiki where a background process runs filling out the new tables. This may take some time -- several weeks sounds likely for the biggest sites. This may increase load on servers and will require additional disk space.
- Most likely we will first add the new fields to revision, allowing them to be filled out without disturbing ongoing actions, and then remove the now redundant fields afterwards.
- In principle, once the background process is complete, it should be possible to switch a wiki to read-only, flip its mode, and then switch back to read-write with little downtime for editors.
- This could also allow quickly aborting/reverting to the previous state of revisions if things look bad in read-only... but if a rollback after going read-write is desired, that's a lot freakier to deal with.
- What about manual queries (tool labs, etc) that might not want to update?
- In principle a view can be created that replicates the classic view of the revision table... except for that text_id maybe. #Compatibility view
- Space and performance concerns about indexes; possibility of "sharding" different sub-tables with specific indexes?
- Could consider extra tables to support usage patterns, as long as they all fit on standard servers. #Denormalization
- How will the archive table be transitioned?
- In current model, archive may drop some legacy fields from ancient MediaWiki versions, which may require upgrade of existing rows in these circumstances: separation of inline text storage to 'text' table, setup of 'content' rows, and creation of an ar_rev_id for those that lack it (to be used in content slot role assignment).
- It's possible we could make a bigger change where deleted rows stay in the 'revision' table (full 'revision deletion') but this is not yet decided.
Transition plan
[edit]Clean up archive table (it's conceivable this has already been done, but expect surprises)
- background process finishes any outstanding migrations on the archive table for deleted revs
- migrate any inline text storage to text table
- assign ar_rev_id if none present
Intermediate schema
- intermediate schema will have all the old fields *and* all the new fields
- apply intermediate schema on DB replicas while out of rotation, until all servers ready
Transition mode
- keep reading from the old fields, but write to the new fields too when updating/inserting
- set up the content model, content format, slot role entries.
- background process churns through existing rows:
- break out
comment
,actor
entries - break out
content
entries and createslot
rows assigning them to the revs
- break out
Provisional switchover
- (?) activate full usage of the new fields, but write to the old fields to when updating/inserting
- (?) gives us a chance to test without losing data
Full switchover
- activate full usage of new fields, ignore use of the old fields
Final schema
- drop the old fields on DB replicas out of rotation, until all servers applied.
Compatibility view
[edit]Something like this could work for providing a back-compatible view of revision:
Main pain points are rev_text_id, if we change it to a full address in content it's harder to just use a reference. And whether to explicitly pull a main content slot or what...
create view revision_old
as select
rev_id,
rev_page,
0 as rev_text_id, -- XXXX no exact match?
comment_text as rev_comment,
actor_user as rev_user,
actor_text as rev_user_text,
rev_timestamp,
rev_minor_edit,
rev_deleted,
cont_len as rev_len,
rev_parent_id,
cont_sha1 as rev_sha1,
cm_name as rev_content_model,
cf_name as rev_content_format,
FROM revision
left join slots on slot_revision=rev_id and slot_role=1 -- ????
left join content on content_id=slot_content
left join content_model on cm_id=cont_model
left join content_format on cf_id=cont_format
left join actor on actor_id=rev_user_entry
left join comment on comment_id=rev_comment_id;
Denormalization
[edit]Part of the space usage of the revision table is the several indexes on it:
CREATE INDEX /*i*/rev_page_id ON /*_*/revision (rev_page, rev_id); -- used for direct current-ver lookups
CREATE INDEX /*i*/rev_timestamp ON /*_*/revision (rev_timestamp); -- is this used directly? looks bogus
CREATE INDEX /*i*/page_timestamp ON /*_*/revision (rev_page,rev_timestamp); -- used for history, editing, lots of stuff
CREATE INDEX /*i*/user_timestamp ON /*_*/revision (rev_user,rev_timestamp); -- used for contribs
CREATE INDEX /*i*/usertext_timestamp ON /*_*/revision (rev_user_text,rev_timestamp); -- used for contribs
CREATE INDEX /*i*/page_user_timestamp ON /*_*/revision (rev_page,rev_user,rev_timestamp); -- what is this for? seems weird
(Question: are there any extra indexes added in production or tools that might be needed too?)
There might or might not be benefit to creating denormalized summary tables containing only a few keys rows and then the indexes needed, like this:
--
-- Summary table to isolate the usage of the indexes for Special:contributions
-- and ApiQueryUserContributions.
--
CREATE TABLE /*_*/contribs (
-- key to rev_id
contribs_revision bigint unsigned NOT NULL PRIMARY KEY AUTO_INCREMENT,
-- Key to user_entry.ue_id of the user who made this edit.
-- ue_user or ue_user_text will have the actual id or IP address.
contribs_user_entry bigint unsigned NOT NULL default 0,
-- The timestamp
rev_timestamp binary(14) NOT NULL default '',
) /*$wgDBTableOptions*/;
CREATE INDEX /*i*/user_entry_timestamp ON /*_*/revision (rev_user_entry,rev_timestamp);
This can then be joined to the main tables to get the rest of the information. Does this make sense? We're not sure... yet :)
Questions and comments
[edit]Use the talk page please