DRAFT DRAFT DRAFT PROBABLY ALL WRONG, DO NOT USE

Database transactions and transaction rounds in Mediawiki

This is a short overview of how database transactions are viewed and used in the MediaWiki codebase. The reader is presumed to understand the basics of SQL databases, and the document is written for users of MySQL/MariaDB.

Let's start at the beginning.

Database transactions

The standard definition of a database transaction, from Wikipedia:

"a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions."

The MariadDBknowledge base says:

"An SQL-transaction (transaction) is a sequence of executions of SQL-statements that is atomic with respect to recovery. That is to say: either the execution result is completely successful, or it has no effect on any SQL-schemas or SQL-data."

So what does this mean in practice?

We may have one or several statements, which are treated as a logical unit, and all are executed and take effect, or they are reverted so that none of them have effect. These statements are executed on one database server, using one open connection to that database server (one connection to the mariadb server on a specific database host, in WMF production).

Let's unpack that a little, looking at the MariaDB/MySQL case, since that's what WMF uses.

The START TRANSACTION or BEGIN statement begins a new transaction. Any database statements after that issued on that connection are part of that transaction. We can end the transaction by one of the following:

COMMIT commits the current transaction, making its changes permanent
ROLLBACK rolls back the current transaction, canceling its changes

START TRANSACTION will first commit any previous db statements that have not yet been committed.

BEGIN does not do this.

So technically speaking, START TRANSACTION ... COMMIT/ROLLBACK is a single db transaction, but BEGIN ... COMMIT/ROLLBACK is not. However, if you put a COMMIT before that BEGIN, then the BEGIN...COMMIT/ROLLBACK block is a single db transaction.

We use BEGIN one place in our code base, in Database::doBegin() which is called by Database::begin(), see: includes/libs/rdbms/database/Database.php: $query = new Query( 'BEGIN', self::QUERY_CHANGE_TRX, 'BEGIN' );

We do not use START TRANSACTION anywhere in our code base.

Now that we understand what is a transaction and what is not, let's move on to how to group database statements together.

Atomic sections

An atomic section (MediaWiki term) is a group of statements, on one single db connection, that the developer wants to all take effect together or all be reverted (fail and not be applied) together.

How is this different from a transaction? Well, a transaction, you know when it is committed or rolled back because there is an actual COMMIT or ROLLBACK statement at the end. An "atomic section" however is a group of statements that may not have a COMMIT or ROLLBACK at the end; the COMMIT/ROLLBACK might come much later on after a bunch of other database statements on the connection.

Why would you want to do this? Because you might not have a choice. Let's look at this possibility:

A user makes an edit and goes to save it. You know you want the save of the text of the edit and the save of the revision, slot and content rows to all go in together, or all fail together. So you make this an "atomic section", calling the appropriate MW methods to wrap it all.

But at the same time, there is a hook called in the edit save code, onRevisionRecordInserted, which you use to emit an event about that save, once the save is complete. And maybe it records something about the issuance of that event someplace in the database. You may decide that if an event for downstream processors fails to be emitted, you want to revert the edit save too, just roll all of it back.

What do you do? Mark the event emission as an "atomic section" too? That's not enough, because if it fails, it can be rolled back but the edit save will be left in place. No, you need an atomic section that covers both the edit save and the event emission. So now you have nested sections, and it's pretty clear that the outermost one should control whether things get committed or rolled back in the end. Thus, your edit-save atomic section no longer gets committed when you thought, but at some later time.

In general, you may not know what other database-related things need to happen or be reverted depending on where your piece of code is called and in what context. So you should resign yourself to not knowing when a COMMIT/ROLLBACK is going to actually happen. Uncomfortable? Yes. Necessary? Also, yes.

The sole exceptions to this are maintenance scripts and jobrunners, where in the script's classes or the jobrunner's classes, you know what you want committed together or failed together and can enforce this. More on that below.

In your code

MediaWiki lets you mark atomic sections of code by startAtomic()...endAtomic() pairs, or by enclosing the code block in doAtomic().

Now that we know about atomic sections, let's zoom out and look at transactions across multiple databases.

Transaction rounds

Sharp-eyed readers will have noticed that transactions and atomic sections only apply to a single open connection to a single mariadb/mysql/postgres/whatever db server connection. But what happens when you have multiple database servers, not just a primary db for writes but perhaps a secondary db for writes of another sort, as WMF does in production?

We store all text blobs on the "extension1" database server cluster, which has its own primary and replica database servers. Some CheckUser database tables live in databases on extension1 (see https://phabricator.wikimedia.org/T368151). Some Echo Notification databases are stored on extension1, and so it goes.

What happens when your code does things on multiple databases and servers and connections?

In fact, the example from the previous section does do things on multiple databases and servers and connections. The save of an edit includes updates to a database on a server in one of the s1-s11 "section" hosts, storage of the text on a database served by a host in the extension1 cluster, and possble trigger of notifications to users (example: a new message on your talk page) uses databases on the extension1 cluster.

Now, if you want all of that to succeed or fail cleanly together, it's not a transaction, because it's across multiple databases. Instead we call this a "transaction round", a group of transactions that can be distributed across several db server connections, where we require (or at least really really desire) that the entire group of statements either all be committed or all be reverted together.

How to mark the start and end of a transaction round:

From our docs: "Every iteration of beginPrimaryChanges()/commitPrimaryChanges() is called a "transaction round"."

These are the calls, in other words, to start and end a round, such that every database statement in between should succeed or fail together.

Where may these calls be used? In maintenance scripts and jobrunners. Everywhere else, you cannot be sure what class is calling your method or instantiating your object and might have already started a round. And, straight from our docs: "similar to begin() and commit(), transaction rounds cannot be nested."

A new pair of calls is being introduced for use solely in maintenance scripts for this exact purpose, beginTransactionRound()" and commitTransactionRound(), to start and end transaction rounds. See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/1075065 for more.

Before we move on to more nuances around transaction rounds, let's take a step back and look more at transactions, explicit and implicit.

Explicit transactions

From MariaDB's knowledge base:

"Explicit transactions ... start with BEGIN or START TRANSACTION."

The point being, you have explicitly marked the start of a transaction in this manner. This is in contrast to implicit transactions, see below.

Implicit transactions

From MariaDB's knowledge base:

"An implicit transaction is started when no transaction has been started and a SQL statement is issued which modifies a table."

This applies both to connections with autocommit mode on, where the statement will commit right away if successful, and with it off, where the transaction will remain open (not committed, not rolled back) until either a COMMIT/ROLLBACK is issued, or a new (explicit) transaction is started.

For WMF production, implicit transactions are the default for db server clusters s1-s11 for web requests, but not for extension1 databases and not for cli mode (e.g. maintenance scripts).

With those definitions under our belt, we can move on to implicit and explicit transaction rounds.

Implicit transaction rounds

From our docs:

"Note that an implicit transaction round is considered to be active when no there is no explicit transaction round."

So yes, this can mean that there is no starting beginPrimaryChanges() call to mark an explicit transaction round, but there are database statements being issued, and so there is an implicit transaction round in progress.

If the DBO_TRX flag is set for any query activity in a transaction round, as typically is the case during web requests, then it is considered an implicit transaction round. Such rounds are committed by MediaWiki on shutdown via LBFactory::commitPrimaryChanges().

DBO_TRX is a MediaWiki-specific flag, which is intended for setting "transaction round mode", meaning that for every db statement, we automatically start a transaction before running a query, if no transaction is active. This means that autocommit mode (automatic commit after every db statement) must be OFF.

A reminder that DBO_TRX is configured on for s1-s11 database clusters and web requests but off for cli mode and the extension1 cluster. See (https://github.com/wikimedia/operations-mediawiki-config/blob/master/wmf-config/db-production.php ) for this, checking the serverTemplate and externalTemplateOverrides sections for the "flags" value.

In your code

What do you need to do in your code to use/enable implicit transaction rounds? If on clusters s1-s11, nothing special.

Just use doAtomic() or startAtomic()/endAtomic() as usual. If on cluster extension1, you can't use them with any effect, due to server configuration, as mentioned above.

Explicit transaction rounds

If a transaction round is started via LBFactory::beginPrimaryChanges() then it is called an explicit transaction round. Note that the new Maintenance script convenience method beginTransactionRound(), which is intended to be the new way to start an explicit transaction round, just calls beginPrimaryChanges().

So what does this method do, you may ask:

makes sure it's not been called inside another one of these (nested rounds means an error)
calls flushPrimarySnapshots() because there might have been implicit transactions earlier which must be wrapped up
claims the transaction round by setting the method name in a private property of the LBFactory;

(this is how later calls to this method can see if they own the round or if it's a nested call which needs to be rejected as an error)

tries to set the appropriate flags for all open connections in the LBFactory
commit(), but no following BEGIN, just "I own the transactions in this round, wherever you set them up in the code."

In your code

What do you need to do in your code to use explicit transaction rounds?

For maintenance scripts, you will use the new beginTransactionRound() and commitTransactionRound() at the start and end of your code path.

For job runners, you will use beginPrimaryChanges() and commitPrimaryChanges().

Related docs

Database_transactions
Manual:Database_access
IDatabase.php for documentation on the atomic methods
ILBFactory.phpfor documentation on the transaction round methods
Database doc from the MediaWiki docs

Defining transaction scope (MediaWiki term):

Actually, I propose that we just don't use this term or define it. Same for Outer transction scope, Unclear or inner transaction scope.