Jump to content

User:ASarabadani (WMF)/Database for developers toolkit/Concepts/Glossary

From mediawiki.org

Wikimedia-specific[edit]

Diagram of sections and how they get read/write
  • Cluster: A group of database hosts that have one primary[1] that replicates data to the rest. There are several types of clusters including "core sections" (s1, s2, ...), "parsercache" (pc1, pc2, ...), "external storage" (es1, es2, ...), "misc" (m1, m2, ...), extension (x1, x2).
  • Section: A type of cluster that sometimes incorrectly is called "shard", it contains MediaWiki main databases. "s1" has English Wikipedia, "s2" has several large wikis, "s3" has most of small wikis, "s4" has Wikimedia Commons, "s5" has German Wikipedia with several small wikis, "s6" has French, Russian and Japanese Wikipedia, "s7" has another set of large wikis (plus centralauth database) and "s8" has Wikidata
  • Group: A group of replicas in a cluster that also can get specific queries. For example, "api" group will get queries you set in the code to hit this group of replicas. One useful group is "vslow" to sandbox slow queries preventing them from becoming a DDoS vector attack (but you need to manually set this in the code for that specific query). Currently only used groups are: "api", "vslow" and "dumps". The rest are deprecated.
  • Abstract schema: RDBMS-agnostic database schemas in MediaWki. See the related RFC and the help page.

Databases in general[edit]

  • Primary database or "master": The source of truth. It's the database that should mostly get writes and not reads. It replicates to replicas.
  • Replica or "slave": This is the database that's used for reading.
  • Replication lag: The latency between primary and replica. Usually it should be below 1 second.
  • Normalization: Mostly means avoiding repeating strings in storage. For example, avoiding repeating username of users in storing revision data and using user id instead.
  • Schema: Current database layout of MediaWiki.
  • Schema change: An atomic part of schema migration that is being added through a commit. For example "Adding table foo", "Dropping column bar from table baz" and so on.
  • Database management system (DBMS): The underlying technology handling the MediaWiki database. The supported ones in MediaWiki core are: MySQL, SQLite and PostgreSQL. It can be more using extensions.
  • Data definition language (DDL): Syntax that defines schema and schema changes (It can differ in different DBMSes). For example "ALTER TABLE", "DROP COLUMN". They are saved as ".sql" files.
  • Database Abstraction Layer (DBAL): The bridge between DBMS-independent database schema and schema change definitions and the actual DDLs.

Footnotes[edit]

  1. With exception of x2. This cluster uses a multi-primary setup.