Manual:searchindex table
â Manual:Contents | MediaWiki database layout | searchindex table |
The searchindex table is used to provide full text searches .
Before MySQL 5.6, full text indexes were only provided by the MyISAM table engine, but not by InnoDB.
The text
table however uses the InnoDB type to improve concurrency, so a copy of the current page text was required in order to make use of these indexes.
This is the reason why this table is there.
If using Postgres, this table does not exist: the full text information is stored directly in the page and pagecontent tables as the titlevector and textvector columns.
This table is populated when edits are saved on pages, unless $wgDisableSearchUpdate
is set to true
, or an extension that implements its own search index (like CirrusSearch extension) is installed.
The rebuildtextindex.php maintenance script can be used to populate it from scratch, or updateSearchIndex.php to update it for recently changed pages.
Fields
[edit]si_page
[edit]Key to page_id .
si_title
[edit]Munged version of page title.
si_text
[edit]Munged version of the current text of the page.
Details
[edit]Indexes for this, using en wikipedia for cardinality:
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+ | searchindex | 0 | si_page | 1 | si_page | A | 797275 | NULL | NULL | | BTREE | | | searchindex | 1 | si_title | 1 | si_title | A | 265758 | NULL | NULL | | FULLTEXT | | +-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
A common search query is of the form:
SELECT page_id, page_namespace, page_title FROM `page`,`searchindex`
WHERE page_id=si_page
AND MATCH(si_text) AGAINST('+''searchterm''' IN BOOLEAN MODE)
AND page_is_redirect=0
AND page_namespace IN (0)
LIMIT 20
In MediaWiki version 1.4 and earlier:
SELECT cur_id,cur_namespace,cur_title,cur_text FROM cur,searchindex
WHERE cur_id=si_page
AND (
MATCH(si_text) AGAINST('+''searchterm''' IN BOOLEAN MODE)
AND cur_is_redirect=0
)
AND cur_namespace IN (0,9,11)
LIMIT 0, 20;
Explain shows the following:
+-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+ | table | type | possible_keys | key | key_len | ref | rows | Extra | +-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+ | cur | range | cur_id,cur_namespace,name_title_timestamp | cur_namespace | 1 | NULL | 317499 | Using where | | searchindex | eq_ref | si_page | si_page | 4 | cur.cur_id | 1 | Using where | +-------------+--------+-------------------------------------------+---------------+---------+------------+--------+-------------+
Known problems
[edit]A range scan with namespaces containing a large number of records requires that a large portion of the searchindex
table be loaded into RAM.
This is very costly, in part because the searchindex
table contains the full text of the article.
A long term plan is to remove the text from the searchindex
table.
Temporary measures may involve adding fields to searchindex
and indexing them (and using a self join to allow both that index and the fulltext index to be used) or adding a searchindex
index to decrease the number of complete searchindex
records which must be read to match the query.
Schema summary
[edit]MediaWiki version: | ≥ 1.43 |
DESCRIBE searchindex;
+----------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+------------------+------+-----+---------+-------+ | si_page | int(10) unsigned | NO | PRI | NULL | | | si_title | mediumtext | NO | MUL | NULL | | | si_text | mediumtext | NO | MUL | NULL | | +----------+------------------+------+-----+---------+-------+
MediaWiki versions: | 1.10 – 1.42 |
DESCRIBE searchindex;
+----------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+------------------+------+-----+---------+-------+ | si_page | int(10) unsigned | NO | PRI | NULL | | | si_title | varchar(255) | NO | MUL | | | | si_text | mediumtext | NO | MUL | NULL | | +----------+------------------+------+-----+---------+-------+
MediaWiki versions: | 1.1 – 1.9 |
DESCRIBE searchindex;
+----------+-----------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+-----------------+------+-----+---------+-------+ | si_page | int(8) unsigned | NO | PRI | NULL | | | si_title | varchar(255) | NO | MUL | NULL | | | si_text | mediumtext | NO | MUL | NULL | | +----------+-----------------+------+-----+---------+-------+
Indexes
[edit]MediaWiki version: | ≥ 1.43 |
SHOW INDEX IN searchindex;
+-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | searchindex | 0 | PRIMARY | 1 | si_page | A | 0 | NULL | NULL | | BTREE | | | | searchindex | 1 | si_title | 1 | si_title | NULL | NULL | NULL | NULL | | FULLTEXT | | | | searchindex | 1 | si_text | 1 | si_text | NULL | NULL | NULL | NULL | | FULLTEXT | | | +-------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+