Jump to content

Extension:SphinxSearch

From mediawiki.org
MediaWiki extensions manual
SphinxSearch
Release status: unmaintained
Implementation Search, Hook
Description Replaces built-in MediaWiki search with Sphinx
Author(s) Svemir Brkic, Paul Grinberg
Latest version 0.9.1 (2019-09-17)
Compatibility policy Snapshots releases along with MediaWiki. Master is not backward compatible.
MediaWiki 1.19 to 1.33
License GNU General Public License 2.0
Download
README
Change Log
Example New World Encyclopedia

$wgSphinxSearch_host, $wgSphinxSearch_port, $wgSphinxSearch_index, $wgSphinxSearch_index_list, $wgSphinxSearch_index_weights, $wgSphinxSearch_mode, $wgSphinxSearch_sortmode, $wgSphinxSearch_sortby, $wgSphinxSuggestMode, $wgSphinxSearchAspellPath, $wgSphinxSearchPersonalDictionary, $wgSphinxSearch_maxmatches, $wgSphinxSearch_cutoff, $wgSphinxSearch_weights, $wgSphinxSearchMWHighlighter,

$wgSearchType , $wgAdvancedSearchHighlighting , $wgEnableMWSuggest
Quarterly downloads 9 (Ranked 120th)
Translate the SphinxSearch extension if it is available at translatewiki.net
Issues Open tasks ¡ Report a bug

One of the most common complaints MediaWiki-based site administrators receive is that the default search engine is far from excellent. The Sphinx Search Engine provides a full text search engine that is both flexible and fast. This extension incorporates the Sphinx engine into MediaWiki to provide a better alternative for searching.

Sphinx operates as a standalone server. It creates an index which is based on a SQL query that retrieves documents from a database (Mediawiki MySQL etc.), stores indices and at a later stage returns corresponding rows that match the search.[1]

The extension also works with the Sphinx successor/fork Manticore Search.

Download

[edit]

Two separate software components are necessary, first you need the Sphinx Search Engine or Manticore Search (hereafter all called Sphinx) and second the SphinxSearch Extension (hereafter called extension).

Installation Instructions

[edit]

Instructions on how to install Sphinx on Windows or Linux are similar but for a more comprehensive view on Windows and Sphinx see:

If you are running SQLite instead of MySQL, you might have a look at

Step 1 - Install Sphinx

[edit]

Follow the installation instructions. You do not need to do the "Quick Sphinx usage tour". Note: if installing on a Windows server, you do not need to compile anything; just download the Win32 release binaries.

A more detailed description about the Sphinx Search Engine installation process can be found in Sphinx Search Beginner's Guide.[2]

Step 2 - Configure Sphinx

[edit]

Extract and copy the file sphinx.conf from the extensions/SphinxSearch directory into the Sphinx installation directory (we will refer to this file as /path/to/sphinx.conf hereafter) This directory should not be web-accessible, so you should not use the extensions folder. Make sure to adjust all values to suit your setup:

  • If you are using PostgreSQL, you should copy sphinx.conf.postgres to /path/to/sphinx.conf instead.
  • Set correct database, username, and password for your MediaWiki database
  • Update table names in SQL queries if your MediaWiki installation uses a prefix (backslash line breaks may need to be removed if the indexer step below fails)
  • Update the file paths (/var/lib/sphinxsearch/data/..., /var/log/sphinxsearch/...) and create folders as necessary.
  • If your wiki is very large, you may want to consider specifying a query range in the conf file.
  • If your wiki is not in English, you will need to change (or remove) the morphology attribute.

Step 3 - Run Sphinx Indexer

[edit]

Run the sphinx indexer to prepare for searching:

/path/to/sphinx/installation/bin/indexer --config /path/to/sphinx.conf --all

Once again, make sure to replace the paths to match your installation. This process is actually pretty fast, but clearly depends on how large your wiki is. Just be patient and watch the screen for updates.

Step 4 - Start Sphinx Daemon

[edit]

In order to speed up the searching capability for the wiki, we must run the sphinx in daemon mode. Add the following to whatever server startup script you have access (i.e. /etc/rc.local):

/path/to/sphinx/installation/bin/searchd --config /path/to/sphinx.conf >> /var/log/sphinx/sphinx-startup.log 2>&1

Note: without the daemon running, searching will not work. That is why it is critical to make sure the daemon process is started every time the server is restarted.

For Windows see ...

Step 5 - Configure Incremental Updates

[edit]

To keep the index for the search engine up to date, the indexer must be scheduled to run at a regular interval. If your wiki is small, it's best to comment out wiki_incremental in sphinx.conf and just run the indexer for wiki_main. The reason is that wiki_main and wiki_incremental are additive only. Words that have been removed since wiki_main was updated will still appear even after wiki_incremental is run.

On most UNIX systems edit your crontab file by running the command:

crontab -e

Add this line to set up a cron job for the full index - for example once every night:

 0 3 * * * /path/to/sphinx/installation/indexer --quiet --config /path/to/sphinx.conf wiki_main --rotate >/dev/null\
 2>&1; /path/to/sphinx/installation/indexer --quiet --config /path/to/sphinx.conf wiki_incremental --rotate >/dev/null\
 2>&1

Add this line to set up a more frequent cron to update the smaller index regularly:

0 9,15,21 * * * /path/to/sphinx/installation/indexer --quiet --config /path/to/sphinx.conf wiki_incremental --rotate >/dev/null 2>&1

As before, make sure to adjust the paths to suit your configuration. Note that --rotate option is needed if searchd deamon is already running, so that the indexer does not modify the index file while it is being used. It creates a new file and copies it over the existing one when it is done.

On Windows, commands like these inside a batch file should do the trick, provided you previously created the .CMD files running the indexer:

 at 23:00 /INTERACTIVE /every:M,T,W,TH,F,S,Su "%~dp0%__IndexMain__.cmd"
 at 08:00 /INTERACTIVE /every:M,T,W,TH,F,S,Su "%~dp0%__IndexIncr__.cmd"

Note that those tasks will only be manageable by the "at" command, and not through the control panel "Scheduled tasks" interface.

Also, adjust the SQL query for src_wiki_incremental source in sphinx.conf to match the time in the crontab for wiki_main, keeping in mind that MediaWiki may be storing the times in UTC while server that runs the cron may be using a different time zone.

Step 6 - Extension Preparation - SphinxSearch Folder

[edit]

Create a 'SphinxSearch' folder, either by extracting a compressed file or downloading via GIT and place the SphinxSearch folder within the main MediaWiki 'extensions' folder.

Step 6.1 - Extension Preparation - Sphinx PHP API

[edit]

If you downloaded a recent packaged version of the extension from this page, you probably already have a "vendor" folder in your extension subfolder, and you can skip the rest of this step.

If you fetched the latest version from git, run composer inside the extension folder to fetch sphinxapi.php. See Composer if not sure how to setup composer.

 cd SphinxSearch
 composer install

If you are unable to run a composer install, or you have an old version before composer.json was added, you may also get the sphinxapi.php from this github repository and save it to the main SphinxSearch extensions folder.

Step 6.2 - Extension Installation - PHP Module (optional)

[edit]

There is also a sphinx extension that can be used instead of the sphinx library described in the previous step. One way to install that is:

pecl install sphinx

After the module is compiled, add it to your php configuration along other extensions, and restart your web server.

Step 7 - Extension Installation - Local Settings

[edit]

In the file LocalSettings.php (for more help, please see the LocalSettings.php manual) in the main MediaWiki directory, add the following line below the:

$wgSearchType = 'SphinxMWSearch';
require_once "$IP/extensions/SphinxSearch/SphinxSearch.php";

Step 8 Show Sphinx Search Support

[edit]

If you want the general public let know that you are using Sphinx as back-end search engine you might want to add the following lines to your SphinxSearch.php. The logo can be downloaded from [2] and be copied in the directory folder .../extensions/SphinxSearch/skins/images/

$wgFooterIcons['poweredby']['sphinxsearch'] = array(
	'src' => "$wgScriptPath/extensions/SphinxSearch/skins/images/Powered_by_sphinx.png",
	'url' => 'http://www.mediawiki.org/wiki/Extension:SphinxSearch',
	'alt' => 'Search Powered by Sphinx',
);

Troubleshooting

[edit]

What can I do when it doesn't seemed to work? What should I check first? Is there a way to switch to some kind of debug mode?

For those and other questions, please consult the troubleshooting page, which is a collection of some of the more common issues that might happen during an installation.

Configuration

[edit]

Options

[edit]

For the most part, the extension's default options do not need any modification. However, if tweaking is needed/desired, there are a number of configuration options that could be configured from LocalSettings.php after the above require_once line. Those are:

  • $wgSphinxSearch_host - the hostname on which sphinx's searchd daemon is running (defaults to localhost)
  • $wgSphinxSearch_port - the port number on which sphinx's searchd daemon is running (defaults to 9312)
  • $wgSphinxSearch_mode - the Sphinx search mode. The default mode is the most intuitive. See Sphinx documentation for other valid options.
  • $wgSphinxSearch_matches - the number of search hits to display per result page.
  • $wgSphinxSearch_weights - the way Sphinx orders the results. The default is pretty good. See Sphinx documentation for other valid options.
  • $wgSphinxSearch_groupby, $wgSphinxSearch_groupsort - define how to group the results. See Sphinx documentation for other valid options.
  • $wgSphinxSearch_sortby - set matches sorting mode (default to SPH_SORT_RELEVANCE). See Sphinx documentation for other valid options.

Search Box "As-You-Type" Suggestions

[edit]
  • $wgEnableSphinxPrefixSearch - set to true to return suggestions from sphinx index by matching the query against the beginning of page titles.

Namespaces

[edit]

A description on how to change the default namespaces can be found here.

Did You Mean

[edit]

When performing a search and the search query is misspelled, the search results could be greatly impaired. Without knowing about the misspelling, it may take the user a while to figure out why their search results are not very good. That is why this extension has an optional "Did You Mean" support. When enabled, this feature will suggest a properly spelled search query for the user in case of a spelling mistake. Also, since many wikis utilize their own jargon, in order to make the "Did You Mean" suggestions more reasonable, this extension can optionally utilize a personalized dictionary. This feature is especially useful for users who are not familiar with the domain-specific language utilized by a wiki, reducing the time and frustration required to find the desired information.

This section is being updated. In the meantime, please see: Extension:SphinxSearch/Search suggestions

Stop Words

[edit]

When modifying the sphinx.conf file (see #Step 2 - Configure Sphinx), there is an option for specifying a file containing search stop words. Stop words are those common words like 'a' and 'the' that appear commonly in text and should really be ignored from searching. A somewhat complete list of English stop words can be found [3], [4] and [5] here. Simply copy those words into a text file, and modify your sphinx.conf to point to that file with

stopwords = /path/to/stopwords.txt

Sphinx Indexing Performance

[edit]

Please, have a look at How To Improve Sphinx Indexing Performance for more details.

Charsets for all languages

[edit]

Copy the charset you need from here to the end of the definition of the charset_table in the sphinx.conf file. After doing so you need to run a full index and restart the service. See Sphinx forum or How to tell Sphinx that your document has CJK characters? for additional details.

Compatibility

[edit]

Unsupported versions of MediaWiki are no longer listed.

MW Sphinx engine MW Sphinx Search Status Description
1.29.2 2.2.9-id64-release (rel22-r5006) 0.9.0 PHP 7.0.28-0ubuntu0.16.04.1, Apache 2.4.18, MySQL 15.1 (MariaDB), Ubuntu 16.04. Note: Same warning as Felipe, managed to disable it with $wgSphinxSearch_mode = null; --Lord Aro (talk) 15:05, 19 April 2018 (UTC)
1.33.0 2.2.11-id64-release (95ae9a6) 0.9.1 PHP 7.2.19, Lighttpd, MySQL (10.1.41-MariaDB), Ubuntu 18.04. --Svemir Brkic (talk) 20:46, 17 September 2019 (UTC)

As described above the extension is also compatible with the open source, free software fork/successor of Sphinx, Manticore Search. It works both with the 2.x and 3.x branches of Manticore and used the classic binary API via the above mentioned sphinx client library.

The extension has been shown to work with the following languages. See below for #Charsets for all languages

Language Status Description
English Works all versions - (Alpha3)
German Works W2k3 and IIS - (80.152.175.189)
German Works apache2 on Ubuntu 12.04 - (SmartK)
Chinese Works MW1.15 + XAMPP + SphinxSearch 0.7 (MarkYin)
Chinese Works Win2003 wamp 1.7.3 - (Alpha3)
Chinese Works RHEL 5.4 + Nginx + Mediawiki With HTTPS -(atyu30)
Chinese Works OpenBSD 4.5 -(atyu30)
Russian Works (XAMPP, Debian) - StasFomin.
Hebrew Works W2K3 and IIS - CrushKing.
Japanese Works MediaWiki 1.16.5, PHP 5.2.13 (apache2handler), MySQL 5.1.44-community, SphinxSearch (Version 0.7.2), Sphinx 1.10-beta (r2420), Windows Vista --MWJames 18:35, 3 June 2011 (UTC)

Comparisons

[edit]

The following are some links for comparing different search engines:

Feature requests

[edit]

Support

[edit]

For general inquiries, you might consult the SphinxSearch talk page or Troubleshooting page, while for errors appearing in connection with the extension one should file a bug report. Questions related to the Sphinxsearch software, Sphinxsearch API, Sphinxsearch indexer itself should be directed to Sphinxsearch forum.

By reporting problems or issues one should always include information about the Sphinxsearch software version, Mediawiki version and extension version to help track down possible areas of impact.

Revisions

[edit]

Prior to version 0.8, revisions can be downloaded at SourceForge

  • v0.8 - September 7, 2011
    • Use of standard MW search interface
    • Support of individual indexed columns weight
    • Support of three different suggestion mode (enchant, soundex, aspell)
    • Still updating the documentation

See also

[edit]

Wikis that use SphinxSearch

[edit]
  • New World Encyclopedia is an excellent example of this extension in use.
  • Rhea/Assimi is a visual search engine using the SphinxSearch extension to MediaWiki.
  • despite-behavior.com is a French installation guide.
  • Mars Tekkom DK is a site using an older version of this extension, and it has good installation instructions
  • IEEE Global History Network uses SphinxSearch to search documentation, analysis and explanation of the history of electrical, electronic, and computer technologies of its Global History Network wiki.

Notes

[edit]