Rozšíření:CirrusSearch
CirrusSearch Stav rozšíření: stabilní |
|
---|---|
Implementace | Hledání, API , Háček |
Popis | Implementuje vyhledávání MediaWiki pomocí Elasticsearch |
Autoři | Nik Everett, Chad Horohoe, Erik Bernhardson |
Nejnovější verze | průběžné aktualizace |
Zásady kompatibility | Vydání snímků současně s MediaWiki. Hlavní vývojová větev není zpětně kompatibilní. |
Composer | mediawiki/cirrussearch |
Licence | GNU General Public License 2.0 nebo novější |
Stáhnout | README |
|
|
|
|
|
|
Čtvrtletní stahování | 263 (Ranked 19th) |
Veřejné wiki používající rozšíření | 1,226 (Ranked 212nd) |
Přeložte rozšíření CirrusSearch, používá-li lokalizaci z translatewiki.net | |
Vagrant role | cirrussearch |
Problémy | Otevřené úkoly · Nahlásit chybu |
The CirrusSearch extension implements searching for MediaWiki using Elasticsearch.
CirrusSearch will be migrated to use OpenSearch as its backend. Please see Wikimedia Search Platform/Decision Records/Search backend replacement technology for more information. |
Elasticsearch is a standalone third-party software you must install as a requirement for this extension. It is a database system that provides search and indexing functionality, where the current text of your wiki pages gets indexed for faster and improved search results. The communication between MediaWiki and ElasticSearch is done through web services.
See also the help page on using this extension.
Goals
- No native dependencies that would make this difficult to install.
The only dependencies are pure-PHP, MediaWiki extensions, and Elasticsearch itself.
- Provide a near-real-time search index for wiki pages that's extendable by other MediaWiki extensions.
- Provide all of the query options MWSearch has given users, and more.
Dependencies
- PHP and cURL
In addition to the standard MediaWiki requirements for PHP, CirrusSearch requires PHP to be compiled with cURL support.
- Elasticsearch
You must install Elasticsearch.
Every version of ElasticSearch changes how web services work and causes compatibility problems. You must install the version of Elastic Search compatible with the version of MediaWiki you are currently using:
Elasticsearch versions before 6.8 are incompatible with PHP 8+.
Take note that a Java installation like OpenJDK is needed in addition. It's best to use the official Elasticsearch Docker image or a self-hosted version. A managed product like Amazon OpenSearch (formerly Amazon Elasticsearch) can work but may require additional configuration depending on its specifics. For example, Amazon OpenSearch only listens for Elasticsearch API requests over HTTPS on port 443 (i.e., it does not expose the default Elasticsearch port 9200), so a TLS-enabled proxy (e.g., Nginx) can enable CirrusSearch to communicate with an Amazon OpenSearch cluster.
- Elastica is a PHP library that makes CirrusSearch talk to Elasticsearch. Install Elastica per the instructions below.
- Other
- Due to the actual handling of jobs by the CirrusSearch extension, it is advisable to set up jobs in Redis to prevent messages like Notice: unserialize(): Error at offset 64870 of 65535 bytes in JobQueueDB.php and subsequent errors like Unsupported operand types.
See úkol T157759.
Installation
Even though the instructions below tell you only to run Composer when installing from git, it may be necessary to issue it anyway to install all PHP dependencies.
- Stáhněte soubor/y a vložte je do adresáře pojmenovaného
Elastica
ve vaší složceextensions/
.
Vývojáři a přispěvatelé kódu by si místo toho měli nainstalovat rozšíření from Git pomocí:cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica - Při instalaci z Gitu spusťte Composer pro instalaci závislostí PHP zadáním
composer install --no-dev
v adresáři rozšíření. (Vyskytnou-li se nějaké komplikace, podívejte se na úkol T173141.) - Na konec vašeho souboru LocalSettings.php přidejte následující kód:
wfLoadExtension( 'Elastica' );
- Dokončeno – Přejděte na stránku Special:Version vaší wiki a zkontrolujte, zda bylo rozšíření úspěšně nainstalováno.
CirrusSearch
- Stáhněte soubor/y a vložte je do adresáře pojmenovaného
CirrusSearch
ve vaší složceextensions/
.
Vývojáři a přispěvatelé kódu by si místo toho měli nainstalovat rozšíření from Git pomocí:cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch - Při instalaci z Gitu spusťte Composer pro instalaci závislostí PHP zadáním
composer install --no-dev
v adresáři rozšíření. (Vyskytnou-li se nějaké komplikace, podívejte se na úkol T173141.) - Na konec vašeho souboru LocalSettings.php přidejte následující kód:
wfLoadExtension( 'CirrusSearch' );
- Now follow the setup instructions in the CirrusSearch README delivered with your extension i.e.
$IP/extensions/CirrusSearch/README
. Note that all info in it might not apply to your version of the extension, especially the version of Elasticsearch supported. - Configure as required.
- Dokončeno – Přejděte na stránku Special:Version vaší wiki a zkontrolujte, zda bylo rozšíření úspěšně nainstalováno.
Enable regex queries
This is an optional step. You will need to install the search-extra plugin for this. Do so by following these steps:
- execute the following command:
/usr/share/elasticsearch/bin/elasticsearch-plugin/elasticsearch-plugin install org.wikimedia.search:extra:7.10.2-wmf12
- add the following line to your
LocalSettings.php
file:$wgCirrusSearchWikimediaExtraPlugin[ 'regex' ] = [ 'build', 'use', 'max_inspect' => 10000 ];
- restart Elasticsearch with the following command:
systemctl restart elasticsearch
- recreate the search index by executing the following commands:
php path/to/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver
php path/to/extensions/CirrusSearch/maintenance/ForceSearchIndex.php
Upgrading
Please follow the upgrade instructions in the CirrusSearch UPGRADE file.
Configuration
The configuration parameters of CirrusSearch are documented at the "settings.txt" file. See also documentation on CirrusSearch configuration profiles.
$wgCirrusSearchIndexBaseName
configuration parameter, which one needs to set, e.g., $wgCirrusSearchIndexBaseName = 'mywikidatabasename';
.Hooks
CirrusSearch extension defines a number of hooks that other extensions can make use of to extend the core schema and modify documents. The following hooks are available:
- CirrusSearchAnalysisConfig - allows to hook into the configuration for analysis
- CirrusSearchMappingConfig - allows configuration of the mapping of fields
- CirrusSearchBuildDocumentParse - allows extensions to modify ElasticSearch document produced from a page
- CirrusSearchBuildDocumentLinks - allows extensions to process incoming and outgoing links for the document
- CirrusSearchBuildDocumentFinishBatch - called when a batch of pages has been indexed
- CirrusSearchAddQueryFeatures - allows extensions to add query parser features
- CirrusSearchScoreBuilder - allows extensions to define rescore builder functions
- CirrusSearchProfileService - allows extension to declare various search components and configuration
API
CirrusSearch features can be used in API queries.
Searching happens via the normal search API, action=query&list=search
; you can use CirrusSearch-specific features, such as the morelike:
special prefix to find pages related to Marie Curie and radium:
api.php?action=query&list=search&srsearch=morelike:Marie_Curie%7Cradium&srlimit=10&srprop=size&formatversion=2
Custom APIs and parameters are provided for querying CirrusSearch configuration and debug information:
action=cirrusdump
module: 2014?action=cirrusdumpcirrusDumpQuery
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpQuerycirrusDumpResult
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult- An additional parameter,
cirrusExplain
, can be passed withcirrusDumpResult
to have the Lucene explanation of the score included with the result dump: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain It can also be used to get the explanation in a human-readable format, by giving it one of the valuesverbose
,pretty
orhot
, such as: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain=pretty cirrus-config-dump
,cirrus-settings-dump
,cirrus-mapping-dump
,cirrus-profiles-dump
modules to obtain dump from the CirrusSearch setup: api.php?action=cirrus-config-dump&formatversion=2
See also
- General links
- Usage help page - CirrusSearch usage documentation (needed after the install)
- Project page
- Info about Wikimedia Cirrus/Elastic setup
- Configuration help page - sets of tunable parameters that influence various aspects of the indexing
- Extension:WikiSearch - provides faceted search API for Semantic MediaWiki using ElasticSearch.
- Extension:AdvancedSearch - Enhances Special:Search by providing advanced parameters
- Debugging
Local development
Elastic Search service can be run with the Vagrant role (cirrussearch
) and MediaWiki Vagrant.
For Docker, you can use a command like docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.2
.
Then follow the installation and configuration directions.
If your web host is in a container, you'll want to make sure the above container is on the same network, and in the LocalSettings.php
file, you will want to reference the elasticsearch
as the hostname.
This will not have the WMF plugins but can be sufficient for basic testing.
Toto rozšíření se používá na jednom nebo více projektech Wikimedia. Pravděpodobně to znamená, že rozšíření je stabilní a funguje dostatečně dobře, aby jej mohly používat weby s tak vysokou návštěvností. Vyhledejte tento název rozšíření v konfiguračních souborech CommonSettings.php a InitialiseSettings.php Wikimedie, abyste viděli, kde je nainstalováno. Úplný seznam rozšíření nainstalovaných na konkrétní wiki lze vidět na stránce wiki Special:Version. |
Toto rozšíření je zahrnuto v následujících wiki farmách/hostitelích a/nebo balíčcích: Toto není autoritativní seznam. Některé wiki farmy/hostitelé a/nebo balíčky mohou toto rozšíření obsahovat, i když zde nejsou uvedeny. Pro potvrzení se vždy obraťte na své wiki farmy/hostitele nebo balíček. |
- Stable extensions/cs
- Search extensions/cs
- API extensions/cs
- Hook extensions/cs
- Extensions supporting Composer/cs
- GPL licensed extensions/cs
- Extensions in Wikimedia version control/cs
- APIAfterExecute extensions/cs
- APIQuerySiteInfoGeneralInfo extensions/cs
- APIQuerySiteInfoStatisticsInfo extensions/cs
- ApiBeforeMain extensions/cs
- ArticleRevisionVisibilitySet extensions/cs
- BeforeInitialize extensions/cs
- CirrusSearchAddQueryFeatures extensions/cs
- CirrusSearchAnalysisConfig extensions/cs
- CirrusSearchSimilarityConfig extensions/cs
- GetPreferences extensions/cs
- LinksUpdateComplete extensions/cs
- PageDelete extensions/cs
- PageDeleteComplete extensions/cs
- PageMoveComplete extensions/cs
- PageUndeleteComplete extensions/cs
- PrefixSearchExtractNamespace extensions/cs
- ResourceLoaderGetConfigVars extensions/cs
- SearchGetNearMatch extensions/cs
- SearchIndexFields extensions/cs
- ShowSearchHitTitle extensions/cs
- SoftwareInfo extensions/cs
- SpecialSearchResults extensions/cs
- SpecialSearchResultsAppend extensions/cs
- SpecialStatsAddExtra extensions/cs
- TitleMove extensions/cs
- UploadComplete extensions/cs
- UserGetDefaultOptions extensions/cs
- All extensions/cs
- Extensions requiring Composer with git/cs
- Extensions used on Wikimedia/cs
- Extensions included in Canasta/cs
- Extensions included in Miraheze/cs
- Extensions included in MyWikis/cs
- Extensions included in semantic::core/cs
- Extensions included in wiki.gg/cs
- Extensions included in WikiForge/cs
- Discovery/cs
- Search/cs