扩展:CirrusSearch
CirrusSearch 发行状态: 稳定版 |
|
---|---|
实现 | 搜索, API , 函数钩 |
描述 | 使用Elasticsearch实现搜索MediaWiki |
作者 | Nik Everett, Chad Horohoe, Erik Bernhardson |
最新版本 | 持续更新 |
兼容性政策 | 快照跟随MediaWiki发布。 master分支不向后兼容。 |
MediaWiki | >= 1.43 |
Composer | mediawiki/cirrussearch |
许可协议 | GNU通用公眾授權條款2.0或更新版本 |
下載 | README |
|
|
|
|
|
|
季度下載量 | 282 (Ranked 18th) |
正在使用的公开wiki数 | 1,226 (Ranked 212nd) |
前往translatewiki.net翻譯CirrusSearch扩展 | |
Vagrant角色 | cirrussearch |
問題 | 开启的任务 · 报告错误 |
CirrusSearch扩展使用Elasticsearch实现搜索MediaWiki。
CirrusSearch has been slated for migration to use OpenSearch as its backend, but this decision is being reviewed based on a late August 2024 blog post from the upstream provider of the current search backend concerning its licensing. Please see Wikimedia Search Platform/Decision Records/Search backend replacement technology for more information. |
Elasticsearch is a standalone third-party software you must install as a requirement for this extension. It is a database system that provides search and indexing functionality, where the current text of your wiki pages gets indexed for faster and improved search results. The communication between MediaWiki and ElasticSearch is done through web services.
See also the help page on using this extension.
目标
- 去除使该扩展难以安装的本地依赖关系
** 仅有的依赖项是纯PHP MediaWiki扩展和Elasticsearch本身
- 为其他MediaWiki扩展可扩展的Wiki页面提供近实时搜索索引
提供MWSearch 为用户提供的所有查询选项等
依赖性
- PHP 和 cURL
- 除了MediaWiki对php的标准要求之外,CirrusSearch还要求PHP编译时支持cURL。
- Elasticsearch
- 你需要安装Elasticsearch。
Every version of ElasticSearch changes how web services work and causes compatibility problems. You must install the version of Elastic Search compatible with the version of MediaWiki you are currently using:
Elasticsearch versions before 6.8 are incompatible with PHP 8+.
请注意,还需要像OpenJDK这样的Java安装。 最好使用官方的Elasticsearch Docker镜像或自托管版本。 A managed product like Amazon OpenSearch (formerly Amazon Elasticsearch) can work but may require additional configuration depending on its specifics. For example, Amazon OpenSearch only listens for Elasticsearch API requests over HTTPS on port 443 (i.e., it does not expose the default Elasticsearch port 9200), so a TLS-enabled proxy (e.g., Nginx) can enable CirrusSearch to communicate with an Amazon OpenSearch cluster.
- Elastica是一个与Elasticsearch对话的PHP库。按照下面的说明安装Elastica。
- 其他
- 由于CirrusSearch扩展实际处理作业,建议在Redis中设置作业以防止Notice: unserialize(): Error at offset 64870 of 65535 bytes in JobQueueDB.php之类的消息和Unsupported operand types之类的后续错误。 参见工單T157759
安裝
Even though the instructions below tell you only to run Composer when installing from git, it may be necessary to issue it anyway to install all PHP dependencies.
- 下载文件,并将解压后的
Elastica
文件夹移动到extensions/
目录中。
开发者和代码贡献人员应从Git安装扩展,输入:cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica - 只有從git安裝才运行Composer来安装PHP依赖,通过发行
composer install --no-dev
至扩展目录。 (参见工單T173141了解潜在问题。) - 将下列代码放置在您的LocalSettings.php 的底部:
wfLoadExtension( 'Elastica' );
- 完成 – 在您的wiki上导航至Special:Version,以验证已成功安装扩展。
CirrusSearch
- 下载文件,并将解压后的
CirrusSearch
文件夹移动到extensions/
目录中。
开发者和代码贡献人员应从Git安装扩展,输入:cd extensions/
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch - 只有從git安裝才运行Composer来安装PHP依赖,通过发行
composer install --no-dev
至扩展目录。 (参见工單T173141了解潜在问题。) - 将下列代码放置在您的LocalSettings.php 的底部:
wfLoadExtension( 'CirrusSearch' );
- Now follow the setup instructions in the CirrusSearch README delivered with your extension i.e.
$IP/extensions/CirrusSearch/README
. Note that all info in it might not apply to your version of the extension, especially the version of Elasticsearch supported. - 按照以下要求进行配置
- 完成 – 在您的wiki上导航至Special:Version,以验证已成功安装扩展。
Enable regex queries
This is an optional step. You will need to install the search-extra plugin for this. Do so by following these steps:
- execute the following command:
/usr/share/elasticsearch/bin/elasticsearch-plugin/elasticsearch-plugin install org.wikimedia.search:extra:7.10.2-wmf12
- add the following line to your
LocalSettings.php
file:$wgCirrusSearchWikimediaExtraPlugin[ 'regex' ] = [ 'build', 'use', 'max_inspect' => 10000 ];
- restart Elasticsearch with the following command:
systemctl restart elasticsearch
- recreate the search index by executing the following commands:
php path/to/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver
php path/to/extensions/CirrusSearch/maintenance/ForceSearchIndex.php
升级
请遵循在CirrusSearch UPGRADE中的升级指南。
配置
The configuration parameters of CirrusSearch are documented at the "settings.txt" file. See also documentation on CirrusSearch configuration profiles.
$wgCirrusSearchIndexBaseName
configuration parameter, which one needs to set, e.g., $wgCirrusSearchIndexBaseName = 'mywikidatabasename';
.
钩子
CirrusSearch extension defines a number of hooks that other extensions can make use of to extend the core schema and modify documents. 以下钩子可用:
- CirrusSearchAnalysisConfig - allows to hook into the configuration for analysis
- CirrusSearchMappingConfig - allows configuration of the mapping of fields
- CirrusSearchBuildDocumentParse - allows extensions to modify ElasticSearch document produced from a page
- CirrusSearchBuildDocumentLinks - allows extensions to process incoming and outgoing links for the document
- CirrusSearchBuildDocumentFinishBatch - called when a batch of pages has been indexed
- CirrusSearchAddQueryFeatures - allows extensions to add query parser features
- CirrusSearchScoreBuilder - allows extensions to define rescore builder functions
- CirrusSearchProfileService - allows extension to declare various search components and configuration
API
CirrusSearch features can be used in API queries.
Searching happens via the normal search API, action=query&list=search
; you can use CirrusSearch-specific features, such as the morelike:
special prefix to find pages related to Marie Curie and radium:
api.php?action=query&list=search&srsearch=morelike:Marie_Curie%7Cradium&srlimit=10&srprop=size&formatversion=2
Custom APIs and parameters are provided for querying CirrusSearch configuration and debug information:
action=cirrusdump
module: 2014?action=cirrusdumpcirrusDumpQuery
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpQuerycirrusDumpResult
parameter to Special:Search or search API queries: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult- An additional parameter,
cirrusExplain
, can be passed withcirrusDumpResult
to have the Lucene explanation of the score included with the result dump: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain It can also be used to get the explanation in a human-readable format, by giving it one of the valuesverbose
,pretty
orhot
, such as: https://en.wikipedia.org/wiki/Special:Search/cat%20dog%20chicken?cirrusDumpResult&cirrusExplain=pretty cirrus-config-dump
,cirrus-settings-dump
,cirrus-mapping-dump
,cirrus-profiles-dump
modules to obtain dump from the CirrusSearch setup: api.php?action=cirrus-config-dump&formatversion=2
參見
- General links
- Usage help page - CirrusSearch usage documentation (needed after the install)
- 项目页面
- Info about Wikimedia Cirrus/Elastic setup
- Configuration help page - sets of tunable parameters that influence various aspects of the indexing
- Extension:WikiSearch - provides faceted search API for Semantic MediaWiki using ElasticSearch.
- Extension:AdvancedSearch - Enhances Special:Search by providing advanced parameters
- Debugging
Local development
Elastic Search service can be run with the Vagrant role (cirrussearch
) and MediaWiki Vagrant.
For Docker, you can use a command like docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:6.8.2
.
Then follow the installation and configuration directions.
If your web host is in a container, you'll want to make sure the above container is on the same network, and in the LocalSettings.php
file, you will want to reference the elasticsearch
as the hostname.
This will not have the WMF plugins but can be sufficient for basic testing.
此扩展用于一个或多个维基媒体项目。 这可能意味着扩展足够稳定、运作足够良好,可以用在这样的高流量的网站上。 请在维基媒体的CommonSettings.php和InitialiseSettings.php配置文件中查找此扩展的名称以查看哪些网站安装了该扩展。 特定wiki上的已安装的扩展的完整列表位于Special:Version页面。 |
此扩展在以下wiki农场/托管网站和/或软件包中提供: 這不是一份權威名單。 即使某些wiki农场/托管网站和/或软件包未在这里列出,它们也可能提供此扩展。 请检查你的wiki农场/托管网站或软件包以确认提供情况。 |
- Stable extensions/zh
- Search extensions/zh
- API extensions/zh
- Hook extensions/zh
- Extensions supporting Composer/zh
- GPL licensed extensions/zh
- Extensions in Wikimedia version control/zh
- APIAfterExecute extensions/zh
- APIQuerySiteInfoGeneralInfo extensions/zh
- APIQuerySiteInfoStatisticsInfo extensions/zh
- ApiBeforeMain extensions/zh
- ArticleRevisionVisibilitySet extensions/zh
- BeforeInitialize extensions/zh
- CirrusSearchAddQueryFeatures extensions/zh
- CirrusSearchAnalysisConfig extensions/zh
- CirrusSearchSimilarityConfig extensions/zh
- GetPreferences extensions/zh
- LinksUpdateComplete extensions/zh
- PageDelete extensions/zh
- PageDeleteComplete extensions/zh
- PageMoveComplete extensions/zh
- PageUndeleteComplete extensions/zh
- PrefixSearchExtractNamespace extensions/zh
- ResourceLoaderGetConfigVars extensions/zh
- SearchGetNearMatch extensions/zh
- SearchIndexFields extensions/zh
- ShowSearchHitTitle extensions/zh
- SoftwareInfo extensions/zh
- SpecialSearchResults extensions/zh
- SpecialSearchResultsAppend extensions/zh
- SpecialStatsAddExtra extensions/zh
- TitleMove extensions/zh
- UploadComplete extensions/zh
- UserGetDefaultOptions extensions/zh
- All extensions/zh
- Extensions requiring Composer with git/zh
- Extensions used on Wikimedia/zh
- Extensions included in Canasta/zh
- Extensions included in Miraheze/zh
- Extensions included in MyWikis/zh
- Extensions included in semantic::core/zh
- Extensions included in wiki.gg/zh
- Extensions included in WikiForge/zh
- Discovery/zh
- Search/zh