Flow/Architecture/Search
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
There are 3 big parts in making search work:
- Manage ES config: this is about getting some ElasticSearch configuration right (e.g. how to interpret datatypes: stem words, highlighter config, ...) and managing the ES indices (validate, reindex, ...)
- Index & search Flow data: self-explanatory, indexes Flow data in Elasticsearch & makes it searchable
- Search front-end: how we'll present the search functionality to users.
The last is mostly blocked on nailing the mockups. Once we're happy with that, we can start building it.
Manage ES config
[edit]Patch: https://gerrit.wikimedia.org/r/#/c/161251/
Make CirrusSearch updateOneSearchIndexConfig.php reusable
[edit]- Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78786
There's been a bunch of refactoring in CirrusSearch so that we can reuse most of its code in Flow. For a list of those patches, see the Phabricator task.
Make ES configuration management maintenance script
[edit]- Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78787
How to use (1-4 will be done by enabling 'cirrussearch' role in MediaWiki-Vagrant). We should probably include this all in MediaWiki-Vagrant, either by default as part of Flow or as an optional role (flow-search?)
- Install ElasticSearch, version >=1.4 (if your MediaWiki-Vagrant doesn't yet have it, see update instructions in Matt's comment on PS12 here: https://gerrit.wikimedia.org/r/#/c/184404/)
- Install Extension:Elastica
- Install Extension:CirrusSearch
- Configure connection to ES (if different from the default 'localhost'):
$wgFlowSearchServers = array( 'searchserver' );
- Flow & ES should now be in touch
- In CLI, run:
php maintenance/FlowSearchConfig.php
: this will prepare the search index. If you are using MediaWiki-Vagrant, you need to usevagrant ssh
go to the/vagrant/mediawiki/extensions/Flow
folder and run the script within the shell. - (You could add any of the many options to that script, if you're looking to try out a particular piece)
- Should you, for some reason, need to quickly rebuild your index from scratch, kill it with
curl -XDELETE http://localhost:9200/\*_flow\*
(adjust the url as needed) and re-run these steps
Figure out how to deploy Flow search
[edit]- Status: To do
- Phabricator: https://phabricator.wikimedia.org/T78796
Index & search Flow data
[edit]Patch: https://gerrit.wikimedia.org/r/#/c/126996/
Index Flow data in ES
[edit]- Status: Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78788
How to use
You should look at #Make_ES_configuration_management_maintenance_script, which has more detailed instructions to also properly configure the search index.
- Do steps from #Make_ES_configuration_management_maintenance_script
- In CLI, run:
php maintenance/FlowFixWorkflowLastUpdateTimestamp.php
(to ensure workflow_last_update_timestamps are correct; may not be needed) - In CLI, run:
php maintenance/FlowForceSearchIndex.php
- Flow data should be indexing, hopefully
Search indexed Flow data
[edit]- Status: Status: Done
- Phabricator: https://phabricator.wikimedia.org/T78789
How to use
- See below, API endpoint is in place already ;)
Search API endpoint
[edit]- Status: Partially done
- Phabricator: https://phabricator.wikimedia.org/T78791
How to use
- Do steps from #Index_Flow_data_in_ES
- Set
$wgFlowSearchEnabled = true;
- Add
'script.disable_dynamic: false'
to your elasticsearch.yml (we're adding dynamic code to figure out the total amount of matching terms) - Do an API call, e.g.:
http://mediawiki.dev/api.php?page=Main_Page&action=flow&submodule=search&qterm=test
- See search results!
Search front-end
[edit]- Status: To do
- Phabricator: https://phabricator.wikimedia.org/T78790
For mockups, see Phabricator task.
There is a patch with a very barebones GUI - it's linked to in the Phabricator task.