Dear Mediawiki CirrusSearch community,
I would like to ask, how one have to set up a Mediawiki Elasticsearch (ES) index, to allow czech fulltext search - in order to following: icu_folding, czech stemmer and lowercase shift. I have installed CirrusSearch, Elastica, Elasticsearch to and around my Mediawiki (MW) installation. I am currently on 1.31 MW version with 5.6.16 ES. I have these versions available because of internal purposes, but there is possiblity of upgrade to MW 1.35 and ES 6.5.4.
I think I have installed (LocalSettings.php reference, run php maintenance/update.php and so on) & configured everything properly according to these steps:
1) Add to LocalSettings.php: $wgDisableSearchUpdate = true;
2) Generating of ES index: php extensions/CirrusSearch/maintenance/updateSearchIndexConfig.php
*while index already created with my settings, this requirement pops out:
--startOver nebo --reindexAndRemoveOk
Started with --startOver, because -reindexAndRemoveOk did nothing, but the same pop out.
3) Remove fromLocalSettings.php: $wgDisableSearchUpdate = true;
4) Bootstrap index: php extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipLinks --indexOnSkip
5) Bootstrap index: php extensions/CirrusSearch/maintenance/forceSearchIndex.php --skipParse
6) Add $wgSearchType = 'CirrusSearch';
Estimated index settings (one of my examples without dictionary) using CURL CLI (I know the MW index is more detailed):
curl -X PUT localhost:9200/omkmediawikitest_general_first/ -d '
{
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"czech": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase","czech_stemmer","icu_folding"]
}
},
"filter": {
"czech_stemmer": {
"type": "stemmer",
"name": "czech"
}
}
}
}
}
}'
curl -X PUT localhost:9200/omkmediawikitest_content_first/ -d '
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"czech": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase","czech_stemmer","icu_folding"]
}
},
"filter": {
"czech_stemmer": {
"type": "stemmer",
"name": "czech"
}
}
}
}
}
}'
Questions:
- Is even able to set Mediawiki index according to my needs? I think czech Wikipedia have this issue already solved, so there might be solution so it seems: https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:Verze. Is this case solveable by upgrading to MW version 1.35 where equivalent ES version allow these settings automatically?
- Should I somehow edit, how the MW index is created to include my own settings? Or should I somehow set the index to only pass MW settings, that will add new settings not overwrite? Or should I add my own settings to the MW indexed index to reindex it again with proper MW settings plus mine settings?
- How can I please solve that case & issue?