Jump to content

Extension talk:CirrusSearch

About this board

Discussion related to the CirrusSearch MediaWiki extension.

See also the open tasks for CirrusSearch on phabricator.

Incompatible with ElasticSearch 7.17

4
Mshastchi (talkcontribs)

I installed ElasticSearch but running the maintenance scripts I get the error that Cirrus Search is only compatible with ElasticSearch 7.10 which is EOL for a long time. When will the extension be updated to support the latest versions of ElasticSearch?

Kghbln (talkcontribs)

This is a guess only: Probably not at all since WMF is migrating to OpenSearch. Here, I'd expect support for a supported version. Let's see what others think.

MetinPueye (talkcontribs)

We also have the same Problem:

We have an Elasticsearch 7.17 installation.

Before the MediaWiki update, we had version 1.39, and it worked without any problems.

Now we are using MediaWiki version 1.42. According to the wiki, only Elasticsearch 7.10.x is supported.

With maintenance-scripts, I also get the error message that the version is not supported.

Is it intentional that versions > 7.10.x do not work?

Is there perhaps a workaround?  

EBernhardson (WMF) (talkcontribs)

Unfortunately there was a licencing change at Elastic, versions of elasticsearch after 7.10.2 have a different license which we chose not to move forward with. It's plausible that the compatability check in includes/Maintenance/ConfigUtils.php could be loosened up and everything would work on 7.17, but we've never tested it.

Longer term the plan is indeed to migrate everything over to OpenSearch. This should happen over the coming months, we already have test instances running CirrusSearch with OpenSearch 1.

Reply to "Incompatible with ElasticSearch 7.17"

Greek search no longer truly diacritics insensitive

9
Spiros71 (talkcontribs)

For example, go to https://en.wiktionary.org/wiki/Wiktionary:Main_Page and try inputting ανθρωπος. Two existing entries will not appear: άνθρωπος and ἄνθρωπος. The same can be seen in my recent upgrade to ElasticSearch 7.10.2, core ICU plugin and extra:7.10.2-wmf12. Go to https://lsj.gr/ and try inputting σιφων. Missing entries will appear when using σίφων (σίφων and σίφωνας). Any advice on how to remedy this would be warmly appreciated!

TJones (WMF) (talkcontribs)

After writing up everything below, I realized I'm diagnosing the current behavior because @EBernhardson (WMF) thought this might be related to some recent work I did on diacritic folding, but now I don't think that's it. The info below might still be helpful, though. It's possible that there have been some changes to the weighting of exact prefix matches in suggestions, so I'll also invite @DCausse (WMF) to weigh in. He's more likely to remember any autocomplete changes that weren't so recent.


I believe you are talking about the drop-down list of suggestions (which we call the "autocomplete" suggestions), since ἄνθρωπος and άνθρωπος are the top two results in the full search results list for ανθρωπος.

The autocomplete search isn't truly insensitive to anything—including case, spaces, punctuation, and diacritics—in that exact matches can always be ranked a little better than inexact matches.

For case, consider the autocomplete suggestions for hun, Hun, ȟun, and hün on English Wiktionary:

  • hun: hun, hunger, hunt, hundred, hund, Hund, Hun, hung...
  • Hun: Hun, hunger, hun, hunt, hundred, hund, Hund, hung...
  • ȟun: hunger, hun, hunt, hundred, hund, Hund, Hun, hung...
  • hün: hün, Hündin, Hüne, hünkâr, Hündchen, hündür, hünnap...

I think the ȟun results are the "truest" results for h+u+n because there are no exact matches. In the other cases, exact matches (hun, Hun) become the first result, and exact prefix matches (everything starting with hün..) can also rank higher.

Note that if you add spaces to hün and search for h ü n, you get the same list as for ȟun above, because spaces can be ignored, and there are no exact matches or exact prefix matches with those spaces.

The problem with ανθρωπος is that it has many exact prefix matches (for those following along who don't read Greek, it's "anthropos", which is the beginning of way more than ten other Greek words), so they rank higher than άνθρωπος and ἄνθρωπος and push them out of the top ten suggestions. If you instead search for ἇνθρωπος (analogous to ȟun in the examples above), you get the results that I think you expect, with ἄνθρωπος and άνθρωπος as the first two suggestions because there is no exact match or exact prefix match.

Unless I'm misreading the diacritics (which is 100% possible with Greek diacritics!) it looks like σιφων does the right thing on both English Wiktionary and LSJ, presumably because there aren't as many exact prefix matches competing for space in the suggestions list.

As for remedies, it depends on what you are looking for. If you want exact prefix matches to count for less, or for diacritics to be completely ignored, I'm not sure there's anything to be done. It might help in this case, but it would cause problems in general.

If you want a solution that you, as a savvy searcher, can use in cases like this where you know or suspect that there might be relevant results that differ by diacritics but which are being swamped by exact prefix matches, you can use the space hack we used for h ü n: if you search for α ν θ ρ ω π ο ς (or less ridiculously, just α νθρωπος or ανθρωπο ς), then you get suggestions without any exact prefix re-ranking. Of course there is always the chance that you get some exact prefix matches after adding one space. If there are too many, add another space—not a great solution, but it works.

For less savvy searchers, hitting return will give you the full-text search results, which do not look for arbitrary prefix matches (though stemming matches can still be prefixes), and at least in this case, the desired results are the top two.

(Note: I've been testing in the search bar on the search results page rather than the search box at the top of the page. These are usually the same, but weird differences can occur. The only thing I've noticed today is that the two boxes seem to use different events to trigger autocomplete searches. Editing hun to hün gives different results because typing ü on my American keyboard uses dead keys, which trigger Javascript events in the big search results search box, but not in the search box at the top of the page. Historical UI cruft, that is. Sigh.)

Spiros71 (talkcontribs)

Tray, that is a very thorough and exhaustive reply as usual!

The points I am making are:

1) I can see a clear change on this from the times of the ElasticSearch 5.6 implementation, and

2) usability (for Greek and Ancient Greek)—being able to get what one is looking for with the minimum effort. When it comes to Ancient Greek (polytonic) many "weird" accents/spirits are used which are not readily available in most keyboards, cases, etc. and users prefer to omit them (this is also typical of how Greek users search on Google even for Modern Greek which only has one accent/spirit). So, in the specific example, using ανθρωπος I would expect to get two search results in autocomplete which are "perfect" matches (minus the diacritics of course). But I do not get these results! A savvy user or a scholar "might" use the full diacritics version (speaking of Ancient Greek here), but the average user will be dumbfounded as they get no results at all with the no-diacritics approach. Also, yes, one could hit search and still get them, but the point of autocomplete is faster access to information.

I am not advocating a sweeping approach here for all languages, as I am not an expert, but I can see clearly the benefit for Greek and Ancient Greek.

DCausse (WMF) (talkcontribs)

Regarding ανθρωπος and άνθρωπος and ἄνθρωπος on english wiktionary:

These two results are found at position 11 and 12: https://en.wiktionary.org/w/api.php?action=opensearch&format=json&formatversion=2&search=%CE%B1%CE%BD%CE%B8%CF%81%CF%89%CF%80%CE%BF%CF%82&namespace=0&limit=12

Unfortunately we display only 10.

If you enter Special:Search these two should move back to the top: https://en.wiktionary.org/w/index.php?go=Go&search=%CE%B1%CE%BD%CE%B8%CF%81%CF%89%CF%80%CE%BF%CF%82&title=Special%3ASearch&ns0=1

Unfortunately the completion search does only rank higher the one suggestion that is a perfect exact match. It does not rank higher suggestions that appear to be fully written titles over the ones that appear to be partially written. It is something we know is not quite perfect but for which we don't yet have a solution for.

Another cause is also that completion prefers suggestions that match a prefix with its accents:

  • ανθρωποσφαγή is preferred over άνθρωπος when searching ανθρωπος

note that ς is just considered identical to σ here.

If this issue is quite recent I'm not sure what could have caused it, I don't think anything changed in the software that could have directly caused this behavior. Could it be that more pages being added over time caused these suggestions to slip out of the 10 displayed results?

See phab:T132637 for when we first implemented diacritics folding for greek, the example query αθανατος used at the time to report the bug is still working as expected.

Spiros71 (talkcontribs)

Yes, David, you pointed very aptly to some of the culprits here

Another cause is also that completion prefers suggestions that match a prefix with its accents:

  • ανθρωποσφαγή is preferred over άνθρωπος when searching ανθρωπος

note that ς is just considered identical to σ here

My point is that ς considered identical with σ is something that could resolve such cases. The former is of course only used at the end of a word. And quoting from that phab issue, I concur with Tray:

French speakers usually have no trouble typing French diacritics, but they may have no idea how to type Ancient Greek polytonic diacritics—which speakers of Modern Greek may also have trouble with, just as speakers of Modern English usually don't know how to type ð, þ, æ, or ē, despite them all being used in the first few lines of Beowulf! Hwæt! (You call me a language nerd, now I gotta act like one.)

TJones (WMF) (talkcontribs)

I can see a clear change on this from the times of the ElasticSearch 5.6 implementation

Wow.. that was 5 years ago for us, so I can't recall every change that might have been relevant in that time. Not sure when it would have changed.

I understand your usability argument, but it is often the case in search engineering that optimizing for one use case breaks others. We are already ignoring the Greek diacritics for the recall phase, but the exact matches come into play for the ranking phase. It's an issue for ανθρωπος (ignoring final sigma, see below) because there are so many words without diacritics that match better.

There's been a similar complaint about overly exact case matching (T364888), but I don't think we can only ignore Greek diacritics or only ignore case for ranking, which—on English Wikitionary for example—would mean that typing an would give "exact" matches with an, àn, ån, án, än, ân, An, Ân, ãn, ān, ăn, ản, ǎn, Ấn, ấn, ẩn, and ắn. (Those are the top full-text results, though.. and I missed aN!) You could argue these results are less usable in autocomplete, since most people most of the time will not be looking for them (on English Wiktionary).

We also were tossing around ideas for improving full-title matches, which could have similar side-effects for short queries. (This also applies, albeit less voluminously, to queries longer than 2 letters, but I stopped looking for details because it's a lot of manual searching for examples since autocomplete doesn't work that way at the moment.)

There's always a trade-off, and having to fall over to full-text search is not the worst trade-off.

I've opened a ticket for the final/non-final sigma issue (T377495), though I'm not sure it will help you. It definitely makes sense on Greek-language wikis, but not as much for non-Greek wikis, like English Wiktionary. (LSJ looks to be using English as its analysis language, too.)

You should be able to set CirrusSearchICUNormalizationUnicodeSetFilter and CirrusSearchICUFoldingUnicodeSetFilter to "[^ς]" in mediawiki/extensions/CirrusSearch/extension.json in your LSJ installation to exempt ς from folding, but that would disable the ς to σ mapping everywhere (autocomplete, full-text, template lookups, etc.), and it still won't work if your language is set to Greek because the Greek-specific lowercase filter also maps ς to σ.. everyone really wants that mapping to happen! But there's an immediate config option that I might help.

Spiros71 (talkcontribs)

Interestingly, Tray, ανθρωπος appears not to be an issue in my case https://ibb.co/StV1J6M There is one other funny thing happening though, not sure if this is up your alley (David included), but I do get search results for a non-existent page https://ibb.co/HgrzHpc σῑ́φων.

TJones (WMF) (talkcontribs)

With respect to the autocomplete of ανθρωπος on LSJ, we know the recall portion of autocomplete gets everything we'd want, but the ranking is where things go awry. Would it make sense on LSJ for those ἄνθρωπος and άνθρωπος to be much more popular? I don't recall all the factors that go into ranking the autocomplete results, but different stats on your site could lead to different rankings that could overpower the exact prefix match advantage.

As for the non-existent page for σῑ́φων, I can't reproduce it, which I think is because I'm not logged in so I don't get offers to create pages. My guess is that there is some either some normalization that isn't happening (ι + ̄ + ́ vs ῑ + ́) or there's an invisisble character (soft hyphens will cause this and are common enough in non-Greek contexts), etc. An example of a lack of normalization you can easily see is that searching for GrEeK LaNgUaGe on enwiki will offer to let you create that exact page.

(BTW, it's "Trey" with an "e".. cognate with τρεις, no less.)

Spiros71 (talkcontribs)

Τριάκις, thanks, Trey! Yes, I think the stats would make the difference as άνθρωπος is a very common word.

Reply to "Greek search no longer truly diacritics insensitive"

Cloned dB and search stopped working

1
Summary by Spiros71

I had to reindex.

Spiros71 (talkcontribs)

I cloned my dB, then switched on the LocalSettings.php to the new dB, and this resulted in the search stopping to function ("An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later"). The ElasticSearch service is running OK (I also restarted it) and I can see that the indexes exist. Is there anywhere else that I should change the dB name?

MW 1.31, ElasticSearch 5.6.13

In error log I see:

2024-09-27 10:24:00 host.xxx.gr xxx_1_31_0_bkp: Search backend error during full_text search for 'σαῦρα' after 2: illegal_argument_exception: no mapping found for field [suggest]

Summary by DCausse (WMF)
213.61.173.172 (talkcontribs)

I have one large page with a larger wikitable (12 column, 3300 rows).

This page is the only large page and the only one not indexed, because of a "TypeError" in the HtmlFormatter:

api.php?action=query&format=json&prop=cirrusbuilddoc&pageids=1231&formatversion=2

{

    "error": {

        "code": "internal_api_error_TypeError",

        "info": "[8c19427845d7c9abfb9f5240] Exception caught: HtmlFormatter\\HtmlFormatter::onHtmlReady(): Argument #1 ($html) must be of type string, null given, called in /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php on line 314",

        "errorclass": "TypeError",

        "trace": "TypeError at /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(90)\nfrom /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(90)\n#0 /var/www/w/vendor/wikimedia/html-formatter/src/HtmlFormatter.php(314): HtmlFormatter\\HtmlFormatter->onHtmlReady()\n#1 /var/www/w/includes/content/WikiTextStructure.php(179): HtmlFormatter\\HtmlFormatter->getText()\n#2 /var/www/w/includes/content/WikiTextStructure.php(221): WikiTextStructure->extractWikitextParts()\n#3 /var/www/w/includes/content/WikitextContentHandler.php(167): WikiTextStructure->getOpeningText()\n#4 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(95): WikitextContentHandler->getDataForSearchIndex()\n#5 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/ParserOutputPageProperties.php(70): CirrusSearch\\BuildDocument\\ParserOutputPageProperties->finalizeReal()\n#6 /var/www/w/extensions/CirrusSearch/includes/BuildDocument/BuildDocument.php(172): CirrusSearch\\BuildDocument\\ParserOutputPageProperties->finalize()\n#7 /var/www/w/extensions/CirrusSearch/includes/Api/QueryBuildDocument.php(58): CirrusSearch\\BuildDocument\\BuildDocument->finalize()\n#8 /var/www/w/includes/api/ApiQuery.php(671): CirrusSearch\\Api\\QueryBuildDocument->execute()\n#9 /var/www/w/includes/api/ApiMain.php(1904): ApiQuery->execute()\n#10 /var/www/w/includes/api/ApiMain.php(879): ApiMain->executeAction()\n#11 /var/www/w/includes/api/ApiMain.php(850): ApiMain->executeActionWithErrorHandling()\n#12 /var/www/w/api.php(90): ApiMain->execute()\n#13 /var/www/w/api.php(45): wfApiMain()\n#14 {main}"

    }

}


I tried and found, that when the Page length (in bytes) is approx < 793.845 Byte, it is working without error. When going > 793.978 Byte I get the TypeError.

I think the page length is only for content, therefore the limit seems to be 1MB for the whole html page.

$wgMaxArticleSize or $wgAPIMaxResultSize is not solving the issue.

I looked into the settings of php, jvm, mediawiki and nginx but did not found a solution.


Is there any settings to extend the limit?

DCausse (WMF) (talkcontribs)
213.61.173.172 (talkcontribs)

thank you very much. This was exact the issue.

I updated the HtmlFormatter.php in my v1.39.7 installation manually with the new version: https://gerrit.wikimedia.org/r/c/HtmlFormatter/+/997959/2/src/HtmlFormatter.php#b306

updated the index with:

php /var/www/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now

php /var/www/w/extensions/CirrusSearch/maintenance/ForceSearchIndex.php

Now everything is fine.

Search in files + extract and display related part of text

2
Allanext2 (talkcontribs)

I would now like to show a portion of the text related to the search, let's say 2,3 pages before and after the matched text, without giving the possibility to open the entire pdf.

How would you approach this with CirrusSearch? are there some parameters that I can tweak? Would you recommend some API calls or hooks directly to CirrusSearch? or would you suggest a different approach?

I've noticed that PdfHandler with the pdfToText and TikaAllTheFiles both get the pdf content indexed.

Thank you!

DCausse (WMF) (talkcontribs)

CirrusSearch is not aware of the structure of the pdf file, so I'm not sure how I would approach this problem with CirrusSearch...

Note that MW is generally not designed to allow fine-grained access to the content so if the file is uploaded then it'll be viewable and it might be hard to prevent users from viewing it.

Getting a better highlight experience for PDFs might be challenging and cirrus alone might not be helpful, it might just provide some text snippets that you could then attempt to search again in the PDF using a library that can manipulate PDFs and reconstruct a shorter PDF on the fly (e.g. https://pymupdf.readthedocs.io/en/latest/).

Reply to "Search in files + extract and display related part of text"

Search results and possible leaking of restricted content

2
Masin Al-Dujaili (WMDE) (talkcontribs)

We have an internal wiki with several namespaces which in turn have different access permissions set. Does CirrusSearch actively prevent leakage of content of pages a user has no access to?

DCausse (WMF) (talkcontribs)
Reply to "Search results and possible leaking of restricted content"

Creating index > ResponseException

6
Q2e.jua (talkcontribs)

I am trying to setup CirrusSearch extension, but stuck with the following exception while running UpdateSearchIndexConfig.php:

Elastica\Exception\ResponseException from line 178 of /wiki/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php

Setup:

  • Mediawiki 1.39
  • PHP 8.1
  • ElasticSearch 7.10.1 (via Docker)

Output of extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php:

Updating cluster ...

indexing namespaces...

mw_cirrus_metastore missing, creating new metastore index.

Creating metastore index...

mw_cirrus_metastore_first

Scanning available plugins...none

Elastica\Exception\ResponseException from line 178... (full stacktrace truncated)

Ciencia Al Poder (talkcontribs)

Looks like you're using ElasticSearch 7.10.1 but the requirement is 7.10.2. I'm not sure if a minor version difference is important, though

Q2e.jua (talkcontribs)
EBernhardson (WMF) (talkcontribs)

We would need to know what is in the ResponseException to offer a path forward. The name ResponseException itself isn't too meaningul, it mostly means elasticsearch said no. Perhaps something in either mediawiki logging or elasticsearch logging says what exactly the problem was?

Q2e.jua (talkcontribs)

Yeah, I'm still searching for some kind of log, but could not find anything.

  • The elasticsearch docker log does not contain related information.
  • Mediawiki log: I think I need to configure the log, because there is nothing in the default installation.

My next steps:

  • Checking for more log or activate enhanced logging in mediawiki
  • Try to manually run ElasticSearch 7.10.2 docker image, so we can exclude minor version differences
Q2e.jua (talkcontribs)

Update:

  • I had to manually load the 7.10.2 ElasticSearch docker image, because it was not available in the Plesk Repository.
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.10.2
  • After all, the ElasticSearch version was not the issue. The real issue was the configuration of the ElasticSearch container. Here it is important to configure the discovery type:
discovery.type | single-node

I will report this as an improvement issue to the maintainers:

  • Add an meaningful output message / At least just catch php exceptions and print the error message and the response message in the case of curl requests.
  • may add a simple setup for elastic search in the readme

Search Suggestion using DisplayTitle

4
Oetterer (talkcontribs)

I just installed CirrusSearch on my knowledge base wiki and it's working mostly as intended. I have no problems with my documents in the main namespace (where I put mostly infrastructure pages like teams, services, etc.). However, all my actual knowledge base articles reside in a second namespace (named KB), have a pagename containing of numbers which are randomly generated. The actual displayed title is set via {{DISPLAYTITLE:Nice Title}}.

The KB namespace is searched correctly and the results are displayed as intended. What is rather anoying is that the search suggestion works only on the actual page title (that is something like KB:1234567). Is there a way, I can have the suggestor use and display the display title for a) that namespace or b) all namespaces?

DCausse (WMF) (talkcontribs)

Sadly not at the moment, some work has been done to start collecting this data in the search index but nothing more. I would suggest to file a task in https://phabricator.wikimedia.org with the tag CirrusSearch describing your use-case.

Thanks!

DCausse (WMF) (talkcontribs)
Oetterer (talkcontribs)

Thank you for taking the time to answer!

Reply to "Search Suggestion using DisplayTitle"

An error has occurred while searching: We could not complete your search due to a temporary problem.

7
Clarasiir (talkcontribs)

For the past month of so, CirrusSearch has suddenly and randomly stopped working and given the message "An error has occurred while searching: We could not complete your search due to a temporary problem. Please try again later."

Our wiki has used CirrusSearch for a good while now with no issues, but recently our traffic has slowly been improving, and that is when trouble with search began. Restarting our VPS would solve the issue, but over time search would eventually go down again. As wiki traffic gradually increased, so did the frequency of the error, up to the point where search would go down daily.

Thinking it might be a memory use issue, I created a custom.options file in elasticsearch/jvm.options.d with the settings

-Xms3g

-Xmx3g

Nothing changed at first, as I didn't restart ElasticSearch, but the next morning search was down per usual, so I rebooted the VPS to get it working again. This time, that didn't solve the problem. The message "An error has occurred while searching" was still appearing.

I deleted the custom.options file I had created, and rebooted the VPS again. Still this didn't solve the problem.

To avoid not having any search function at all, we're now using the default mediawiki search. But I would much rather have CirrusSearch back again, so does anyone know what I should do to solve this issue and stop search giving nothing but error messages?

DCausse (WMF) (talkcontribs)

Hi,

I would suggest to analyze the elasticsearch logs to understand if it is having issues and why. The error you describe could have a wide variety of causes:

  • network issue between mediawiki and elastic
  • health status of your search indexes
  • elasticsearch crashing

https://www.elastic.co/guide/en/elasticsearch/reference/8.13/fix-common-cluster-issues.html might be interesting, note that this doc is for 8.13 and you might be running an older version but I suspect that most information you will find there still applies for 7x.

Please let us know if you have more precise information about the issue you are facing.

Good luck!

Clarasiir (talkcontribs)

Okay, well I checked the elasticsearch.log but it didn't seem to have anything useful. It did have the message "Native controller process has stopped - no new native processes can be started," but there were no other error messages or an explanation as to why search stopped. I'm not even sure if that's an error or that's just when I disabled CirrusSearch because it had already stopped working anyway and was showing the "error has occurred while searching" message.

I have noticed that with CirrusSearch disabled, our server's total memory use is very low. Just enabling CirrusSearch makes it jump to over 65%, and as time passes that number will slowly creep higher to around 80-83% before search then goes down.

To me that sounds similar to the "Circuit breaker errors" in the guide you linked, but there's no error like that in the logs. Our wiki does use an older 7.x version of elasticsearch, so I don't know if error logs work differently for this older version?

DCausse (WMF) (talkcontribs)

Unfortunately without more details I can only give you very broad guidance only, first I would try to understand if elasticsearch dies or not. It could be killed by the JVM itself because of high GC overhead but in that case you would see an error in the logs or it could be killed by the system oomkiller (which can be inspected in system logs or dmesg). Circuit breaker errors should also be logged in elasticsearch logs so you would have seen those, but if you believe this is affecting your setup please see: https://www.elastic.co/guide/en/elasticsearch/reference/7.17/fix-common-cluster-issues.html#circuit-breaker-errors

Have you followed https://www.elastic.co/guide/en/elasticsearch/reference/7.10/system-config.html when setting up elasticsearch? If not I would encourage you to follow this documentation and make sure that your system is properly setup for elasticsearch to run smoothly.

Hope it helps, good luck!

Clarasiir (talkcontribs)

I'm not sure what more details you want me to provide, but our elasticsearch certainly does not have a server to itself, and I wouldn't want to change the settings as if it did only to have that crash our server or something.

We use a shared VPS server with 4 cores and our container having 8 GB guaranteed ram. That has been enough for elasticsearch to function with the default settings up until recently.

DCausse (WMF) (talkcontribs)

As I said earlier the CirrusSearch error message alone is not precise enough to identify the cause of the issue and without knowing the cause I can't guide you on a solution. I can only advise you to continue troubleshooting the problem until you understand its cause. Have you tried seeking for help on other forums more dedicated to elasticsearch? You might certainly get more precise guidance on how to troubleshoot an elasticsearch instance.

Clarasiir (talkcontribs)

I see, then I will try an elasticsearch forum and see if I can get more help figuring things out there, thank you.

Reply to "An error has occurred while searching: We could not complete your search due to a temporary problem."

Username and password authentication for Elastic server?

3
Brooke Vibber (WMF) (talkcontribs)

I tried setting up a local development instance using a default Docker installation of Elastic. This creates a username "elastic" and a generated password for authentication, however I can't find anything about specifying authentication in search host configuration for CirrusSearch, and the updater script won't connect with anything I've devised yet.

It doesn't seem to work to specify "elastic:<password>@localhost:9200" as the host:

Elastica\Exception\Connection\HttpException from line 186 of /var/www/html/w/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php: Malformed URL


nor "elastic:<password>@localhost" with default port assumed:

Elastica\Exception\Connection\HttpException from line 186 of /var/www/html/w/extensions/Elastica/vendor/ruflin/elastica/src/Transport/Http.php: Couldn't connect to host, Elasticsearch down?

Prefacing with 'http:' or 'https:' makes no difference.

Any ideas? I'm hoping to get this running so I can do some fixes on Extension:MediaSearch on my local development site. Thanks!

EBernhardson (WMF) (talkcontribs)

Generally i would suggest using docker-registry.wikimedia.org/repos/search-platform/cirrussearch-elasticsearch-image:v7.10.2-5 as it contains the extra plugins we use. This is based off the elasticsearch-oss image. Alternatively you can directly use https://www.docker.elastic.co/r/elasticsearch/elasticsearch-oss image. This image doesn't contain authentication, because the auth isn't part of the OSS offering.

I haven't tested it, but you should be able to provide authentication as part of the connection configuration. For example (untested):

    $wgCirrusSearchServers = [
        [
            'host' => 'localhost',
            'port' => '9200',
            'username' => '...',
            'password' => '...',
        ]
    ];
GregRundlett (talkcontribs)

The above dictionary worked for me. (configuring $wgCirrusSearchClusters['default'])

Reply to "Username and password authentication for Elastic server?"