Help talk:CirrusSearch/Flow

About this board

Edit description

This page is an archive. Do not add new topics here.

Please ask new questions at Help talk:CirrusSearch instead.

Start a new topic

How to exclude redirects from search results?

16 comments • 04:54, 7 December 2024 13 days ago

16

Summary last edited by Speravir 07:31, 11 June 2024 6 months ago

phabricator:T90807, Closed, Declined.

phabricator:T64680, Closed.

Thread:Help talk:CirrusSearch/exclude redirections ?

phabricator:T204089

Automatik (talkcontribs)

Hi. Is there any way to exclude redirects from search results? I want, e.g., to find entries that are not redirections and that contain some character in their title. How to do that?

Reply 03:15, 23 August 2018 6 years ago

TJones (WMF) (talkcontribs)

Unfortunately, there's no easy way to exclude redirects from search results.

However, depending on the scope of the task you are trying to complete and your technical ability you could try to use the Search API to semi-automatically do what you need.

This query will give you back the top results with "English Wikipedia" in the title or a redirect:

https://en.wikipedia.org/w/api.php?action=query&list=search&srlimit=50&srsearch=intitle:%22english%20wikipedia%22

The default format is JSON converted to HTML so it's easy to read for a human, but hard to read for a computer. If you only have a small number of queries to deal with, and only need a limited number of results from each (up to 500—set by srlimit), you might be able to get what you need by getting these results and looking through the titles by hand.

If you need a computer to process the results for you, say, because you have many queries, you can get real JSON by adding &format=json:

https://en.wikipedia.org/w/api.php?action=query&format=json&list=search&srlimit=50&srsearch=intitle:%22english%20wikipedia%22

On a Unix-like command line (I'm working in Terminal on OS X) you can use curl to fetch the JSON, python to make it pretty, and grep to pull out the titles, and grep again to find the specific ones you want:

curl -s "https://en.wikipedia.org/w/api.php?action=query&list=search&format=json&srlimit=50&srsearch=intitle:%22english%20wikipedia%22" | python -m json.tool | grep "\"title\":" | grep -i "english wikipedia"

Note that the API URL is URL-encoded (spaces become %20, quotes become %22, etc.).

Results:

   "title": "English Wikipedia",
   "title": "Simple English Wikipedia",
   "title": "Notability in the English Wikipedia",

The results aren't pretty, and in this case there are only 8 results total and 3 that are not redirects. If you are searching for specific characters, you may need to do some more pre-processing before the final grep. (If you are searching for "e", everything will match, because "title" has an "e" in it, for example.) If you need to go through more than the top 500 results, you'll have to figure out how to get the API to give you additional results, etc.

It's not pretty and it's not easy, but it's a start.

Reply Edited 16:30, 23 August 2018 6 years ago

Automatik (talkcontribs)

Thanks for this answer. It is clearly not easy or convenient, and pretty similar to run the query manually (then, filtering visually with CTRL+F "(redirection" and picking only the results without the "(redirection" text highlighted. Developers should add an option "do not follow redirects", to avoid tedious work for all users using this functionality. I guess it is not so difficult, as this option already exists in some use cases (e.g. when displaying a page with &redirect=no).

Reply Edited 16:45, 23 August 2018 6 years ago

TJones (WMF) (talkcontribs)

It is very similar to the ctrl-F solution, just more automatic! For me, somewhere around 25 to 50 queries it would be faster (or at least less boring and thus less error-prone) to go for a hacked-together semi-automatic solution.

Adding a title-only index is probably not a trivial change to make from our current state. We have a search index for intitle:, with the text from titles and redirects in it. There's no differentiation between the title and redirect text once it's in the index. I think we'd have to create another field that was title-only (and maybe a redirect-only field would be equally useful—which together would be bigger than the size of the current title index).

It's not clear to me how many people would need such an index. I'm really curious what your use case is—both to get a sense of how useful title-only search would be, and to see if there's a better clever way to get what you need.

You could open a Phabricator ticket and ask for this feature, but that certainly doesn't guarantee that it would be implemented any time soon.

Reply 17:03, 23 August 2018 6 years ago

Automatik (talkcontribs)

On the French Wiktionary, we use the typographic apostrophe in titles, instead of the typewriter/vertical apostrophe. I was looking for titles that use the vertical apostrophe, without being a redirection.

Moreover, I am using Windows, which is less convenient than Unix-like command line regarding command-line tools (documentation unclear/not a unified way to run commands in Windows, etc.)

Reply 17:18, 23 August 2018 6 years ago

TJones (WMF) (talkcontribs)

Ah.. that's a sensible use case. No other obvious solution comes to mind, but I'll think about it more and if I think of anything useful I'll let you know.

If you are already familiar with Unix-like commands (or want to learn), but just don't have them available because you are on Windows, you could look at Cygwin (English WP, French WP, website)—it's not an emulator or virtual machine, it just gives you versions of standard Unix commands that work on Windows. I used it about 15 years ago when I had a Windows machine for my job. I found it very useful back then, but haven't used it since.

Reply 17:28, 23 August 2018 6 years ago

Automatik (talkcontribs)

Thanks for the advice, however the bash terminal from Cygwin does not work (and the solution suggested in https://superuser.com/questions/1172759/cygwin-error-failed-to-run-bin-bash-no-such-file-or-directory does not work out either). Moreover, now that I have installed the program, I cannot uninstall it anymore (at least, not easily), as it does not appear in "Programs and features", and when I click "Uninstall" from a right click on the program icon, it opens the "Programs and features" windows, anyway.

Reply 18:08, 23 August 2018 6 years ago

TJones (WMF) (talkcontribs)

Oh no! I should have known better than to suggest software I haven't used in so long—but it was so nice back in the day. I haven't used Windows in almost 15 years either, so I don't really have any helpful advice. Crap, I'm sorry!

Reply 18:20, 23 August 2018 6 years ago

Automatik (talkcontribs)

No worries: I "uninstalled" it by removing its folders, and re-installed it using another repository, and now it works! Thanks for the tip then. To look for more than 500 results, I added the &sroffset=500 parameter (then 1000, 1500,... until no results are found)

Reply 22:32, 23 August 2018 6 years ago

Speravir (talkcontribs)

Oh, slightly funny: Unaware of this thread I recently opened a ticket on Phabricator: phab:T204089.

Reply 22:21, 19 September 2018 6 years ago

197.235.98.211 (talkcontribs)

It seems that it used to be possible to filter redirects at some point, and this was removed https://phabricator.wikimedia.org/T5174, https://phabricator.wikimedia.org/rMW52e699441edf2958701cea692a5dc3243ec3b064.

It seems developers are confused and going back and forth between removing and readding redirects to search. As the old saying goes, "clients don't know what they want". Anyway, a more sensible approach would be a degree of faceting, where it returns all results but aggregates similar properties, e.g. many pages will be in the same category, or many pages will be redirects, disambiguations, poor quality stubs, etc...

It is probably simpler to resolve this using the API, since it already has options for redirect titles. There are also at most about 10000 results, so it would probably be less challenging to filter through those. Anyway, if the search results aren't too many it is easier to include redirect title in API search results and use your favorite replace tool to clean up all those that don't match, e.g. https://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&format=json&list=search&srsearch=shakespeare&srlimit=500&srprop=redirecttitle . This would be easier if CSV was a valid API output format.

Reply 23:00, 19 September 2018 6 years ago

197.235.98.211 (talkcontribs)

Also that task is a duplicate: https://phabricator.wikimedia.org/T90807

Reply 23:20, 19 September 2018 6 years ago

Speravir (talkcontribs)

(Nitpicking) @IP, apparently not: User/developer debt closed phab:T90807 as declined, but with the words “If there is more of a use case than what is in this ticket, please reopen and show examples / steps to reproduce.” Well I did not reopen, because this ticket was not found in a search for older tickets, but the same user/dev debt did not close the ticket opened by me. It seems I showed some valid use cases.

Reply 23:39, 19 September 2018 6 years ago

197.235.98.211 (talkcontribs)

Well, it seems more sensible to formulate it as "restore ability to remove redirects from search results" . This was explicitly and deliberately removed for specific reasons.

The general problem with wikis is that they attempt to cater to two sometimes conflicting groups. Pure readers, and editors. The average reader wants the best results, and doesn't even know about the existence of redirects. An editor sometimes wants worse results because they want to address a specific problem.

There are several orders of magnitude more readers than editors, and that's likely the reason it was removed . There is no doubt that such filters have its uses, although the question is whether it justifies the older functionality being restored. Also chances are that "debt" probably forgot about the older ticket or they would likely reopen it, and duplicate that task.

Reply 00:06, 20 September 2018 6 years ago

Speravir (talkcontribs)

Fair enough.

Reply 00:10, 20 September 2018 6 years ago

Fgnievinski (talkcontribs)

A partial workaround is to restrict search to Talk pages, which often are missing for redirects.

Reply 04:54, 7 December 2024 13 days ago

Reply to "How to exclude redirects from search results?"

Can't run UpdateSearchIndexConfig.php file

7 comments • 00:18, 7 December 2024 13 days ago

7

Summary by DCausse (WMF)

Solved by downgrading from php 8.4 to php 8.2

75.130.249.175 (talkcontribs)

MediaWiki 1.39:

When I run the command php UpdateSearchIndexConfig.php in the CirrusSearch/maintenance folder, I get the following error:

[930f130bf0cbf86ca7483c41] [no req] Error: Class "MediaWiki\Extension\AbuseFilter\Parser\RuleCheckerFactory" not found Backtrace: from /var/www/html/w/extensions/AbuseFilter/includes/ServiceWiring.php(113) #0 /var/www/html/w/vendor/wikimedia/services/src/ServiceContainer.php(124): require() #1 /var/www/html/w/includes/MediaWikiServices.php(447): Wikimedia\Services\ServiceContainer->loadWiringFiles() #2 /var/www/html/w/includes/MediaWikiServices.php(285): MediaWiki\MediaWikiServices::newInstance() #3 /var/www/html/w/includes/Hooks.php(174): MediaWiki\MediaWikiServices::getInstance() #4 /var/www/html/w/includes/exception/MWExceptionHandler.php(807): Hooks::runner() #5 /var/www/html/w/includes/exception/MWExceptionHandler.php(336): MWExceptionHandler::logError() #6 /var/www/html/w/includes/AutoLoader.php(244): MWExceptionHandler::handleError() #7 /var/www/html/w/includes/AutoLoader.php(244): require(string) #8 /var/www/html/w/extensions/AbuseFilter/includes/ServiceWiring.php(113): AutoLoader::autoload() #9 /var/www/html/w/vendor/wikimedia/services/src/ServiceContainer.php(124): require(string) #10 /var/www/html/w/includes/MediaWikiServices.php(447): Wikimedia\Services\ServiceContainer->loadWiringFiles() #11 /var/www/html/w/includes/MediaWikiServices.php(285): MediaWiki\MediaWikiServices::newInstance() #12 /var/www/html/w/includes/Setup.php(322): MediaWiki\MediaWikiServices::getInstance() #13 /var/www/html/w/maintenance/doMaintenance.php(83): require_once(string) #14 /var/www/html/w/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php(117): require_once(string) #15 {main}

This file does exist in the AbuseFilter/includes/Parser folder. Does anyone know what's going on here?

Reply 05:20, 15 November 2024 1 month ago

PMiazga (WMF) (talkcontribs)

It's difficult to find out what could cause. Let me ask you couple questions/throw some suggestions before hand:

AbuseFilter files are autoloaded automatically thanks to composer `AutoloadNamespaces`. Please check if you have the merge-plugin enabled - Composer#Using composer-merge-plugin
Did you update/install anything recently? Is it a new set-up/installing new extensions you're trying to finalise, or is it something that worked before but stopped working after update?
I assume you already have both AbuseFilter and CirrusSearch extensions enabled (by calling wfLoadExtension() in LocalSettings file).
Can you specify exact versions of MediaWiki, are you on 1.39.10? I tried to run it and it worked to me, therefore it may be related to specific version or a version mismatch.

Reply 14:08, 15 November 2024 1 month ago

This post was hidden by DCausse (WMF) (history)

Bawolff (talkcontribs)

To confirm, does mediawiki work normally (like during web views) and do maintenance scripts in mediawiki core work fine (like e.g. view.php)? What version of AbuseFilter do you have?

Reply 15:13, 15 November 2024 1 month ago

75.130.249.175 (talkcontribs)

- I'm on MediaWiki 1.39.5.

- Just enabled the merge plugin, all composer json files should be correct. It should be noted that I installed the plugin using the tarball file, as I can't figure out how to install it from git and have it be compatible for MediaWiki 1.39.

- Both AbuseFilter and CirrusSearch extensions are enabled in LocalSettings.php

- I haven't installed any new plugins recently - I just did a full php re-install to see if that was the problem and it wasn't

- My wiki works normally in web view; scripts like view.php don't work because they run into the same error from AbuseFilter

Reply 07:08, 16 November 2024 1 month ago

DCausse (WMF) (talkcontribs)

Could you check if you have multiple versions of AbuseFilter installed?

I wonder if with multiple version installed, an old one gets its class loaded (likely <= 1.36) but the new one gets its ServiceWiring file executed.

Perhaps one way to investigate would be to debug the issue by printing the location of the AbuseFilter classes location:

$reflector = new \ReflectionClass( 'MediaWiki\Extension\AbuseFilter\FilterUser' );
print("FilterUser class location: " . $reflector->getFileName() . "\n");

You could perhaps put this at the very beginning of /var/www/html/w/extensions/AbuseFilter/includes/ServiceWiring.php?

Reply 10:48, 21 November 2024 29 days ago

Bawolff (talkcontribs)

Just to close the loop, this user reports that the issue went away after they downgraded from php 8.4 -> php 8.2

Reply 11:07, 21 November 2024 29 days ago

Reply to "Can't run UpdateSearchIndexConfig.php file"

Search index update

3 comments • 13:26, 19 November 2024 1 month ago

3

Jonteemil (talkcontribs)

In the page it says that the search index will be updated, at least once a day. I've been trying to fix broken files over at Commons that have 0 x 0 px. I used the search fileh:0 filew:0 filetype:image -filemime:image/tiff to find them. Now, files I fixed weeks ago are still listed in the results. When will they go away?

Reply 12:48, 24 July 2023 1 year ago

DCausse (WMF) (talkcontribs)

Thanks for reporting the problem, there seems to be a problem in the way CirrusSearch is handling these edits, I filed Phab:T342562 to track and fix the issue.

Reply Edited 17:07, 24 July 2023 1 year ago

Jonteemil (talkcontribs)

Okay, perfect.

Reply 18:58, 24 July 2023 1 year ago

Reply to "Search index update"

How to search the fields of the File information template on Commons?

6 comments • 12:49, 10 November 2024 1 month ago

6

Prototyperspective (talkcontribs)

Please see this thread. How to search for example for a specific string specifically in the source field?

Also how can one search for files from a specific uploader? (I'd like to check which of my video2commons uploads were imported below resolution at source.)

Reply 20:42, 10 October 2024 2 months ago

EBernhardson (WMF) (talkcontribs)

Unfortunately, the image description is simply an argument to a template. CirrusSearch doesn't do anything at that level and can't be that specific. Something like insource:kathmandu would require the wikitext source to have the word kathmandu in it, but it's not a great substitute.

Regarding filtering by uploader, I'm not too familiar with how the P170 there is structured, but with structured data available it seems plausible the appropriate information could be indexed. Today though P170 is indexed as a plain statement and does not include any context about it. The best workaround i could provide is that the Information template used on many images renders such that the searching for "Author <name>" , with the quotes, tends to bring up only pictures from them.

Reply 21:05, 10 October 2024 2 months ago

Prototyperspective (talkcontribs)

I don't know why but the results for insource:"kathmandu" don't seem to show the intended results
The uploader username is not in the structured data
The link you shared only shows original works by that username
So I will create an issue for enabling showing uploads by a particular user (please let me know if this could/should be changed in a tool other than CirrusSearch)
I think the best workaround currently would be to use insource with the field name first so for example I searched for insource:"|source=[https://soundcloud.com to identify files for c:Category:Audio files from Soundcloud.com. I think easily searching fields of the File pages' Information template could be enabled by
1. Developing some regex that searches for any content after e.g. |source=
2. Creating some alias for it so instead of writing some complex regex query every time one can simply enter e.g. info-source:"soundcloud.com"

Reply 18:10, 11 October 2024 2 months ago

Keith D (talkcontribs)

A problem with searching the information template fields for things like author is that author also appears in the {{tl|Credit line}} template and the 2 could be different.

Reply 14:46, 14 October 2024 2 months ago

Prototyperspective (talkcontribs)

I first misunderstood what you were saying but understood it via your comment in your proposal. That's may be an issue for other templates, but I think in that case it doesn't matter because it would also contain the same author name so it would even be best if both fields are searched (actually it would be a problem if it doesn't search both fields).

Reply Edited 15:07, 14 October 2024 2 months ago

This post was hidden by Clump (history)

Reply to "How to search the fields of the File information template on Commons?"

Searching talk pages that use Structured Discussions

3 comments • 03:55, 26 October 2024 1 month ago

3

HaeB (talkcontribs)

Is it possible to use CirrusSearch to search (the topic pages of) a particular talk page that uses Structured Discussions (like this one)? I.e. restrict search results to only topics from that talk page.

Reply Edited 03:51, 29 September 2024 2 months ago

Pppery (talkcontribs)

No.

Reply 04:44, 29 September 2024 2 months ago

Tacsipacsi (talkcontribs)

Unfortunately, it’s not possible to search Structured Discussions pages using CirrusSearch at all, with or without constraining the search to a particular page. This is one of the many reasons for which Structured Discussions is deprecated and to be replaced with DiscussionTools.

Reply 16:32, 29 September 2024 2 months ago

Reply to "Searching talk pages that use Structured Discussions"

Abuse filter logs on plwikiquote

4 comments • 11:55, 14 October 2024 2 months ago

4

Ferien (talkcontribs)

I'm not really sure why this is occuring, but I'm pretty sure this isn't supposed to happen in abuse filter logs to this level.

Reply 14:35, 1 September 2024 3 months ago

Tacsipacsi (talkcontribs)

I’m pretty sure plwikiquote shouldn’t block the account from being created (filter 3). I don’t think CirrusSearch is really at fault here – it just tries to create its account on first use. (Since it doesn’t succeed, the next time also counts as the first use. And the next one. And so on.)

Reply 22:21, 1 September 2024 3 months ago

Ferien (talkcontribs)

Thanks, I didn't know why it was occurring or what abuse filter it was relating to as I can't understand the language.

Reply 22:23, 1 September 2024 3 months ago

DCausse (WMF) (talkcontribs)

Thanks, I reported phab:T373778 to have a closer look into it.

Reply 07:29, 2 September 2024 3 months ago

Reply to "Abuse filter logs on plwikiquote"

Archive broken?

5 comments • 10:40, 12 July 2024 5 months ago

5

Beland (talkcontribs)

I was looking for an older thread...I don't see it on this page, so I assume it's been archived. Unfortunately, Help talk:CirrusSearch/LQT Archive 1 is showing up as blank for me. -- Beland (talk) 01:26, 25 June 2024 (UTC)

Reply 01:26, 25 June 2024 5 months ago

Pppery (talkcontribs)

That's intentionally blank, as a result of an untidy refactoring in 2015 that's not worth fixing now. This page uses Structured Discussions, which doesn't have the concept of archiving, and instead uses an infinite scroll system.

Reply 03:17, 25 June 2024 5 months ago

Beland (talkcontribs)

Aha! Hmm, that seems somewhat poor. There's no indication in the UI that scrolling down to the bottom of the page and staying there will show more threads, and there's no apparent facility for searching the entire history of the page? The URL I loaded was:

https://www.mediawiki.org/wiki/Help_talk:CirrusSearch#Relevance_52070

It seems like that should take me to the thread if it's on the page, but I can't tell if it is or isn't, and searching on my username doesn't really work because some threads are collapsed. -- Beland (talk) 03:40, 25 June 2024 (UTC)

Reply 03:40, 25 June 2024 5 months ago

Pppery (talkcontribs)

That would be Topic:S8cojikw0xzel2u8 (found via your contributions). The URL seems to have dated from way back when LiquidThreads was involved, and stopped working in 2015 when this page was migrated.

The chance of this getting fixed, realistically, is zero, since both discussion systems are deprecated and going to be removed someday.

Reply 03:48, 25 June 2024 5 months ago

Beland (talkcontribs)

Ah, whew, I was a bit worried these were going to be spreading to other wikis. 8)

Reply 07:24, 25 June 2024 5 months ago

Reply to "Archive broken?"

Update en-wiki

4 comments • 22:22, 19 May 2024 7 months ago

4

2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)

Ping @User:JWBTH (or anyone who notices this). Referring to these edits, can you also update en:Help:Searching/Regex#Workarounds_for_some_character_classes? I noticed en-wiki's 􏿽 doesn't work but this MediaWiki's 􏿿 does work for newlines.

Reply 12:36, 8 April 2024 8 months ago

2001:14BA:9CD6:4200:D43C:5ABA:9AD8:104 (talkcontribs)

That ping didn't work so I'll try again: User:JWBTH

Reply 14:45, 8 April 2024 8 months ago

JWBTH (talkcontribs)

Done, thanks for pointing out.

Reply 21:34, 15 May 2024 7 months ago

This post was hidden by JWBTH (history)

Reply to "Update en-wiki"

Automatically jump to first result

2 comments • 13:19, 22 April 2024 7 months ago

2

Aschroet (talkcontribs)

Hello everybody, is there a possibility to automatically jump/redirect to the first result? Obviously this works for dewiki

https://de.wikipedia.org/w/index.php?search=Espenfeld

but not for Wikidata:

https://www.wikidata.org/w/index.php?search=Am_Hanffgraben_(Berlin)

Maybe there is a parameter that can be added to the URL?

Thank you in advance, --~~~~

15:21, 3 October 2023 1 year ago

TheDJ (talkcontribs)

There is no such functionality. What you are seeing is title matching. If your search exactly matches the title of a page, it will take you to that page. For wikidata the title of a page is its Q id. So you can do https://www.wikidata.org/w/index.php?search=Q111351350 and it will take you to that Q id.

19:12, 3 October 2023 1 year ago

Multiple keyword searches

2 comments • 11:21, 19 April 2024 8 months ago

2

Seeker1030 (talkcontribs)

Hi how to search using multiple key words, For eg: Libra ascendant born on 1965 how could we search this parameters

Reply 05:28, 25 February 2024 9 months ago

Speravir (talkcontribs)

Simply by typing libra ascendant born 1965 into the search form (I assume "on" is a so called stop word). If there are dedicated categories for a topic you could also use the filter word incategory, e.g. ascendant libra incategory:"1965 births".

Reply Edited 20:09, 26 February 2024 9 months ago

Reply to "Multiple keyword searches"