Jump to content

Talk:Cross-wiki Search Result Improvements

About this board

Ignore image annotations on commons

2
Vexations (talkcontribs)
DTankersley (WMF) (talkcontribs)

Hi @Mduvekot, the testing URL that you're using in your sample is currently awaiting a new code update to not display the commons / multimedia results in the sister project snippets on English Wikipedia.

Reply to "Ignore image annotations on commons"

How to start cross-wiki search on my own wiki?

3
36.227.241.50 (talkcontribs)
Omotecho (talkcontribs)

Hi, I looked at the linked image and its /ja page: have you had a look at Cross-wiki Search Result Improvements , where everything seems to start/evolve? Pardon me if I am too short-sighted. Cheers,

220.246.250.11 (talkcontribs)

I've seen it, but the documentation doesn't tell me how to enable it on other wikis.

Reply to "How to start cross-wiki search on my own wiki?"

Issue: Commons multimedia results disabled all wikis?

7
197.235.241.38 (talkcontribs)

Steps to reproduce:

  1. Go to pt:wikipedia

Expected:

In the sidebar for interwiki searches some images of milk from commons, e.g. commons:Special:search/~milk

Actual:

Sidebar with interwiki results but no images from commons.

197.235.241.38 (talkcontribs)

I think this was only meant to be disabled on english wikipedia.

DTankersley (WMF) (talkcontribs)
197.235.248.150 (talkcontribs)

You're welcome.


As a sidenote, I think the reason people didn't realize it was missing is partly because it is completely random whether anything will ever appear because the sidebar with search results isn't always there. I noticed this problem months ago, but originally thought perhaps the images were simply not matching or it was taking a long time to load.


As a potential future change it might be useful to keep a heading or button always available to show potential results, consider for example, if one searches in google (www.google.com/search?q=lx14566&source=lnms&tbm=isch&sa=X&ved=0ahUKEwjYns3wnMTkAhWTlFwKHTD8DHsQ_AUIEygC&biw=1299&bih=637) for a non-existing image. One would note that despite the search not "currently" showing any images it still contains a heading that people can click on to see if there are any.


Perhaps there could be a button like "Show sister wiki results", or something like that. Just a thought.


DTankersley (WMF) (talkcontribs)

Thanks for the suggestions! :) The patch is rolling out this week on the train, please let us know if you see more weirdness on the search results page.

197.235.230.130 (talkcontribs)

It seems to be working again.


Although it is odd that the same search string (in different wikis) may not always bring up commons search results. For example:

Leopard seal

shrimp fisherman

What makes it weird is that the file title actually contains those strings. On a positive note, it seems to be nicely matching some commons structured data, so while it can't seem to find the content in one label, it finds it in another, e.g.:


It also seems to be quite accurate when searching for a single word, presumably if that's at least included in the title. But with two or more words it seems to fail fairly often, even in cases where there is a perfect match in the title, e.g.: "leaning tower of pisa" , https://fr.wikipedia.org/w/index.php?sort=relevance&search=leaning+tower+of+pisa->https://commons.wikimedia.org/wiki/File:Leaning_tower_of_pisa.jpg.

197.218.88.244 (talkcontribs)

Apparently this issue came back, none of the links above seem to show any commons images.

Reply to "Issue: Commons multimedia results disabled all wikis?"

Suggestion: Provide search suggestions using synonyms from wiktionary when the search matches a title

7
Summary by 197.218.91.95
197.218.83.157 (talkcontribs)

Issue

As a user I'd like to be presented with suggestions to improve my search.

Background

Currently search depends entirely on a word either matching the search terms, or matching the title of a page. This reduces the usefulness of the search when a word can mean so many things, for example, looking for "trunk", one may mean a proboscis ("elephant's trunk"), boot (a part of a car), a part of a tree, part of a body, and so forth.

Proposed solution

  • Provide a search suggestion: "you may be interested in : proboscis, boot ' using words extracted from the Synonyms sub-heading

Considering the different wiktionaries and different headings or rules in each wiki, this may not be feasible until there is some way to store these in a structured manner.

Even so, just showing the contents under the synonym (and similar ones in other wiktionaries) heading will be a good short term improvement.

197.218.83.157 (talkcontribs)
DTankersley (WMF) (talkcontribs)

Hi, this is an intruging suggestion, thanks for posting.

However, using 'food' or 'greed' or 'house' as a search term goes directly to the article page on that search query. Are you suggesting to add the 'you maybe interested in ___' phrase with synonym subheading within article pages?

197.218.80.160 (talkcontribs)

The suggestion was actually to add it either just below the search box in Special:Search .


>search term goes directly to the article page on that search query

Well, that's because those are single word queries, and also because that's English wikipedia, and they are addicted to creating redirects for everything. In fact, the reason it works is probably because they used a bot to find synonyms and add redirects to make up for the limitations in the search engine.

In smaller wikipedia, uncommon words will always have such problems:

In some of the above results wiktionary sister search provides enough context for a person to improve their search. In other cases, it doesn't help.

Seems like a somewhat similar idea has been suggested (although not exactly the same):

https://phabricator.wikimedia.org/T127874

https://phabricator.wikimedia.org/T85770

The automobile example still doesn't show the expected results, and wiktionary snippets are not that helpful in that case, but its synonyms are (https://en.wiktionary.org/wiki/car#Synonyms) "private vehicle that moves independently): auto, motorcar, vehicle; automobile (US), motor (British colloquial), carriage (obsolete)".

If the user had been suggested "US automobile", and they used it, they would have likely found the page they were looking for.

197.218.80.160 (talkcontribs)

Also, it might be worth evaluating the possibility of disabling (for unregistered users) the "automatic go to article".

It can often be very confusing to type something and suddenly be taken to an article, which in some cases may be unrelated, as the user may merely want to have an idea of the existing pages before improving search keywords.

Entering a wrong keyword there can also quickly pull you into a completely unrelated wikimedia project, e.g. Special:Search/meta:monkeys or randomly push you into random wikis (https://www.mediawiki.org/wiki/Special:GoToInterwiki/wikicities:monkeys).

Not to mention that it may result in a lot of deadends and needlessly skew the search results statistics.

DTankersley (WMF) (talkcontribs)

Hi,

I've added this conversation to the older (but similar) ticket you noted earlier: https://phabricator.wikimedia.org/T85770. We'll take a look at this and scope out the work that this type of new update would probably need and then prioritize it from there. :)

197.218.91.95 (talkcontribs)

This project would likely give the search engine a huge boost: https://phabricator.wikimedia.org/T986 (although that may take a year or more to complete). Since that will make it possible to differentiate between synonyms and the same word in different dialects, e.g. boot vs trunk, pants vs underwear , and automobile vs motor.

Thanks for considering the idea!

Reply to "Suggestion: Provide search suggestions using synonyms from wiktionary when the search matches a title"

How is the ranking of sister projects determined?

5
Summary by 197.218.81.79

The wiki blocks are ordered by recall (most to least number of articles returned from each project). . Large wikis are likely to be ordered first frequently. Concerning wikivoyage there's a small variation, a filter wikivoyage results on title.

Commons uses boosted-templates (https://phabricator.wikimedia.org/T163223).

Tbayer (WMF) (talkcontribs)

I just tried out searching for some popular travel locations, and Wikivoyage came out last or next to last (i.e. below the fold) in each of these examples:

(Examples are not cherry-picked, these were just the first few that that came to my mind.)

197.218.81.178 (talkcontribs)
MPopov (WMF) (talkcontribs)

Projects should – in theory – be ordered according to recall (most to least number of articles returned from each project). And this is mostly true if you open each sister project's search results page separately.

Looking at InterwikiSearchResultSetWidget in MediaWiki Core, it does not appear there is explicit front-end code for ordering the projects when the SERP is rendered and the order is determined by Cirrus (IIRC) on the back-end when it returns the interwiki results – which should be according to recall.

Looking at InterwikiSearcher.php in Cirrus source, we still have code from the first cross-wiki search A/B test where results can be returned in a random order if the configuration requests it, but they should be returned according to recall in production. Although maybe we're accidentally still using the static order that the switch statement defaults to. I'll reach out to @EBernhardson (WMF) and @DCausse (WMF) for clarification.

DCausse (WMF) (talkcontribs)

Absolutely, the wiki blocks are ordered by recall. Large wikis are likely to be ordered first frequently. Concerning wikivoyage there's a small variation. During the RFC it was requested to strongly filter wikivoyage results on title. Today we ensure that 80% of the search terms (stop-words excluded) appears in a title for wikivoyage results. In other words it decreases recall for wikivoyage and probably one of the reason you feel that wikivoyage is ranked so badly. Without the title filter wikivoyage would be ranked #3 (just below wiktionary) for the query Alaska.

197.218.81.79 (talkcontribs)
Reply to "How is the ranking of sister projects determined?"

Suggestion: Always make wiktionary results as the first in the sidebar

8
197.218.90.174 (talkcontribs)

Issue:

As a user, it frequently happens that when I mistype a search string I give up searching because it shows no local results.

Background:

Wiktionary results are almost 100% always relevant. The reason is pretty simple, it adds a natural disambiguator, and it may serve as an improved "did you mean". It may help the reader / user to correct their results and search again. As it contains a lot of words and relevant synonyms, it also helps in the scenario where one looks for "automobile", "vehicle", "car" helping non-native speakers find more common words.

Finally the primary reason is that it is often the case that someone may be interested in a very simple fact about something, and they may find exactly what they are looking for without clicking any search results, by just looking at the wiktionary text snippet.

Proposed solution

  1. Always put the wiktionary results as the first box in the sidebar cross-wiki results
  2. Allow snippets to store more info so the wiktionary snippet can show more info at a glance
  3. Potentially look into using Extension:TextExtracts to parse the page and show only the most relevant snippets without all maintenance templates and other unnecessary content.

It might also be fruitful to deploy the wiktionary results to all projects, including this one. The context it provides tends to help users retype their search queries and find what they are looking for.

DTankersley (WMF) (talkcontribs)

Hi,

Thanks for your suggestions—we have a future update to the search results page that is a Wiktionary widget that I hope would be good for you and all our users. You can read more here and the A/B test page is here as well as a self-guided test you can add to your own logged in account.

Once we get done with all the testing and chat with the community about it, I think it would be great if we can put it on all Wikipedias and sister projects.

However, Wiktionary results won't always display, it just depends on the query the user inputs. But, we'll see what surfaces in our testing. :)

197.218.83.157 (talkcontribs)

It looks pretty good. But it is missing the most interesting thing, clearly highlighted synonyms !

The results don't seem to clearly emphasize them: https://en.wikipedia.org/w/index.php?search=trunk&title=Special:Search&profile=default&fulltext=1&searchToken=3h3zlq4ixyinhbe07ko2gqjp1

It currently shows synonyms jumbled together with basic definitions of the word, when these are pretty useful for anyone searching and should at least be in bold.

It might also be useful to include the related image (https://upload.wikimedia.org/wikipedia/commons/thumb/e/e5/Yellow_birch_trunk.jpg/220px-Yellow_birch_trunk.jpg).

In cases where wiktionary doesn't show anything, the other projects might make up for it.

197.218.90.174 (talkcontribs)

Correction: As a user, it frequently happens that when I mistype a search string I give up searching because it shows no local results.

Daylen (talkcontribs)

197.218.90.174 I have updated you initial question. If you press the three dots next to a post, an option to edit the post appears. Have a nice day!

197.218.90.174 (talkcontribs)

Thanks. In case you weren't aware logged out users cannot edit their posts using Flow , they can only edit the title...

Daylen (talkcontribs)

Thank you for letting me know. It was my understanding the IP users could edit posts that they created, why can't they @Quiddity (WMF)?

Quiddity (WMF) (talkcontribs)

Hi, it's configured that way because IP addresses can be shared between many people (hundreds or even thousands). (Known addresses are sometimes tagged with a template, such as w:en:Template:Shared IP header templates at Enwiki, but it's not always obvious, and is inconsistently & manually applied.)

Note that aside from that restriction, there is also a restriction on who can edit other people's posts (Flow#Can I edit other people's posts?).

There was some discussion (a long time ago) about the desire to "enable IPs to edit their posts for x minutes after save" (5 or 30 etc). But I don't think anyone ever filed a task for it. I've now filed phab:T169167.

Reply to "Suggestion: Always make wiktionary results as the first in the sidebar"

Simple English Wikipedia

2
Daylen (talkcontribs)

Why isn't the Simple English Wikipedia shown in the cross-wiki search results for articles which have identical titles in the English Wikipedia? The Simple English Wikipedia is not known by many Wikipedia users, and I believe that it would be beneficial to include results from it with other cross-wiki results on the English Wikipedia.

DTankersley (WMF) (talkcontribs)

Hi @Daylen, thanks for the question.

The Simple English Wikipedia has it's own sister projects displaying in the search results page, but only Simple Wiktionary and Simple Commons (multimedia), as shown in this query for Paris (Simple Wikiquote is locked).

As you mentioned, Simple English Wikipedia is not well known by most Wikipedia users; however, including results from SimpleEnWIki in the sister projects on English Wikipedia would probably cause a lot of confusion for the general population, because they don't know the project exists.

It'd be great if there was a better way to encourage discovery, reading and editing in Simple English Wikipedia; please let us know if you have any ideas. :)

Reply to "Simple English Wikipedia"

Do we want these new search results to work across all Wikimedia projects?

10
CKoerner (WMF) (talkcontribs)
  1. For example, if I'm on Wikiquote, do I want to also see relevant search results from Wikivoyage, Wikipedia or Wikinews?
  2. Or, if I'm on Wikipedia, just show me results from other projects?
Jeblad (talkcontribs)

I wonder if specific projects have a given relevance for other projects, like Wikitionary have a higher relevance for Wikipedia, and a lower for Wikispecies. It will probably also change given the categorization of pages within the projects. Wikispecies has a high relevance for articles in Wikipedia within biography, but would have a low relevance for art.

If you do a search in a project, then the categories could be used as an indicator for how relevant (likely) some other project would be, given this specific result set. If a project is highly relevant, then the number of hits could be increased from 1 to 3 (just an example, use whatever number).

EncycloPetey (talkcontribs)

It really depends on the nature of the question. If someone is looking for the meaning of the Latin word ''vicesimanus'', Wiktionary information will be of most use, and it may not matter which language Wiktionary the results come from, as the word may only appear in a few projects, and might be illustrated with a picture, with a list of translations into other languages, or at least with an explanation in another language besides Latin. Likewise if someone is looking up the pronunciation of a word, or its syllabification for the purposes of hyphenating it, or synonyms. All of these features of a word may be presented on any Wiktionary, and may be found independently of the project language.

TJones (WMF) (talkcontribs)

I don’t think the average user searching English Wiktionary would be happy with a definition of a Latin term that was in Finnish, Russian, or Chinese—generally in any non-Indo-European language or any language that doesn't use the Latin alphabet. The lack of readable cognates makes those pages useless. Look at the Russian page for gato (Spanish "cat"). If you don't at least know some Cyrillic, you can't get much out of that page. Finnish gato is actually better than I expected, but only because there are some cognates (Espanja, Portugali, and substantiivi). You can translate those pages using your browser or online tools, but I think that's getting into the realm of “power users” unfortunately.

My intuition is that what most people want is results in the language of the project they are on, or projects in the same language. (Exception: when their query is clearly in another language. Exception to the exception: when they are on Wiktionary—which is where I often go for words I don’t know even when they are not in English.) Users could also use results in other languages they can read (which they need to specify or we need to surmise, say, based on browser settings). Only power users and researchers are going to dig into results for languages they don't know. This may change over time as machine translation gets better and people become more sophisticated about handling text in other languages—but I think most people aren't there yet.

I’m open to other opinions on user preferences and the typicality of any given use cases, of course!

However, there may be some technical limitations. We can’t index English Wiktionary both with all the other English projects and with all the other Wiktionary projects. Searching across all Wiktionaries without a shared index is probably too resource intensive to be practical.

EncycloPetey (talkcontribs)

Re: "Only power users and researchers are going to dig into results for languages they don't know." I disagree. During the time I was seriously active on Wiktionary, requests for translations into languages the user did not know were very common. We had daily requests for assistance.

TJones (WMF) (talkcontribs)

Interesting! Requests for translations into, say, Russian, seems very different from using a Wiktionary page in Russian (without machine translation).

197.218.89.65 (talkcontribs)

For example, if I'm on Wikiquote, do I want to also see relevant search results from Wikivoyage, Wikipedia or Wikinews?

Or, if I'm on Wikipedia, just show me results from other projects?

This answers it better than anything (in short both):

https://xkcd.com/214/

To be clear, this "problem" should be expanded to most projects so that anyone can keep jumping from wiktionary or commons to others, and back again in a perfect loop. If nothing else it helps with cross-wiki vandalism detection.

DTankersley (WMF) (talkcontribs)

Great suggestion and yes, we've thought about it and Wikivoyage is actively talking about it (adding in results from other projects into their project).  :)

I've added a phab task to keep it on our backlog for now.

197.218.80.220 (talkcontribs)

Considering this has been ~1% deployed for years in the italian wiki (it.wiktionary , it.wikivoyage, etc) projects(https://it.wiktionary.org/w/index.php?search=~rome&title=Speciale:Ricerca&profile=default&fulltext=1&searchToken=4lpgbyehwomrct7tasvm04c7), and probably without complaints, it seems that this would be pretty much welcome on most sister projects.

It might be worth considering sister search for mediawiki.org . The natural sisters could be :

  • meta.wikimedia.org - (it is also confusing how documentation is split between this wiki and meta and this might bridge the gap)
  • commons.wikimedia.org -pdfs (e.g. for about lua programming , programming in general and other illustrations)
  • wikitech.wikimedia.org - (there seems to be documentation tidbits there that are useful for general mediawiki users)

The meta wiki on the other hand could probably search all wiki projects (https://wikimediafoundation.org/wiki/Our_projects) as suggested here: https://phabricator.wikimedia.org/T87632, and https://meta.wikimedia.org/w/index.php?title=Meta:Babel&oldid=11078192#footer. Perhaps wikimediafoundation.org or wikimedia.org could also be considered.

Enabling it on meta should be pretty non-controversial and would make it the go to place to search all wikis and already has users who agreed to it.

Also, the original task to enable it everywhere seems to be this.

DTankersley (WMF) (talkcontribs)

Thanks! I've added your suggestions to the ticket I created yesterday and I also added a comment onto the original task as well. :)

Reply to "Do we want these new search results to work across all Wikimedia projects?"

Suggestion: Reduce the weight of the description page and add a reporting tool

2
197.218.81.62 (talkcontribs)

One of the drawbacks of simply searching the file description and title is that it can be very unreliable and can easily be used to vandalize. This means that searching for "whore", surfaces obvious vandalism like this , . The image of a famous woman being "shown in results" for whore, or a "molester" (https://en.wikipedia.org/wiki/File:Photo_of_molester.jpg, Carl_Sagan)

Proposed solutions:

  1. Reduce the weight of the file description page - it can often be misleading and just plain bad especially on non-English wikis.
  2. Add a reporting tool to make it easier for people to report / remove vandalism - this will also help get more eyes on commons, and potentially somewhat reduce their workload.
  3. Prioritize media used locally in multimedia results - images used locally might be more relevant to the project and to the search.
DTankersley (WMF) (talkcontribs)
Reply to "Suggestion: Reduce the weight of the description page and add a reporting tool"

Suggestion: Show page image (thumbnail) for wiktionary results

2
197.218.89.1 (talkcontribs)

Issue

Search results don't provide enough visual clues to make it easy to find content.

Background

While normal search results from commons are unstructured and may result in a lot of false positives. Wiktionary's narrow focus makes it quite useful to use as a "visual" dictionary, and it often avoids controversial or images.

Proposed solution

When wiktionary returns search results show, the page image used in the page as a thumbnail show the image:

  • A small icon near the wiktionary results (the pageimage)
  • A multimedia result at the top of the side box

This seems to be quite good results compared to commons, e.g.:

Term Commons Wiktionary( see image)
hair hair https://en.wiktionary.org/w/index.php?title=hair&action=info
ram ram https://en.wiktionary.org/w/index.php?title=ram&action=info
unicorn https://en.wiktionary.org/w/index.php?title=unicorn&action=info
shag https://en.wiktionary.org/w/index.php?title=shag&action=info
Honey (mel, honig)

As demonstrated by the last row, it also returns useful images for the same term in various languages. Until a better system comes around this seems to be a reasonable alternative.

DTankersley (WMF) (talkcontribs)

Thanks for the suggestion and samples! We'll take it into account when we start working on the thumbnail icons next to the search results. :)

Reply to "Suggestion: Show page image (thumbnail) for wiktionary results"