Jump to content

Topic on Talk:Search/Old

New search not good at searches of the form "Bloggs, Joe"

5
Jason Quinn (talkcontribs)

I do a lot of reference improvement on Wikipedia. Our cite templates put the last name of an authors first, followed by a comma, and the given name (e.g., "Bloggs, Joe"). One very common task among people who improve references is to look for existing articles for reference authors so that authorlinks can be added. This was conveniently done for instance in Firefox by just highlighting the name and using the right click menu to search on Wikipedia as I have it setup to do. The old search was pretty amazing at finding the correct author article even when the search had the names backwards. The new search engine is hit or miss and overall I would say it is pretty terrible at it. For example a search for "Gettleman, Jeffrey" on the English Wikipedia completely misses the existing article "Jeffrey Gettleman". I'd estimate that the new search "completely misses" the obvious intended page about a quarter of the time when using this search format. About half the time the intended target is within about the top 5 results. And the remaining quarter of the time it actually finds the target. This is compared to the old search which is almost always spot on.

I don't know how the internals of the old search work, but I would venture a guess that when a search term like "X, Y" is given, it does a search for "X, Y" and "Y X". Perhaps that is missing in the new search. When I said above that the old search was pretty amazing, I meant it. Very often, it returns the intended result first even when initials are used for the first and/or middle name. It's also possible that it was taking into account redirects and stuff to figure out the intended target.

I've been using the new search (the beta implementation) for a while now. My overall impression is that except for the above issue, where the new search is inferior, I haven't noticed any significant change in quality of the returned results. They are about equally good and I would have trouble noticing which engine I was using if I had to guess.

Jason Quinn (talkcontribs)

Another example on the English Wikipedia I found right after my post above is "Georgiadis, Nicholas J." which does not list any author article in the results. Searching for "Nicholas J. Georgiadis" still does not find any author article. Finally if "Nicholas Georgiadis" is searched, it finds the article for "Nicholas Georgiadis". Clearly the search is not being as "fuzzy" as it needs to be.

NEverett (WMF) (talkcontribs)

I can explain the "Georgiadis, Nicholas J." issue - his page doesn't have a "J." in it at all. If you add a redirect from "Georgiadis, Nicholas J." to "Georgiadis, Nicholas" Cirrus will pick it up. Or something - so long as a J. ends up in the page. I tried the search with lsearch and it didn't find the "Georgiadis, Nicholas J." article at all either.

As for how the old search handles "X, Y" vs "Y X": 1. It searches for articles containing X and Y, unions the set together. 2. Of those, it runs down the positions of X and Y and if they appear close to each other and in the order they appear in the search query it pushes that match up in the ranking. 3. If they appear close together but not in the right order then it pushes them up, but not as far. (I think this is true, at least.)

Cirrus right now only does steps 1 and 2. I think I can replicate that last behavior in Cirrus which should help your searches.

There are other searches you could do in the mean time that'd pull the author up in the results but they aren't as quick to type. Stuff like <<Georgiadis, Nicholas hastemplate:"Template:Infobox person">>. It'd be useful for a tool but isn't fun to type.

Jason Quinn (talkcontribs)

This still seems to be an issue. The new search simply isn't good at finding articles when the given name and surname are reversed.

NEverett (WMF) (talkcontribs)

I agree. Its something that I spent some time working on but never got to finish.

Reply to "New search not good at searches of the form "Bloggs, Joe""