Jump to content

Talk:Edit Review Improvements/Proposed Huggle improvements

About this board

Queue ordering and the value of icons

4
JMatazzoni (WMF) (talkcontribs)

Thanks so much @Excirial for your feedback. It’s essential we hear from people like you so that we avoid just the type of errors you’re describing. I wanted to start this thread to discuss one of the specific issues you raise. You write:

“For me using Huggle equals looking at the displayed diff while using keyboard shortcuts to navigate the edit queue. I wouldn't take note of any added queue icons or the ORES score….I am manually checking each edit so a machine evaluation is redundant.”

So it sounds like you generally examine edits in the queue sequentially; you don’t pick and choose based on Huggle’s icons or other clues. This is good feedback. You don’t say which filtering option you use for the queue. I’m going to guess it’s “Filtered edits,” which sorts by Huggle score? Is that right?

If so, then at the top of your queue you're seeing the edits Huggle's internal algorithm thinks are the most likely vandalism, followed by less and less likely candidates. So that would explain why going in order works for you. But what if I said there is research to suggest that ORES’s quality predictions (based on its “damaging” test) provide more reliable predictions? (I think this is true; @Halfak (WMF) can comment).

Assuming that’s so, then a queue sorted by ORES score could help you to more quickly eliminate the worst edits, putting your efforts where they'll make the most difference.  What might work for you, I'm thinking, is if we simply offer one or more alternative sorting options based on the ORES scores. Then you could try them and decide which options you prefer. Would that be interesting, do you think?

It sounds like you find Huggle's queue icons irrelevant. I wonder if other users feel differently? There must be a reason why this complex system evolved? I’d be interested in hearing from users who do look at these icons for clues. What is useful, what not? If there are such users, I’m inclined to believe that implementing the ORES icons could still provide valuable data for some. (Though I assume we’ll  let users turn them on or off as they prefer.)

Excirial (talkcontribs)

Hello @JMatazzoni (WMF),

The Huggle queue i use is indeed the default filtered queue that lists all edits except for edits made by whitelisted users. Beyond this i have the page history and user info tab closed and ORES plugin disabled in order to minimize the amount of network traffic and parsing required for each diff.

Before disabling the ORES plugin I did monitor the scores for some time though. In most cases the scores seem to be decent, but i thought there were too many false positives and negatives to base anything beyond a queue sorting on them. For example there is currently a wave of vandalism that adds cat-like typing to article's (Eg: "Zmmajjsnd klksmww Oskkdmma wqpidif"). While clearly non productive to a human editor the ORES score for these edits generally seems to be in the 0-150 range. The inverse also occurs: I saw a new user wikilinking a few general words and receive a 500+ ORES score for his effort. In the latter case i am a bit concerned that adding "Likely bad" icons may cause a patrol to go along with the machine assessment too rapidly and revert without thinking.

Using ORES to sort the edit queue may work though i would note that Huggles queue mechanic is suboptimal for this. In my case i can generally keep up with the queue so there wouldn't be a lot to sort. Ignoring this for a second i'd note that Huggles queue is limited to 200 items before it stops the recent changes feed (default setting) which would lead to lost edits regardless of scoring. The alternative setting is to let Huggle trim the queue by removing the oldest edits. In this case one could possibly use ORES to - for example - remove the lowest scored edits and retain likely vandalism edits in the queue.

This mechanic would have its own drawback though. Huggle is a fantastic piece of software, but one major issue is that it synergizes horribly. If two patrols use Huggle at the same time both will look at and evaluate the exact same edits with the only variance being where they are currently in the queue. While two sets of eyes is generally a good thing, ten concurrent patrols would lead to a lot of wasted effort. If all these patrols would sort based on ORES score we can be nigh certain they would be looking at the exact same queue.

One Wikipedia addition that i have long been hoping for to counter this - and i apologize as i may be straying somewhat out of the boundaries of this subject - is an aggregated edit feed that tracks unreviewed edits. In other words: Pretty much the ReviewStream edit review improvement extended with the ability to track unreviewed edits and the ability to provide extended feedback on an edit as a user. I imagine the reviewsteam returning edits until a quorum (perhaps two?) editors viewed a a specific edit and found it not problematic. Not only would this solve a major issue i have seen for years (Excessive vandalism patrols wasting time on the same edits at one hour and none reviewing whatsoever on another hour), but I can imagine it being employed for other types of feedback as well. For example: More than once i see diffs where a user seems to struggle to achieve something. I try to leave those users an invitation to the teahouse but i can never really follow this up. If one could simply flag the edit as "Promising new editor" another tool or editor could fetch these edits / flagged editors and follow up on them. Heck, i can imagine a situation where a user would set Huggle's queue to "Promising new editors". This would in turn query the reviewsteam for all new editors registered as good which haven't been welcomed by another user yet.

If anything i believe that providing a means to cooperate with other users would be more valuable than providing an individual vandalism patrol with more details regarding an edit.

Excirial (talkcontribs)

Correcting pingback and fixing some typo's.

Smalljim (talkcontribs)

Great thoughts on the next generation of AV programs, Excirial. I really think a new approach is needed instead of trying to ginger up the faithful old workhorse.

Reply to "Queue ordering and the value of icons"

What do you think about the proposed Huggle improvements?

3
Jmatazzoni (talkcontribs)

Please provide responses to the new icon scheme, the standardized ORES classifications, the User Info enhancements, the enhanced filtering, etc.

Smalljim (talkcontribs)

This is a bit off-topic. I'm an admin on en.wikipedia and I've had a lot of experience with anti-vandal/recent changes patrol (AV/RCP) work there, over many years (up to last summer). In 2014 I became dissatisfied with Huggle's limitations and built my own prototype AV/RCP program that I called AddBad. I set out the principles behind it at https://en.wikipedia.org/wiki/User:Smalljim/AddBad , which you might like to read. I haven't done anything with the program for nearly a year now, but the principles on which I built it proved to be very effective at distinguishing vandalism from good-faith edits - as evidenced by the 30,000 or so edits (mostly reverts/warnings/blocks) that I made to en.wikipedia with its aid, with an extremely low error rate. The two main things I learned are 1. there is a lot of useful information available that isn't being made use of by any of the current generation of AV/RCP programs. 2. Extensive user customisation should be available so that each editor doing RCP doesn't end up chasing the same bad edits (as happens with Huggle now). I have to say that when I tried ORES last year I was unimpressed with its predictions, and ReviewStream appeared to be clogged up with uninteresting information, while omitting the stuff that would be really useful for AV work (I don't know if they have got better since). Hope this helps, sorry it's rather negative, happy to answer any questions, etc... Smalljim (talk) 00:34, 8 February 2017 (UTC)</nowiki></nowiki>

Excirial (talkcontribs)
(Copied over from the EnWiki Huggle feedback page)
Having read through the full proposal I cannot shake the feeling this proposal was written with the absolute best of intentions, but without full working knowledge of Huggle as a tool. To explain i'd like to highlight two main characteristics of Huggle:
  • Huggle is realtime only: Huggle connects each editor to the recent changes feed and generates a local list of edits that have to be checked without prioritization beyond listing edits from previously reverted editors first.
  • Huggle is for high-speed editing: On an average evening EnWiki had about 170 edits a minute with spikes to 220 and more edits a minute. By default Huggle filters whitelisted editors (long term editors, bots and so on) but there is still a very sizeable portion that has to be verified every minute. Huggle pre-caches all diffs before adding them to the queue so navigation between diffs is nearly instant. When reverting Huggle will handle the work in the background (Rollback + warn) while the user can review the next edit in the meantime.
Huggle as of such excels at a single job: It allows a user to single handedly monitor a wiki that is as large and busy as the English wikipedia. It is essentially a single queue that a user can navigate at high speed in order to cull any clear cut vandalism that got trough Cluebot NG or the edit filters. It is not well suited for slow patrol as it will not load old edits, and won't list any more edits if it reaches its queue capacity of 200 edits (Which on EnWiki is reached in minutes). Comparing this to the proposal as written I would conclude that the suggestions included wouldn't truly fit Huggles methodology of vandalism patrol - though I hasten to add I am but one user and other users may use Huggle in an entirely different fashion.
To explain: The proposal as written seems to focus on extending the Huggle interface so that it provides more contextual information about editors, chiefly based on the calculated ORES score / editor account age. Due to Huggle's nature i would personally find this information irrelevant: For me using Huggle equals looking at the displayed diff while using keyboard shortcuts to navigate the edit queue. I wouldn't take note of any added queue icons or the ORES score. The queue icons wouldn't be too useful as any edit in the queue would be reached in seconds: On average I spend about 2 seconds looking at a revision before concluding it is fine or clear-cut vandalism and moving to the next one. If an edit cannot be readily be identified as clear vandalism it shouldn't be dealt with trough Huggle in the first place. Using the same methodology i am not using the ORES score either: I am manually checking each edit so a machine evaluation is redundant.
The proposal as written would make a lot of sense in another vandalism patrol tool though: Stiki. While Huggle is real-time Stiki uses a server backend to log edits. Logged edits are forwarded for evaluation to editors who use Stiki. Since Stiki will happily serve weeks old edits it is much better suited to slow patrol and thus handling edits that may not be ideal yet aren't clear cases of malicious intent. I find that both tools complement each-other well: Huggle is perfect for first-line defence against clear cut vandalism while Stiki is better suited at dealing with the edge cases or cases where good faith editors may run into trouble. Since the goal of this project is new editor engagement i would argue that a slow patrol tool that doesn't have a time limit on its queue is the better option to implement this. Excirial (Contact me,Contribs) 22:58, 7 February 2017 (UTC)
Reply to "What do you think about the proposed Huggle improvements?"
There are no older topics