Moderator Tools/Automoderator

Group:	Moderator Tools
Team members:	Jason Sherman (software engineer), Susana Cardenas Molinar (software engineer), Katy Graessle (software engineer), Dennis Mburugu (engineering manager), Olga Tichonova (designer), Krishna Chaitanya Velaga (analyst)
Backlog:	#Automoderator
Lead:	Sam Walton (product manager)

Automoderator is an automated anti-vandalism tool developed by the Moderator Tools team. It allows administrators to enable and configure automated reversion of bad edits based on scoring from a machine learning model. Automoderator behaves similarly to anti-vandalism bots such as ClueBot NG, SeroBOT, Dexbot and Salebot, but making it available to all language communities. Please see Extension:AutoModerator for technical details on the AutoModerator extension.

Communities can now request for Automoderator to be deployed on their Wikipedia.

Updates

October 2024 - A dashboard is now available to track metrics about Automoderator's behaviour on the projects on which it is deployed.
October 2024 - We have made new datasets available to test the Multilingual Revert Risk model. We anticipate that this model should have better performance than the Language Agnostic model which is currently in use, but need your feedback to make an informed decision.
September 2024 - Indonesian and Ukrainian Wikipedias start using Automoderator (Automoderator; Автомодератор).
June 2024 - Turkish Wikipedia starts using Automoderator (Otomoderatör).
February 2024 - Designs have been posted for the initial version of the landing and configuration pages. Thoughts and suggestions welcome!
February 2024 - We have posted initial results from our testing process.
October 2023 - We are looking for input and feedback on our measurement plan, to decide what data we should use to evaluate the success of this project, and have made testing data available to collect input on Automoderator's decision-making.
August 2023 - We recently presented this project, and other moderator-focused projects, at Wikimania. You can find the session recording here.

Motivation

Wikimania presentation (13:50)

A substantial number of edits are made to Wikimedia projects which should unambiguously be undone, reverting a page back to its previous state. Patrollers and administrators have to spend a lot of time manually reviewing and reverting these edits, which contributes to a feeling on many larger wikis that there is an overwhelming amount of work requiring attention compared to the number of active moderators. We would like to reduce these burdens, freeing up moderator time to work on other tasks.

Indonesian Wikipedia community call (11:50)

Many online community websites, including Reddit, Twitch, and Discord, provide 'automoderation' functionality, whereby community moderators can set up a mix of specific and algorithmic automated moderation actions. On Wikipedia, AbuseFilter provides specific, rules-based, functionality, but can be frustrating when moderators have to, for example, painstakingly define a regular expression for every spelling variation of a swear word. It is also complicated and easy to break, causing many communities to avoid using it. At least a dozen communities have anti-vandalism bots, but these are community maintained, requiring local technical expertise and usually having opaque configurations. These bots are also largely based on the ORES damaging model, which has not been trained in a long time and has limited language support.

Goals

Reduce moderation backlogs by preventing bad edits from entering patroller queues.
Give moderators confidence that automoderation is reliable and is not producing significant false positives.
Ensure that editors caught in a false positive have clear avenues to flag the error / have their edit reinstated.

Design research

To learn about the research and design process we went through to define Automoderator's behaviour and interfaces, see /Design .

Model

Automoderator uses the 'revert risk' machine learning models developed by the Wikimedia Foundation Research team. There are two versions of this model:

A multilingual model, with support for 47 languages.
A language-agnostic model. This is the model which Automoderator currently uses, while we test the Multilingual model to better understand its performance.

These models can calculate a score for every revision denoting the likelihood that the edit should be reverted. Each community can set their own threshold for this score, above which edits are reverted (see below).

The models currently only support Wikipedia, but could be trained on other Wikimedia projects in the future. Additionally they are currently only trained on the main (article) namespace. We would like to investigate re-training the model on an ongoing basis as false positives are reported by the community. (T337501)

Before we moved forward with this project we provided opportunities for testing out the language-agnostic model against recent edits, so that patrollers could understand how accurate the model is and whether they felt confident using it in the way we proposed. The details and results of this test can be found at Moderator Tools/Automoderator/Testing .

We are also testing the Multilingual model to understand if it is preferable to use it instead of the Language Agnostic model. See Moderator Tools/Automoderator/Multilingual testing to help us review the model's scores.

How it works

To request that Automoderator be deployed on your Wikimedia project, please see Extension:AutoModerator/Deploying .

Automoderator scores every main namespace edit on a Wikimedia project, fetches a score for that edit based on how likely it is to be reverted, and reverts any edits which score above a threshold which can be configured by local administrators. The revert is carried out by a system account, so it looks and behaves like other accounts - it has a Contributions page, User page, shows up in page histories, etc.

To reduce false positives and other undesirable behaviour, Automoderator will never revert the following kinds of edits:

An editor reverting one of their own edits
Reverts of one of Automoderator's actions
Those made by administrators or bots
New page creations

Configuration

Automoderator is configured via a Community Configuration form located at Special:CommunityConfiguration/AutoModerator, which edits the page MediaWiki:AutoModeratorConfig.json (the latter can be watchlisted so that updates show up in your Watchlist). After deployment, Automoderator will not begin running until a local administrator turns it on via the configuration page. In addition to turning Automoderator on or off, there are a range of configurations which can be customised to fit your community's needs, including the revert threshold, minor and bot edit flags, and whether Automoderator sends a talk page message after reverting (see below).

Certain configuration, such as Automoderator's username, can only be performed by MediaWiki developers. To request such a change, or to request other kinds of customisation, please file a task on Phabricator.

Localisation of Automoderator should primarily be carried out via TranslateWiki, but local overrides can also be made by editing the relevant System message (Automoderator's strings all begin with automoderator-).

Caution levels

One of the most important configurations to set is the 'Caution level' or 'threshold' - this determines the trade-off Automoderator will make between coverage (how many bad edits are reverted) and accuracy (how frequently it will make mistakes). The higher the caution level, the fewer edits will be reverted, but the higher the accuracy; the lower the caution level, the more edit will be reverted, but the lower the accuracy. We recommend starting at a high caution level and gradually decreasing over time as your community becomes comfortable with how Automoderator is behaving.

Talk page message

To ensure that reverted editors who were making a good faith change are well equipped to understand why they were reverted, and to report false positives, Automoderator has an optional feature to send every reverted user a talk page message. This message can be translated in TranslateWiki and customised locally via the Automoderator-wiki-revert-message system message. The default (English) text reads as follows:

Hello! I am AutoModerator, an automated system which uses a machine learning model to identify and revert potentially bad edits to ensure Wikipedia remains reliable and trustworthy. Unfortunately, I reverted one of your recent edits to Article title.
Because the model I use is not perfect, it sometimes reverts good edits. If you believe the change you made was constructive, please report it here.

Learn more about my software.

To learn more about editing visit your Newcomer Homepage. --Automoderator (talk) 01:23, 1 January 2024 (UTC)

If the same user receives another revert soon after the first, they are sent a shorter message under the same section heading. Default (English) text:

I also reverted one of your recent edits to Article title because it seemed unconstructive. Automoderator (talk) 01:23, 1 January 2024 (UTC)

False positive reporting

Automoderator's 'report false positive' link.

Because no machine learning model is perfect, Automoderator will sometimes accidentally revert good edits. When this happens we want to reduce friction for the user who was reverted, and give them clear next steps. As such, an important step in configuring Automoderator is creating a false positive reporting page. This is a normal wiki page, which will be linked to by Automoderator in the talk page message, and in page histories and user contributions, as an additional possible action for an edit, alongside Undo and/or Thank.

Metrics

You can track data about how Automoderator is behaving on Wikimedia projects at the Activity Dashboard.

For data on the expected number of reverts that Automoderator would make per day on your project, see the testing subpage . Similar data for the multilingual model (not currently in use) can be found at /Multilingual testing .

Usage

Automoderator is currently deployed on the following Wikimedia projects:

Project	Deployment request	Username	Configuration	Dashboard
Indonesian Wikipedia	T365792	Automoderator	CommunityConfiguration	Dashboard
Turkish Wikipedia	T362622	Otomoderatör	CommunityConfiguration	Dashboard
Ukrainian Wikipedia	T373823	Автомодератор	CommunityConfiguration	Dashboard
Vietnamese Wikipedia	T378343	Kiểm tra tự động	CommunityConfiguration
Afrikaans Wikipedia	T376597	OutoModerator	CommunityConfiguration	Dashboard