Jump to content

Moderator Tools/Automoderator/Unified Activity Dashboard

From mediawiki.org

Unified Automoderator Activity Dashboard

[edit]

The Unified Automoderator Activity Dashboard is intended to serve both the Moderator Tools team, the community, and other interested folks, to monitor and evaluate key metrics related to Automoderator's activity. The dashboard tracks activity across all the wikis where Automoderator is deployed on.

You can view the dashboard by logging into the public Superset instance via your Wikimedia global account at
https://superset.wmcloud.org/superset/dashboard/unified-automoderator-activity-dashboard/

The metrics on the dashboard are categorized into three tabs:

  • Monitoring metrics: These are updated daily, to enable operational monitoring of Automoderator's activity.
  • Key metrics: These are updated monthly and are more reliable for evaluation and reporting purposes.
  • Configuration: This is updated weekly and provides an overview of current deployment status and configuration by wiki.

Monitoring vs. key metrics

[edit]

Monitoring metrics are a subset of key metrics that are updated daily, while key metrics are updated monthly. The data end of the key metrics is the last day of the previous month. The monitoring metrics pipelines rely on MariaDB-replicas and run daily. To accurately calculate number of revisions reverted by a single revert action, especially in cases where multiple edits are reverted using rollback, we need to match based on the content-hash (rev_sha1) in the revision table. This is complex to implement and computationally expensive to run on a daily basis on multiple wikis, and essentially re-inventing the wheel of mediawiki_history. Many fields (especially those related to reverts) required for these metrics are already being calculated by the MediaWiki history pipeline using the notion of revision identity-revert. MediaWiki history snapshots are released on a monthly basis.

Waiting an entire month to track Automoderator's activity, take any action if needed, and then wait another month for the results won't fit our needs. So, we decided to have a subset of key metrics that we track daily. The goal is to keep the daily metrics pipelines lightweight. This is prone to a degree of error (that is acceptable for our use case)—primarily, we may be under counting if multiple edits are reverted by a single revert (rollback). So the goal of monitoring metrics is for operational monitoring only (for example, ensuring there is no major issue), while key metrics, updated monthly based on MediaWiki history, are more accurate for reporting and evaluation.

Data pipelines

[edit]
Data pipelines overview
Major assets

Configuration

[edit]

The source truth for Automoderator deployment status as well as the localised username, which is needed fetch the edits made is InitialiseSettings.php, a PHP array. wmgUseAutoModerator for wikis where Automoderator is currently deployed on, and wgAutoModeratorUsername for localized version of the username.

Pipeline (frequency - weekly)

Small wiki classification

[edit]

Automoderator may be enabled on small wikis by trusted global admins (T372280). The dashboard will need a filter to view activity of Automoderator on all such wikis. Although the dashboard will have a filter to select specific wikis, selecting all of them individually will be tedious. The small wikis filter is a quick access filter to select all small wikis.

Pipeline (frequency - ad hoc)

As the data is unlikely to change regularly, this will be updated on ad-hoc basis.

Daily monitoring snapshots

[edit]

The daily monitoring snapshots are intended to help with operational monitoring of Automoderator's activity. The daily snapshots are based on data from MariaDB-replicas, use the following tables: revision, page, change_tag, change_tag_def, actor and user.

Pipeline (frequency - daily)

Monthly activity snapshots

[edit]

The monthly activity snapshots are intended to provide a comprehensive overview of Automoderator's activity across wikis. These snapshots are based the data from wmf.mediawiki_history.

Pipeline (frequency - monthly)

Potential vandalism reverted

[edit]

The one of the main goals of the Automoderator project is to see if it will reduce the workload with patrolling new edits. The aggregate metrics are calculated using wmf.mediawiki_history.

Pipeline (frequency - monthly)

Airflow DAG - automoderator_potential_vandalism_reverted_monthly_dag.py, uses:

Toolforge

[edit]

From the datasets published, jobs based on Toolforge jobs framework update the corresponding tables in Automoderator metrics database (s56213__unified_automod_metrics_p) in ToolsDB, which can be accessed the by the public Superset instance.