Jump to content

ORES/New model checklist

From mediawiki.org
Warning Warning: The ORES infrastructure is being deprecated by the Machine Learning team, please check wikitech:ORES for more info.

Overview

[edit]

This is a guide to training new ORES models and enabling them on wikis.

This guide is intended for members of the Scoring Platform team.

Learn more about ORES and machine learning as a service.

See historical examples of adding new models: task T130213

Step 1: Determine which models need to be built

[edit]

Each model can target one set of classifications. The current norm is to begin with damaging and goodfaith models for a single language wiki database.

Note: Obtaining labeled training observations can require a large investment of volunteer time. This can be a limiting factor when making decisions about which models to build.

Step 2: Create tasks in Phabricator

[edit]

Create the following set of tasks in for Scoring Platform in Phabricator

  • Optional parent epic
  • Task to gather labeled test data.
  • Task to engineer language features
  • Task to collect word lists.
  • Task to train, test, and tune the model (blocked by test data).
  • Task to deploy the model (blocked by training).

Step 3: Compile the test dataset.

[edit]
  • For edit quality, run a WikiLabels campaign on the target wiki.
  • File a Phabricator ticket to set up the campaign.
  • Announce the campaign on-wiki.
  • Run the campaign until it is complete. Update the community regularly during the campaign. Typically, around 5,000 articles will need to be assessed.
  • For an article quality model, we'll need to collect the set of all articles already given evaluations, grouped by quality classification.
  • For some models, test data may be extracted by using database queries.

Step 4: Create badwords and informal words lists

[edit]

Badword lists (AKA BDWS lists) and informal words lists have already been generated for a number of languages. An appropriate BDWS list for the new model may be found in the existing revision scoring word lists.

Lean more about how to sort BDWDS-generated words lists.

Step 5: Add the new model to configuration files and Makefiles

[edit]

Step 6: Train the new model

[edit]
  • (p3)ladsgroup@ores-compute-01:~/editquality$ make models/frwiki.damaging.gradient_boosting.model

Step 7: Test, cross-validate, review model health

[edit]

Step 8: Add and commit the model

[edit]

If the model requires the installation of a new language dictionary, add the dictionary to the ORES base config in puppet.

Step 9: Deploy the new model

[edit]

The final step is to deploy the new model.

There are a number of spaces where new models can be deployed. Refer to the following chart to determine the appropriate space.

Production-ish Experimental
Production Beta WMFLabs Staging
URL ores.wikimedia.org ores-beta.wmflabs.org ores.wikimedia.org ores-staging.wmflabs.org
Suggested use Serving real-time requests Making sure we don't break production Experimenting with new code and running analyses Making sure we don't break the experimental install
Stability Stable Not stable Mostly Stable Not stable
Performance Fast Extremely limited Pretty fast Extremely limited
Code version Stable code Experimental code Experimental code Experimental code
Parallel Requests Up to 2 parallel requests per second Up to 1 request per second Up to 4 parallel requests per second Up to 1 request per second

How to deploy server and MediaWiki components of ORES on a new wiki

[edit]
  1. Follow all of the steps on the beta cluster.
  2. Enable $wmgUseORES for the wiki.
  3. Configure wiki thresholds to reasonable defaults (how do we guess these?) in $wmgOresThresholds.
  4. Request new DB tables.

See Also

[edit]

For additional information contact the Scoring Platform team.