Mediawiki-utilities/List
Datasource
[edit]mwxml -- XML dump processing
[edit]This library contains a collection of utilities for efficiently processing MediaWikiâs XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.
- Complexity
- Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that youâre streaming XML. A mwxml.Dump contains a mwxml.SiteInfo and an iterator of mwxml.Pageâs. A mwxml.Page contains page metadata and an iterator of mwxml.Revisionâs. A mwxml.Revision contains revision metadata and text.
- Performance
- Performance is a serious concern when processing large database XML dumps. Regretfully, pythonâs Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides mwxml.map(), a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS
See also dumps.wikimedia.org, Special:Export, and Manual:DumpBackup.php.
mwapi -- API querying and session management
[edit]This library provides a set of basic utilities for interacting with MediaWikiâs âactionâ API â usually available at /w/api.php. The most salient feature of this library is the mwapi.Session class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get() and post().
- Authentication
- mwapi.Session provides convenient login() and logout() methods
mwdb -- Database connection and querying
[edit]pip install mwdb
 â˘Â source
This library provides a set of utilities for connecting to and querying a MediaWiki database.
Authentication & authorization
[edit]mwoauth -- OAuth connection handler for MediaWiki
[edit]This library provide a simple means to performing an OAuth handshake with a MediaWiki installation with the OAuth Extension installed.
Data processing
[edit]mwdiffs -- Revision diff processing
[edit]This library provides a set of utilities for generating information about the difference between revisions.
mwreverts -- Revert detection
[edit]This library provides a set of utilities for detecting reverts (see mwreverts.Detector and mwreverts.detect()) and identifying the reverted status of edits to a MediaWiki wiki.
See also m:R:Revert detection.
mwsessions -- Edit session processing
[edit]This library provides a set of utilities for group MediaWiki user actions into sessions. mwsessions.Sessionizer and mwsessions.sessionize() can be used by python scripts to group activities into sessions or the command line utilities can be used to operate directly on data files. Such methods have been used to measure editor labor hours[1].
See m:R:Activity session.
mwpersistence -- Content persistence processing
[edit]This library provides a set of utilities for measuring content persistence and tracking authorship in MediaWiki revisions.
See also m:R:Content persistence.
mwparserfromhell -- Easy-to-use parser for wikitext
[edit]This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.
Basic utilities
[edit]mwtypes -- A basic type system for MediaWiki data
[edit]This library provides a set of standardized types to be used when processing MediaWiki data. All of the types in this package make use of jsonable and therefore can be trivially serialized as JSON documents.
mwcli -- Utilities for unix command-line data processing
[edit]pip install mwcli
 â˘Â source
Incubator
[edit]These libraries are experimental and may change dramatically or be discontinued.
mwmetrics -- A collection of statistics and measurements for MediaWiki
[edit]mwevents -- A generalized event extraction and processing framework
[edit]pip install mwevents
 â˘Â source
- â Using Edit Session to Measure Participation in Wikipedia R. Stuart Geiger & Aaron Halfaker. (2013). CSCW (pp. 861-870) DOI:10.1145/2441776.2441873.