Analytics/Wikistats/Database API
Database API | |
---|---|
Component | General |
Creation date | |
Author(s) | Erik Zachte, Diederik van Liere |
Document status | draft withdrawn |
Reportcard API
[edit]This document outlines the new API for the Reportcard. Please leave your thoughts at the Talk page. This proposal is incomplete.
Parameter | Description | Syntax |
---|---|---|
The MediaWiki API is very rich and the Reportcard API augments this richness. Every call to the Reportcard API should always start with action=analytics followed by a question mark to append new parameters. | action=analytics | |
theme | The different metrics are grouped in 8 themes:
|
theme=readers |
metric | The metric indicates for which measure you want to fetch data.
Open Question:How do we expose the available metrics? |
metric= |
country_code | We use the ISO 3166-1 country codes. See http://www.iso.org/iso/english_country_names_and_code_elements. About ISO 3166-1 Standard. | country_code=UK |
project | The project for which you want to query the data. The default choice is Wikipedia (wp). Valid choices are:
*wb = Wikibooks *wk = Wiktionary *wn = Wikinews *wp = Wikipedia *wq = Wikiquote *ws = Wikisource *wv = Wikiversity *co = Commons *wx = Other projects |
project=wb |
language_code | Specify the language of the project that you want to query. By default no language is specified and so you will retrieve data for all the languages for a specific project. | language_code=FR |
from | Is a yyyy-mm-dd string that indicates the start of the timeframe. Data returned will be inclusive of this date. Currently, there is no support for HH:MM:SS. | to=2012-01-01 |
to | Is a yyyy-mm-dd string that indicates the ending of the timeframe. Data returned will be inclusive of this date. Currently, there is no support for HH:MM:SS. | to=2012-06-01 |
format | Currently supported output formats are: JSON. This is the default format and does not need to be explicitly mentioned. | format=json |
meta | Fetch information about relevant actions for the Reportcard API. This action cannot be combined with any of the other parameters. | meta=list_metrics, meta=list_geographies |
Example JSON output
[edit]Please fill in.
Old API call for data analysis metrics
[edit]- concept documentation for new API call
* action=analytics * Collect data from the analytics database. Parameters metric - Type of data to collect. About metric names: these include source of data, to allow for alternate sources of similar metrics, which likely are defined differently or have other intrinsic issues (e.g. precision/reliability). One value: comscoreuniquevisitors definition: Unique persons that visited one of the Wikimedia wikis at least once in a certain month filters: selectregions, selectcountries [implementation: table comscore, field visitors] comscorereachpercentage definition: Percentage of total unique visitors to any web property which also visited a Wikimedia wiki filters: selectregions, selectcountries [implementation: table comscore, field reach] squidpageviews definition: Total articles (htm component) requested from nearly all Wikimedia wikis (exceptions are mostly special purpose wikis, e.g. wikimania wikis) Totals are based on the archived 1:1000 sampled squid logs. filters: selectregions, selectcountries, selectwebproperties, selectprojects, selectwikis, selectplatform [implementation: table page_views, field views_non_mobile_raw,views_mobile_raw,views_non_mobile_normalized,views_mobile_normalized depending on normalized and select_platform] dumparticlecount definition: All namespace 0 pages which contain an internal link minus redirect pages (for some projects extra namespaces qualify) filters: selectprojects, selectwikis [implementation: table comscore, field reach] dumpbinarycount definition: All binary files (nearly all of which are multimedia files) available for download/article inclusion on a wiki filters: selectprojects, selectwikis [implementation: table , field ] dumpedits definition: All edits on articles (as defined by dumparticlecount) filters: selectprojects, selectwikis [implementation: table wikistats, field edits] dumpnewregisterededitors definition: All registered editors that in a certain month for the first time crossed the threshold of 10 edits since signing up filters: selectprojects, selectwikis [implementation: table wikistats, field editors_new] dumpactiveeditors5 definition: All registered editors that made 5 or more edits in a certain month filters: selectprojects, selectwikis [implementation: table wikistats, field editors_ge_5] dumpactiveeditors100 definition: All registered editors that made 100 or more edits in a certain month filters: selectprojects, selectwikis [implementation: table wikistats, field editors_ge_100] estimatereadersoffline definition: People who access Wikipedia through an offline reader [implementation: table offline, field readers] other metrics which are likely to follow at some stage (for now included for brainstorm purposes only) squidpageedits worldbankpopulationpercountry worldbankinternetuserspercountry Parameter is always required startmonth - First month to include in time series, or single date month to include One value: single month as yyyy-mm-dd Parameter is always required endmonth - Last month to include in time series One value: single month as yyyy-mm-dd select... - Return data per month per qualifying row of data Specify per select parameters the criteria in any of four ways (only cB and cC can be combined): cA: * for all known values, e.g. selectregions=* cB: one or more codes separated by pipe. e.g. selectregions=NA|SA cC: one or more codes separated by plus sign, which returns required data totalled for all specified codes, e.g. selectregions=NA+SA cD: highest n (number) occurences, using values for most recent selected month for ranking, e.g. selectcountries=top:12 Available select.. parameters: selectregions cA cB cC for valid region codes see here selectcountries cB cC cD for valid country codes see here selectwebproperties cC cD This parameter requires extra authorisation Example: selectwebproperties=top:10 selectprojects cC for valid project codes see here selectwikis cC specify each wiki code as project:language, e.g. wp:en for English Wikipedia, wq:de for German Wikiquote Example: selectwikis=wp:en|wp:de selecteditors cB cC A for anonymous user, R for registered user, B for bot Example: selecteditors=R|A|R+A|B selectedits cB cC M for manual, B for bot-induced Example: selectedits=M|B selectplatform cB cC (only squidpageviews) M for mobile N for non-mobile (anyone knows a better term?) Example: selectplatform=M|N|M+N normalized - Y or N Only applies to squidpageviews, where data for each month are recalculated to 30 days (other metrics may follow) Default: N (WMF Report Card will use normalized time series when available) data - One or more type of data to be returned, separated by comma Values: timeseries returns ordered list of value pairs, on efor each month within range timeseriesindexed like timeseries, but each month's value will be relative to oldest month's value which is always 100 percentagegrowthlastmonth percentagegrowthlastyear, percentagegrowthfullperiod growth percentages are relative to oldest value (80->100=25%) although trivial, requesting these metrics through API ensures all clients use same calculation Default: timeseries reportlanguage - Language code, used to expand region and country codes into region and country name Default: en Supported: en format - (csv,json,... see elsewhere) . Examples: api.php?action=analytics&months=2008-03:2011-03&metric=squidpageviews&selectcountries=US|UK&selectmobile=M|N&normalized&data=timeseries|percentagegrowthlastmonth|percentagegrowthlastyear&format=xml returns four sets of metrics (time series plus two percentages) one for United States/mobile, one for United States/non-mobile, one for United Kingdom/mobile, one for United Kingdom/non-mobile
Further details on value ranges
[edit]Filter select_regions
[edit]For comscore_... filters:
AS = Asia Pacific C = China EU = Europe I = India LA = Latin-America MA = Middle-East/Africa NA = North-America US = United States W = World
Filter select_countries
[edit]Valid country codes are ISO 3166-1
Filter select_projects
[edit]wb = Wikibooks wk = Wiktionary wn = Wikinews wp = Wikipedia wq = Wikiquote ws = Wikisource wv = Wikiversity co = Commons wx = Other projects
Filter select_wikis
[edit]Specify as project:language For valid project codes see select_projects above For full lists of valid language codes per project see Wikimedia projects The following overview of exported wiki databases (aka dumps) can also be useful: lists of Wikimedia dumps
Return value
[edit]The return value will have the outermost group be the name of the metric, along with the API call used to generate this object, the start date of any time series returned, along with the granularity. The objects inside the main metric will have all the appropriate filters that were used to obtain the data, along with any constraints on the data.
{
"comscore_views": //metric name
[
{
"country_code" : "us", //various filters that apply to this data to interpret it properly
"language_code": "en",
"language_name": "English",
"normalized": "false",
"modality": "indexed",
"data_type": "time_series", //the data type
"data": [14,16,72,9034], //the data itself
"comments": "COMMENT STRING" //any additional comments
},
{
"country_code": "uk", "language_code":"en", ......
},
{
"country_code": "fr", "language_code":"fr", ......
}
],
"generated_by":"APICALL", //api call which generated this data
"time_start": "20110213000000", //start date of all the data in this object in MW timestamp format
"granularity": "2592000", //granularity of this data in seconds
"report_language" : "en"
}