Manual:Stats
The Stats library (or StatsLib) in MediaWiki core lets any core component or extension generate operational metrics from PHP code. It was introduced in MediaWiki 1.41 and supersedes the older StatsdDataFactory service, which used the liuggio/statsd-php-client
library (RFC T249164, and T240685). The main difference is native support for key-value labels, and to add support for Prometheus (via DogStatsD) in addition to StatsD.
Global Configuration
[edit]The global configuration is in MainConfigSchema.php
and can be overridden in LocalSettings.php
.
$wgStatsFormat
is the output format all metrics will be rendered to.- Default
null
which disables metrics rendering. - Supported options are
statsd
,dogstatsd
,null
. To use MediaWiki with Prometheus, use thedogstatsd
format and point it to an instance of prometheus/statsd_exporter. - See
Wikimedia\Stats\OutputFormats::SUPPORTED_FORMATS
for an up-to-date list of options.
- Default
$wgStatsTarget
is the URI the metrics will be forwarded to. E.g.udp://127.0.0.1:8125
- Default
null
which disables sending metrics.
- Default
$wgStatsPrefix
is the prefix which will be applied to all generated metrics. Required. DefaultMediaWiki
.
Metric Types
[edit]The supported metric types are:
- CounterMetric
- An only-ever-incrementing counter.
- Great for tracking rates.
- Implements
increment()
andincrementBy($number)
- GaugeMetric
- A settable value.
- Implements
set($number)
- TimingMetric
- Observes timing data.
- When the backend is configured to do so, histograms of timing metrics are generated.
- Implements
observe($number)
,start()
, andstop()
- Can call
start()
many times; call tostop()
can throw ifstart()
not called first.
Requesting a Metric
[edit]Use a getter to get a metric from the StatsFactory
.
getCounter($name)
getGauge($name)
getTiming($name)
A Simple Example
[edit]Let's create a counter for tracking each time a function is called:
// Get the StatsFactory from MediaWikiServices
$statsFactory = MediaWikiServices::getInstance()->getStatsFactory();
// Get a CounterMetric and increment the counter
$statsFactory->getCounter( 'example_total' )
->setLabel( 'action', 'my_function' )
->increment();
// StatsD output: "mediawiki.example_total.my_function:1|c"
// DogStatsD output: "mediawiki_example_total:1|c|#action:my_function"
StatsD Metric Namespace
[edit]Stats lib produces predictable metric naming, following this pattern:
// StatsD/Graphite
$output = implode( '.', [ $wgStatsPrefix, ...$components, $name, ...$labelValues ] );
// DogStatsD/Prometheus
$output = implode( '_', [ $wgStatsPrefix, ...$components, $name ] );
For example:
// assuming $wgStatsPrefix = 'mediawiki'
$statsFactory = MediaWikiServices::getInstance()->getStatsFactory();
$statsFactory->getCounter( 'example_total' )
->setLabel( 'action', 'my_function' )
->setLabel( 'namespace', 'User' )
->increment();
// StatsD: "mediawiki.example_total.my_function.User:1|c"
// DogStatsD: "mediawiki_example_total:1|c|#action:my_function,namespace:User"
// Prometheus: mediawiki_example_total{action="my_function", namespace="User"}
Features
[edit]StatsFactory->withComponent()
[edit]Returns a new StatsFactory instance with a component field appended to the globally-configured prefix. Intended for core components and extensions to ensure all metrics from that code share the same prefix. This trades greppability of the full metric name, for a deduplication:
// assuming $wgStatsPrefix = 'mediawiki'
$exampleStats = $statsFactory->withComponent( 'example' );
$exampleStats->getCounter( 'foo_total' )
->increment();
$exampleStats->getCounter( 'bar_total' )
->increment();
// DogStatsD: "mediawiki_example_foo_total:1|c"
// DogStatsD: "mediawiki_example_bar_total:1|c"
This is equivalent to:
// assuming $wgStatsPrefix = 'mediawiki'
$statsFactory->getCounter( 'example_foo_total' )
->increment();
$statsFactory->getCounter( 'example_bar_total' )
->increment();
// DogStatsD: "mediawiki_example_foo_total:1|c"
// DogStatsD: "mediawiki_example_bar_total:1|c"
MetricInterface->copyToStatsdAt()
[edit]When StatsFactory is given an StatsdDataFactory service, metrics will be copied to the legacy Statsd service. This provides continuity for old metrics, and lets you build up a few weeks of data in your new time series before transitioning dashboards and alerts to new StatsLib metrics. For example:
// assuming:
// $wgStatsdServer = 'old_statsd_server:8125'
// $wgStatsdMetricPrefix = 'MediaWiki'
//
// $wgStatsFormat = 'dogstatsd'
// $wgStatsPrefix = 'mediawiki'
// $wgStatsTarget = 'udp://new_statsd_server:8125
$action = 'my_function';
$namespace = 'User';
$statsFactory->getCounter( 'example_total' )
->setLabel( 'action', $action )
->setLabel( 'namespace', $namespace )
->copyToStatsdAt( "Example.$action.$namespace" )
->increment();
// dogstatsd/Prometheus: mediawiki_example_total:1|c|#action:my_function,namespace:User
// statsd/Graphite: MediaWiki.Example.my_function.User:1|c
With the legacy interface, it was not uncommon to emit the same logical increment multiple times under different prefixes to help with querying in Graphite. For example:
$statsdFactory->increment( 'Example_calls_all' );
$statsdFactory->increment( "Example_calls.$action.$namespace" );
This is not needed in general when using Prometheus as it allows queries like sum(mediawiki_example_total)
and you control which and whether to select certain labels or not. These can be emitted for compatiblity by passing an array to copyToStatsdAt
:
$action = 'my_function';
$namespace = 'User';
$statsFactory->getCounter( 'example_total' )
->setLabel( 'action', $action )
->setLabel( 'namespace', $namespace )
->copyToStatsdAt( [
'Example_calls_all',
"Example_calls.$action.$namespace"
] )
->increment();
$statsdFactory->increment( 'Example_calls_all' );
$statsdFactory->increment( "Example_calls.$action.$namespace" );
// dogstatsd/Prometheus: mediawiki_example_total:1|c|#action:my_function,namespace:User
// statsd/Graphite: MediaWiki.Example_calls_all:1|c
// MediaWiki.Example_calls.my_function.User:1|c
MetricInterface->setSampleRate()
[edit]Configures the metric to emit a subset of samples recorded. Takes a float: 0.0 (sends 0% of samples) to 1.0 (sends 100% of samples). Note: sample rate must be configured prior to recording any samples otherwise an IllegalOperationException will be thrown. This can be encountered inadvertently because metrics pulled from cache may have samples already recorded.
Notes
[edit]Cardinality
[edit]High cardinality metrics present challenges for service operators and consumers of timeseries data. It is recommended to avoid using unbound values in labels or names. Cardinality is Key
Examples of high-cardinality data to avoid:
- IDs and UUIDs
- Usernames
- IP Addresses
- User agent strings
- Page titles
Recommendations
[edit]Labels
[edit]For StatsD output, declaration of label order matters. Take care to declare labels in the order you would like them to appear.
Metrics
[edit]The WMF SRE Observability Team recommends following the Prometheus Metric and label naming guidance from the upstream Prometheus project. TL;DR:
- Metrics should not include variable strings in the metric name.
- Metrics should not include a label name also in the metric name.
- Metrics must use the same unit across all measurements over time (i.e. do not mix or switch from seconds to milliseconds, or from counter to timing on the same metric name).
- Metrics should represent the same logical thing across all label dimensions (i.e. metric
foo{something=A}
andfoo{something=B}
should be combinable). If you need to store different types of data, use a separate metric name. This ensures thatsum()
oravg()
give meaningful and stable output, even when new labels are introduced in the future. - Metrics should use base units when naming metrics (e.g. seconds and bytes, not milliseconds or kilobytes)
- Metrics should have a suffix describing the base unit in plural form. (i.e.
_total
counter,_seconds
timing, etc.) - Metrics must exist with the same consistent label keys across all measurements. Labels may not conditionally exist. Use a neutral value like "none" or "unknown" if needed.
Troubleshooting
[edit]Nested timers
[edit]Use of TimingMetric->start()
and TimingMetric->start()
helpers are not supported in case of nesting or recursion (T368073), where the same timer would restart when it was already started. If this is unavoidable, measure the time separately and use observe()
when done:
$startTime = microtime( true );
# <do work>
$statsFactory->getTiming('foo')
->observe( ( microtime( true ) - $startTime ) * 1000 ); # expects ms
mw.track() and Statsv
[edit]Refer to:
- ResourceLoader/Core modules#mw.track
- Statsv on Wikitech.
PerDbNameStatsdDataFactory
[edit]This shortcut for the StatsdDataFactory was not re-implemented in StatsFactory. The recommended replacement is to explicitly set a "wiki" label on specific metrics via setLabel()
, using the value of WikiMap::getCurrentWikiId
. See also T359436.
Developers
[edit]Local Testing in Docker
[edit]Add statsd-exporter service to docker-compose.override.yml
:
services:
statsd-exporter:
ports:
- "9112:9112"
image: docker.io/prom/statsd-exporter:v0.22.2
command: "--web.listen-address=:9112"
Configure LocalSettings.php
:
# statsd-exporter target config
$wgStatsFormat = 'dogstatsd';
$wgStatsTarget = 'udp://statsd-exporter:9125';
Scrape the metrics endpoint:
$ watch 'curl -s localhost:9112/metrics| grep mediawiki_'
Note that the production statsd-exporter configuration may differ from the default set and make the metrics render differently, especially timing metrics. Please refer to the production statsd-exporter configuration to get the most accurate Prometheus metrics representation.