Analytics/Data Quality/status
Last update on: 2014-03-monthly
2013-09-monthly
[edit]- The Analytics team discovered that we were incorrectly tagging traffic as Wikipedia Zero and this lead to overcounting W0 traffic by approx. 10%. A joint effort by the Wikipedia Zero, Ops and Analytics team uncovered the root cause and a fix was applied.
2013-10-monthly
[edit]We created dashboards for several Wikipedia Zero partners (Orange Madagascar, Bangalink, Umniah Jordan), and identified and fixed Wikipedia Zero data issues in collaboration with the Zero team.
2013-11-monthly
[edit]We identified issues with over-counting page views, and deployed a fix in November. Data from July onward were restated.
2013-12-monthly
[edit]The team continues to spend a large amount of time on data quality. The primary effort in December was in isolating and fixing an error in WikiStats that inflated page views from July to December by a significant amount. The error was patched in early December and the statistics were recalculated. There were also issues with Wikipedia Zero traffic and an outage caused by a single point of failure in the legacy infrastructure.
2014-01-monthly
[edit]The team has spent an intense month analyzing data to explain the page view issues identified in December. The team's report was shared at the February metrics meeting.
2014-02-monthly
[edit]We've fixed the following production issues:
- Resolved on No sampled-1000 tsv file for 2014-02-06 on stat1002;
- Wikipedia Zero team investigated ~30% increase of number of lines zero tsvs between 20140218 and 20140220 file;
- Wikipedia Zero team investigated on light drop in zero requests around 2014-02-08;
- Data for ULSFO Cache performance prepared for Ops blog post.
2014-03-monthly
[edit]We fixed a number of issues around data quality in Wikistats, Wikipedia Zero and Wikimetrics.