Some small comments and one big one:
- Were there any edits to the Barack Obama page between the two measurements of page HTML size? Did they add or remove bytes?
- Apart from the fact that the Ganglia graphs combine desktop and mobile traffic, there is also the fact that the majority(!) of our traffic on the image cache servers is not in service of Wikimedia page views but rather images which are hot-linked from other sites. [Edit: I should be more careful here; I am basing this on an informal remark, and have not verified it myself.]
It does not make too much sense to combine "upper and 95th percentile". "upper" represents the single highest value processed by statsd for the aggregation interval in question.
- I'm not sure what 17 days of analysis means. Was the analysis done over the course of seventeen days, or after seventeen days had passed?
- It's very confusing to have negative decreases. We call those increases :) Might be better to rename the column to "percent change".
- Using Graphite data as the basis for analysis is sketchy. To understand why, consider the journey that a single sample makes from the user's browser to your plate:
- The browser measures the duration of some operation and sends it to our servers.
- The data is received by statsd, a metric aggregation service. Statsd does not pass it onward. Instead, it accumulates a minute's worth of samples, and then it sends the average, p50, p75, p99, etc. of that minute's worth of data to Graphite. It does this continuously.
- Graphite is designed to make efficient use of disk space by making the resolution of data coarser as the data ages. To see what I mean, take a look at these two graphs:
1) https://graphite.wikimedia.org/S/BO
2) https://graphite.wikimedia.org/S/BN
Each of these graphs shows first paint on mobile for anons over the course of an hour. The first graph shows the past sixty minutes (relative to the time you load the graph, not the time I write these words). The second graph shows 1:00 AM to 2:00 AM on 1 March 2016.
Notice how the line in the first graph moves up and down whereas the line in the second graph is flat. That is because as the data has aged, Graphite has reduced resolution by averaging multiple datapoints into coarser and coarser intervals: first 5 minutes, then 15 minutes, then one hour.
As I understand it, the figures in the tables are average the data from 13 days prior to the change and 11 days after. That means that the figures you are reporting are averages of averages of averages of averages, and medians of averages of medians of averages. It also means that if you re-do this analysis a week from now, using the exact same scripts to analyze the same interval of time, you will end up with very slightly different results (since the data will have been rolled up into coarser aggregation intervals).
We tolerate the loss of precision at multiple layers of aggregation for a specific use-case -- to generate real-time graphs of operational data, so we can know which way the wind blows in production.
If you want to have descriptive statistics that you can define concisely and precisely, you can't start with Graphite. You have to start with the full set of individual samples. That is why we capture each sample in the EventLogging database.
I usually start by dumping the samples into a flat text file that I can download and analyze locally. I do that with short Python scripts, like this one, for example: https://gist.github.com/atdt/5e6d44fcb1c7795d52a8ace102f522e0
The shape of the data that you get by doing things this way ends up being very close to what you get from Graphite. It's close enough that I do sometime use Graphite graphs in presentations. But not numbers with single-millisecond precision. Starting with the full set of samples is the only way to end up with statistical measures that have standard, agreed-upon definitions, which helps people understand what you're talking about, and makes the analysis reproducible.
I hope this makes some sense. Thank you for all the hard work you have put into this so far. It is not easy.