Lazy loading of references on Russian Wikipedia
On the 1st September, lazy loading references was disabled on Russian Wikipedia after both images and references were enabled back in July. The beneficial Lazy loading of images continued to be enabled. Whereas previously all references views were routed via the API, now references would be served in the HTML.
The impact gives the impression that very few users need references in a page view. In a week period, after ending the experiment, an additional 338GB were shipped to Russian Wikipedia and there was only a 0.11GB decrease in bytes shipped via by the API.
In the worse case disabling the experiment increased page load by 2 seconds and first paint by 0.5 seconds.
What we noticed
[edit]Impact on performance
[edit]In progress [ToDo: Normalise sample size]
Fully load time, first paint and first interactive time were inspected before and after the experiment was disabled.
select * from NavigationTiming_15485142 where wiki = 'ruwiki' and event_mobileMode = 'stable' and event_action ='view' and timestamp > 20160822000000 and timestamp < 20160909000000
Label | Sample Size | 95th percentile | median |
---|---|---|---|
With lazy loaded references | 35582 | 14868.5 | 2692.0 |
Without lazy loaded references | 39506 | 16447.75 | 3067.0 |
With lazy loaded references (anons) | 35555 | 14899.3 | 2693.0 |
Without lazy loaded references (anons) | 39467 | 16450.0 | 3067.0 |
With lazy loaded references (http2) | 26872 | 11518.15 | 2359.0 |
Without lazy loaded references (http2) | 30016 | 13009.25 | 2697.0 |
With lazy loaded references (http1) | 8710 | 23583.05 | 4146.5 |
Without lazy loaded references (http1) | 9490 | 26780.95 | 4767.5 |
First paint
Label | Sample Size | 95th percentile | median |
---|---|---|---|
With lazy loaded references | 20217 | 6927.6 | 1427.0 |
Without lazy loaded references | 22440 | 7469.15 | 1502.0 |
DomInteractive
Label | Sample Size | 95th percentile | median |
---|---|---|---|
With lazy loaded references | 35582 | 7181.0 | 1081.0 |
Without lazy loaded references | 39506 | 7580.5 | 1121.0 |
Impact on bytes shipped
[edit]The following SQL query was made on all page views for the Russian mobile site:
use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path rlike '^/wiki/([^:])+$' and content_type rlike '^text/html' and agent_type = 'user' and http_status = '200' group by month, day;"
Month | Day | Total bytes shipped | Total bytes shipped (GB) |
8 | 23 | 195433081582 | 195.4330816 |
8 | 24 | 196452275069 | 196.4522751 |
8 | 25 | 194610708845 | 194.6107088 |
8 | 26 | 193316844255 | 193.3168443 |
8 | 27 | 204064575497 | 204.0645755 |
8 | 28 | 213554827616 | 213.5548276 |
8 | 29 | 193135615810 | 193.1356158 |
8 | 30 | 197245837879 | 197.2458379 |
8 | 31 | 184946052684 | 184.9460527 |
9 | 1 | 181258661113 | 181.2586611 |
9 | 2 | 218129458634 | 218.1294586 |
9 | 3 | 253029949643 | 253.0299496 |
9 | 4 | 271154670250 | 271.1546703 |
9 | 5 | 244388334543 | 244.3883345 |
9 | 6 | 247460404402 | 247.4604044 |
9 | 7 | 247948861959 | 247.948862 |
9 | 8 | 249034788519 | 249.0347885 |
week | bytes shipped (GB) |
24th-30th (With lazy loaded references) | 1392.380685 |
2nd-8th (Without lazy loaded references) | 1731.146468 |
bytes increase | 338.765783 |
We also had to consider the increased load on the API to retrieve references. The bytes shipped by the API before and after the change to the references api were considered using the following query:
for i in `seq 23 31`;
do
hive -e "use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path like '%api.php%' and uri_query like '%action=mobileview%sections=references%' and http_status = '200' group by month, day;" > ru-8-$i.tsv
done
for i in `seq 1 9`;
do
hive -e "use wmf; select month, day, sum(response_size) from webrequest where year = 2016 and month = 8 and day = $i and uri_host = 'ru.m.wikipedia.org' and uri_path like '%api.php%' and uri_query like '%action=mobileview%sections=references%' and http_status = '200' group by month, day;" > ru-9-$i.tsv
done
Month | Day | API: Total bytes shipped | MB | GB |
8 | 23 | 1015320157 | 1015.320157 | 1.015320157 |
8 | 24 | 1003732951 | 1003.732951 | 1.003732951 |
8 | 25 | 991807250 | 991.80725 | 0.99180725 |
8 | 26 | 998057985 | 998.057985 | 0.998057985 |
8 | 27 | 1042861398 | 1042.861398 | 1.042861398 |
8 | 28 | 1124165529 | 1124.165529 | 1.124165529 |
8 | 29 | 1014439791 | 1014.439791 | 1.014439791 |
8 | 30 | 1067402736 | 1067.402736 | 1.067402736 |
8 | 31 | 955704418 | 955.704418 | 0.955704418 |
9 | 1 | 1012481681 | 1012.481681 | 1.012481681 |
9 | 2 | 1008896513 | 1008.896513 | 1.008896513 |
9 | 3 | 984832141 | 984.832141 | 0.984832141 |
9 | 4 | 982142387 | 982.142387 | 0.982142387 |
9 | 5 | 954347415 | 954.347415 | 0.954347415 |
9 | 6 | 1046599717 | 1046.599717 | 1.046599717 |
9 | 7 | 1142749069 | 1142.749069 | 1.142749069 |
9 | 8 | 1011144808 | 1011.144808 | 1.011144808 |
week | bytes shipped (GB) |
24th-30th (With lazy loaded references) | 7.24246764 |
2nd-8th (Without lazy loaded references) | 7.13071205 |
bytes increase | -0.11175559 |
Questions and answers
[edit]Q: Can we attribute this change to lazy loaded references?
A: There were no known changes that rolled out during this period that we'd expect to cause such a large increase in bytes shipped by the HTML.
Similar to how we did the analysis for lazy loading images, it's quite possible the Russian Wikipedia project did a lot of editing that week to reduce the size of pages, but that would be a lot of editing for such a large amount of bytes!
One theory might be that during the experiment traffic contributed to this increase in bytes shipped, but looking at the graph you can see this is not the case.
A 338 gb additional shipped page HTML on mobile is quite significant and had that impacted all traffic on desktop as well that would have been noticed by ops, so it's safe to say that there were no core changes that may have caused this! We could analyse desktop traffic for the same period if we lack that confidence, but I feel it would be a lot of effort for little gain.