@ABaso (WMF) is there a reason you did not sample from Before and Post groups randomly so end up with the same number of data points in each sample size prior to computing the 95th percentiles? Also, did you do a hypothesis test to see if the Before and Post differences are statistically significant?
Topic on Talk:Reading/Web/Lazy loading of images on Japanese Wikipedia
Appearance
The analysis we've done so far wasn't very scientific. The numbers reported are simply a count of events before and after roughly the time of the change. I used about 3 weeks before and 1 week after. This preliminary analysis was just to check if data was going down or up.
Following up on this, per our previous face to face, the NavigationTiming data were already sampled. In the updated analysis, however, as recommended, the Kolomogorov-Smirnov significance test was applied to these two groups and the null hypothesis rejected.