Universal Language Selector/Compact Language Links/metrics
This page describes the scripts and queries for collecting data about Interlanguage links, especially in the context of Compact Language Links. For the collected data, see the page Universal Language Selector/Compact Language Links/metrics/data.
Measuring interlanguage links clicks
[edit]For getting information one day before, run:
$ ssh stat1002
$ ./all_projects.sh
For a particular date, run (note: no leading zeros in the date):
$ ./all_projects.sh 2016 7 7
The output will be in the files with names such as prev_wikipedia_2016_7_7.out and curr_wikipedia_2016_7_7.out, where "prev" is for the source language and "curr" is for target language.
Copy the results from the .out file to a text editor, and then copy them from the text editor to the relevant column in the spreadsheet.
Pageviews data
[edit]Updating daily data
[edit]THIS IS ON THE WAY TO DEPRECATION, AND MOVING TO QUERIES, SEE BELOW
Get the pageviews data for the projects that are measured:
- Wikipedia 1: https://tools.wmflabs.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-10&sites=af.wikipedia.org%7Cals.wikipedia.org%7Cbn.wikipedia.org%7Cca.wikipedia.org%7Ccs.wikipedia.org%7Cda.wikipedia.org%7Cde.wikipedia.org%7Ceo.wikipedia.org
- Wikipedia 2: https://tools.wmflabs.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-10&sites=es.wikipedia.org%7Cfr.wikipedia.org%7Che.wikipedia.org%7Chi.wikipedia.org%7Chr.wikipedia.org%7Cit.wikipedia.org%7Clv.wikipedia.org%7Cpt.wikipedia.org
- Wikipedia 3: https://tools.wmflabs.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-10&sites=se.wikipedia.org%7Csq.wikipedia.org%7Cta.wikipedia.org%7Czh-yue.wikipedia.org%7Czh.wikipedia.org%7Czu.wikipedia.org
- Wikivoyage: https://tools.wmflabs.org/siteviews/?platform=desktop&source=pageviews&agent=user&range=latest-10&sites=de.wikivoyage.org%7Cen.wikivoyage.org%7Ces.wikivoyage.org%7Cfr.wikivoyage.org%7Cit.wikivoyage.org%7Cru.wikivoyage.org
Copy the results to the relevant cells in the spreadsheet.
Adding a new language for calculating percentage of clicks out of views
[edit]- Find the right place, alphabetically, for the language to appear.
- Select all the rows for the language before which you want to add another one, including an empty row after it.
- Copy (Ctrl+C/⌘ Command+C).
- Right-click on the row numbers and click "Add rows above: 7".
- Select the added rows. Paste (Ctrl+P/⌘ Command+P).
- Change the language code in the first pasted row.
- Go to https://tools.wmflabs.org/siteviews/
- Select a range of initial dates that you want to backfill.
- Type the project name. The need chart will be shown.
- Click the CSV button and download the file. Open the CSV file in your desktop spreadsheet software.
- In your desktop spreadsheet software, select the numbers for the relevant dates and Copy (Ctrl+C/⌘ Command+C).
- Go back to the Google spreadsheet. Paste (Ctrl+P/⌘ Command+P) in some empty space and again Copy (Ctrl+C/⌘ Command+C).
- Right-click on the first cell in the "views" row for the new language. Paste special -> Transpose.
- Update the URLs for pageviews stats in the section on pageviews above.
Query for getting pageviews
[edit]From Phab T142505
SELECT
year,
month,
day,
CONCAT(year, "-", LPAD(month, 2, "0"), "-", LPAD(day, 2, "0")) AS date,
SUM(view_count) AS views
FROM
wmf.pageview_hourly
WHERE
year = 2016 AND
month = 8 AND
day = 16 AND
project = 'ru.wikipedia' AND
agent_type = 'user' AND
access_method = 'desktop' AND
NOT ( -- See https://phabricator.wikimedia.org/T141506
user_agent_map['browser_family'] = 'Chrome' AND
user_agent_map['browser_major'] = '41' AND
user_agent_map['os_family'] LIKE 'Windows%' AND
page_title = 'Заглавная_страница'
)
GROUP BY
year,
month,
day
ORDER BY
year,
month,
day
LIMIT 1000;
The clicks query
[edit]These queries run from the all_projects.sh script. They are here only for reference.
-- Assuming that your username is "kartik"...
-- Create your database (this only needs to be done once!)
CREATE DATABASE kartik;
-- Go to your database.
use kartik;
-- Create table of cross wiki navigation events.
-- Modify time parameters for your needs!
-- This takes around 5 minutes per day.
DROP TABLE IF EXISTS cross_wiki_navigation;
CREATE TABLE cross_wiki_navigation AS
SELECT year,
month,
day,
prev,
curr,
COUNT(*) AS n
FROM (SELECT day,
month,
year,
REGEXP_EXTRACT(parse_url(referer,'HOST'), '^([a-z-]{2,}).wikipedia.org', 1) AS prev,
normalized_host.project AS curr
FROM wmf.webrequest
-- select a relevant timespan to query over
WHERE year = ${hiveconf:yesterday_year}
AND month = ${hiveconf:yesterday_month}
AND day = ${hiveconf:yesterday_day}
AND webrequest_source = 'text'
AND is_pageview
-- only consider wikipedia article requests from users
AND agent_type = 'user'
AND normalized_host.project_class = 'wikipedia'
-- only consider wikipedia article referers (this is an approximation)
AND parse_url(referer,'HOST') LIKE '%wikipedia.org'
-- exclude .m.wikipedia.org and .zero.wikipedia.org
AND parse_url(referer,'HOST') NOT LIKE '%.%.wikipedia.org'
AND parse_url(referer,'PATH') LIKE '/wiki/%'
) log_with_parsed_referer
WHERE prev != curr
GROUP BY
year,
month,
day,
prev,
curr
;
-- Interlanguage links clicks by source language.
-- This takes about a minute per day.
SELECT
prev,
sum(n)
FROM
cross_wiki_navigation
GROUP BY
prev
ORDER BY
prev
LIMIT
1000;
-- Interlanguage links clicks by target language.
-- This takes about a minute per day.
SELECT
curr,
sum(n)
FROM
cross_wiki_navigation
GROUP BY
curr
ORDER BY
curr
LIMIT
1000;