Jump to content

Talk:Analytics

Add topic
From mediawiki.org
Latest comment: 10 years ago by Bawolff in topic Question

Only Wikipedia?

[edit]

From mail:analytics: "A mailing list for the Analytics Team at WMF and everybody who has an interest in Wikipedia and analytics. If you enjoy counting stuff on / about Wikipedia and you have a question then please sign up and the Analytics Team will try to help you as best as it can". Only Wikipedia? Usually research/data questions go to mail:wiki-research-l and I've not seen them on this list yet, isn't the list rather about WMF infrastructure for analytics and research? --Nemo 09:38, 10 January 2013 (UTC)Reply

It's mostly about infrastructure and software for analytics but I would like to be inclusive when it comes to the topics. Drdee (talk) 20:24, 1 October 2013 (UTC)Reply

Question

[edit]

I was wondering if it would be hard to get a table with all the languages that these 80 articles are written in (en:Wikipedia:WikiProject_Medicine/Translation_task_force/Popular_pages) with the current page views per day? Are there tutorials for non-software-engineers available on how to do this? --Tobias1984 (talk) 20:26, 9 July 2013 (UTC)Reply

Or would it be possible to track all the articles that use a WikiProjects template? There would be some interest from WikiProject Medicine to track all their marked articles and also see how many views they get in different languages. --Tobias1984 (talk) 21:00, 23 July 2013 (UTC)Reply
Hi, Tobias1984,
Assuming that you don't need up-to-the-minute numbers, I believe that both of these lists could be done with an occasional database report. That means that you want to ask someone like User:MZMcBride or User:Ironholds for help. They'll both know people who can do this and perhaps might be interested. If you want it to run regularly, then talking to User:Mr.Z-man (who runs the bot that updates the popular pages reports) would be my suggestion for the first person to contact.
Your idea about having a tutorial for normal people (like you and me) to learn how to do this is also a good one. We'd need someone to create an instructional page. User:Ocaasi is good at explaining things, but I don't know if he knows how to do this either. Do you think that a written instruction page would be best, or a video (like a recording of someone teaching a "class" on this)?
BTW, I'm likely to see your ideas much sooner if you post them at w:en:WT:MED.  ;-) Whatamidoing (WMF) (talk) 19:02, 18 August 2014 (UTC)Reply
Its a little more complicated than a simple database report (Since page view statistics aren't in the database). In essence in order to accomplish such a list you would need to do the following steps:
  1. Get a list of all interwiki links for those pages.
    This can be pretty easily accomplished with sql, see http://quarry.wmflabs.org/query/263
    The query I used is as follows:
SELECT p2.page_title as "English article", ll_lang as "Lang code", ll_title as "Translated name"
FROM categorylinks
inner join page p1 on cl_from = p1.page_id and p1.page_namespace = 1
inner join page p2 on p2.page_title = p1.page_title and p2.page_namespace = 0 and p1.page_namespace = 1
left outer join langlinks on p2.page_id = ll_from
WHERE cl_to = 'WikiProject_Medicine_Translation_Task_Force_articles';
  1. If you're looking for an introduction on how to make SQL queries, this is probably not the best first example, but I'll try to explain what it does. So first of all the second line (FROM categorylinks), says we want to get things in a certain category. The last line (WHERE), says which category. This gives a list of page ids of things in w:category:WikiProject_Medicine_Translation_Task_Force_articles. Line 3 converts these page ids (cl_from is the page id a particular category entry is from) into normal page names. However they're all in the talk namespace (because everything in that category is in the talk namespace). The and p1.page_namespace = 1 part discards any non talk namespace entries in that category as a precaution. Line 4 takes the page names we found out in line 3, and finds what the page_id is for those pages but in the main namespace. We've now converted the page ids we got from the category in line 6, into page ids for the relavent article namespace page. Now finally, line 5 takes every page we got from line 3, and looks up all the interlanguage links on those pages. The left outer join part means to include a dummy entry if there are not interlanguage links for a particular page (compared to inner join which would discard anything that doesn't have any corresponding entries). Last of line 1 says to take all the results, and format it into 3 columns. The results of this query can be viewed at [1]
    Since we probably want to use this list to do further work (instead of just looking at it), we probably want it in an easier format to work with. A good format is tab separated values (List of rows, each item in a row separated by a tab, each row separated by a newline). You can get the data in such a format quite easily if you have access to the tool labs. You would just run the command:
    echo 'select p2.page_title as "English article", ll_lang as "Lang code", ll_title "Translated name" from categorylinks inner join page p1 on cl_from = p1.page_id and p1.page_namespace = 1 inner join page p2 on p2.page_title = p1.page_title and p2.page_namespace = 0 and p1.page_namespace = 1 left outer join langlinks on p2.page_id = ll_from where cl_to = "WikiProject_Medicine_Translation_Task_Force_articles" order by p2.page_title, ll_lang;' | sql enwiki_p > WikiMedTranslationArticles.txt
    
    Note: I changed some single quotes to a double quote, and added an order by clause in the above query. The resulting tab separated value file can be viewed at https://tools.wmflabs.org/bawolff/WikiMedTranslationArticles.txt
  2. We need to take the articles listed in this file, and find out how many views each has. This is the harder part.
    The raw page view data is available at http://dumps.wikimedia.org/other/pagecounts-raw/ . However using it involves a few processing steps. Redirects are not taken into account, and each file is only for an hour's worth of data. Alternatively you could probably use the grok.se stats.
    If you were using the grok.se stats, what you would do is:
    1. go over each line of https://tools.wmflabs.org/bawolff/WikiMedTranslationArticles.txt (Typically using a for loop in the computer language of your choice). The first line is Abortion an Alborto. Don't forget to make your computer sleep between each line so as not to be too hard on grok.se server.
    2. You'd take the last 2 columns, and form a url like http://stats.grok.se/json/an/latest30/Alborto (Remembering to properly url encode the page name. This is especially important for languages that use non-latin letters).
    3. Take the results. These are formatted as JSON. Almost all computer languages have libraries to deal with this format. You may want to combine the entries to get a total for a month. In the url you are fetching, depending on your use case, you may want to replace the latest30 with latest60, or a month in YYYYMM form (e.g. http://stats.grok.se/json/an/201407/Alborto ).
  3. Last of all, you have to take all the results, and format them into wikitext to insert into the page (or find some other way to display them). Ideally you would want to make a bot that did this automatically on a regular basis.

So that's a rough outline of what you would need to do if you wanted to do it yourself (The last half is particularly rough, as I started to rush). It probably is still not that useful unless you already know how to program. Anyways, I hope this was at least somewhat informative, or at least better than nothing. I'd be happy to try to explain better any parts that are confusing. Bawolff (talk) 22:45, 18 August 2014 (UTC)Reply

Another question

[edit]

This page is infrequently updated. Does that mean that this project is in limbo? Or just that the team are volunteers and haven't had time to work on it? 69.125.134.86 22:57, 19 August 2013 (UTC)Reply

it means that I am not receiving alerts :(, please come to us on IRC #wikimedia-analytics Drdee (talk) 20:24, 1 October 2013 (UTC)Reply

Changes

[edit]

I've made some fairly wholesale changes to the site in preparation for our quarterly review later this month.

All of the engineering resources have been moved to https://www.mediawiki.org/wiki/Analytics/Engineering

— Preceding unsigned comment added by TNegrin (WMF) (talk • contribs) 01:19, 2 October 2013 (UTC)Reply

New Report Card - request for earlier data

[edit]

Hey all,

The New Report Card is really great. Is there any chance you can get it to show data all the way back to the beginning rather than just 2011? I'm interested in looking at our whole history of active editor numbers, and would also like to be able to compare active, medium-active and very-active editor levels.

Thanks, — Scott • talk 13:16, 27 October 2013 (UTC)Reply

Is there anything that stats: doesn't have? That's the main site and has all the important data. --Nemo 20:33, 27 October 2013 (UTC)Reply