Hope this is the right section... My issue is that current metrics are based on Wikipedia, while other projects (e.g. Wikisource) do have a different architecture. For example, Wikisource "should" have edits at the "book level", which is unfortunately impossible with the current software architecture. But a feasible hack could be summing up edits of different subpages and give them to the main page. This is doable both in main namespace and Index/Page namespaces. This alone would be a major improvement, for the Wikisource community.
Talk:Wikistats 2.0 Design Project/RequestforFeedback/Round1/Site dashboard
After following up with @Aubrey, here are some details:
Wikisource has this "book" entity, which is a logical entity but it's not baked in the software; so, every book has, often a set of main namespace pages, one or more Index pages, several Page pages. The simple, more common model: 1 Index, n pages, 1 ns0 page, m subpages. Good to have: stats relative to the 1 ns0 page and its m subpages, and possibly stats about 1 Index page and its n Page pages. The ns0 pages are for "readers" to read, pure text. The Index and Page pages are for "editors" to proofread and validate, this is the main difference between WP and WS. We have "two" places: one for readers, one for editors. Of course there are many corner cases, but this is the main structure.
Also, I may add that a snapshot of how much *all books* are read would be informative. For example, a graph that shows the very likely power law of Wikisource books could inform editors to give priority to certain books/topics/subjects/authors over others. Please bear in mind that Wikisource is 100s times smaller than Wikipedia, so many things are possible for us that are not possible (due complexity and "cost") for Wikipedia. For Wikisource, it would not be impossibile to have charts/graphs regarding *all books/all users/all authors*, one by one. That would be overkill for Wikipedia. It's just a reminder that the little scale of the project works in our favor, I think.
It's really important to us to get metrics that work for all projects, so thank you for the insight. So I think I understand what you mean about sub-page edits belonging to their main page, but does that influence the kinds of metrics we're looking at? I figure numbers like "active editors" would be the same, while numbers like "pages created" might be different based on how we count (just the main pages or all subpages). If you have time, and no pressure, but if you have time, we would love to have a chat and find out more about which specific metrics would be useful for wikisource and which are wrong or could be adjusted. Thanks again either way.
For main ns we structure works to be sequential in subpages, so they basically link, as mentioned there is a relationship to a book. So thinking of the top of my head ...
If we are talking about a novel a reader will/should read these in chapter order, presumably from start to end.
- We don't know how many people start a work and wind their way through to the end, or read a bit and then stop (so are all suubpages read, or how far are people progressing)
- We don't know whether a search for a work throws people to a subpage, and they work their way back to the top. (so what is the landing point for a work)
- We don't know how people arrive to read literary works, external search, internal search, or links
- We don't know whether for literary works whether we need a search that presents top level pages only, or whether we should also present chapters (chapters that are named creatively Chap. 1, 2, ...n)
- Counts matter for the work
- Long pages versus short pages, does a long chapter scare them away?
- Collectively does poetry get read, or is it our novels?
For a biographical/encyclopaedic work it is more likely that they will dip in and out, often back to the main page before delving down, looking for something
- We don't know whether our biographical works are entered from a search engine externally, or internally, or from Wikipedia links; or from a biographical entry, or from the root of a work.
- For something like the 63 volume DNB with its 000s of biographical works (not following a normal naming pattern) how many of this are read, so collection data
For official works, we can guess that the US records are most used, but who knows. If we categorised these works differently, ie. used wikidata cross-referencing what can it tell us about what our visitors want to read, or what they read? Does WDering assist our analystical skills?
How effective is our namespace structure? Are pages found, from where, and are links followed?
- author pages (lots of curatorial work)
- portals (some curatorial work)
- categories (where we don't do much work)
Tell me the % of pages visited per namespace, or maybe dwell time.
Can you tell us which works are read in mobile, versus which are read by desktop. Or are they all read in mobile, and we need to rethink our presentation componentry? Do vistors just arrive via mainpage?
Lots of questions for which there haven't been answers. I could not tell you that I know anything about how a visitor arrives and travels through our sites.
Aubrey raises some good points here. A "book's" edit (and viewership) count could be the summation of all edits done on all of its pages (including subpages) in main, Index, and Page namespaces. That's probably quite hard, and certainly different from other wikis (although, similar to Wikibooks in some ways).
At the very least, is it possible to customize things like where the metrics refer to 'articles'? Because Wikisource has 'books' as the most insteresting unit of how-many-things-do-we-have — and doesn't really have 'articles' at all, or if it does then other namespaces should be included in that count.
We could bring this up at the next Wikisource hangout.
Thanks @Samwilson, we will see what we can do about making a project-dependent notion of "article". I guess it would have to have a customizable definition and name and in wikisources's case refer to the top level book. Is this the exact same in wikibooks? I guess I should spend some time getting more familiar with all our projects.
Yes, I think you could probably reasonably accurately not worry about particular projects, but just work with the concept that subpages can be considered "part of" their parent page and included as such in some sorts of metrics. That's a concept that's tied to MediaWiki design, and not particular communities (if you see what I mean?). :-)
But most of all, thank you for working on this stuff! It's brilliant.
Hi Millimetric, I don't have any problems for a chat, the only issue is how and when ;-) Were I live the connection is not excellent, but an hangout/skype is probably doable.).
I don't know where is the right place to discussed about the stats of Number of Article( NS0) and its sub page. The sub-page structure is depends on book structure. Some Wikisource have used one word one sub-page for 60000 word dictionary. So Subpage should not be counted as Total Numbers of Stats. It should be counted and presented in nested with its main page. And its most needed for Wikisource, stats for proofreading ( validate, proofread, No text and Problematic )of each user.
@Aubrey also pointed me to https://tools.wmflabs.org/phetools/statistics.php?diff=1 and https://tools.wmflabs.org/phetools/stats.html which look like great tools for us to understand how people need their stats. Thanks to you both!
I would say that phetools is an internal reflective piece of data, and maybe a little bit of interwiki tease.
@Jayantanth, see the discussion just above, Aubrey had the same thought. From the technical point of view, this can be tricky, but I promise to do my best :)
For comparison: a higher aggregation level also exists on wikibooks (probably different). I built a set of reports to cover that aspect of Wikibooks. But I have to say few ever used it or at least I received hardly any feedback. https://stats.wikimedia.org/wikibooks/EN/WikiBookIndex.htm Maybe I didn't present the things people cared about. Or maybe people didn't know it was there.
That's pretty cool! :) Will it still be updated? I guess it's harder to do page-views of each book. (Or did I mis-read the page you linked?).
Page views per book, could be done, but it would require custom code invoking the page view API. And sorry I have no intent to maintain that code. I only gave the link to serve as example of what per book metrics could entail. Right now the large index of chapters colors each chapter title based on amount of text in the chapter.
Another stats that would be cool: number of active users per day. I can't stress enough the fact that for sister projects we can think of stats that are maybe meaningless or too expensive for Wikipedia. Also, we love to have graphs/charts, to better understand the picture.
If you just want the pageviews of the current page and subpages, you could try Massviews with "Subpages" set as the "Source". E.g. http://tools.wmflabs.org/massviews/?platform=all-access&agent=user&source=subpages&target=https%3A%2F%2Fen.wikibooks.org%2Fwiki%2FMuggles%2527_Guide_to_Harry_Potter&range=latest-20&sort=views&direction=1&view=list [Edit]: My comment is merely to help you if you need this data right now. I do not work on Wikistats and do not mean to portray Massviews as some sort of replacement :)
The numbers are made up, but the choice of metrics is deliberate. If this is not a good overview, what other metrics would you highlight?
Not really. The "Top Contributors" is a very poor choice for showing on the dashboard. Many users do not want to be present in such lists, but if a way to opt out is offered, the list would become irrelevant. Instead, I would go with "very active editors" or "returning editors" (if those exists).
Also, media uploads is relevant only at wiki level (and even then, mostly for Commons). Mixing free media with protected materials brings little overview on the activity.
Got it. We could exclude Top Contributors by default and allow the dashboard to be customized for those who want it. And we could replace it with something less competitive that brings attention to the fact that people edit wikipedia. Any ideas on that?
I agree with Strainu on choosing a different word to replace "Top contributors".
I would like to clarify that my objection was not about the naming. I believe regardless of the name, such a stat cannot be useful in the dashboard. My proposal was to replace it with another statistic.
I'm wondering: Analytics/Wikistats/DumpReports/Future per report#Most prolific contributors .2839.29 got the most votes for keeping it (39).
Would you prefer a different default time range?
Is it a rolling year? Is current month included or is it last month? E.g. if I visit this page today (3 February), what will I see?
Yes, a rolling year up through the most recent complete month. So on February 3rd we could show January 2016 through January 2017. For daily data through the current day, the user would have to click on detail and choose the right time range.
Makes sense. Would be great to have MTD values available somewhere (possibly with projections), like in https://stats.wikimedia.org/EN/TablesPageViewsMonthlyCombined.htm . It might not make sense for all metrics but it might be disappointing to still see January only on 28 February.
Detail: MTD only appears from day 8 of the month. And there is the issue that sometimes first say 10 days will contain one weekend, sometimes two, which hampers extrapolation to full month.
I was wondering if the time range could be more prominent, maybe put it next to the metric name.
...
Not from the mockup. But if the cursor changes, it should be enough.
Yes, it is.
How so? What would you change?
Sure. I would probably choose a smaller number of stats for a longer period of time.
Some metrics are important to some people that are not important to others, etc. It would be cool to be able to customize the dashboard for the metrics you track on your report, and for which specific Wikimedia projects.
YoY means year over year and MoM means month over month. These numbers can be used to account for seasonality. For example, in places that celebrate Christmas, editing activity decreases so it wouldn't be useful to compare December numbers with July numbers. Are these numbers useful at a glance or better left in the detail pages?
I would say yes, these numbers are useful. But please tell what you compare MoM: a day or a month? If month, are months normalised (e.g. will March have a 11% increase over February for well-known reasons?)
Well, we can use whatever approach people find useful, but typically MoM compares the average of this month to the average of the previous month. This takes care of the normalizing issue you mention, which is a good point.
Then yes, both seem useful for me.
It depends on the statistic, for instance it doesn't make much sense to compare number of edits MoM.
Yes, it is important.
I don't like the enormous dark grey space at the bottom. I know this is trendy and many corporate sites have a huge space like that with their contact details, etc. but it seems inefficient and ugly. Even dropdown lists would be better IMO.