Flow/Analytics
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. |
Goals:
- determine engagement with Flow boards (a Trello card)
- We'll do this by running queries against the Flow DB
- Probably want to compare with regular talk pages.
- measure how people use the UI (Trello card plus others)
- We'll do this using EventLogging to log m:Schema:FlowReplies and action events.
- also involves qualitative User Research.
Determining engagement
[edit]Flow can determine metrics like new topics, and average number of replies to topics because these are separate DB updates.
We'll probably want to compare with regular talk pages. Wikimetrics can show edit metrics for regular talk pages, but it's only for a cohort, a defined group of users.
- Wikimetrics page edits aren't currently talk-aware. Determining similar metrics (New topics, count replies, etc.) for regular user talk pages is harder. Echo has a
DiscussionParser
that can help, but it's intensive, parsing each revision.
Cohort
[edit]Typical wikimetrics involves identifying a cohort ("People who signed up at our Editathon") and then tracking their page edit success.
Flow doesn't have obvious cohorts to compare, we could just pick a bunch of newly-registered users. Danny has manually counted regular talk page edits vs. Flow board edits.
Implementation
[edit]http://flow-reportcard.wmflabs.org/ runs on the front-end web server limn1.eqiad.wmflabs
Dan Andreescu set up analytics/limn-flow-data repository (see its gerrit patches) based off mobile's repo.
- This commit deploys the Flow metadata to limn1
- reportcard.json defines our default dashboard.
- Gerrit change 171465 sets up a cron job to generate Flow statistics
The Flow analytics repository is regularly checked out on the stats back-end machine stat1003 to /a/limn-flow-data
.
Log output from the generate.py
cron job (not much) appears in /var/log/limn-data/limn-flow-data.log
To generate new data the Flow team "only" needs to
- commit python query scripts based on mobile to our limn-flow-data repo's flow directory
- and update reportcard.json to reference their output.
Deploying new front-end code
[edit]Deploying new code on the front-end is a separate process. You need to check out https://github.com/wikimedia/limn-deploy locally.
limn-deploy uses [www.fabfile.org/ Fabric] to execute commands remotely on limn1 via ssh, so you need to be able to ssh limn1.eqiad.wmflabs.
It has "stages" for deployment, flow
is one of the stages, thus
$ cd your/git/analytics
$ git clone https://github.com/wikimedia/limn-deploy
$ cd limn-deploy
$ sudo pip install -e .
$ fab -l # lists available stages and commands
Then to push changes to the Flow analytics front-end:
$ fab flow deploy.only_data
How to get info to a dashboard?
[edit]- Limn for now
Mobile and multimedia teams have automated this, each has a labs server ( http://mobile-reportcard.wmflabs.org and http://multimedia-metrics.wmflabs.org ), running cron jobs and generating Limn graphs.
Multimedia team also has server-side graphing in Ganglia.
Dan Andreescu will tell us where the code is, how these teams do it, etc.
Example: Echo dashboard
[edit]http://ee-dashboard.wmflabs.org/dashboards/enwiki-features has Echo, AFT, Page curation, and WikiLove stats. (An interesting one is Echo views by category.) All dashboards are actually puppetized web hosts on a limn1 server.
- wikitech:EE Dashboard has some info about setting this up
- enwiki-features dashboard definition has multiple
graph_ids
including "enwiki_echo_all"- enwiki_echo_all datasource definition points to URL
- http://datasets.wikimedia.org/public-datasets/enwiki/echo/echo_all.csv
- which is on wikitech:Datasets.wikimedia.org
- which is stat1001 where I think we can run cron jobs to create datasets, or possibly stat1003.
- which is on wikitech:Datasets.wikimedia.org
- http://datasets.wikimedia.org/public-datasets/enwiki/echo/echo_all.csv
- enwiki_echo_all datasource definition points to URL
(Note "ee-dashboard" sounds like a labs machine for the Editor engagement team (what the Flow team used to be called), but is actually editor engagement research (User:DarTar).)
Privacy: not too much data, not too long, not too personal
[edit]- don't store data for long periods.
- don't store personally-identifiable information data.
- don't log for every single user
Note that Echo does all this for logged-in users who click on the Echo [NN] red badge.
Next steps
[edit]Talk to Dan Andreescu
Make sure we define what success is
[edit]For comparison, Analytics has developed a well-defined funnel for "editor success": user registers, user edits successfully, and user sticks around.
Possible model
[edit]- how many people visit a talk page
- and never try again
- or try to edit
- or add a new topic/section
- and "get their answer"
UI event logging
[edit]We understand this pretty well.
Can Extension:Flow simply require EventLogging, or can it be decoupled through a "track
" interface?
( see how VisualEditor decouples)
See also
[edit]- m:Research:Data/Dashboards links to lots of good Limn dashboards , e.g.
- http://multimedia-metrics.wmflabs.org/dashboards/mmv (lots of EventLogging, not much insight)