Jump to content

Product Analytics/Superset Access

From mediawiki.org
This page refers to instances of Superset and Turnilo that provide access to data in the Analytics Data Lake. For information about how to access Fundraising's instance of Superset, see Fundraising Access Request or email fr-analytics@wikimedia.org.
Screenshot of an edit counts dashboard
Screenshot of a traffic dashboard
Gallery of chart types available in Superset
Screenshot of hourly edit counts in Turnilo

Hello, dear reader! If you're on this page you're probably interested in checking out Superset.

Apache Superset (available at superset.wikimedia.org) is a dashboarding and data exploration tool. The screenshots on the right showcase two examples of it – reading and editing metrics dashboards created by Connie Chen – and illustrate some core features like formatted text (which can include links), menus for user input, and interactive charts. There are a lot of ways to slice and dice the data and many chart options in Superset (see gallery of options on the right) to visualize that data with.

Turnilo (available at turnilo.wikimedia.org) is data exploration tool and can be thought of as a much lighter version of Superset. All the datasets available in one are also available in the other because both of these tools hook up into the same backend database called Apache Druid, where these datasets are stored as "data cubes" (aka "Druid Datasources" in Superset). Unlike Superset, Turnilo does not let you create dashboards and its charting capabilities are very limited. If you're familiar with the concept of pivoting when working with spreadsheets, Turnilo is basically just for doing that.

Access

[edit]

If you've tried to open either of those links (or any Superset/Turnilo links in the past), you've seen something like this:

Wikimedia Developer Single Sign-On Portal

It's understandable (and completely okay, even!) if you're not sure what exactly it's asking you or how to get past that.

Let's demystify it a bit.

A Wikimedia developer account is the account you use for developer services hosted by WMF. It's also known as "LDAP account/credentials" within the Wikimedia tech world, and you can create one yourself. The developer account password and the UNIX shell username you pick will be the login info you will use to access Superset/Turnilo.

Why?

[edit]

Great question! There are two reasons:

  1. The data in Druid/Superset/Turnilo has not been cleared for public release. In very few cases there are public versions of datasets – and those have been altered for public consumption for security/privacy reasons like the geoeditors dataset which has a private version and a public version – and in other cases the data may never become publicly available.
  2. LDAP info is how account access, group membership, and permissions are programmatically managed by the ops engineers on the SRE and Analytics Engineering teams. They have a whole system[1] set up for managing key-based authentication, permissions, and access to things like SWAP for running queries and performing analyses, the data lake (which includes events from EventLogging and traffic data that has IP addresses), and BI tools like Superset – which ties into the general system that they use to manage all of Wikimedia's infrastructure.

Besides being able to use Superset and Turnilo, having a developer account enables you to contribute to Wikimedia projects (including MediaWiki Core and MediaWiki extensions) on Gerrit (similar to GitHub) and use Wikimedia Cloud Services like Cloud VPS (similar to Amazon EC2) and Toolforge for things like making your own Wikipedia & Commons bots.

Requesting access

[edit]

Once you have a developer account, you will need to request to be added to the following groups:

  • analytics-privatedata-users (for access to dashboards which depend on data restricted to this group)
  • either the wmf or the nda group
    • The wmf group is usually for full-time req# staff and the nda group is for external collaborators and contractors who have signed an NDA. For more detailed information for your specific use-case, see these instructions on Phabricator.

The request is made by using this Phabricator form tagged with LDAP-Access-Requests where you will need to:

  • Specify who you are and your developer account username
  • Specify which group you're requesting to be added to
    • For wmf group: mention that you're a full-time WMF employee.
    • For nda group: you will need your supervisor to confirm by commenting on the task.
    • For analytics-privatedata-users group: add the project tag SRE-Access-Requests. Learn more here.
  • Include the reason for your request – e.g. "to access Superset/Turnilo"

Here are some examples of requests to use as reference: Danny Horn's and Kate Zimmerman's.

It might take a few days for this request to be processed but once it's done, it's done. Congratulations, you're one of the cool kids now!

Aside: if this is your first time hearing about it, Phabricator is the tool used for project management, software bug reporting and feature requests at Wikimedia. Refer to these instructions for creating a Phabricator account by logging in with your Wikimedia unified account or your developer account. Whichever one you don't pick can be linked to your Phabricator account later[2] to enable you to login with either.

Troubleshooting: once your request is processed and you see a confirmation that you've been added to one of those groups, try logging in with your shell username (check this page if you don't remember what you picked) and developer account password. If for some reason your access is denied, you will need to contact the Analytics Engineering team for support.

Training

[edit]

If you're interested in learning more about Superset/Turnilo and what you can do with them, you can book an appointment with anyone on the Product Analytics team to have a certified data professional give you a tour of Turnilo, dashboarding in Superset, and explain the various datasets available within those tools. Training can be done 1:1 or in groups. See our Office Hours page for more details.

We also have a training video available on YouTube to anyone with a @wikimedia.org account.

Further reading

[edit]

References

[edit]