Product Analytics/Reporting Guidelines
The Product Analytics team produces several types of reports:
- One-time substantial reports after the completion of a project and/or analysis. These are typically published as either a wiki-page or a PDF.
- One shot analysis of specific questions from product teams. These are often published as comments on a Phabricator ticket.
- Recurring reports such as weekly or monthly statistics about a project. These are typically published internally, or available externally through a shared analytics resource.
Common guidelines
[edit]- All types of reports should describe the purpose of the project and the analysis.
- Metrics should be defined, and reference relevant standardizations where applicable (e.g. standardized retention metric).
- There's no need to create a standalone report if simply reporting the results in a relevant Phab ticket will suffice.
Types of reports
[edit]Phabricator comments
[edit]Sometimes it is enough to post results of your analysis as a comment on the relevant Phabricator task.
- Provide sufficient detail about the methods used to gather data (e.g. provide a SQL query, define date ranges).
- Define any relevant assumptions.
- Uploading graphs directly to Phabricator is fine. If community members ask, the graphs might also be uploaded to Commons.
Generating tables
[edit]Python:
# df is a Pandas DataFrame
df.to_markdown(index = False)
R:
library(knitr)
cat(kable(data_frame, format = "markdown"))
In either case you will need to remove the alignment row (the second row which has colons that in most Markdown flavors is used to specify column alignment but which does not work in Phabricator's flavor).
Recurring reports
[edit]- Recurring reports are expected to be more lightweight and do not need to provide deeper discussions/analysis of the data, they are instead expected to be more of a data summary.
- If the report is generated from a Jupyter Notebook, include a button to show/hide the code.
- Make it clear when the report was last updated and what range of data it contains.
- The report should be easily accessible to relevant stakeholders (e.g. by having it hosted publicly if possible).
Substantial reports
[edit](for lack of a better name)
- These reports should include an executive summary of the results and the recommendations that follow from the analysis.
- If the definition of a given method/metric becomes substantial, consider moving it to an Appendix and referring to it as reading for those who are interested.
- Publishing the reports on-wiki (e.g. as a sub-page of a team's pages on MediaWiki-wiki) enables translation into other languages through standardized translation practices on wikis.
- For PDF reports, we recommend you use R Markdown and wmfpar template.
Publishing reports
[edit]Important information Before publishing any report, please consult and follow the Data Publication Guidelines. |
HTML
[edit]While you can convert a Jupyter notebook to HTML using the built-in feature (which uses nbconvert under the hood), the recommended way to create an HTML report from a Jupyter notebook is to use Quarto. Refer to these instructions for installing and using Quarto.
analytics.wikimedia.org
[edit]When publishing from analytics cluster (stat100X hosts), follow these instructions. This will make pages available on analytics.wikimedia.org – e.g. https://analytics.wikimedia.org/published/reports/wikipedia-android-app/suggested-edits-v2.html and https://analytics.wikimedia.org/published/reports/wikipedia-android-app/metrics/
NOTE: Jupyter restricts user's write permissions to within home directory for security reasons. So be sure to copy or move files into /srv/published via SSH in Terminal, as opposed to Terminal inside Jupyter. If you're planning on scheduling a recurring job via crontab
to re-run and publish a report with some frequency, that has to be done via SSH (not Jupyter Terminal) also.
people.wikimedia.org
[edit]There's also the option of hosting the page in your personal directory on people.wikimedia.org (example).
You can either upload the files using an SFTP client like Transmit or use scp
in Terminal. Another method is to put the files in a public git repository, clone the repo to ~/public_html on people.wikimedia.org and schedule cron job to git pull
every now and then.
To restrict access to the files to users of the wmf
and nda
groups (similar to how Superset is restricted), follow these instructions (example at T290693#7343430).
nbviewer.org
[edit]You can also upload the Jupyter notebook to a publicly accessible repository on GitLab (preferred) or GitHub and use nbviewer.org to render the notebook, which usually renders the notebook better than the built-in renderer in GitLab or GitHub.
Quarto Pub
[edit]If you are uploading the report created with Quarto to Quarto Pub, make sure that it adheres to the Data Publication Guidelines – that is, either it is Low Risk or Medium Risk but sanitized since this is a non-Wikimedia server.
PDFs
[edit]For PDF reports the recommendation is to upload it to Wikimedia Commons. After uploading, edit the file to have the following meta information:
=={{int:license-header}}==
{{WMF-staff-upload|license=cc-by-sa-4.0}}
{{Wikimedia trademark}}
See Impact of sitemaps on Italian Wikipedia search engine-referred traffic for example.
Future work
[edit]- Using the Template:Wikimedia engineering project information to describe the report and defining the project's start and end dates will automatically categorize the report into the relevant date-based categories for WMF projects. [FIXME: have a Product Analytics-specific template for this]
- If the report covers a large range of data, consider adding the ability to filter/focus on parts of the data through dynamic graphs. [FIXME: we need to know how to do this]