Wikimedia Product/Better use of data/DACI
During Q1 FY 2018-2019 (July 2018 - September 2018) the Better Use of Data Working Group and interested Wikimedia Audiences team members collaborated on the following DACI to help better formalize the instrumentation (usually taken to mean event logging) process. This artifact is part of Output 3.1 Instrumentation.
Activity | Comments | Head of Product Analytics | Data Analyst / Data Scientist | Reading Infrastructure Engineering Manager | Reading Infrastructure Data Engineer | Product Software Engineer | Software Engineer Research | Analytics Engineering Engineering Manager | Analytics Engineering Software Engineer | Product Manager / Program Manager / Researcher / Designer | Legal / Privacy | Security | Management (usually, product Director) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Decide metrics needs | Usually the product manager or researcher realizes a need for metrics and gets the OK from management. Sometimes, management was the one initiating the request in the first place. The workflow is initiated. | Contributor | Informed | Informed | Driver | ||||||||
Define research questions, metrics, instrumentation data | The product manager defines questions that the instrumentation data needs to be able to help us answer.
The product manager and the data analyst determine the concrete metrics to be derived from the instrumentation, and define the concrete data that the instrumentation willl need to generate for this purpose. The data engineer and product software engineer reality check the instrumentation definition to ensure the data can actually be generated with the event model of the client. |
Approver | Contributor | Contributor | Contributor | Driver | |||||||
Instrumentation specification ticket and schema blob (precise definition of schema fields w/ validation, data dictionary cross-check, sampling approach, purging & whitelisting approach, queries, metadata) | The schema and queries and other attributes of the logging and use are collaboratively defined in Phabricator and a schema blob through a session between the data analyst, product software engineer, AE software engineer (optional), and data engineer.
The data analyst creates database queries that will be used to generate results (this is the equivalent of test-driven development (TDD), for data). Once they're done, the Heaad of Product Analytics reviews regarding use of best practices with the Product Manager and then advances to the next step for privacy review. Everyone else is looped onto the ticket. A heads up of 7-10 business days is given to Legal/Privacy via privacy@ that schema privacy review will be needed on date X. |
Approver | Contributor | Informed | Contributor | Contributor | Informed | Contributor | Driver | Informed | Informed | Informed | |
Schema privacy review | privacy@ is emailed with the link to the ticket the business day before (or the business morning of) date X for the review so that Legal/Privacy can review and approve the formulation or ask for re-engineering. | Contributor | Driver | Approver | Contributor | ||||||||
Schema creation with backlink to ticket | The data analyst, data engineer, and product engineer finalize the schema, with a reference back to the Phabricator ticket. Ideally, all metadata is self-documented in the schema / schema registry itself. The Head of Product Analytics reviews with the Product Manager and confirms when the schema is satisfactory for being instrumented against. | Approver | Contributor | Contributor | Contributor | Driver | |||||||
Instrumentation, pipeline scripts (if needed), alerting & monitoring | Product Software Engineer does the plumbing, with assistance from the data engineer as needed. The data analyst reviews and approves the code/config artifacts and testing is arranged. | Approver | Contributor | Contributor | Informed | Driver | |||||||
Testing | The software engineers (usually, product software engineers), the data analyst and possibly a QA tester collaboratively test the instrumentation. The data analyst reviews the outcome with the product manager and if the work is satisfactory move forward. | Approver | Contributor | Contributor | Contributor | Driver | |||||||
Update data dictionary if needed | The data dictionary (initially on-wiki, probably built into the schema registry as a software fixture when the schema registry is built) is updated by the data analyst | Approver | Contributor | Contributor | Contributor | Contributor | Driver | ||||||
Activation | The product software engineer or data analyst activates production level logging. | Contributor | Contributor | Contributor | Approver | Driver | |||||||
Verification and fixes | Analytics Engineering Software Engineer confirms no system degradation. | Approver | Driver | ||||||||||
Dashboard productionization | In case it has been requested to visualize some of the instrumentation data in form of a dashboard, the data analyst configures dashboarding, consulting with the data engineer and product software and Analytics Engineering software engineer as needed. | Driver | Contributor | Contributor | Contributor | Approver | |||||||
Long-term Support (LTS) | The data analyst monitors the quality of the logged data and reports anomalies/bugs to the Reading Infrastructure Engineering Manager and Product Software Engineer. | Driver | Informed | Contributor | Informed | Informed | Approver | ||||||
Decommissioning | If logging is no longer needed, the Head of Product Analytics requests sign off on decommission. The data analyst and software engineers provide consultation and work along the way to actually decommission things. | Driver | Contributor | Informed | Contributor | Contributor | Informed | Contributor | Contributor | Contributor | Informed | Approver |