Data Platform Engineering/Data Products/work focus
FY 24/25
[edit]The Data Products Team is primarily focused on delivering towards the FY 24/25's Annual Plan's. first goal: Infrastructure by working towards Signals and Data Services objectives and key results. Our secondary focus is on the essential work of maintaining the systems, components, and products that we own. "Essential work is “keeping the lights on” work that the P&T department recognizes as essential to continue the immediate operations of the Foundation and its projects."
The majority of our work will be in support of the SDS Objective 2: Product managers can quickly, easily, and confidently evaluate the impacts of product features.
Big Picture Goal: drive innovation and improve Movement outcomes through the effective use of an experimentation platform that is iteratively developed and adopted across product teams.
Our first targeted Key Result (2.1) is: by the end of Q2, we can support 1 product team to evaluate a feature or product via basic split A/B testing that reduces their time to logged-in user interaction data by 50%.
We are beginning the new fiscal year with two hypotheses to support this KR:
(2.1.1) If we create an integration test environment for the proposed 3rd party experimentation solutions, we can collaborate practically with Data SRE, SRE, and QTE to evaluate the solution’s viability within WMF infrastructure in order to make a confident build/install/buy recommendation.
[DRAFT] (2.1.6) If we dogfood Metrics Platform instrument configuration (MPIC), we can ensure that the functional prototype is ready for user testing.
We will also support the Growth Team's adoption of the Metrics Platform client libraries as they begin work to design their experimentation plan for the Community Updates module and implement instrumentation. We hope to support their needs and create a standard click through rate instrument that can be used by any product team.
FY 23/24
[edit]Data Products Goals
[edit]At a high level, our team is currently focused on two hypotheses within the WMF Product & Technology FY 23/24 annual plan
- 2.5.1: "If we develop a data contract composed of schema fragments and a consistent cross-platform API, we can reduce the number of steps required to instrument for an experiment and produce consistent data used across teams and experiments."
- 3.4.1: "If our trusted datasets are all in the same place following the same conventions in dimension semantics, naming, and granularity considerations; it will be easier to combine and extract the data and serve data that can be easily evaluated in terms of privacy."
We are also working on the committed work of Commons Impact Metrics and other essential work to maintain and and decrease maintenance burden on systems we steward.
Sprint Goals
[edit]The goals for current sprint are (23/10/24 - 23/11/214)
- [HIGHEST] Commons Impact Metrics: Prep for GLAM Wiki Conference
- [HIGH] SDS 2.5: Core Interaction API Design, Implementation & Documentation
- [MEDIUM] Transition to 50/25/25 capacity structure
- [LOW] Sunset AQS 1
Past Sprints
[edit]- SDS 2.5.1: Prepare to onboard the rest of the team
- Traffic to all six services routed to AQS 2. AQS is ready to sunset.
- Technical strategy for Commons Impact Metrics prototype including implementation draft
- Dumps 2: Bring to complete or pause with a plan for future.
- Knowledge gaps: pause until we open work on SDS 3.4
- At least one client library is refactored to include the new data contract (core schema and scheme fragments) and an existing instrument is prototyped [receiving live data?]
- Did not yet
- Almost at two client libraries refactored
- Merge requests not quite landed
- [Continue] Generate XML dumps for simplewiki
- Not yet
- XML generated with everything but data quality issues form input
- How we import is remaining work
- 100% of traffic routed to Media, Pageviews [Edit and Editor Analytics next]
- Media done 🎉
- Pageviews is waiting on SRE
- Knowledge Gaps Index metrics receive production traffic
- Waiting on SRE
- Data dumps transition has been clearly communicated across stakeholders
- Done 🎉
Generate XML dumps for a simplewiki
Core interaction schema and schema fragments are prototyped and tested in preparation for updating metrics platform client libraries next sprint
100% of traffic routed to Geo and Media Analytics
Identify and mitigate risks associated with MediaWiki History pipeline