Platform Engineering Team/Data Value Stream
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. Since June 2023 this team is now part of Data Platform Engineering |
Data Platform
|
Mission / Objective
[edit]The Data Platform team serves data producers and data consumers by providing a stack of software solutions to support the following capabilities:
- Scheduled Dataset Creation
- Data Persistence
- Data Gateway
- Data Events
- Event Driven Dataset Creation
- Data Discovery
The team's primary focus is building out these capabilities while creating clear and comprehensive documentation so teams can utilize these services to build out their own data pipelines. The Data Platform team will partner with data producers and consumers to understand use case needs, provide design recommendations, review code and deploy code to the stack.
The Data Platform team's ultimate goal is to centralize data pipeline creation and ensure good software development standards are encouraged throughout the process.
What isn't the Data Platform team?
[edit]The Data Platform team is not scaled to build out and own data pipelines for other teams unless there is an explicit need or lack of technical expertise in the requesting team. This will be handled on a case by case basis as support is needed.
The team's focus is on building capabilities for the platform to support dataset producer and consumer use cases.
What is generated data?
[edit]The Data Platform team defines this as any dataset generated from the results of a data pipeline that requires persistence for use in a process that serves knowledge content and knowledge experiences. The team considers generated data which has a primary use in analytics or machine learning as out of scope.
NOTE: The team will continue to support AQS but our main focus will be on our capabilities. We will review ownership of AQS as we progress with our platform services
What is a data pipeline?
[edit]A data pipeline is a series of data processing steps. It typically involves ingestion of data from a source, one or many steps to transform, enrich or aggregate that data and a final step to output the resulting dataset to a data store.
Still not sure what we do? Hopefully this table helps clarify...
[edit]Use Cases | Which team to go to? |
---|---|
I have a notebook that produces some cool output that I think could be useful for contributors | Data Platform & Structured Data Team |
I want to access MediaWiki data and produce some reports and analysis | Metrics Platform |
I created a machine learning model and want somewhere to run it | Machine Learning Team |
I want to consume data events, trigger a micro service and store the data so an API can serve dataset consumers | Data Platform |
I built a process that the community is using but I can no longer support it | If you want to migrate it yourself > Data Platform
If you can no longer support it at all or migrate it then ??? |
I have a data pipeline process but I am not sure of the benefit of the output | Data Platform & Structured Data Team |
Work Intake Process
[edit]For significant projects, you can follow Platform Engineering teams how to work with us process.
For work related to bugs, features or support on Data Platform areas of responsibility, you can contact Data Product Manager Luke Bowmaker directly or create a task on our Phabricator board, tagged with #generated_data_platform assigned to lbowmaker(Luke Bowmaker). The team meets throughout the week and triages on a rolling basis.
Data Platform Value Stream Demo
[edit]On a regular cadence, the Data Platform Value Stream team will post demos of our developments/works in progress here to provide transparency and gather feedback.
Demo Sessions
[edit]Note: You'll need to be signed in with your WMF account to view these videos.
Date | Links | Presenters |
---|---|---|
2021/09/08 | First Demo | Data Team |