Platform Engineering Team/Data Value Stream

Group:	Platform Engineering
Team members:	Will Doran, Xabriel J. Collazo Mojica, Andrew Otto, Gabriele Modena, Thomas Chin
Backlog:	#generated_data_platform
Lead:	Luke Bowmaker

Mission / Objective

The Data Platform team serves data producers and data consumers by providing a stack of software solutions to support the following capabilities:

Scheduled Dataset Creation
Data Persistence
Data Gateway
Data Events
Event Driven Dataset Creation
Data Discovery

The team's primary focus is building out these capabilities while creating clear and comprehensive documentation so teams can utilize these services to build out their own data pipelines. The Data Platform team will partner with data producers and consumers to understand use case needs, provide design recommendations, review code and deploy code to the stack.

The Data Platform team's ultimate goal is to centralize data pipeline creation and ensure good software development standards are encouraged throughout the process.

What isn't the Data Platform team?

The Data Platform team is not scaled to build out and own data pipelines for other teams unless there is an explicit need or lack of technical expertise in the requesting team. This will be handled on a case by case basis as support is needed.

The team's focus is on building capabilities for the platform to support dataset producer and consumer use cases.

What is generated data?

The Data Platform team defines this as any dataset generated from the results of a data pipeline that requires persistence for use in a process that serves knowledge content and knowledge experiences. The team considers generated data which has a primary use in analytics or machine learning as out of scope.

NOTE: The team will continue to support AQS but our main focus will be on our capabilities. We will review ownership of AQS as we progress with our platform services

What is a data pipeline?

A data pipeline is a series of data processing steps. It typically involves ingestion of data from a source, one or many steps to transform, enrich or aggregate that data and a final step to output the resulting dataset to a data store.

Still not sure what we do? Hopefully this table helps clarify...

Use Cases	Which team to go to?
I have a notebook that produces some cool output that I think could be useful for contributors	Data Platform & Structured Data Team
I want to access MediaWiki data and produce some reports and analysis	Metrics Platform
I created a machine learning model and want somewhere to run it	Machine Learning Team
I want to consume data events, trigger a micro service and store the data so an API can serve dataset consumers	Data Platform
I built a process that the community is using but I can no longer support it	If you want to migrate it yourself > Data Platform If you can no longer support it at all or migrate it then ???
I have a data pipeline process but I am not sure of the benefit of the output	Data Platform & Structured Data Team

Work Intake Process

For significant projects, you can follow Platform Engineering teams how to work with us process.

For work related to bugs, features or support on Data Platform areas of responsibility, you can contact Data Product Manager Luke Bowmaker directly or create a task on our Phabricator board, tagged with #generated_data_platform assigned to lbowmaker(Luke Bowmaker). The team meets throughout the week and triages on a rolling basis.

Data Platform Value Stream Demo

On a regular cadence, the Data Platform Value Stream team will post demos of our developments/works in progress here to provide transparency and gather feedback.

Demo Sessions

Note: You'll need to be signed in with your WMF account to view these videos.

Date	Links	Presenters
2021/09/08	First Demo	Data Team

Onboarded Projects

Image Suggestions