Core Platform Team/Initiatives/Unify Parsers-Phase 2
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
Unify Parsers-Phase 2
|
Initiative Description
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Summary
MediaWiki currently has two wikitext parsers: the (legacy) parser and Parsoid supporting different use cases. This project aims to arrive at a single parser that supports all use cases.
- Significance and Motivation
Parsoid was developed to support HTML-editing clients but is also used by some read view use cases but not all of them. It is not tenable to have two parsers in the long term since it hamstrings development and upgrades to the parsing codebase, wikitext, and templates since we would have to add that support to both codebases. More importantly, the parsing pipelines in the two parsers are different which makes replicating functionality in both parsers more complex.
We would like to consolidate behind Parsoid as the new default parser given its support for HTML clients, annotated HTML output, and more structured internal pipeline. This requires identifying all output and feature incompatibilities between Parsoid and the legacy parser and bridging those gaps. This may also require updating (a) bots (b) gadgets (c) extensions (d) wikitext. This project aims to minimize all such changes by handling any differences with appropriate tooling and support.
Once Parsoid is deployed as the default and only parser for all wikitext-based use cases, we can embark upon much needed work to enhance wikitext and templates and make them easier to use, more performant, less error-prone, and easier to write tools for.
- Outcomes
- Baseline Metrics
None given
- Target Metrics
None given
- Stakeholders
- Client teams (Web, VE, Flow, CX, Apps)
- Bot, Gadget, and Extension authors (only as pertaining to the Wikimedia cluster initially)
- Editing community
- Core Platform
- Known Dependencies/Blockers
Reduce Extension Interface Surface Area
Epics, User Stories, and Requirements
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Fix known issues in Parsoid relating to using Parsoid HTML for read views
- Complete language variant support
- Address any other issues in Parsoid/Known differences with PHP parser output
- Finish updating legacy PHP parser media output to match Parsoid
- This might require updates to some bots and gadgets
- Identify any other Parsoid feature gaps (This can/will reveal new work)
- Finalize new parser hooks API (Parsoid and legacy PHP parser have different pipelines and internals)
- Migrate over Wikimedia extensions using existing hooks
- Compatibility Testing (this can/will reveal new work)
- Establish regular visual diff QA runs to identify uncaught issues
- Analyze results and file Parsoid bugs or identify any wikitext changes required on wikis
- Decide on what compatibility is acceptable (100% compatibility is not achievable and there might be insignificant output differences)
- Connect with CL and engage with community if we require any wikitext / templates to be fixed (This can/will reveal new work)
- Production Readiness
- Improve Parsoid performance (undefined until phase 1 is complete)
- Switch over all read views to Parsoid on the Wikimedia cluster
Time and Resource Estimates
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Estimated Start Date
Late FY1920 Q1
- Actual Start Date
None given
- Estimated Completion Date
None given
- Actual Completion Date
None given
- Resource Estimates
18-24 months
3.5 FTE and .5 Engineering and Project Manager for the duration
Possible augmenting of other engineers, but more clarity is needed.
- Collaborators
- Parsing Team
- Core Platform
- Performance
- SRE
Open Questions
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- To what extent do we want to refactor the Parsing Interface in Core? It is currently coupled with the legacy wikitext parser and the templating implementation.
- What is acceptable output disparity between Parsoid and the PHP parser? How do we decide this? What qualitative analysis should be used?
- What are our strategies for engaging with the community on any changes this might require them to do?
- What additional work is required on the Linter extension to better support editors with any required wikitext and template changes?
Documentation Links
This page is obsolete. It is being retained for archival purposes. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date. The Core Platform Team and its initiatives do not exist anymore. See MediaWiki Engineering Group instead since 2023. |
- Phabricator
https://phabricator.wikimedia.org/tag/parsoid-read-views/
- Plans/RFCs
- Other Documents
- Parsoid/Known differences with PHP parser output
- Parsing/Parser Hooks Stats
- Parsing/Media structure
- Parsoid/LanguageConverter
Subpages
Blocked, waiting for phase 1 to be complete.
Some work is less defined until several tasks are complete which are expected to define the rest of the project.