Parsoid/コア パーサーの出力との既知の違い
このページは、Parsoid と PHP パーサー間の既知の HTML 出力の違いと、それを解決するために提案されている解決策を追跡しています。 利用者向けのバージョンは Parsoid/パーサーの統一/既知の問題点 を参照してください。
実装の違いや機能のギャップが原因で生じた違い
違い | 説明 | Proposed resolution | 状態 |
---|---|---|---|
Parsoid generates <figure> tags for block images whereas PHP parser uses <div>
|
This is once again a HTML4 / HTML5 fallout. Parsoid uses semantic markup available in HTML5 that wasn't available in HTML4 at the time PHP parser was written.
Once this code is ready to be merged and deployed (and before we deploy this), we'll work with bot and gadget authors to use the new markup that will be generated. |
T118517 is the RFC for updating PHP parser output. See Parsing/Media structure | 完了 |
Parsoid doesn't handle language variants yet | Parsoid doesn't yet parse language variant markup and doesn't provide a variant-specific rendering for reading clients. | Language variant support in Parsoid has landed and has been deployed. The TODO at this point is finishing up support for all languages | 進行中 |
Edge case differences between Parsoid's native implementation of some extensions compared to PHP implementations of the same | For any extensions that process wikitext (ex: Cite, Gallery), Parsoid needs a native implementation of the same in Parsoid. However, because of implementation differences, there are edge cases where the output differs (ex: T51538, T96555, and a few others related to gallery). | Some of these (T104662, T96555) will be fixed in Parsoid. Others might be tweaked in the PHP implementation, or we might just treat the edge case differences as undefined behavior which shouldn't be relied on by editors. Since these are edge cases, they will be fairly uncommon usage in wikis (otherwise, we would have fixed them). | 進行中 |
Unavailability of some parser hooks in Parsoid compared to PHP parser | Parsoid and PHP parser have different internals and hence not all the PHP parser's tag hooks are available in Parsoid. This page with parser hook stats lists extensions and the parser hooks they use. Some hooks like ParserBeforeStrip, ParserAfterStrip have no equivalent in Parsoid. So, in a Parsoid-only world, this could affect output and functioning of extensions like <translate>
|
We are going to develop a parser hooks API that is implementation independent (without exposing the internal details of how parsing happens) and port all the Wikimedia extensions to use this new API.
Parsoid is developing an extension API to support existing Parsoid-native extensions cleanly (Cite, Gallery, Poem, etc). We plan to extend the API gradually based on experience with adapting more extensions to work with Parsoid. In parallel, we will continue to deprecate unnecessary hooks and possibly rename some to reflect desired semantics. This task is likely going to be completed after Parsoid moves to core. |
進行中 |
Parsoid doesn't handle pages in some namespaces properly (ex: File, Category) | Parsoid doesn't have special handling for pages in namespaces that has generated content. For example, the content for a page in a Category namespace is generated dynamically. Content for a page in a File namespace similarly has some generated content. There is a good argument to be made that Parsoid shouldn't be duplicating this support and that clients should fetch this from the MediaWiki API directly. However, this does leave Parsoid clients in a bit of a bind because they don't know which of these namespaces are special in that content for those pages is better fetched from the MediaWiki API directly. So, some good resolution of this problem would be helpful. Maybe Parsoid should handle requests for content in all namespaces, and where that content is better served from the MediaWiki API, redirect the client to the right url? | With Parsoid's integration into core and the ParserOutputAccess and RevisionRenderer and ContentModelHandler classes, Parsoid is only involved where wikitext processing is needed. Elsewhere, core code handles other functionality. | 完了 |
Parsoid doesn't generate metadata needed for updating the links and page_props tables. | See T310512 | We'll have to add that at some point before Parsoid can replace the existing Parser class. | To do |
Differences identified via visual diff testing
We run mass visual diff tests comparing rendering of Parsoid output and PHP parser output. This table will be filled out as we inspect the visual diffs and identify the underlying cause for those diffs. 上記の違いに加えて、いくつかの具体的な違いも発見されました。
違い | 説明 | Bug / proposed resolution | 状態 |
---|---|---|---|
Long tail of bugs related to read views | Fix bug filed under the Read Views column on the Parsoid Phabricator workboard | Fix all the bugs! | 進行中 |
Missing resource modules in Parsoid output | http://sv.wikipedia.org/wiki/Mir has a bunch of modules (ext.gadget.*) which the Parsoid output is missing | T161278 | 進行中 |
CSS differences in Cite | Cite output needs styling (T156351 and T156350). This should also cover the styling requirements for cite ref links - some wikis like eswiki and frwiki skip the brackets. In addition, knwiki (Kannada) uses Kannada numerals for the ref text. | The necessary styles for these various wikis are being added to visual diffing code. Most of these styles for wikis are good to be added to commons.css on these specific wikis.
However, as part of this, we've also identified some limitations in the Cite CSS output. We'll have to figure out how to resolve that. |
進行中
This is mostly done now for desktop views -- see dedicated wiki page here.
|
Broken / missing support for some extensions | Pages extension output for wikisource pages is missing some wrapping divs (with associated styles). (Example)
Pages on viwiki are missing mapframe / osm maps (Example) |
To be investigated | To do |
関連項目
- https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/513692/2/src/Config/Env.php#721 has some commentary from Brad about handling page metadata that Parsoid would need to emit when it is slated to replace the current parser.
- Parsoid/limitations
- The known differences column on the Parsoid phab board
- Parsoid/パーサーの統一
- m:Special:MyLanguage/Migration to the new preprocessor — a similar page listing the rendering differences introduced by Tim Starling's 2008 rewrite of the wikicode parser preprocessor, released in MediaWiki 1.12