Hi, I see you're listed as Parasoid lead.
Recently I've submitted quite a few bug reports, and some WMF staff have expressed puzzlement at why I have such a miserable Flow experience and why I run into a steady stream of problems, up to and including Unable to parse content due to a Parsoid failure. I think I've figured out a big issue. I was looking at Parsoid/Round-trip_testing. It looks like the WMF attempted to collect a large body of representative test data. parsoid-tests currently claims:
- 100% parsed without errors,
- 99.84% round-tripped without semantic differences, and
- 74.56% round-tripped with no character differences at all.
But if your body of test data isn't actually representative of real world use, you're going to miss a lot of issues that rarely or never appear in the data you did test against.
The majority of edits are not saves.... the majority of edits are actually done within an edit-preview-cycle, possibly ending with a single save. I can tell you, you're going to find a *lot* of stuff that shows up within the edit-preview-cycle that rarely shows up in a final save. It seems you also didn't test against edits outside of article space, so there's another big chunk of stuff you're missing. (In Flow, merely clicking between the two editing modes generates a round-trip which routinely mangles what I had... and some of those no-semantic-difference changes are gross).
Is there any chance you can collect a large body of preview-diffs to do a round-trip test against? Preferably one test-set from article space, and one test-set sampled from all non-article space?