Jump to content

Talk:VisualEditor on mobile/VE mobile default/Flow

About this board

How should we measure which editing interface is "better" for newer contributors?

2
PPelberg (WMF) (talkcontribs)

In running this A/B test, we are trying to figure out which editing interface is a “better” default for newer contributors.

To answer this question, we first need to define what “better” means so we can compare the two test groups and, ultimately, decide whether to explore making the mobile visual editor the default mobile editing interface for more contributors on more wikis.

How do you think we should measure which editing interface is "better" for newer contributors?

Alsee (talkcontribs)

The true goal is the health and productivity of our projects. In more simple and measurable terms, the goal is to maximize contributions. The number of edits or size of edits are poor measures, there may be different work patterns between the two editors. So as a practical matter, the metric we want is user retention and sustained contributions. Sustained contributions are particularly significant, as an edit by a knowledgeable experienced user is far more reliable and valuable than edits by newbies. We want to look at new user retention over as long a time span as practicable. If you want to expand on that, you could count the number of days the user has been active. That would avoid any small-scale differences in editing style between the two editing environments.

Metrics such as edit-completion-rate may be more convenient to measure, and may be useful for catching certain glaring issues, however past studies on VE have demonstrated that there are complexities with defining and interpreting that metric. If edit completion rates were to point in the opposite direction as user-retention&contributions, then obviously we disregard the irrelevant completion rate. For example it's a not-uncommon part of the wikitext workflow to open additional throwaway edit-sessions just to view or copy wikitext from a page. Closing that session without saving does not indicate any sort of failure. The original VE research project explicitly excluded any session where the user closed the editor without any content change. Failing to account for that issue will result in invalid low figures for wikitext success rate.

Another metric you want to look at is whether there is any preferential direction in users switching away from one editor and into the other. Assuming retention and contributions are roughly equal between the editors, obviously we should not be forcing new users to switch away from a bad initial default.

Shifting to a related subject, I'd like to note that the Foundation has spent years trying to get positive results for VE. I'd also like note that some of the documentation here continues to indicate a significant bias seeking a specific outcome on this research. previously attempted to roll out a Visual-Editor-default as part of the SingleEditTab project. I'd also like note your graph on Per-interface retention rates shows that mobile&desktop retention rates for VE are roughly half of the rate for wikitext. When users are defaulted into VE, they generally either quit or switch to wikitext. I'd also like note that your table for Use of the visual and wikitext editing interfaces shows that when wikitext is the default, nearly one hundred percent of editors stick with it, and when VE is the default the overwhelming majority of editors flee VE and switch to wikitext. For the last several years the Foundation has been battling the community trying to push VE, and the community has been continually fighting back and insisting that wikitext is the better tool for the job. When the Foundation attempted an unannounced rollout of a VE-default, there was a unanimous Polish demand that it be rolled back. Two other wikis (including EnWiki) went so far as to write hacks to the sitewide javascript to reverse the VE-default. A constant flow of new users into the community is extremely important to us. We are strongly motivated to protect their on-ramp to success.

And finally, I'd like to note that the test scenarios for positive test results lays out "a proposal to make the VisualEditor the default mobile editing interface on all wikis", but if the test results are negative it instead directs analysis to figure out why you didn't get the desired results. Can we please get that changed? If the research finds that a VE-default is actively harmful to new users, that obviously warrants an equal-and-opposite proposal to make the Wikitext the default mobile editing interface on all wikis.

Reply to "How should we measure which editing interface is "better" for newer contributors?"
PPelberg (WMF) (talkcontribs)

In our 18 September 2019 update, we shared the next steps we plan to take, depending on the results of the A/B test. Do those actions seem correct to you? Is there something we haven't considered? Please let us know what you think...

197.235.242.46 (talkcontribs)

There are plenty of nuisances (from a research perspective) with the study:

  • Participants can change editors on a whim - regardless of whether a user gets Visualeditor or the source editor, they still have a choice to jump from one to the other. This means that there will be bias towards the wikitext editor, primarily because it is the default fallback experience, and because visualeditor is an incomplete editor. It can't handle stuff like "Undo", it can't handle edit conflicts, it is not used in all possible text areas, or some namespaces.
  • Fallback to source editor - if for whatever reason the visualeditor can't load. There is no alternative besides falling back to the source editor. So for huge pages or browser errors, you'll be dumping editors to the wikitext editor regardless of their choice or the test settings.
  • Bias towards wikitext editor - the source editor has a lot of inertia, it has been used for more than a decade, and even if that wasn't the case, all edits made by it are logged in recent changes. So they may be more carefully reviewed, and some may be more quick to revert.

It might be prudent to look into similar previous studies. Some of the issues of those studies is that it doesn't take into account different nature of edits. Creation of a page is different from a minor edit, the difficulties and burden is different.

Some suggestions:

  • Consider edit type, create page vs edit page (it is way harder to create a new page especially on mobile devices).
  • Consider the number of LintErrors (wikitext vs visualeditor) introduced by editor, while many lint errors do not really affect a page, some can break some rendering, e.g.(Special:LintErrors/wikilink-in-extlink), misnested tags ([1],[2] ,[3], [3b]). Scroll the page and you'll note several full paragraphs with strange styling (e.g. bbbb , bbb , italic). Note how the error has existed since the page was first created ([4]). Of course, there were recent parser changes and it didn't always render like that, but it doesn't change the fact that an average shouldn't care if <b> html tag is closed or not.
  • How many edits were needed to correct such errors.
  • Consider the ORES rating for the edit, vandals or random kids or bored people are more likely to get reverted regardless of whether the edit was done successfully

There are many more things that should be considered. For instance, in my opinion any large scale study of Visualeditor should temporarily obscure all visualeditor change tags([5]). Otherwise you're just setting a target on any of those people which could make them quit.

Also, it is worth considering that an incomplete or unsaved edit isn't necessarily bad. Many people just open the editor out of curiosity, or because they inadvertently click a redlink, or due to some random click or just to preview content.

Whatamidoing (WMF) (talkcontribs)

I think that the chance of "opening the editor out of curiosity" should be equally likely no matter which editing environment is displayed.

PPelberg (WMF) (talkcontribs)

Thank you for the time and thought you put into drafting your comments, 197.235.242.46.

I'm going to list the points you shard and then do my best to reply to each one, with a few follow up questions included within them…


1. Test participants can switch editors and those presented with VisualEditor by default will be more likely to switch to editing using the wikitext editor.

You're right, contributors can switch editing interfaces if they choose and this is behavior we will be analyzing. Although, how do you see this impacting the test results considering the test is targeted to include contributors with little to no experience editing Wikipedia and thus, we assume, are not likely to be familiar with the wikitext editor?


2. Fallback to source editor if the VisualEditor can't load

Have you experienced seeing the wikitext editor on mobile in instances where the VisualEditor took longer than a certain amount of time to load? This is not something we expect to happen so if it is, then something might be wrong and we'd value getting to the bottom of this!

More generally, you make a good point about load times. More specifically, how they can impact a contributor's likelihood to complete an edit. In fact, one of the things the team is curious to understand from the test results is this relationship between load times and edit success. More info here: T232175#5545364


Previous studies and the importance of considering edit type in analyses

First, I hadn't thought to use search like this to filter for "research" about a particular topic...this is wonderful!

Second, can you say a bit more to how you think edit type ought to be considered/evaluated in a test like this? You mention differentiating between edits to existing articles and edits associated with creating articles...are you thinking we should limit our comparison of the edit completion rates of the mobile wikitext and mobile visualeditors to certain types of edits? And if so, why?


LintErrors

In bringing up LintErrors, what are you suggesting in the context of the A/B test? Are you thinking if we were to compare mobile wikitext and mobile VE by the number of edits completed in each, it would be important to consider how many edits completed in mobile VE were edits to correct these errors?


Using ORES to score edit quality

It sounds like you're suggesting edit quality is something we should use to evaluate whether the mobile wikitext or the mobile visualeditor provides contributors with a "better" editing experience...am I understanding you correctly there?

If so, edit quality is something we will be evaluating a part of our analysis. See: A/B test.

Right now, our current measure of quality is whether an edit was reverted or not. We chose this approach as a measure of edit quality over ORES because ORES models are not yet deployed on all the wikis included in the test. You can see the ORES deployments here: https://tools.wmflabs.org/ores-support-checklist/


Obscure all VisualEditor change tags…

Interesting thought...are you suggesting that some contributors might have a negative bias to edits made in VE which could those contributors more likely to revert edits made in VE, in turn creating a bad experience for the contributors who made these edits in VE and ultimately driving them away?


Many people just open the editor out of curiosity, or because they inadvertently click a redlink, or due to some random click or just to preview content.

This is a great point and something the team is trying to figure out. More specifically, we're trying  to understand how we might be able to measure intent.

Said another way: how can we detect whether a contributor is tapping the edit button with the intention to edit or if they're just – as you said – curious?

An idea we've thought of: Do contributors make any changes to the article before abandoning? This is something we'll soon be able to measure. See: T229079

If you have any other ideas of how we might be able to detect edit intent, we'd be keen to hear!

Thanks again for all your thought,

Reply to "Test scenarios"

Do you think your wiki should be included in the A/B test?

1
PPelberg (WMF) (talkcontribs)

We are in the process of determining what wikis to include in the A/B test. If you'd like to see a particular wiki included, please let us know!

Reply to "Do you think your wiki should be included in the A/B test?"
There are no older topics