Talk:Wikimedia Apps/iOS Suggested edits project

Feedback about alt-text experiment

Latest comment: 9 months ago8 comments3 people in discussion

Hi everyone, sorry for the confusion the project page for this effort is actually here. I also appreciate that a multi-place ping is less than ideal but I realized that this discussion is happening in 3 different places and wanted to bring it to the actual project page where this effort is being worked on, especially because it is a bit more language neutral.

By the way my name is Jaz, I am the Lead Product Manager for the apps.

As I mentioned on EN Wikipedia,@Graham87 @TheDJ and @PerfektesChaos, I fully agree that it is important to be intentional that in our effort to address the 95% of images that do not have alt-text on Wikipedia, but that in our quest in creating a more equitable experience for low vision folks and folks with low bandwidth where images do not load, we do not replicate the issues we see with 3% of images that do have alt-text. Our goal is to increase the 2% of helpful alt-text. As you’ll notice in our consultation strategy, we've brought a prototype to the GLAM conference and gathered our first round of feedback from volunteers in attendance regarding a prototype ; I welcome you to try it out. The insights were positive but there is a lot more careful testing we need to do before moving forward, which is what we are working on at the moment. All of our work is being done in lockstep with an accessibility organization based in Latin America, our initial target audience are folks in Latin America and the Caribbean speaking French, Spanish and Portuguese.

Starting out the feature will only be available for people with at least 50 unreverted edits, a measure we introduced for all app Suggested Edits including Article Descriptions in 2022 based on community feedback. We partnered with several community members to evaluate if there was an increase in the quality of suggested edits and did see a lift by adding that gate, although I am always happy to partner on improvements. The iOS Suggested Edits project page will receive an update next week with additional details about the next phase of our experiment now that we've gotten feedback about the prototype. That update will also include an updated decision matrix for the next phase of this experiment. I will ensure Amal tags you all when the update is there so that you can review our decision matrix and contribute to other guardrails we can put in place to ensure this experiment is meaningful in deciding if we should proceed.

The questions that @Brooke Vibber (WMF) and @ARamadan-WMF are asking about the Linter is to investigate if using the linter is a promising use case for providing a feed of contextualized alt-text in the case we have positive indicators. This investigation is not to convey that we are definitively rolling this out. I also want to assure you that this is a true experiment, it will be turned off after some people try the tool. We will evaluate the usefulness of what is produced in partnership with an accessibility organization before we proceed and will not leave the tool on while decisions are made. If you'd like to be apart of the evaluation process, I also welcome your partnership when we get to that stage! If you have advice for Brooke about use of the Linter for the feed in particular feel free to share it in the original discussion places or on Phab and I'm happy to be your point person to discuss the effort overall general.

Ps. for folks that may join the conversation later, it started here on English Wikipedia, here on German Wikipedia, and innT344378 on Phabricator. JTanner (WMF) (talk) 01:16, 8 March 2024 (UTC)Reply

The fifty unreverted edits stipulation eases my fears a little bit, but not entirely. It's entirely possible to make fifty unreverted edits to badly watched articles (which are most of them) and still have them all be complete nonsense; here's an extreme example courtesy of the frwiki-based long-term abuser the time thief. (I know this is a /64 IP range which isn't directly comparable and not everyone knows how to properly check their edits). I've also encountered people achieving these kinds of editing rates using semi-automated grammar checkers or rapid linking, like this. If this feature must continue and ends up being rolled out more widely, I'd be very concerned about people in areas where the wiki's language isn't the main one spoken in the area having access to it (e.g. people in South Asia, the Philippines, and many former British colonies of Africa on the English Wikipedia), because I've encountered far too many cases of people from these areas overestimating their fluency in English. Adding alt text is difficult to evaluate because most general readers won't see it and the ones who will (screen reader users like me and those with images turned off) won't be able to see the original image to properly evaluate the results. Graham87 (talk) 08:00, 8 March 2024 (UTC)Reply

As I already pointed out at the other places, I am strongly opposing to push anybody who has not been educated to create alt texts.

I am removing some 75 % of alt texts if I do encounter them, since they are contra productive, even when made by experienced article authors.
The task is: If you do not see the picture in advance, just hear or read the text. Then close your eyes and imagine what you received. Now open the eyes, make the image visible, and compare. If it is not rather similar, you made the situation worse.
This does need some poetry, it will require some five or ten minutes for each alt text, and you need the context which aspects of the article text are to be illustrated.
That is a very advanced issue, and it does need excellent guidance as well as intensive training and feedback.
- An operational guidance, a cookbook is to be provided first, before pushing people to write alt= texts.
- In English, some good help is available by associations outside WMF, but I am not aware of any wiki page which is breeding alt= text editors.
- In German, the external sites did not establish editor guidance yet. Some first steps were made, but not sufficient. German Wikipedia is attempting to develop an operational manual, currently on user page level.
If you make unselected people to create an image description, the first error is that they just copy the legend = caption. This is entirely nonsense since blind people will hear the same story twice.
If blind people experience that it is waste of time to receive image descriptions, they will disable them entirely for wikis, or never request them any more.
Bad image descriptions make the situation worse, but are nice for statistics. You might measure that 3 % of images had alt texts initially, and you raised that quota to 5 %, but you need to measure whether these are helpful stories. Most alt texts do exist but are pointless.

Validation tools will tell you that a certain page should be equipped with alt texts.

This is a nice recommendation for a website with 150 images. They may do so.
It is not feasible for wikiverse heading for 100 millions of pictures, transcluded in several 100 languages, in billions and billions of transclusion contexts.

A major part of images is presentational, and the best thing is not to mention that they exist.

Please see this page. There are four images, but screenreaders will not tell that they are present since all of them are eye-catchers and decoration. They are enforcing the textual content. Please do not tell that there is a red circle with diagonal white cross. Just skip it. They have no alt text, and your statistics will count them as undescribed, and your tool will ask people to add an image description.
Many images are duplicating such text. If you have a list of winners of olympic games, the text “USA” will be decorated with Stars And Stripes. Do not tell any alt text, of 13 horizontal white and red stripes, and a blue rectangle in the upper left corner, and 50 white stars.
Do not try to describe this or that picture. Just forget about it; the caption may be “structure”.

We made the experience that it takes several loops of attempts and feedback and improvement, before editors are able to write suitable alt texts, or suppressing undesired images.

Wikis are promising that everybody can contribute and write Wikipedia articles, but for some issues you do need additional skills.

Those who know about the alt= text traps will deal with them anyway, if they are editing articles in detail, but nobody else must be nudged to invent anything.

Greetings --PerfektesChaos (talk) 11:14, 8 March 2024 (UTC)Reply

Hi @PerfektesChaos , thanks for such a thoughtful reply. It is helpful to hear someone's perspective from German Wikipedia. As mentioned we are starting out with Spanish and Portuguese Wikipedias in partnership with affiliates. Additionally, we are working in partnership with accessibility experts in Latin America focusing on the use cases that have been surfaced there. The flow we are planning to use for those audiences for the sake of the experiment is if a logged-in experienced user adds an image to an article without adding alt-text, to provide guidance and onboarding so that they can add alt-text. We are first trying to gain an understanding of if the guidance provided would be useful enabling already interested editors to add alt text when they add an image or make a productive edit to an article, and if it would be of quality in a controlled environment with the guidance provided. As I mentioned, Brooke and Amal are just seeking out options to explore alt-text as a suggested edit in a feed should the experiment results end up positive. If we see that the guidance is useful and we lower the barrier, particularly for folks on mobile devices while increasing quality, at that point we would explore if it makes sense to have a feed (suggested edit) and if so, who should have access.

Its great to hear that you all are working on an operational manual for German Wikipedia, good luck as it evolves! It is helpful to hear from your perspective there does not appear to be good guidance for German Wikipedia. Should our experiment prove successful and scaling is an option, we'd be sure to be mindful of that and share our learnings to see if such a tool would make sense for German Wikipedia if the alt-text guidance was adopted or if it isn't a good fit. JTanner (WMF) (talk) 22:04, 8 March 2024 (UTC)Reply

Hi @Graham87! Thanks again for this reply. Yes it wouldn't be a good idea for time thieves to have access to suggested edits. Suggested Edits are only available to logged-in users with over 50 edits, so no IP users currently do or would have access to suggested edits in the apps. While it may not align with a screen reader to add alt-text to articles using the tool, if you'd like to review the output of the experiment and determine if the alt-text is effective, I am happy to have the alt-text produced during the experiment translated so that you can share an opinion on the quality alongside the accessibility experts. Our goal is to lower the barrier for folks like Angie from Wikipedia Argentina that have edit-a-thons to add alt-text but our existing tools make it very tough and arduous, but not so easy for folks that will not go the extra mile to learn the proper way of adding alt-text. By the way here is a video of the presentation that was delivered during the GLAM conference where you can hear more from Angie and her use case and some of the goals of the project. JTanner (WMF) (talk) 14:42, 8 March 2024 (UTC)Reply

The time thief did use accounts too, but I take your point. I couldn't really follow Angie's presentation because the recording was very quiet and there was a lot of background noise, but I think I got the gist ... surely there's a more lightweight way of doing this that would benefit people like Angie without potentially inconveniencing article watchers/patrollers? A toolforge tool/separate app (that would need the API, I know), for instance. The way I see [sic] it, finding an image without alt text in most cases is as easy as shooting fish in a barrel. If people are interested in/have expertise in a particular topic, want to add alt text to a Wikipedia article about it, and have a good idea of what alt text is,sure, give them a way to do that. But ... I dunno ... using linter and and suggestededits seems way too intrusive. Also, re the person in the video who said "Photograph" was inadequate alt text, on the English Wikipedia, the alt text complements the caption so its encouraged when the caption is adequate alt text. Having said all that, I think I'll trust the accessibility experts to evaluate what is and isn't good alt text; I can't see the images so that puts me at a disadvantage. Graham87 (talk) 15:28, 8 March 2024 (UTC)Reply

@Graham87 I understand this perspective, and agree the audio on the video is less than ideal!

Thanks for taking the time to express your thoughts. It sounds like we are aligned in wanting to support intentional folks in adding alt-text that is of quality, and not expose the feature to anyone that will burden patrollers or make life harder for anyone using a screen reader or in a low bandwidth environment. I will continue to iterate on balancing these two important points as the experiment is scoped, and assessed and keep you updated.

As far as technical complexity, Seddon (WMF) can better speak to this. JTanner (WMF) (talk) 22:37, 8 March 2024 (UTC)Reply

The best operational guidance in English I could dig out currently is by whatwg.org – you cannot summarize this in two lines on an interactive form.

And it is not written for a Wikitext audience.

However, on lyrics there is more to do by which textual structure a similar image will appear in your mind. Some creative writing is needed to draw a picture within the head of the listener.

Best --PerfektesChaos (talk) 08:52, 10 March 2024 (UTC)Reply

Please provide data set supporting the initial claims

Latest comment: 9 months ago7 comments5 people in discussion

I am surprised to see the claim that "50% of images on Wikipedia have captions" and that 10% have alt text. My anecdotal experience is that the percentage of (non-icon) images with captions is much higher. Can you please provide the supporting data behind that claim? Are you looking only at article space? Do the percentages vary by the Wikipedia's language? Are you counting images inside templates? Are you counting images with deliberately blank alt text? Jonesey95 (talk) 01:49, 8 March 2024 (UTC)Reply

And I'd say the percentage of images with alt text is much lower and almost every single article on the English Wikipedia will have at least one image without alt text. It's only specifically added by a very small number of users, especially those writing pages classified as featured/good articles/lists, on the English Wikipedia. Graham87 (talk) 08:05, 8 March 2024 (UTC)Reply

Hi @Jonesey95 @Graham87, I am happy to share the the origin of these percentages. The work comes from professional researchers, found in this research paper. You can also read the more digestible diff post. As mentioned above, our initial focus is Spanish, Portuguese and French Wikipedia. The article does however mention that on English Wikipedia only 46% of images have captions and 10% have alt-text, with 3% have. alt--text that is appropriate. If you have a data set as well, I am happy to pass it along to the researchers, although their information is available in the posts if you'd prefer direct outreach. JTanner (WMF) (talk) 14:26, 8 March 2024 (UTC)Reply

I looked through the research paper. It has methodology for excluding certain types of images ("Using the CSS class in the HTML code, we exclude all images that appear as icons (for example, portals or Wikiprojects)"), which is good, but it does not appear to cite a percentage of images with or without captions.

The diff post claims that "only 46% of images in English Wikipedia come with a caption text" but does not have methodology. That post links to phab T276849, which says that they looked for captions using this criterion: "If we exclude all gif, tiff and png images, English Wikipedia has 7,811,234 images. Among those, 3,645,913 have a caption: 46.6%." That is different from using CSS classes. Why would a gif or tiff not need a caption? I'm assuming the search was limited to article space, but I don't know for sure.

Frankly, without seeing the specific criteria, I can't trust these numbers. Did they include the .svg lock icons that show the protection status of articles, or were those excluded because of their class? Those icons don't have captions and should not, but may have passed through their criteria. What about an infobox caption such as that at John Dalton, where the caption is rendered in a div that is separate from the File: call?

Given all of these questions, how sure can we be about the fundamental data on which this project is founded? If you really want data, I recommend a post at the English Village Pump (technical) page requesting help with developing criteria that would get you good data about images. Jonesey95 (talk) 16:59, 8 March 2024 (UTC)Reply

Hi @Jonesey95, thank you for sharing these questions. I passed them along to the research team. Most of them live in Europe, so I may not hear back until next week. Additionally, we are starting with an experiment in partnership with Spanish and Portuguese Wikipedia affliates. Once we learn from that experiment and share the outcomes of it, we can revisit if the tool makes sense for English Wikipedia or not. Happy to stay connected and share what we are learning along the way. JTanner (WMF) (talk) 21:32, 8 March 2024 (UTC)Reply

Hello, we can confirm the fraction reported for the alt-text. More recent work matches the findings with a new independent analysis. You can find the dataset here: https://github.com/elisakreiss/concadia

We are looking into the captions and post an update once we have more information.Tizianopiccardi (talk) 21:46, 11 March 2024 (UTC)Reply

Hello!

I know the task is focused on alt-text, but I wanted to add some pointers here about missing captions data.

First, our research mostly focused on the notion alt-text, how to extract its statistics (Tiziano can write more about his method), and how to bridge this large gap. We haven't performed an in-depth analysis of the state captions yet, as our work mostly focused on solutions for automated image description, or on the broad understanding of the role of images on Wikipedia.
Looking again at the Phabricator task, I agree with you that the initial ~50% estimate might include images that do not need a caption, and exclude images that need one. That method was our best attempt to exclude all potential "icons" with the knowledge that we had at the time. We used images in the article space only. Some examples of images without captions:

revision_id='1003691028', image_name='Diplomatic_relations_of_South_Korea.png'

revision_id='1002493008',image_name='Two_girls_examining_a_bulletin_board_posted_on_a_fence._An_advertisement_painted_above_them_asks_%22Are_You_a_Woman%3F%22.jpg'

revision_id='1002493008',image_name='Something_you_wouldn%27t_see_in_America_(cropped).jpg'

revision_id='1003137489', image_name='MiliziaSanMarino.JPG'

revision_id='1002566756', image_name='Sg-map.png'

revision_id='1000778202', image_name='Ceylan-map.png'

revision_id='994548000', image_name='P200084A.jpg'

revision_id='994548000', image_name='P200084B.jpg'

revision_id='994548000', image_name='P200084C.jpg'

revision_id='994548000', image_name='P200084D.jpg'

So I did a quick analysis from a sample of the data in the paper, you can find the results here. In a nutshell, for 33% of non-icon images the caption is empty, and for 50% the caption is 3 words or less.
- One caveat is that this data is from 2021 and extracted via wiki text parser. So this analysis should be repeated with fresh data and with an html parser that can better detect links and all forms of captions.
- Additionally, this data includes many infobox and gallery images, and we should probably agree on whether they should be included as images "needing captions"
- Finally, this data is for English Wikipedia only. We might find that these numbers vary across languages, but we haven't done this analysis yet.

So given that this feature is about alt text, and that we have reliable statistics on it, I suggest we focus on those and revisit the caption numbers if/when needed. Miriam (WMF) (talk) 15:18, 13 March 2024 (UTC)Reply