Wikimedia Research/Showcase/Archive/2020/12

December 2020

Theme: Disinformation and reliability of sources in Wikipedia

December 16, 2020 Video: YouTube

Quality assessment of Wikipedia and its sources

By Włodzimierz Lewoniewski (Poznań University of Economics and Business, Poland)

Information in Wikipedia can be edited in over 300 languages independently. Therefore often the same subject in Wikipedia can be described differently depending on language edition. In order to compare information between them one usually needs to understand each of considered languages. We work on solutions that can help to automate this process. They leverage machine learning and artificial intelligence algorithms. The crucial component, however, is assessment of article quality therefore we need to know how to define and extract different quality measures. This presentation briefly introduces some of the recent activities of Department of Information Systems at Poznań University of Economics and Business related to quality assessment of multilingual content in Wikipedia. In particular, we demonstrate some of the approaches for the reliability assessment of sources in Wikipedia articles. Such solutions can help to enrich various language editions of Wikipedia and other knowledge bases with information of better quality.

Modeling Popularity and Reliability of Sources in Multilingual Wikipedia, https://doi.org/10.3390/info11050263
Multilingual Ranking of Wikipedia Articles with Quality and Popularity Assessment in Different Topics , https://doi.org/10.3390/computers8030060
Measures for Quality Assessment of Articles and Infoboxes in Multilingual Wikipedia, https://doi.org/10.1007/978-3-030-04849-5_53
slides on figshare

Challenges on fighting Disinformation in Wikipedia: Who has the (ground-)truth?

By Diego Saez-Trumper (Research, Wikimedia Foundation)

Different from the major social media websites where the fight against disinformation mainly refers to preventing users to massively replicate fake content, fighting disinformation in Wikipedia requires tools that allows editors to apply the content policies of: verifiability, non-original research, and neutral point of view. Moreover, while other platforms try to apply automatic fact checking techniques to verify content, the ground-truth for such verification is done based on Wikipedia, for obvious reasons we can't follow the same pipeline for fact checking content on Wikipedia. In this talk we will explain the ML approach we are developing to build tools to efficiently support wikipedians to discover suspicious content and how we collaborate with external researchers on this task. We will also describe a group of datasets we are preparing to share with the research community in order to produce state-of-the-art algorithms to improve the verifiability of content on Wikipedia.

Online Disinformation and the Role of Wikipedia, https://arxiv.org/abs/1910.12596