Jump to content

Wikimedia Research/Showcase/Archive/2022/06

From mediawiki.org

June 2022

[edit]
Time
(4:00am PDT / 7:00am EDT/ 13:00pm CEST)
Theme
Wikipedia's languages.

June 15, 2022 Video: YouTube

Quantifying knowledge synchronisation in the 21st century
By Jisung Yoon (Pohang University of Science and Technology)
Humans acquire and accumulate knowledge through language usage and eagerly exchange their knowledge for advancement. Although geographical barriers had previously limited communication, the emergence of information technology has opened new avenues for knowledge exchange. However, it is unclear which communication pathway is dominant in the 21st century. Here, we explore the dominant path of knowledge diffusion in the 21st century using Wikipedia, the largest communal dataset. We evaluate the similarity of shared knowledge between population groups, distinguished based on their language usage. When population groups are more engaged with each other, their knowledge structure is more similar, where engagement is indicated by socio-economic connections, such as cultural, linguistic, and historical features. Moreover, geographical proximity is no longer a critical requirement for knowledge dissemination. Furthermore, we integrate our data into a mechanistic model to better understand the underlying mechanism and suggest that the knowledge "Silk Road" of the 21st century is based online.

Relevant links: paper (preprint), slides

The Language Geography of Wikipedia
By Martin Dittus
Every language is a system of being, doing, knowing, and imagining. With over 7,000 active languages in the world, how many languages are fully represented online? To answer this question, digital non-profit Whose Knowledge? initiated the first ever report on the State of the Internet's Languages. As part of this report, Martin Dittus and Mark Graham have investigated the languages of Wikipedia. Wikipedia began with a single English-language edition more than two decades ago, and now offers more than 300 language editions, which places it at the forefront of digital language support. However, this does not mean that speakers of these languages get access to the same content: Wikipedia’s language editions vary widely in scale. We further find that this inequality is also reflected in Wikipedia’s geographic coverage: not all places are captured in every language. Wikipedia's coverage often follows the global distribution of speakers of the respective language. Yet even when we account for the distribution of language populations, certain language communities are much more strongly represented on Wikipedia than others. As a consequence, we find that for many countries in Africa, Central and South America, and South Asia, most of the content about those countries is in a foreign language, often a European-colonial language. In other words, in many of these places, people may need to be able to speak a second (possibly foreign) language in order to access Wikipedia information about their own places. Why do we see these differences? And what can be done to improve things?

Relevant links: The Language Geography of Wikipedia, State of the Internet's Languages Report, Slides