Jump to content

User:TJones (WMF)/Notes/Fallback Languages

From mediawiki.org

October 2016 — See TJones_(WMF)/Notes for other projects. (related to T146358 and T147959)

Background

[edit]

For some languages, if we don't have a language analyzer, we can specify that we should fall back to a different language analyzer, if available. For example, there is no Ukrainian analyzer, so we fall back to the Russian analyzer. Obviously this is far from perfect, but it's better than nothing. It's also good to keep these in mind when making changes to an analyzer that is a default for other languages.

In the Ukrainian/Russian case, this seems to mean that changes made to Russian-language wikis will happen on Ukrainian-language wikis, and changes necessary for Ukrainian-language wikis will have to be made for Russian-language wikis, too, unless we special-case them by language (which is doable, but hacky). That complicates matters.

I've done my best to expand the language and orthography codes (in parens) for ease of reading and searching the page, but if their expansion disagrees with the code, the code is correct.

This list of codes was extracted from the $fallback variables in the source files mediawiki/languages/messages/Messages<Xxx>.php. (Turns out that this fact is documented in in Manual:Language—who knew‽) They were extracted on October 7, 2016—and have been getting stale since then. Nemo kindly points out that there's a very handy up-to-date list at Localisation statistics. (Thanks!)

Update: After looking into this some more, I now realize that these fallbacks—which are defined in Messages<Xxx>.php and used in the jQuery Internationalization library—have an obvious geographical and historical basis, which is not necessarily linguistic at all. I'd picked out three representative examples that made no linguistic sense—Guaraní / Spanish, Wolof / French, and Chechen / Russian—and for each X / Y pair, it does make sense that a speaker of X is more likely to know Y than most other world languages, so it's a reasonable fallback for messages, banners, etc., when no native option is available. However, the list should not have been expanded to matters of language processing. My favorite current example of nonsensical language processing is searching for lorsqu'ele on the Wolof Wikipedia—if you know a little French and a few secrets about the French analysis chain, it's no surprise that it returns matches on element—but I don't think that's the expected result in Wolof (though Wolof speakers are more likely to know some French, and thus may not be entirely surprised). We're going to try to address this issue over time. See T147959.

Graphical Representation

[edit]

This may be even more outdated, but it is useful nonetheless to get a sense of the Big Picture: MediaWiki fallback chains.

Sorted by Original Language Code

[edit]
If this language isn't found… … try to use one of these languages
ab (Abkhazian) ru (Russian)
ace (Acehnese) id (Indonesian)
ady (Adyghe) ady-cyrl (Adyghe-Cyrillc)
aeb (Tunisian Arabic) aeb-arab (Tunisian Arabic-Arabic)
aeb-arab (Tunisian Arabic-Arabic) ar (Arabic)
aln (Gheg Albanian) sq (Albanian)
an (Aragonese) es (Spanish)
anp (Angika) hi (Hindi)
arn (Mapudungun) es (Spanish)
arq (Algerian Arabic) ar (Arabic)
arz (Egyptian Arabic) ar (Arabic)
ast (Asturian) es (Spanish)
av (Avar) ru (Russian)
ay (Aymara) es (Spanish)
azb (Southern Azerbaijani) fa (Persian)
ba (Bashkir) ru (Russian)
ban (Balinese) id (Indonesian)
bar (Bavarian) de (German)
bbc (Batak Toba) bbc-latn (Batak Toba-Latin)
bbc-latn (Batak Toba-Latin) id (Indonesian)
bcc (Southern Balochi) fa (Persian)
be-tarask (Belarusian-Taraškievica) be (Belarusian)
bgn (Western Balochi) fa (Persian)
bh (Bihari) bho (Bhojpuri)
bjn (Banjar) id (Indonesian)
bm (Bambara) fr (French)
bpy (Bishnupriya Manipuri) bn (Bengali)
bqi (Bakhtiari) fa (Persian)
bug (Buginese) id (Indonesian)
bxr (Buryat) ru (Russian)
cbk-zam (Chavacano) es (Spanish)
cdo (Min Dong) nan (Min Nan Chinese), zh-hant (Chinese-Traditional)
ce (Chechen) ru (Russian)
co (Corsican) it (Italian)
crh (Crimean Tatar) crh-latn (Crimean Tatar-Latin)
crh-cyrl (Crimean Tatar-Cyrillc) ru (Russian)
cs (Czech) sk (Slovak)
csb (Kashubian) pl (Polish)
cv (Chuvash) ru (Russian)
de-at (German-at) de (German)
de-ch (German-ch) de (German)
de-formal (German-formal) de (German)
dsb (Lower Sorbian) de (German)
dtp (Kadazan Dusun) ms (Malay)
dty (Dotyali) ne (Nepali)
egl (Emilian) it (Italian)
eml (Emilian-Romagnol) it (Italian)
ext (Extremaduran) es (Spanish)
ff (Fula) fr (French)
fit (Tornedalen Finnish) fi (Finnish)
frc (Cajun French) fr (French)
frp (Franco-Provençal) fr (French)
frr (North Frisian) de (German)
fur (Friulian) it (Italian)
gag (Gagauz) tr (Turkish)
gan (Gan) gan-hant (Gan-ChineseTraditional), zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
gan-hans (Gan-ChineseSimplified) zh-hans (Chinese-Simplified)
gan-hant (Gan-ChineseTraditional) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
gl (Galician) pt (Portuguese)
glk (Gilaki) fa (Persian)
gn (Guarani) es (Spanish)
gom (Goan Konkani) gom-deva (Goan Konkani-Devanagari)
gom-deva (Goan Konkani-Devanagari) hi (Hindi)
gsw (Swiss German) de (German)
hak (Hakka) zh-hant (Chinese-Traditional)
hif (Fiji Hindi) hif-latn (Fiji Hindi-Latin)
hrx (Hunsrik) de (German)
hsb (Upper Sorbian) de (German)
ht (Haitian) fr (French)
ii (Nuosu) zh-cn (Chinese-cn), zh-hans (Chinese-Simplified)
inh (Ingush) ru (Russian)
iu (Inuktitut) ike-cans (Eastern Canadian Inuktitut-CanadianSyllabics)
jut (Jutish) da (Danish)
jv (Javanese) id (Indonesian)
kaa (Karakalpak) kk-latn (Kazakh-Latin), kk-cyrl (Kazakh-Cyrillc)
kbd (Kabardian) kbd-cyrl (Kabardian-Cyrillc)
khw (Khowar) ur (Urdu)
kiu (Kirmanjki (individual language)) tr (Turkish)
kk (Kazakh) kk-cyrl (Kazakh-Cyrillc)
kk-arab (Kazakh-Arabic) kk-cyrl (Kazakh-Cyrillc)
kk-cn (Kazakh-cn) kk-arab (Kazakh-Arabic), kk-cyrl (Kazakh-Cyrillc)
kk-kz (Kazakh-kz) kk-cyrl (Kazakh-Cyrillc)
kk-latn (Kazakh-Latin) kk-cyrl (Kazakh-Cyrillc)
kk-tr (Kazakh-tr) kk-latn (Kazakh-Latin), kk-cyrl (Kazakh-Cyrillc)
kl (Greenlandic) da (Danish)
ko-kp (Korean-kp) ko (Korean)
koi (Komi-Permyak) ru (Russian)
krc (Karachay-Balkar) ru (Russian)
ks (Kashmiri) ks-arab (Kashmiri-Arabic)
ksh (Ripuarian) de (German)
ku (Kurdish) ku-latn (Kurdish-Latin)
ku-arab (Kurdish-Arabic) ckb (Sorani)
kv (Komi) ru (Russian)
lad (Ladino) es (Spanish)
lb (Luxembourgish) de (German)
lbe (Lak) ru (Russian)
lez (Lezgian) ru (Russian)
li (Limburgish) nl (Dutch)
lij (Ligurian) it (Italian)
liv (Liv) et (Estonian)
lki (Laki) fa (Persian)
lmo (Lombard) it (Italian)
ln (Lingala) fr (French)
lrc (Northern Luri) fa (Persian)
ltg (Latgalian) lv (Latvian)
luz (Southern Luri) fa (Persian)
lzh (Literary Chinese) zh-hant (Chinese-Traditional)
lzz (Laz) tr (Turkish)
mai (Maithili) hi (Hindi)
map-bms (Banyumasan) jv (Javanese), id (Indonesian)
mg (Malagasy) fr (French)
mhr (Meadow Mari) ru (Russian)
min (Minangkabau) id (Indonesian)
mo ro (Romanian)
mrj (Hill Mari) ru (Russian)
mwl (Mirandese) pt (Portuguese)
myv (Erzya) ru (Russian)
mzn (Mazandarani) fa (Persian)
nah (Nahuatl) es (Spanish)
nan (Min Nan Chinese) cdo (Min Dong), zh-hant (Chinese-Traditional)
nap (Neapolitan) it (Italian)
nds (Low Saxon) de (German)
nds-nl (Dutch Low Saxon) nl (Dutch)
nl-informal (Dutch-informal) nl (Dutch)
olo (Livvi) ru (Russian)
os (Ossetian) ru (Russian)
pcd (Picard) fr (French)
pdc (Pennsylvania German) de (German)
pdt (Plautdietsch) de (German)
pfl (Palatinate German) de (German)
pms (Piedmontese) it (Italian)
pt (Portuguese) pt-br (Portuguese-Brazilian)
pt-br (Portuguese-Brazilian) pt (Portuguese)
qu (Quechua) es (Spanish)
qug (Chimborazo Highland Quichua) qu (Quechua), es (Spanish)
rgn (Romagnol) it (Italian)
rmy (Romani) ro (Romanian)
rue (Rusyn) uk (Ukrainian), ru (Russian)
ruq (Megleno Romanian) ruq-latn (Megleno Romanian-Latin), ro (Romanian)
ruq-cyrl (Megleno Romanian-Cyrillc) mk (Macedonian)
ruq-latn (Megleno Romanian-Latin) ro (Romanian)
sa (Sanskrit) hi (Hindi)
sah (Sakha) ru (Russian)
scn (Sicilian) it (Italian)
sdh (Southern Kurdish) fa (Persian)
ses (Koyraboro Senni Songhai) fr (French)
sg (Sango) fr (French)
sgs (Samogitian) lt (Lithuanian)
sk (Slovak) cs (Czech)
sli (Lower Silesian) de (German)
sr (Serbian) sr-ec (Serbian-ec)
srn (Sranan) nl (Dutch)
stq (Saterland Frisian) de (German)
su (Sundanese) id (Indonesian)
szl (Silesian) pl (Polish)
tcy (Tulu) kn (Kannada)
tg (Tajik) tg-cyrl (Tajik-Cyrillc)
tt (Tatar) tt-cyrl (Tatar-Cyrillc), ru (Russian)
tt-cyrl (Tatar-Cyrillc) ru (Russian)
ty (Tahitian) fr (French)
tyv (Tuvan) ru (Russian)
udm (Udmurt) ru (Russian)
ug (Uyghur) ug-arab (Uyghur-Arabic)
uk (Ukrainian) ru (Russian)
vec (Venetian) it (Italian)
vep (Vepsian) et (Estonian)
vls (West Flemish) nl (Dutch)
vmf (Mainfränkisch) de (German)
vot (Votic) fi (Finnish)
vro (Võro) et (Estonian)
wa (Walloon) fr (French)
wo (Wolof) fr (French)
wuu (Wu) zh-hans (Chinese-Simplified)
xal (Kalmyk) ru (Russian)
xmf (Mingrelian) ka (Georgian)
yi (Yiddish) he (Hebrew)
za (Zhuang) zh-hans (Chinese-Simplified)
zea (Zeelandic) nl (Dutch)
zh (Chinese) zh-hans (Chinese-Simplified)
zh-cn (Chinese-cn) zh-hans (Chinese-Simplified)
zh-hant (Chinese-Traditional) zh-hans (Chinese-Simplified)
zh-hk (Chinese-hk) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-mo (Chinese-mo) zh-hk (Chinese-hk), zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-my (Chinese-my) zh-sg (Chinese-sg), zh-hans (Chinese-Simplified)
zh-sg (Chinese-sg) zh-hans (Chinese-Simplified)
zh-tw (Chinese-tw) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)

Sorted by Fallback Language Code

[edit]
If this language isn't found… … try to use one of these languages
ady (Adyghe) ady-cyrl (Adyghe-Cyrillc)
aeb (Tunisian Arabic) aeb-arab (Tunisian Arabic-Arabic)
aeb-arab (Tunisian Arabic-Arabic) ar (Arabic)
arq (Algerian Arabic) ar (Arabic)
arz (Egyptian Arabic) ar (Arabic)
bbc (Batak Toba) bbc-latn (Batak Toba-Latin)
be-tarask (Belarusian-Taraškievica) be (Belarusian)
bh (Bihari) bho (Bhojpuri)
bpy (Bishnupriya Manipuri) bn (Bengali)
nan (Min Nan Chinese) cdo (Min Dong), zh-hant (Chinese-Traditional)
ku-arab (Kurdish-Arabic) ckb (Sorani)
crh (Crimean Tatar) crh-latn (Crimean Tatar-Latin)
sk (Slovak) cs (Czech)
jut (Jutish) da (Danish)
kl (Greenlandic) da (Danish)
bar (Bavarian) de (German)
de-at (German-at) de (German)
de-ch (German-ch) de (German)
de-formal (German-formal) de (German)
dsb (Lower Sorbian) de (German)
frr (North Frisian) de (German)
gsw (Swiss German) de (German)
hrx (Hunsrik) de (German)
hsb (Upper Sorbian) de (German)
ksh (Ripuarian) de (German)
lb (Luxembourgish) de (German)
nds (Low Saxon) de (German)
pdc (Pennsylvania German) de (German)
pdt (Plautdietsch) de (German)
pfl (Palatinate German) de (German)
sli (Lower Silesian) de (German)
stq (Saterland Frisian) de (German)
vmf (Mainfränkisch) de (German)
an (Aragonese) es (Spanish)
arn (Mapudungun) es (Spanish)
ast (Asturian) es (Spanish)
ay (Aymara) es (Spanish)
cbk-zam (Chavacano) es (Spanish)
ext (Extremaduran) es (Spanish)
gn (Guarani) es (Spanish)
lad (Ladino) es (Spanish)
nah (Nahuatl) es (Spanish)
qu (Quechua) es (Spanish)
liv (Liv) et (Estonian)
vep (Vepsian) et (Estonian)
vro (Võro) et (Estonian)
azb (Southern Azerbaijani) fa (Persian)
bcc (Southern Balochi) fa (Persian)
bgn (Western Balochi) fa (Persian)
bqi (Bakhtiari) fa (Persian)
glk (Gilaki) fa (Persian)
lki (Laki) fa (Persian)
lrc (Northern Luri) fa (Persian)
luz (Southern Luri) fa (Persian)
mzn (Mazandarani) fa (Persian)
sdh (Southern Kurdish) fa (Persian)
fit (Tornedalen Finnish) fi (Finnish)
vot (Votic) fi (Finnish)
bm (Bambara) fr (French)
ff (Fula) fr (French)
frc (Cajun French) fr (French)
frp (Franco-Provençal) fr (French)
ht (Haitian) fr (French)
ln (Lingala) fr (French)
mg (Malagasy) fr (French)
pcd (Picard) fr (French)
ses (Koyraboro Senni Songhai) fr (French)
sg (Sango) fr (French)
ty (Tahitian) fr (French)
wa (Walloon) fr (French)
wo (Wolof) fr (French)
gan (Gan) gan-hant (Gan-ChineseTraditional), zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
gom (Goan Konkani) gom-deva (Goan Konkani-Devanagari)
yi (Yiddish) he (Hebrew)
anp (Angika) hi (Hindi)
gom-deva (Goan Konkani-Devanagari) hi (Hindi)
mai (Maithili) hi (Hindi)
sa (Sanskrit) hi (Hindi)
hif (Fiji Hindi) hif-latn (Fiji Hindi-Latin)
ace (Acehnese) id (Indonesian)
ban (Balinese) id (Indonesian)
bbc-latn (Batak Toba-Latin) id (Indonesian)
bjn (Banjar) id (Indonesian)
bug (Buginese) id (Indonesian)
jv (Javanese) id (Indonesian)
min (Minangkabau) id (Indonesian)
su (Sundanese) id (Indonesian)
iu (Inuktitut) ike-cans (Eastern Canadian Inuktitut-CanadianSyllabics)
co (Corsican) it (Italian)
egl (Emilian) it (Italian)
eml (Emilian-Romagnol) it (Italian)
fur (Friulian) it (Italian)
lij (Ligurian) it (Italian)
lmo (Lombard) it (Italian)
nap (Neapolitan) it (Italian)
pms (Piedmontese) it (Italian)
rgn (Romagnol) it (Italian)
scn (Sicilian) it (Italian)
vec (Venetian) it (Italian)
map-bms (Banyumasan) jv (Javanese), id (Indonesian)
xmf (Mingrelian) ka (Georgian)
kbd (Kabardian) kbd-cyrl (Kabardian-Cyrillc)
kk-cn (Kazakh-cn) kk-arab (Kazakh-Arabic), kk-cyrl (Kazakh-Cyrillc)
kk (Kazakh) kk-cyrl (Kazakh-Cyrillc)
kk-arab (Kazakh-Arabic) kk-cyrl (Kazakh-Cyrillc)
kk-kz (Kazakh-kz) kk-cyrl (Kazakh-Cyrillc)
kk-latn (Kazakh-Latin) kk-cyrl (Kazakh-Cyrillc)
kaa (Karakalpak) kk-latn (Kazakh-Latin), kk-cyrl (Kazakh-Cyrillc)
kk-tr (Kazakh-tr) kk-latn (Kazakh-Latin), kk-cyrl (Kazakh-Cyrillc)
tcy (Tulu) kn (Kannada)
ko-kp (Korean-kp) ko (Korean)
ks (Kashmiri) ks-arab (Kashmiri-Arabic)
ku (Kurdish) ku-latn (Kurdish-Latin)
sgs (Samogitian) lt (Lithuanian)
ltg (Latgalian) lv (Latvian)
ruq-cyrl (Megleno Romanian-Cyrillc) mk (Macedonian)
dtp (Kadazan Dusun) ms (Malay)
cdo (Min Dong) nan (Min Nan Chinese), zh-hant (Chinese-Traditional)
dty (Dotyali) ne (Nepali)
li (Limburgish) nl (Dutch)
nds-nl (Dutch Low Saxon) nl (Dutch)
nl-informal (Dutch-informal) nl (Dutch)
srn (Sranan) nl (Dutch)
vls (West Flemish) nl (Dutch)
zea (Zeelandic) nl (Dutch)
csb (Kashubian) pl (Polish)
szl (Silesian) pl (Polish)
gl (Galician) pt (Portuguese)
mwl (Mirandese) pt (Portuguese)
pt-br (Portuguese-Brazilian) pt (Portuguese)
pt (Portuguese) pt-br (Portuguese-Brazilian)
qug (Chimborazo Highland Quichua) qu (Quechua), es (Spanish)
mo ro (Romanian)
rmy (Romani) ro (Romanian)
ruq-latn (Megleno Romanian-Latin) ro (Romanian)
ab (Abkhazian) ru (Russian)
av (Avar) ru (Russian)
ba (Bashkir) ru (Russian)
bxr (Buryat) ru (Russian)
ce (Chechen) ru (Russian)
crh-cyrl (Crimean Tatar-Cyrillc) ru (Russian)
cv (Chuvash) ru (Russian)
inh (Ingush) ru (Russian)
koi (Komi-Permyak) ru (Russian)
krc (Karachay-Balkar) ru (Russian)
kv (Komi) ru (Russian)
lbe (Lak) ru (Russian)
lez (Lezgian) ru (Russian)
mhr (Meadow Mari) ru (Russian)
mrj (Hill Mari) ru (Russian)
myv (Erzya) ru (Russian)
olo (Livvi) ru (Russian)
os (Ossetian) ru (Russian)
sah (Sakha) ru (Russian)
tt-cyrl (Tatar-Cyrillc) ru (Russian)
tyv (Tuvan) ru (Russian)
udm (Udmurt) ru (Russian)
uk (Ukrainian) ru (Russian)
xal (Kalmyk) ru (Russian)
ruq (Megleno Romanian) ruq-latn (Megleno Romanian-Latin), ro (Romanian)
cs (Czech) sk (Slovak)
aln (Gheg Albanian) sq (Albanian)
sr (Serbian) sr-ec (Serbian-ec)
tg (Tajik) tg-cyrl (Tajik-Cyrillc)
gag (Gagauz) tr (Turkish)
kiu (Kirmanjki (individual language)) tr (Turkish)
lzz (Laz) tr (Turkish)
tt (Tatar) tt-cyrl (Tatar-Cyrillc), ru (Russian)
ug (Uyghur) ug-arab (Uyghur-Arabic)
rue (Rusyn) uk (Ukrainian), ru (Russian)
khw (Khowar) ur (Urdu)
ii (Nuosu) zh-cn (Chinese-cn), zh-hans (Chinese-Simplified)
gan-hans (Gan-ChineseSimplified) zh-hans (Chinese-Simplified)
wuu (Wu) zh-hans (Chinese-Simplified)
za (Zhuang) zh-hans (Chinese-Simplified)
zh (Chinese) zh-hans (Chinese-Simplified)
zh-cn (Chinese-cn) zh-hans (Chinese-Simplified)
hak (Hakka) zh-hant (Chinese-Traditional)
lzh (Literary Chinese) zh-hant (Chinese-Traditional)
gan-hant (Gan-ChineseTraditional) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
lzh (Literary Chinese) zh-hant (Chinese-Traditional)
gan-hant (Gan-hant) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-hk (Chinese-hk) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-tw (Chinese-tw) zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-mo (Chinese-mo) zh-hk (Chinese-hk), zh-hant (Chinese-Traditional), zh-hans (Chinese-Simplified)
zh-my (Chinese-my) zh-sg (Chinese-sg), zh-hans (Chinese-Simplified)