Jump to content

Manual:Adding and removing languages

From mediawiki.org

MediaWiki is heavily multilingual and localized. Adding even more languages to it is a frequent activity. Sometimes, languages have to be removed, too. This is done in various contexts, and the procedures can be quite different, both from the technical and the community policies perspectives. This page documents the various language adding procedures.

Some conventions for this document:

  • "qqf" is a generic example language code.

Core MediaWiki

[edit]

Adding to core

[edit]

MediaWiki core is usually not the first place to which a language is added; the first places are usually translatewiki.net and language-data.

A new language is usually added to MediaWiki core when the translation of the messages in the "MediaWiki core" group in translatewiki.net reaches the export threshold (as of February 2023, it's 13%; see :translatewiki:Translating:MediaWiki and the "mediawiki" section in repoconfig.yaml). When this happens, the translations are automatically exported to languages/i18n/qqf.json, a bot adds the new language to Translating:MediaWiki/New languages, and one of the translatewiki administrators creates a Phabricator task to add the new language (example: T294729).

Boilerplate for a checklist for a new Phabricator task:

[ ] Verify autonym correctness in language-data
[ ] Add to Wikifunctions's language list in function-schemata
[ ] Add to `includes/languages/data/Names.php`
[ ] Add to RELEASE-NOTES
[ ] Set fallback if needed
-- [ ] In core (`languages/messages/Messages*.php`)
-- [ ] In jquery.i18n (`src/jquery.i18n.fallbacks.js`)
-- [ ] Remove fallback gender aliases if needed
[ ] Set rtl if needed
-- [ ] In core (`languages/messages/Messages*.php`)
-- [ ] In MobileFrontend (`src/mobile.languages.structured/rtlLanguages.js`)
[ ] Add namespace translations
-- [ ] Add gender aliases if needed
-- [ ] Check that namespace translations don't conflict with language codes
[ ] Add $linkTrail
[ ] Add date formats (if needed)
[ ] Add digit conversion (if needed)
[ ] Define wider line-height in `resources/src/mediawiki.skinning/i18n-headings.less` (if needed)
[ ] Remove the language from Wikibase and WikibaseLexeme (if needed)
[ ] Remove from translatewiki after deploying the core change
-- [ ] `mw-config/LanguageSettings.php`
-- [ ] `mw-config/FallbackSettings.php`
[ ] Test language search in UniversalLanguageSelector and fix the search index (if needed)

Preparation steps:

  1. Determine the ISO 639 code. No ISO 639 code—no adding! (At this stage, it should have already been checked when adding to language-data and translatewiki, but you still need to double-check it, because Names.php is one of the most notable and stable locations for language configuration in MediaWiki.)
  2. Determine the autonym. By this point, this should have already been done when the language was added to translatewiki or language-data, but it doesn't hurt to double-check. Check the patches that added the languages to those other places, and verify the sources for the autonym.
  3. Do basic quality control on the translations: No one can know all the languages, but it's possible for anyone with MediaWiki experience to check simple things:
    1. Messages must be actually translated and not just copied from English.
    2. General syntax correctness is practiced with magic words, links, etc.
  4. Decide whether the language needs a fallback language. English is the default final fallback language and doesn't need an explicit definition. Don't guess this, but ask native speakers whether it's better for most people who speak this language to see untranslated things in English or in some other language. Remember that one language may be spoken in several countries.
  5. Optional, but highly recommended: Determine what are all the characters that are necessary for writing this language's words. This is necessary for defining the linktrail. The Wikipedia article about the language is often a good source for the alphabet, but double-check them with external reliable sources. If the language is written in the plain 26-letter ASCII Latin alphabet without any diacritics or special characters, then an explicit linktrail is not necessary.
    • Note: Languages that are written in scripts of East Asia (Chinese, Japanese, Korean) or Southeast Asia (such as Thai, Burmese, Javanese, etc.) probably don't need a linktrail. If in doubt, ask a speaker.
  6. Optional, but highly recommended: Ask people who know this language well (for example, trustworthy translatewiki translators or Incubator contributors) to give you translations for the namespace names. Useful documentation about this for translators can be found on the page Translating:MediaWiki#Translating namespace names. The list is short, but give the translators some time to think about it: it's a bit difficult to change it later, so it's important to get it right.
    • Note: Check that none of the namespaces are the same as language codes! This creates ambiguity with interlanguage links in wikitext.
  7. Optional: Get date formats for the language.

If the language has a fallback language, update it in the jquery.i18n library: Make a patch in the src/jquery.i18n.fallbacks.js file in the wikimedia/jquery.i18n GitHub library. The file is self-explanatory. Don't worry about deploying it; the library will be autoupdated later.

Make a Gerrit patch in core:

  1. Add an entry about adding this language to the "Languages updated" section in the newest version's RELEASE-NOTES.
  2. Edit includes/languages/data/Names.php. Add an entry for the language. Copy the autonym from translatewiki or language-data (they should be the same; if they aren't, something may be wrong somewhere). The name doesn't have to begin with a capital letter; English requires it, but most languages don't. Put the language's English name in a comment.
  3. If the language is written in a writing system that requires it, define wider line-height. This is mostly needed for South Asian and South East Asian writing systems, such as Devanagari, Bengali, Thai, etc. This is done in the file resources/src/mediawiki.skinning/i18n-headings.less.
  4. If you have any information to add there, create the file languages/messages/MessagesQqf.php. If you don't have any information to put there, skip this step. You can copy the boilerplate from a file for a similar language, but make sure to change all the necessary parts. Details:
    • Add fallback in the beginning, if necessary.
    • Add the $linkTrail in the end, if you have it.
    • If the language is written from right to left, add $rtl = true;.
    • Add namespace names in the variable $namespaceNames, if you have them. Make sure to use underscores instead of spaces in the strings.
    • If necessary, add gender aliases for them in the variable $namespaceGenderAliases.
    • If the new language uses a fallback that has gender aliases, such as French, Russian, Spanish, or Portuguese, but the new language itself doesn't need them, reset them by adding $namespaceGenderAliases = [];.
    • Add date formats in the variables $datePreferences, $defaultDateFormat, and $dateFormats, if you have them.
    • If the language needs different numerals, add them in the variable $digitTransformTable. Examples can be found in Persian (fa), N'Ko (nqo), Burmese (my). Don't guess this—consult with native and ask whether they actually use them. Some languages have traditional native numerals defined in the Unicode block for their writing system, but in practice they use Arabic numerals or some other system.
    • Add magic words and special page aliases if you have them.

If the language is written from right to left, make a Gerrit patch in extensions/MobileFrontend (necessary because of task T342447):

  1. Add the language code to the file src/mobile.languages.structured/rtlLanguages.js.
  2. Run npm install and npm run build in the repo's root directory. This will generate some files based on your modifications.

After this is done and deployed:

  1. Remove the language from translatewiki.net configuration.
  2. Test whether you can change your interface to this language using the "Internationalisation" section in Preferences, and using Universal Language Selector.
  3. Check whether there are any special definitions for this language in Wikibase and Wikidata, and remove them. When the language is in Names.php, it's fully supported in Wikibase (and Wikidata), too.

Updating in Names.php

[edit]

Boilerplate for a checklist for a new Phabricator task:

A user requested to change the autonym of the ______ language (qqf).

Current name: ______

Requested name: ______

Discussion links:
* ...
* ...

[ ] check the history of the name in MediaWiki files
[ ] verify that the change is correct, necessary, and there is consensus for it
[ ] change in translatewiki.net configuration (if needed; for example, if it has variants that still aren't in Names.php)
[ ] change in language-data
[ ] update in jquery.uls
[ ] update in ULS extension
-- [ ] add an alias with the old name to ULS search index (if needed)
-- [ ] run the script to auto-update the ULS search index
[ ] change in jquery.ime (if necessary)
[ ] change in Names.php
[ ] change in the top comment in MessagesQqf.php
[ ] mention the update in core RELEASE-NOTES
[ ] verify using https://codesearch.wmcloud.org that the old name is not used anywhere

It's possible to change a name in core if you found a better autonym. Make sure that relevant stakeholders who know the language in translatewiki and some wikis are aware of the change. Do not change it without a consensus.

The procedure is:

  • Make sure that a Phabricator task about it is filed. The task must explain the motivation for the change and references to a better autonym.
  • Mention it in the newest version's RELEASE-NOTES.
  • Edit the file Names.php and change the autonym on the language's line.
  • Edit the file languages/messages/MessagesQqf.php and update the autonym in the top comment .

In addition, you should probably update the following:

  • language-data: update the name, and make sure to update jquery.uls and the UniversalLanguageSelector extension.
  • jquery.ime, if it's there.
  • After the above, you likely need to update the language search index in the UniversalLanguageSelector extension, so that the language is findable by the old name.
  • Update the name in WikiLambda.

Special scenarios for core

[edit]

Some languages may have a MessagesQqf.php file, but no entry in Names.php. A useful case for this is right to left languages without full localization support yet. There are two cases when this can happen:

  1. Historical languages, which aren't usable for new modern content or software localization, but usable for lexicogrpahy or storing historical texts, for example Ottoman Turkish or Jewish Babylonian Aramaic. This is necessary for proper RTL support in Wikidata, for example.
  2. Modern languages that may may have localization and content, but haven't yet crossed the localization threshold. This is necessary for proper RTL support in Wikidata, as well as Wikimedia Incubator, translatewiki.net and other places.

In such cases, add a MessagesQqf.php file with $rtl = true; and a clarifying comment of why this is only partial configuration. (This comment should be removed when the language becomes fully supported.)

TODO: Any other special scenarios?

Removing from core

[edit]

TODO

CLDR

[edit]

Language names

[edit]

The CLDR extension of MediaWiki holds a copy of the official Unicode CLDR Project. It is used to display language names and other information about languages, in a lot of languages. Examples:

  • {{#language:fr|de}}= The language name of "fr" (French) in "de" (German) = Französisch
  • {{#language:ko|en}}= The language name of "ko" (Korean) in "en" (English) = Korean
  • {{#language:en|ar}}= The language name of "en" (English) in "ar" (Arabic) = الإنجليزية

It is used by the Babel, Wikibase and WikibaseLexeme extensions too (test page).

Therefore, if the new language code is not part of CLDR, only the autonym will be displayed instead of the name of language in the requested language or the user language.

To override this locally:

  1. Check whether the language code is in CldrMain/CldrMainEn.php and is correct:
    1. Yes: Go to the next section
    2. No: check if the language code is in LocalNames/LocalNamesEn.php. This is a MediaWiki-specific file for languages that are not part of Unicode CLDR or are wrongly translated.
      1. Yes: Go to the next section.
      2. No: Write a patch which adds the new language code to LocalNamesEn.php. Translations into other languages are welcome. Example commit: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/cldr/+/846592
    3. If the official Unicode CLDR translation is wrong, it would be a bit more complex: File a ticket at Unicode CLDR and write a patch that changes the language code in LocalNamesEn.php (and, if needed, in other languages). Sadly, the process of fixing a translation in the official Unicode CLDR is really, really slow.

A name in English should always be present because English is the ultimate fallback for all languages, and this ensures that some name will appear. You may also add names in other languages to appropriate LocalNames*.php files, and they will be shown to people who use MediaWiki in those languages.

Plural rules

[edit]

Plural rules are by default pulled from the CLDR. Some languages that are supported in MediaWiki, are not supported in CLDR. Plural rules for those languages can be added in core MediaWiki, in the file languages/data/plurals-mediawiki.xml.

If the language that you are adding uses rules that are identical to the rules of a language that already appears in that file, don't add a new XML element, but add the language code to an existing element.

If the language that you are adding uses rules that are identical to the rules of a language that appears in the standard CLDR, copy that language's section from the file languages/data/plurals.xml and update the language code.

Always add a comment that explains the presence of that language and links to a relevant Phabricator task.

Add a test for the new rules in tests/phpunit/includes/languages/LanguageQqfTest.php.

jquery.ime

[edit]

jquery.ime is a library for typing in various languages and alphabets. It is embedded into websites, and the users don't have to install anything or change their settings. The only condition is using a modern browser with JavaScript enabled.

The detailed instructions for adding an input method appear in the README files in the root directory of jquery.ime and in the rules/ directory.

In MediaWiki, jquery.ime is integrated into the UniversalLanguageSelector extension, but because jquery.ime is designed to be independently usable without MediaWiki, it has its own list of languages. Therefore, when adding a new keyboard layout for a language that is not supported yet, the language must be added to that list:

  1. Determine the ISO 639 code and the autonym.
  2. If it's not there already, add the language to Universal Language Selector (language-data, jquery.uls, ULS extension).
  3. In the jquery.ime GitHub repository, add the input method's code (see the README in the repository). After that, edit src/jquery.ime.inputmethods.js. In the $.extend( $.ime.languages, { section, add a new block, ordered by the ISO 639 code. There, write the autonym in the autonym and list the input method identifiers in the inputmethods field.

To test, run a local webserver, open http://localhost/examples, and try selecting the language and using your new input method.

To get this deployed to Wikimedia wikis and other sites that run MediaWiki, it has to be updated in the UniversalLanguageSelector extension. To do this, run scripts/update-jquery-ime.sh in the extension directory, and submit a patch to Gerrit.

It's important to also write documentation for the keyboard as a subpage of Help:Extension:UniversalLanguageSelector/Input methods.

jquery.uls

[edit]

jquery.uls is the generic MediaWiki-independent library that provides the core functionality of the Universal Language Selector extension. To add a language to jquery.uls:

  1. Add it to the language to language-data.
  2. After that, make a GitHub patch in the wikimedia/jquery.uls. In the repo's root folder, run the script ./scripts/fetch-language-data.sh.
  3. Check the diff. In the commit message, describe the changes (using the English name of the language), and give a link to the latest language-data commit on GitHub.
    Example commit: https://github.com/wikimedia/jquery.uls/pull/422
  4. Update Universal Language Selector.

language-data

[edit]

language-data is a library with a list of language codes. Its primary purpose is to make the language selectable in Universal Language Selector, although it may also be used in other contexts. As of late 2022, it's stored in the wikimedia/language-data repository.

It's one of the most "liberal" places for adding a language. Since the generic Universal Language Selector is used for various purposes in MediaWiki, and even outside the MediaWiki and Wikimedia world, it's not as strictly limited to new languages as, for example, the Wikimedia Language proposal policy for new wikis, and it may include ancient and constructed languages or variants that can be generally useful.

Despite being relatively liberal, it still strictly requires that only languages with a valid ISO 639 code are added. No ISO 639 code—no adding!

To add a language, determine the autonym and make a GitHub patch in the wikimedia/language-data repository. Before doing it for the first time, make sure that you are familiar with that repository's README and with the other documentation to which it links.

  1. Update your copy of the the language-data repository.
  2. Run the script php src/util/ulsdata2json.php and view the diff. If it creates any changes, make a patch. In the commit message, say that the change is automatic, but describe the changes.
    • The ulsdata2json.php script automatically downloads a file with data about languages from the CLDR server. Usually, these automatic changes are adding or removing languages used in countries.
  3. When ulsdata2json.php script doesn't create any changes, edit the file data/langdb.yaml. Add a line for that language in alphabetical order of language codes. List the ISO 15924 code of the language's script, the continents where the language is spoken, and the autonym.
    • The ISO 15924 code must appear in one of the scriptgroups sections towards the end of the file. If the language is written in a script that doesn't yet appear in any of these groups, determine its writing system, make sure that it has a valid four-letter ISO 15924 code, and add it to a one of the groups. Use your judgment to put it in an appropriate group.
  4. Run the script php src/util/ulsdata2json.php.
  5. Using git diff, check that the file data/language-data.json was updated.
  6. Run npm test. All the tests must succeed.
  7. In the commit message, mention the reason for adding the language (using the language's English name), and the source for the autonym. (This is similar to adding a language to translatewiki.)
  8. Submit the changes as a pull request.

After the pull request is merged, update jquery.uls.

Consider also adding a keyboard layout for the language using jquery.ime.

Names.php

[edit]

See Core MediaWiki.

translatewiki

[edit]

Adding to translatewiki

[edit]

Boilerplate for a checklist for a new Phabricator task:

Title: Add ________ (qqf) to translatewiki.net

Requested at _________

Autonym according to _____: .

[ ] Add to language-data
-- [ ] Run `src/util/ulsdata2json.php` before adding
-- [ ] Add a script to `scriptgroups` (if needed)
-- [ ] Add a script to `rtlscrips` (if needed)
-- [ ] Run `src/util/ulsdata2json.php` after adding
[ ] Add to translatewiki configuration
-- [ ] Add assistant languages to `mw-config/FallbackSettings.php` (if needed)
-- [ ] Configure as `always-export-languages` in `repoconfig.yaml` (if needed; most languages don't need it)
[ ] Deploy configuration to translatewiki
[ ] Add a Messages*.php file with `$rtl = true;` (if needed)
[ ] Deploy the Messages*.php file with `$rtl = true;` to translatewiki
[ ] Add an entry to LocalNames/LocalNamesEn.php in the CLDR extension (if needed)
[ ] Add jquery.ime keyboard (if needed)
-- [ ] Add keyboard documentation
[ ] Update jquery.uls and jquery.ime in the ULS extension
[ ] Deploy ULS to translatewiki
[ ] Create the translatewiki language portal
[ ] Make sure that the category for the language is not marked as disabled
[ ] Update the translatewiki language portal to indicate that the language is enabled
[ ] Add the requester (and possibly other relevant users) to the translators list on the portal
[ ] Enter the autonym in the Wikimedia Portals project
[ ] Test language search in UniversalLanguageSelector and fix the search index (if needed)

Adding new languages to translatewiki usually begins as a request on the page Support. The request must include the language's ISO 639-3 language code.

Before adding, read the policy at Translatewiki.net languages and make sure that the language fits it. Ask the requester for clarifications if needed. If the language doesn't fit the policy, politely decline the request.

Determine the autonym. The autonym should be mentioned in the request, but you still have to verify it according to the instructions on this page.

Make a Gerrit patch in the translatewiki repository:

  1. Edit the file mw-config/LanguageSettings.php. Add a line for this language in alphabetical order of language codes. Write its autonym in the quotes. In a comment on the same line, add the English name of the language, and write your name and date.
  2. Optional: Add default assistant languages. Edit the file mw-config/FallbackSettings.php, add a line for that language, and list the languages in the array. The language that is the most likely to be helpful to translators should be at the top. Even though the file where it's done is called FallbackSettings.php and the variable in question is called $wgTranslateLanguageFallbacks, this is not actually a fallback language, but an assistant language that is shown to translators as an aid, when a translation is available. This will make translation easier for people who didn't define assistant language preferences. Only languages that are already available on translatewiki should be added there. English doesn't have to be added there, because it's always shown. Add no more than six languages. Consider also adding the new language as a default assistant language to other languages. The languages to add there are:
    • A common foreign or official language in the country where this language is spoken. E.g., if a language is spoken in Indonesia, add Indonesian; if a language is spoken in a Francophone African country, add French; etc.
    • Languages from the same family. Don't just add everything from the same linguistic family; only add languages that the translators are likely to find helpful.
    • Other languages spoken in the same country or region.
  3. If a language is a variant of another language, and needs only a partial translation, edit the file repoconfig.yaml and add the language code to the relevant always-export-languages sections. (Example: "pap-aw".)

In the commit message:

  1. Give a reason for adding the language (using the language's English name). If it's a request on the Support page, give a direct permanent link to the relevant thread. If the reason is different, explain it clearly.
  2. Give the source for the autonym.

See the git log for mw-config/LanguageSettings.php for example patches.

After the patch is reviewed and merged, deploy the updated configuration to the production translatewiki server. After the deployment, test that the language was added correctly: go to Special:Translate, click the target language selector, and type the new language's code. The new language should appear in the results.

Actions in other repositories

[edit]
Right to left languages
[edit]

If the language is written from right to left, do the following:

ULS and language-data
[edit]

Add the language to language-data and then to Universal Language Selector. The translatewiki.net change can be deployed before the Universal Language Selector change is done, and the language will be usable as a target language for localization, but selecting it can be broken in some cases, so don't forget it and do it as quickly as possible.

Consider also adding a keyboard layout for the language using jquery.ime.

CLDR
[edit]

Update your checkout of the CLDR extension and check whether the language code appears there in the CldrMain/CldrMainEn.php or LocalNames/LocalNamesEn.php file. If not, add it according to the instructions in the CLDR section.

Actions on the translatewiki.net website itself

[edit]

Make sure that there is a language portal for the language on translatewiki. The portals have names in the form "Portal:qqf". Some portals for languages that aren't yet configured already exist, but the language may be marked as "disabled". See the instructions at Template:Portal.

Add the requester (and possibly other relevant users) to the translators list on the portal. Simply click "edit" in the Translators section and add a list item as {{user|username}}.

Add the language's native and transliterated autonyms to the Wikimedia Portals message group localization (not to be confused with translatewiki's own language portals).

Removing from translatewiki

[edit]

Languages are removed from the translatewiki configuration in two cases:

  1. When their MediaWiki localization statistics cross the necessary threshold, full-fledged support for them is added to MediaWiki. In this case, they remain usable on translatewiki.net as a target language for translation, and they also become usable as a user interface language.
  2. Full removal because the language is deemed invalid for translatewiki.

Crossing the threshold

[edit]

If the language has crossed the export threshold (see Translating:MediaWiki) and was added to MediaWiki core's Names.php, it should be removed from translatewiki.net's own configuration. In this case:

  1. Make sure that the language is actually in Names.php, and that the MediaWiki revision that includes it has been deployed to translatewiki!
  2. Remove the language's entry from the file mw-config/LanguageSettings.php.
  3. Check the language's entry in the file mw-config/FallbackSettings.php. If the language has no entry, you're all set. If it has an entry, check whether any of the assistant languages has been added to core MediaWiki as the fallback language, remove that language from FallbackSettings.php. Leave the rest of the languages, as they may still be useful as assistant languages.

Invalid languages

[edit]

Sometimes, the translatewiki administrators decide that MediaWiki and other projects hosted on translatewiki shouldn't be localized into that language. This may happen, for example, if the language had been added not according to policy, if grave mistakes were made while adding the language, or if the localizations are very low quality, and it's better to remove them and start the work in that language from scratch some time later.

  1. Remove the language's entry from the file mw-config/LanguageSettings.php and from mw-config/FallbackSettings.php.
  2. If the language is being removed because it was added by mistake, check whether it appears in any special configurations in repoconfig.yaml, and remove them if needed (to be extra-sure, grep the repository for more appearances).
  3. Consider also deleting all the translations from the site.
  4. Delete the language's portal page or mark the language as disabled.
  5. TODO: anything more to document here?

Universal Language Selector

[edit]

Universal Language Selector is the MediaWiki extension that provides language selection functionality in various contexts.

To add a language to Universal Language Selector, first add it to language-data and then to jquery.uls. After that is merged, make a Gerrit patch for Universal Language Selector:

  1. From the repo's root folder, run the script scripts/update-jquery-uls.sh.
  2. Check the diff. In the commit message, describe the changes (using the language's English name), and give a link to the latest jquery.uls commit on GitHub.

Example commit: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/793082

(For updating keyboard layouts, see the jquery.ime section.)

Language search index

[edit]

The Universal Language Selector extension has a little built-in search engine for languages. It allows users to search for any language in any language using data from CLDR, from language-data, and from this extension's own code. The search index should be regularly updated after every update in the CLDR extension and in language-data.

In addition, if the autonym of the language that you are adding has commonly used aliases or alternate spellings, add them to the $specialLanguages array in the file data/LanguageNameIndexer.php. This is done to ensure the ULS's search box usefulness: many people may use a name that is significantly different from the autonym to search for the language they need. You can see the existing examples of cases when this is needed in the file itself:

  • A language that has a common alias in the language itself. For example, in Spanish, the Spanish language itself is often called both "español" and "castellano". The name "español" is used as the autonym, but since "castellano" is a common alias, it is also added here.
  • Some languages that aren't written in the Latin alphabet are frequently searched using the Roman alphabet, such as Armenian or Japanese.

After adding the language:

  • Make sure that you have an up-to-date version of the CLDR extension.
  • Run php LanguageNameIndexer.php in the same directory. It will update the file LanguageNameSearchData.php. Check that the updated file includes the necessary additions, and that it doesn't remove any useful search strings. If you are not sure about anything, it's best to consult with people who speak the relevant languages.
  • Commit the changes to Gerrit.

Wikibase

[edit]

Wikibase is the MediaWiki extension that powers the Wikidata project.

Term language codes (language codes for labels)‏

[edit]
  • Until T273627 is resolved, in mediawiki-config:
    • Add it to InitialiseSettings.php, under $wmgExtraLanguageNames['wikidata']. (This list is to remain consistent with the one in getDefaultTermLanguages() as some parts of Wikidata still fetch languages from $wmgExtraLanguageNames until its removal in the above mentioned task.)

NOTE: changes to mediawiki-config require deployment. It makes sense to merge the Wikibase patch at the same time as the config one.

To test whether it worked, try to add a new item in wikidata.org and see if you can find the added language code in the language dropdown list

Adding a language code only for monolingual text

[edit]

Add a local name to CLDR and the language code will be available for monolingual text. The old method is deprecated, but are still available for exceptional circumstances when adding a local name to CLDR is inappropriate:

Old method

NOTE: This type of change does not require deployment. Put a patch up for review as usual. Wikibase gets localized language names from the CLDR extension. Names provided by CLDR are in the CldrMain directory. The CLDR extension falls back to $wmgExtraLanguageNames and Names.php, so a name might still be shown for languages not in the CLDR extension if the language is defined in one of those places instead. Adding language names to CLDR is not part of the Wikidata team scope.

To test whether it worked, try to add a new statement on an item on wikidata.org (e.g. on the item sandbox). Choose a property with datatype monolingual text (for example, nickname) and look for the newly added language code in the language input field.

Adding a language code for Lexemes

[edit]

Add a local name to CLDR and the language code will be available for Lexemes. The old method is deprecated, but are still available for exceptional circumstances when adding a local name to CLDR is inappropriate:

Old method

NOTE: This type of change does not require deployment. Put a patch up for review as usual.

Past patch for example

To remove the language code from lexeme, remove wikibase-lexeme-language-name-qqf from the <language-code>.json file.

To test whether it worked, try to add a new lexeme and check if you can find the added language code in the menu of the “Spelling variant of the Lemma” field.

Anything else?

[edit]

Wikifunctions

[edit]

The Wikifunctions project and Abstract Wikipedia have their own, more expansive definition of languages than MediaWiki. Consequently its code has a mapping of MediaWiki-supported language codes to its own language objects. This is stored in the function-schemata repository on GitLab. When a language is added or modified to the Wikimedia platform, it should also be updated in this repository, and the update pulled through to the WikiLambda repository on Gerrit.

In general, this can be left to the Abstract Wikipedia team, but the workflow is roughly:

  1. Make sure the new MediaWiki language code is covered in the function-schemata repository. Either
    1. Create the language object using the script: node bin/generateNaturalLanguage.js code "autonym" "English name", e.g. node bin/generateNaturalLanguage.js en-ie "Hiberno-English" "Irish English", commit the resultant changes to a branch, push it to GitLab, and create a Merge Request for it to be merged; or
    2. Identify an existing language object to which the new code should be added (such as when MediaWiki is switching to use a more modern language code reference to one already listed), and edit the object to extend (or add an initial) Z60K2 value with the new code in it, update the references, and commit, push, and create an MR as above.
  2. Once the above commit is reviewed and merged by the team to main, create pull-through requests in each of the four repos that use function-schemata, most importantly the WikiLambda MediaWiki extension, using the local ./bin/updateSubmodule.sh script, and push for review to gerrit/GitLab as appropriate, so the team can review and merge.

Wikimedia Portals

[edit]

Wikimedia Portals are the main language-neutral pages of Wikipedia, Wiktionary, and some other Wikimedia projects. They have www in the beginning of their URL, rather than a language code.

Autonym configuration

[edit]

To make sure the language's autonym is properly supported in portals, do the following steps:

  1. Make sure that the language is configured and usale in translatewiki.net.
  2. If the autonym is written in an alphabet that has uses letter casing, such as Latin or Cyrillic, you should begin the autonym with a capital letter, unless the language specifically requires that it be a small letter. If in doubt, verify with someone who knows the language well.
  3. Go to Special:Translate on translatewiki.net.
  4. Select "Wikimedia Portals" in the message group selector.
  5. Select the target language.
  6. Click "..." in the toolbar (next to "Translated") and check the "Optional messages" box.
  7. In the key portals-language-name, add the autonym.
  8. If the autonym is not in the Latin alphabet: In the key portals-language-name-romanized, add a transliteration of the autonym. It may include diacritics and various special character.
  9. If the key portals-language-name or portals-language-name-romanized includes any characters outside the basic 26-letter Latin alphabet: In the key portals-language-name-romanized-sorted, add a transliteration of the autonym without any special characters.

Make sure to read the documentation (qqq) for each of the messages above!

The rest of the "Wikimedia Portals" message group should also be translated, but this, of course, should be done by people who know the language.

Configuring the project on the portal in production

[edit]

If there is a project in that language, but its name appears incorrectly, doesn't appear on the portal at all, or appears in the wrong section, please create a Phabricator ticket with the tag "#wikimedia-portals".

TODO:

  • Describe the actual fixing for cases when the name doesn't appear or appears in the wrong section.
  • Describe rtl language handling (they probably have to be added while creating a new wiki, but verify this).
  • Describe various caveats, exceptions, overrides.

Wikistats

[edit]

Wikistats 2 has a manual step for adding a new language. See Data Engineering/Systems/Wikistats 2#Adding languages. It would be nice to make it more automatic (task T336752), but for now it's manual.

Determining the autonym

[edit]

When people ask to add a language or to change the autonym of an already-configured language, do your best to verify it. This is sometimes challenging: MediaWiki already supports well over 400 languages, which are, naturally, the world's better-documented languages. This means that many of the new languages that are being added are less well documented and it's generally harder to find information about them. Sometimes, an autonym may be generally hard to find in an external source. In other cases, different sources may cite different autonyms, and when adding the language, you'll have to make a decision without actually knowing the language. Try to use your best judgment and to reach a reasonable compromise between the information in available third-party sources and the information given to you by the requesters.

A general useful guideline is to remember the autonym's most important purpose: To let users select the language they need from a long list of languages.

Since autonyms usually appear in lists of languages, they have to be unique. The users must be able to choose the precise language that they need, and not something with a similar name.

A particular comment must be added about languages known as "creole", "pidgin", or "patois". They often have the word "creole", "pidgin", and "patois" in their names, often adapted to their spelling. Their speakers often call them just by that word. However, there are many languages of this kind, and their names have to be unique. Therefore, try to find a name that at least includes another word, such as the name of the place where it's spoken, or a completely unique name. Other than that, all the other suggestions about autonyms apply to these languages.

If possible, avoid mentioning country names in autonyms. Do it only if you have a specific reason, such as adding a variant of a language that is based on the standard used in that country. When you have to do it, write the country name in the language.

Autonyms usually don't have to be written with a capital letter. English spelling conventions require that names of languages be written with a capital letter in English, but most languages don't have such a requirement. Use a capital letter only if this is specifically required by the language's own orthography or if you are doing it in an environment where a capital letter is needed for another specific reason.

Some sources where you may find autonyms:

  • A good autonym may appear in the Wikipedia article about the language in major languages, such as English, French, German, Portuguese, Spanish, or Russian. You can also look for it at the Wikidata item page for the language. It may be correct, but as with every statement in Wikipedia and Wikidata, they are often good starting points, but you have to verify it with an external reliable source.
  • The best source for autonyms is a professionally written and published online or printed book about the language: a dictionary, a grammar reference, a standard orthography guide, a history of the language, etc. Academic articles about the language are a good source, too. Glottolog often has titles of such sources, although it doesn't always have a link to an online copy. A good source where actual books or at least parts thereof may be found are the Internet Archive (many such books are available for free time-limited lending in its e-book library). Some providers in The Wikipedia Library, such as L'Harmattan, Cambridge, and others have free access to relevant e-books. Google Books and Amazon.com may have enough content in the free preview to find the autonym.
  • Other websites and apps in this language are a very good source, especially if they have a language selector. One particular kind of app to check is Android keyboard apps, such as Gboard: install the app, enable the language, and check how does its name appear when you choose to type in it. Check also the language settings on the Windows operating system—recent versions have a very large selection of languages.
  • The UN's Universal Declaration of Human Rights is one of the most translated documents in history, and translations are available online. Unfortunately, the lookup system for languages is not so convenient, and the presentation is non-standardized and quite inconsistent. Nevertheless, it is sometimes a useful source.
  • Ethnologue: the autonym is usually available as part of the freely-shown pages, although it's not always the best option. Try to check in other sources, too.

Note that when writing in English, for example in English-language discussions, code comments, Git commit messages, etc., you should use the English name of the language and not the autonym.

See also

[edit]