Writing systems
Diese Seite enthält grundlegende Informationen zur Unterstützung verschiedener Aspekte von Schriftsystemen: Sprachen, die in mehreren Schriftsystemen geschrieben werden; Schreibrichtung; Schriftart-Rendering und Eingabe.
Mehrere Schriftsysteme, mehrere Dialekte
Viele Sprachen sind mit verschieden Skripten geschrieben. Es ist oftmal möglich, allerdings fehlt es an Unterstützung in der Software, und manchmal ist es schwierig, wenn nicht gar unmöglich, es zu implemtieren Manche Sprachen haben einen SprachConverter der Unterstützung für mehrere Schriftsysteme hinzufügt.
Einige Sprachen haben sehr ähnliche Dialekte, die mit dem(n) gleichen Schriftsystem(en) und can&mdash geschrieben werden; auf einem technischen Niveau behandelt wie verschiedene Schriftsysteme.
LanguageConverter
Für Dokumentation über den Gebrauch des LanguageConverter, siehe Schriftsysteme/Syntax
LanguageConverter (LC) is a system based on language variants that automatically converts the content of a page into a different variant. A variant is mostly the same language in a different script. To use the LanguageConverter, go to your Internationalisation preferences. If you are on a wiki that supports conversion, you'll see an extra option for choosing the script.
Phab:T21044 -- this needs more documentation!
It is implemented for the following languages (as of July 2023; see languagesWithVariants for the latest list):
- Balinesisch (ban): Balinese (ban-bali), Latin (ban-latn) [1.36+]
- Krimtatarisch (crh): Latin (crh-latn), Cyrillic (crh-cyrl)
- Englisch (en): Normal (en), Pig Latin (en-x-piglatin) (for testing, only when $wgUsePigLatinVariant is enabled)
- Gan (gan): Simplified (gan-hans), Traditional (gan-hant)
- Inuktitut (iu): Latin (ike-latn), Syllabics (ike-cans) [1.18+]
Kasachisch (kk): Cyrillic (kk-cyrl), Latin (kk-latn), Arabic (kk-arab)Discontinued in 2023, see reasons at phab:T268143 and phab:T350684.- Kurdisch (ku): Latin (ku-latn), Arabic (ku-arab) [1.11+]
- Serbokroatisch (sh): Cyrillic (sh-cyrl), Latin (sh-latn) [1.40+]
- Taschelhit (shi): Tifinagh (shi-tfng), Latin (shi-latn) [1.19+]
- Serbisch (sr): Cyrillic (sr-ec), Latin (sr-el)
- Tadschikisch (tg): Cyrillic (tg-cyrl), Latin (tg-latn)
- Talisch (tly): Cyrillic (tly-cyrl), Latin (tly-latn) [1.36+]
- Usbekisch (uz): Cyrillic (uz-cyrl), Latin (uz-latin) [1.20+]
- Wu (wuu): Simplified (wuu-hans), Traditional (wuu-hant) [1.41+]
- Tamazight (zgh): Tamazigh (zgh-tfng), Latin (zgh-latn) [1.42+]
- Chinesisch (zh):
- Chinesisch (vereinfacht) (zh-hans): China (zh-cn), Singapore (zh-sg), Malaysia (zh-my)
- Chinesisch (traditionell) (zh-hant): Taiwan (zh-tw), Hong Kong (zh-hk),[1] Macau (zh-mo)
And it is needed for many more languages!
Language code tags for scripts should follow the ISO 15924 standard.
However, for legacy reasons, Serbisch is an exception, with sr-ec
instead of sr-cyrl
and sr-el
instead of sr-latn
.
This is in discussion in phab:T117845.
A current limitation of this system is that it may be particularly bad at dealing with multiple writing systems based on the same underlying script.
Chinese Wikipedians occasionally use =>
(unidirectional) for failing cases.
As LC always tries to eat up the largest chunks of words using strtr
in PHP, -{}-
(breaking up words) can be often useful too.
Supporting configuration
The wgULS/wgUVS functions in zhwp's sitelib (now deprecated, see zh:Wikipedia:HanAssist for the current version) allows for easy variant selection in userscript UIs.
This can help scriptwriters produce a variant-aware interface for users.
For other places unreachable by LC, {{int:Conversionname}}
can be used to fetch the current UI language/variant.
The PreviewWithVariant gadget allows Wikipedians to check conversion results in the editor preview. You can configure it for your own wiki.
"Foreign language marker" templates like {{lang}} should add "disable conversion" markers -{ text }-
around the quoted foreign text to avoid mis-conversion.
On Hans/Hant wikipedias this becomes a concern for Japanese Kanji and Vietnamese Han Nom, while on Wikipedias with Latin text marked for conversion this concern should be immediate.
The WikitextLC module allows for easily inserting LC commands to Lua output. The NoteTA and CGroup system allow for accessing pre-defined sets of subject-specific conversions. Module:地区用词 allows for an adaptive output of the form "foo, known in PLACE and PLACE as bar, and PLACE as baz".
Automated title redirection on URLs may cause apparent inconvenience for interfaces without this feature. See T49725 for the Lua task and T160952 for the section-anchor task.
URL Redirection
In some installations of MediaWiki, a short URL is employed.
For example, in Chinese Wikipedia, instead of https://zh.wikipedia.org/wiki/维基百科
(if no variant is specified) or https://zh.wikipedia.org/w/index.php?title=维基百科&variant=zh-cn
(if the variant is specified without rewrite rules), a shortened URL such as https://zh.wikipedia.org/zh-cn/维基百科
can be used as a temporary link to the specified script variant (zh-cn
in this case).
This behaviour can be seen several language Wikipedia such as Chinese Wikipedia, Serbian Wikipedia, etc.
However, others like Gan Chinese Wikipedia and Balinese Wikipedia often keeps the long url with index.php&variant=
.
This is controlled by $wgVariantArticlePath and web server rewrite rules (see manuals for short URL in Apache and nginx ).
Siehe auch
- m:Automatic conversion between simplified and traditional Chinese
- m:Wikipedias in multiple writing systems
- Specs/HTML#Language conversion blocks
- Parsoid/Language conversion
Directionality
Most writing systems operate as characters written left-to-right (LTR), with lines stacked from top-to-bottom (TtB).
A few common scripts (Arabic and Hebrew in particular) write characters right-to-left (RTL) -- see directionality support for more details on how we handle right-to-left and mixed bidirectional text with HTML output and CSS styles.
Note that an individual language can be used with scripts that have different directionalities, such as Kazakh and Kurdish which support Latin and Arabic variants.
Note also that the World Wide Web Consortium has defined more directionalities for the use in web pages, such as North East Asian top-to-bottom ones, with lines stacked either from left to right or right to left.[2]
Font rendering and input
Many scripts do not have proper fonts easily available to users. This may be because operating systems do not ship these fonts, or users don't know how to install them or don't have enough permissions to do this. The UniversalLanguageSelector extension tries to solve this by embedding the fonts in the wiki itself. Fonts will be served from the server and the user's system would not need to have the fonts installed.
UniversalLanguageSelector adds support to be able to type a certain script, so users do not have to rely on external tools or support on their systems.
Anmerkungen
- ↑ Taiwan and Hong Kong are two major variants written in the same Traditional script with significant differences in phrase usage due to market separation and influence from local
zho
languages, so you likely want to at least keep CN, TW, and HK in your list of variants. If you insist on flattening the scope of Chinese variants to a script-based Simp/Trad separation, follow what the reporter did in phab:T149278. - ↑ CSS Writing Modes Level 3