Jump to content

User:Liangent/wb-lang

From mediawiki.org

cat-redir | wb-lang (dev) | webfonts-cjk

Target: Language fallback and conversion feature for data stored in Wikibase / Wikidata.
Mentor: User:Denny, User:Aude

About me

Liangent (talk | global | mail | irc | gerrit | gerrit-review).

Preference

I'm submitting two proposals: one for the category redirects idea, and another for the wiki(base|data) language fallback & conversion idea. Actually I like to do either, and hope that another one can be done by someone else, because both of them resolve problems for our (zhwp) community. Since the first one has been listed on our idea list page, it's more likely that some other student wants to choose it too, and in this case I can do the fallback one.

From talk with my possible mentors, I learnt that Bawolff wants to implement category redirects themself if there's no student, and in Denny's plan language fallback is not something scheduled for short-term development. So if both projects give me a chance I'd like to do the Wikibase one. Besides the availablity of mentors, another reason is that new data are added to Wikidata rapidly these days. If language fallback is not handled properly, we may need more cleanup work to do in the future after people put duplicated data into multiple languages and / or variants, creating the same scenario as the one when LanguageConverter was initially created on zhwiki.

For above reasons, the main proposal is wikibase language work, and category redirects one is kept as a backup.

Introduction

[edit]

Currently Wikidata stores multilingual contents. Labels (names, descriptions etc) are expected to be written in every language, so every user can read them in their own language. But there're some problems currently:

  • If some content doesn't exist in some specific language, users with this exact language set in their preferences see something meaningless (its ID instead). This renders some language with fewer users (thus fewer labels filled) even unusable.
  • There're some similar languages which may often share the same value. Having strings populated for every language one by one wastes resources and may allow them out of sync later.
  • Even for languages which are not "that similar", MediaWiki already has some facility to transliterate (aka. convert) contents from its another sister language (aka. variant) which can be used to provide better results for users.

This proposal aims at resolving these issues by displaying contents from another language to users based on user preferences (some users may know more than one languages), language similarity (language fallback chain), or the possibility to do transliteration, and allow proper editing on these contents.

Although Wikidata is in its fast development stage, lots of data have been added to it. The later we resolve these issues, the more duplications may be created which will require more clean up work in the future, like what we had to face before / when the language converter (that transliteration system) was introduced for the Chinese Wikipedia. Besides, having this included in Wikidata design is better than patching is in adhoc ways later.

Finally, we don't have much workforce in LanguageConverter-related stuff. It's nice to accept me to do this now. :)

Requirements

[edit]
  • Every user may define it's language preference order
  • Every language has its system fallback order
  • Some languages can be derivated (converted) from other (prime & sister) languages (variants) automatically
  • Display what user loves best to the extent of what's available in current data
    • with some annotations saying what language a string is actually in, when it's falling back to another language

Timeline

[edit]
June
Su Mo Tu We Th Fr Sa
27 28 29 30 31 1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30
July
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31
August
Su Mo Tu We Th Fr Sa
1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
September
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27
May 27, 1900 UTC
Announced.
May 28 - June 6
(I'll be busy at the first one or two weeks after June 17 so I may have to start early to compensate that)
Investigate places where visible (to users & other external developers such as bot authors) work is needed which may include API (new interfaces or parameters may be needed), repo front-end (obviously), client front-end (for example, the add-link dialog) and exported data (for example, data dumps, if we're planning to provide per-language dump at some time), and design the interface when needed.
June 7 - June 16
Investigate current data exchange structures (API, embedded JavaScript data or anything else. Internal storage structure shouldn't be affected much as we're just doing fallback before data are sent out) and see whether they still meet our need. Design new data structure when necessary.
June 17
Beginning.
June 17 - June 30 (and as soon as any design is done)
Send designs of interface and data structures to mentor & others for review. During this period I may be somehow busy, so it won't block me much if there's some delay in others' actions.
July 1 - July 20
Code up anything internal (data structures, API etc.) based on design done in previous periods.
July 21 - July 29
Front-end development based on design, part I.
July 29, 1900 UTC - August 2, 1900 UTC
Mid-term; Writing some summary about current design and coding work as mid-term evaluation documents.
August 3 - August 11
Front-end development, part II.
August 12 - August 25
Test it to see whether it works well as an integrated product; tweak code when necessary.
Auguest 26 - September 2
Test it on larger data set? (optional, continue coding & testing work if it's not done yet)
September 3 - September 16
Try to have it deployed on Wikidata and test it in real world? (optional, continue coding & testing work if it's not done)
September 16 / September 23, 0900 UTC - September 27, 1900 UTC
Final documentations and reports.

Target

[edit]
Expected target
Have it deployed on Wikidata.
Minimal target
Have a working codebase with required features completed. In this case I'll be still interested in getting it live finally as it'll be a required feature for Wikidata users.
[edit]