Jump to content

Topic on Talk:Content translation

Request from Jawiki to abolish machine tranlation on CX2

52
Sethemhat (talkcontribs)

Hello. I am Sethemhat, a member of the project of Translation confirmation from jawiki. I'm here to request abolishing the machine translation service on Contents Translation (CX2) officially on behalf of the project. The request at this time has been discussed many times. We know that in many Wikipedias including enwiki, machine translation on CX2 can't be used.

The project has been reviewing articles made in CX2. As a result, we have concluded that machine translation on CX2 has obviously affected badly to Wikipedia, not only the article itself but the cost to deal with the bad-translated article. About half articles from CX2 have been reviewed as "Bad quality translation". The CX2 itself doesn't affect it badly, the machine translation on it affects it.

I regret to say to you that after several month's surveying, the member of the project has agreed and is ready to resort to banning most use of CX2 by the Editing filters unless the CX2 team won't abolish the machine translation service on Jawiki.

I hope you choose the better option. Thank you.

UOzurumba (WMF) (talkcontribs)

Hello Sethemhat,

Sorry about the issue with translators not using the initial machine translation appropriately. We have started working on making the initial translations to be modified by at least 75% before they can be published in the Japanese Wikipedia. Once it is done, please let us know if that helps. We are open to adjusting it further as needed. We will also evaluate translations published from our end to see if they need to be adjusted further.

Thank you so much for bringing this up.

Best regards

Omotecho (talkcontribs)

Hello, @UOzurumba (WMF), if you need statistic data that members of jawp Senior translators have invested their time and wisdom, please reach out.

AFAIK, the aggressive threshold of match rate under 75% is too high, and kindly offer discussion snd evidence that number 75 is a logic acceptable number: please have us a say on numbers, as I discuss herein, the system itself has its challenge to overcome. I am very happy to give you samples. And from an editor's eyes, other teams at WMF have solutions that will make things more easy at Tech team to tackle.

I am targeting under 85%, because I am not adding local perspective by scholars or professionals on CX2 workflow. That number is backed up with the condition that I output to my own w:ja:user:Omotecho/sandbox/article_title, do tease arch on citations or counter argument by local-ja academics/professionals.

I am reluctant but wish to say directly. Kindly imagine we CX2 supporters are sitting and watching that CX2 inevitably traps goodwill users to RfD for their CX2 output, and if they persist and protest "I obeyed rules and how-tos offered on CX2," for a Ban. That is not the scheme why we invented CX1. We need to reach out to more to-be editors globally, as a Movement strategy and its Initiative. How can we fill the gap of reality and our hope? That saddens me on two levels.

  • You might be aware how mismatching citation templates among language pair inevitably stripes off reliable sources, making my CX2 output unreliable and an easy target to RfD. Or that is why I seldom output from CX2 to page namespace directly. Using and rethinking CX2 usability over the years? I selected to give up getting higher "points" on CX2, accept its short comings, and I still am patching up those "wormholes" on CX2 as much as I can do: Intended to signal your team how time-consuming the CX2 is if you try and obey global wikipedia principles. Are you sure CX2 is matching such principles? Or how do you match varieties of citation templates among output languages? Is that such a huge task requiring high percentage for your fiscal budget?
  • How about planning a CX3? Introductory lesson combined with CX2 to start with? And limiting by your editor history where your CX output be acceptable, either on your user:username/sandbox, or directly on page namespace? Optionally, renaming that sandbox to a regular namespace requires review by Senior editors, perfect.
  • I know that kind of CX3 is a burden for some newbies or Junior translators who is eager to see an article posted by themselves (though no article belongs to anybody), but since academic papers have proved, Japanese language is among the most challenging language for MT engineers, and on top of that, language education matters to me.

Education path. My idea that has lead to dream of CX3: With no statistics, but I can safely say compulsory language classes in Japan and her schools do not introduce you to translation in Japan. Or in both Japanese and English courses at high schools or college/university level, they teach grammar, but not how to convey ideas written in non-Japanese languages into our native tongue.

Multilingual Wikimedians. When you think how you spend you day-to-day activities off-wiki, at least being an bi-lingual, and the percentage of those among your peer at local communities, I doubt ja extended community joins top 50 in our Movement. Street signs given in English (Tokyo/Osaka/Kyoto) are for tourists, but you will find how hard to get an answer to your simple question on how to get to a famous place on a street anywhere in my country, spoken in English: I mean IMHO our community might need a path for Junior editors, or those wishing to enrich Wikimedia Wikis utilizing CX2, a special preparation dashboard to learn translation. I have been preparing to draw a full picture how tech and education would shake hand on jawp.

Yes, we have a good one on Programs and Events Dashboard: I am translating "how to translate" and "To edit" sections, wishing the technical/systemic part of translation and contribution path of our activities be supported with educational viewpoint, and good teaching materials. How sad and angry you will feel that you (believe) you followed CX2 rules, output on jawp, then being henpecked as if you are a total idiot? I've been to that foothold by trying out the standard workflow we offer to newbies, used the CX2, felt the grutch, and gained knowledge translation-CX2-contribution to jawp has at least those three walls I lined out as above.

Sethemhat (talkcontribs)

I am sad that UOzurumba (WMF)-san couldn't understand my offer correctly: I offered to ABOLISH the machine translation system in CX2; not to establish a threshold in any%. All you have to do is to abolish it: not do anything other. It is because you seem not to be understood how the Japanese language is difficult and how inaccurate machine translation from English to Japanese is.

I also thank Omotecho-san for explaining his thinking. I will amplify what Omotecho-san has said in his sentences: "CX2 inevitably traps goodwill users to RfD for their CX2 output, and if they persist and protest "I obeyed rules and how-tos offered on CX2," for a Ban. " and "How sad and angry you will feel that you (believe) you followed CX2 rules, output on jawp, then being henpecked as if you are a total idiot?". This is precisely explaining what is happening in Jawp. When kind and motivated (beginner) editors translated articles with CX2, almost HALF of their articles will be sent to AFD and deleted. How awful! We, the translation confirmer are mourning that we're obliged to delete the kindly translated article because of its bad translation accuracy: CAUSED BY MACHINE TRANSLATION. What is happening in Jawp is beyond description: maybe more than 10000 articles have collapsed due to machine translation. Are bad and hard-to-read articles follow the philosophy of Wikipedia? I don't think so. I really hope WMF personnel understands our situation. Thank you.

Sethemhat (talkcontribs)

P.S. Thresholds are completely futile in the case of the Japanese. One intelligent translator in Jawp who has translated over 600 articles concluded as above because his manual translation has been diagnosed his article as "90%-machine translation" in an article.

And, he seems to be fed up with dealing with articles with machine-translation and paused his activity. I am also a soso-writer in Jawp but haven't written articles since August because I had to deal with a machine-translation problem. Like this,  machine-translation deprives precious time - that had to be used to write the more efficient things - from capable users.

UOzurumba (WMF) (talkcontribs)

Sorry, Omotecho and Sethemhat, for misunderstanding your request. But, I would like you, Omotecho, to help me understand your comment that said,

"I am targeting under 85% because I am not adding local perspective by scholars or professionals on CX2 workflow". Are you saying that the machine translation limit is an option that the Language team can consider for your community? But the machine translation can be modified 85% instead of 75%?

Then Sethemhat, you mentioned that an experienced translator who translates without using the Machine Translation support got a message that his translation is 90% machine translation when he tried to publish. Because of that, making the machine translation stricter is not an option. If the above is what you mean, I would like to understand this more because this might be an issue that would need fixing. Better still, the person can explain this more if the person is open to. I will be available to understand this more.

Please, understand that my questions are in good faith to help with the problems you have with the Machine translation support.

I look forward to you feedback.

Thank you!

Omotecho (talkcontribs)

@UOzurumba (WMF), hello, and thank you keeping your eye on this thread. Indeed, it is a potty that numerical data does little help for all of us wishing to support better environ for translation. And for multiple reasons, please let us not focus on percentage as a savior of the issue. The problem is rooted deeper than surface numbers/statistics AFAIK, that is why I wish to join the discussion here as a volunteer. I need to repeat that semantic theory of human translation and pattern analysis approach of MT have a wide gap we need to visualize. That's egg or hen debate, I am not a philosopher but a practitioner, thus wish to offer my view how I see CX2 can be upgraded as a translation workflow. Maybe we could list aspects that need bilateral attention to enhance statistic research and data based discussion. In the age of machine learning, some of CX2 defects are so simple to solve IMHO, or are those resource demanding?

The area of articles and CX2

It does not show difference in workability to me, maybe yes between Physical Science or Literature, but rather CX2 suffers by theory of linguistics, as much as bridging MT output and wikification-templates. Both @Sethemhat and I work on History articles, and I translate pages in Humanities, too. However, the article categorized in Geology or Food and Drinks I work on suffer as much as History. If we understand those "features" CX2 offers us users at the moment, Medical or Space Science would suffer as much as Machine Technology articles.

Could those shortcomings be the target of any reasonable upgrade, if we users offer what kind of feedback?

If numbers matter or not.

As CX2 output is earmarked as "Maybe a low quality translation" if I hit over 85%, I have chosen to go practical, only to not to alarm my peer translators watching over the standard of jawp articles/CX2 output. I do not mean precision score or number certifiés (<- system spell check error) the output is "good enough" nor readable as jawp article. Say, as Google translation hires and matches translation memory and my edited text on CX2, and statistically gives score, I "fool" CX2 score system to hit below 85%.

For statics, I spend between 10-20% of my wiki hours on editing poor quality translated pages. Without watching some pages, such as Recentchanges or New articles, I read jawp articles in the area to support rewriting my CX2 output in User:Omotecho/sandbox, and inevitably hit articles that I feel obliged to help and fix: It is very helpful that CX2 gives a flag on its output that when I check EditHistory, I find the user made the article applying CX2, and I read two articles, an enwp article as the translation original as well as its cousin on jawp. It is marvelous to find how honestly or in good intension CX2 users output bad quality articles, without twisting scores intentionally as I do. I ironically can rely on CX2 output which part to fix, as follows.

I am discussing citation template and its parameter mismatch between language pairs, only as an example which looks low cost to me to patch up. May I ask plainly if CX2 team on WMF Tech team realize that users suffer by "stolen" citation templates? CX2 lowers reputation of CX2 users, or peer editors don't know why a Junior editor insists their translation is acceptable but lacks citation. You could make Junior editors leave jawp because they don't know why their translated article was called low quality, not given hint or dashboard to mend "errors"; themselves are not responsible for lacking citations, but caused by CX2.

  • Citation errors CX2 produces: above 80% of CX2 output by myself or by other editors lack citations. That is fatal if you are contributing to an encyclopedia, a basic principle is violated if you lack citation to reliable sources. I am warned on CX2 workflow the system omitted citation template that exists in the original article on enwp. Reasoned CX2 has stripped off citations when processing, why not introduce an algorithm, or matching table for varied params applied in enwp and jawp? Does it require human to mend/supplement lacking parameters which causes error flags on jawp?
  • I am more than sure if you are a Junior editor, you don't notice CX2 has "deleted" citations which is in enwp article that the editor is working on. On top of that, I don't have any statistics if any editor, without discriminating the accumulated wiki hours, realizes their CX2 output be nominated for deletion reasoned as unreliable/low quality, lacking citation to reliable sources. They don't know that CX2 contributes to the editor's reputation that way.
  • short term patch-up #1: The warning message could be shown in alarming color like red. We need to tell that editor, at least to expect their output might be checked on when they publish such CX2 output on page namespace. Too simple, but till we upgrade CX2
  • short trem effect to above proposition #1: Being noted that those Senior editors do not objectively henpecking a Junior editor's work applying CX2 output, might help Junior editors feel more comfortable or less hostile, nor feel trapped by the system, and we would retain them to become Senior editors.
  • The most relevant parameter is "access-date" on cite_web template. It seems not to trigger error on enwp if you leave access-date blank,(?) but it triggers error flag on jawp. The gimmick here is that you need to shift to Edit mode and see which ref tag includes such error, not handy at all to me.
  • So we are looking at how each language wikipedia applies error threshold on citation. Is there any solution or fix we can apply to CX2? How can we prevent too philosophical thinking here?
Omotecho (talkcontribs)

Fixed dagger comments.

YellowSmileyFace (talkcontribs)

Hello UOzurumba (WMF). I am YellowSmileyFace, and I am also a member of the Translation Confirmation project on jawiki (ja:プロジェクト:翻訳検証), and I sometimes translate articles from English to Japanese. I would like to say that, even though I see number of articles being published from CX that made only a little to no modification to machine translation every day, I am against the idea of preventing someone from publishing an article if the system finds that someone did not do enough to modify machine translation. For instance, when I am publishing a translation, I sometimes get a popup before publishing from CX saying I have not modified machine translation enough, even though I modified it to meet my standards. I also get this popup sometimes when I translate it from the original text, without the help of machine translation, which kind of annoys me. Would be great if your team could fix this problem, or possibly abandon the idea itself. Thanks for your time.

P.S. I do not think I am the person mentioned by Sethemhat. But I wish I was.

Sethemhat (talkcontribs)

I'm disappointed with @UOzurumba (WMF)-san for adhering to set thresholds. I again and strongly say that your new threshold of 75% is entirely vain. There are three reasons:

1. As Omotecho-san stated, the percentage of 75 is completely baseless. You should be familiar with English-Japanese translation to determine the threshold. Do you truly believe that you have deeper understanding of English-Japanese translation than our community? It is obvious that this was a figure given quite clearly without sufficient understanding which renders any so-called threshold completely obsolete. Moreover, it is completely harmful as I have mentioned in the case of the veteran translator.

2. My point was about the abolishment of machine translation entirely on our Japanese wiki. I didn't offer you to set the threshold higher.  As editors of Wikipedia,  we cannot let the standards slip as we have a duty to the netizens to provide a trustworthy information source. Setting thresholds will lower the accuracy and the legibility and will not solve the problem in anyway; we must abolish machine translation in its entirety.

3. You didn't suggest changing the threshold as an option, you created the phab task. You intended to deal with my demand by alternating abolishing into the setting threshold, but I know what the point is.

This may be the final warning to you. We, the Japanese Wikipedia would rather if we did not have to ban CX2 because CX2 is said to be very useful when a veteran translater translates. However, we also have a duty to the public to uphold the standards, and currently, CX2 compromises it by quick-machine-translation service. If the CX2 team is cannot listen to the feedback given by the community, I am sure that there will be a continuation of a ban throughout other Wikipedia as well. We cannot even look at the current state of CX2, and we had enough with tolerating it. We deeply call for a swift ban on machine translation, until such a time comes - not in the foreseeable future - that machine translations will be able to meet our standards (Revised).

UOzurumba (WMF) (talkcontribs)

Hello Sethemhat

Sorry, I lost track of this ticket. However, the machine translation limit can be adjusted anytime. So, it is not a problem. To be 100% clear, you are requesting that the machine translation support be abolished, not the Content translation itself. as that is what I understood from our previous discussions.

Also, It would be helpful if you could provide the following information for the language team:

  • Examples of mismatching citations in translations, you can show how they appear in articles that usually have the problem. You can provide more details in screenshots.
  • Samples of Automatic Machine Translation of three articles (English to Jawiki) and the corrected translation side by side? You can use this spreadsheet to provide the information.

Finally, will you, Omotecho and YellowSmileyFace and any other person in your community be open to a virtual meeting with the language team if need be, and I succeed in arranging this? Let me know if it is something you can do.

Omotecho (talkcontribs)

@UOzurumba (WMF), hello, thank you for your kind offer, and I am open to join an online conversation, on the standard how Wikimedia functions per volunteers and tech guardians. I, too am not very optimistic about MT for ja language, however, to make our comprehension of the situation clearer and shareable, would you consider two things?

I wish that we can start with a sample that attendees will be able to test by themselves. To do that, would you kindly point me to how I pick up CX2 workflow ID on FireFox please? My input on Mediawiki has a link to my testing threshold 75, ended up 77 with no MT but not avail to output the translation anywhere. I wish to provide workflow ID so that you will be able to #1 reverse MT and #2 further investigate why/how citations does not migrate from enwp-CX2-jawp.

The second thought is still just an idea: maybe you have already shared and evaluated with language specialists and tech people you exchange advice. Would anybody on Language team has contacts at the European Association for Machine Translation? Ideally, their sister group, please reach out to Asia-Pacific Association for MT (AAMT) who had a wonderful survey in the 1990s on ja-en-ja MT, continuing efforts operated by MT engineers at corporates as TOSHIBA and Fujitsu: those who were developing their version of MT app packages. As well, AAMT membership was shared among leading MT service providers in Japan, too. They moved on to speech translation/interpretation in the 2010s.

It might need wisdom to carry out, and I am crossing my fingers that those MT specialists on en-ja-en process might fill the seat if we'd have an extended online meeting, which will be an opportunity to desect the issue with technical overview and analyses: how fit/unfit ja language and MT be, good for handbooks for photocopiers or microwave cookers, but not that to other subjects we are proud of our Wikipedias. If I remember correctly, translation algorithm has shifted to memory/pattern analytic approach for Latin script languages with limited success, but not that for double-byte languages that apply Subject-Object-Verb sentence patterns. We need to remember ja writing oftentimes omits Subject, which is fatal to pattern analytic MT algorythm. Cheers,

UOzurumba (WMF) (talkcontribs)

Thank you Omotecho, for your feedback. I will enquire if there is a contact with the mentioned associations from my team.

Pardon me, I don't understand your request here: "I wish that we can start with a sample that attendees will be able to test by themselves. To do that, would you kindly point me to how I pick up CX2 workflow ID on FireFox please? My input on Mediawiki has a link to my testing threshold 75, ended up 77 with no MT but not avail to output the translation anywhere. I wish to provide workflow ID so that you will be able to #1 reverse MT and #2 further investigate why/how citations does not migrate from enwp-CX2-jawp."

Are you asking that I provide an article for you to translate or how you can share the translations with me?

Omotecho (talkcontribs)

日本語は英語の後にまとめます。(text in ja follows) Ideally, as @User:Sethemhat has pointed out, a bilingual MT/linguistics specialist would be great to help us go forward. We both sides are "MT users/unisers" here as we discuss, we did not invented it. That said, I need to tell you I am not promoting MT, as bitter as it might sound, but based on where we stand as users/editors and keeping Wikimedia running budget-wise, my understanding is that we are dealing with a kind of delicate subject IMHO. I wish CX2 will remain a great resource to me as an editor, as before "75-line" was drawn rather hastily. My apology and I admit my carelessness not to follow up the discussion report till September more in details.

Now.

Your inquiry hits the center of what we are looking at: two parts actually.

  1. Does CX2 team need to see the log point so that to investigate codes and find what is not working? If so, a log to pick up and analyse: excuse me for very basic question, but how will I present you a point of workflow on any en-ja CX2 procedure where I doubt problem? Don't we need an exact log point so that we can discuss what we need to focus on? FYI, I recall past conversation among mw:CX2 talk_page how you pinpoint one log entry on the CX2 workflow, maybe on Microsoft browser. They were talking about if somebody wanted to share and request analyse, log point was very useful for engineers. (I myself apply Firefox, and seldom apply Safar.i)
  2. to select sample: either way please, and I recall that jawp translator circle has already talked about the case analyses we are accumulating on jawp. Those are evaluated on jawp side, and are you interested in doing analyses on those same articles as well?
  3. Those jawp proofreading cases might be mostly pre-75 threshold, so that we can time travel together to see what jaWP translators base their judgement from living language suited for encyclopedic entry.

What do you think? Maybe we both try; (1) en-CX2-ja translation, and then (2) reverse-translate the raw output to ja_(1)-CX2-en_(2)? How can we stand on a level ground to see what is not working?

I dream: Our trial could find some technical solution/post process to consider for CX2, such advantageous for languages applying SOV grammar.

どちらがCX2にかけるサンプル版を提示するかという話題。前提としてウィキメディア運営めんも意識して微妙な問題と意識するが、同じ記事をそれぞれがCX2にかけて作業して、どこに問題があるか共通認識もしくは認識の違いを明らかにしませんか。日本語ウィキペディアの翻訳者側からCX2出力査読の蓄積について、これまでの議論で申し上げたと推測します(UOzurumbaさんたちも同じ記事をCX2評価に使ってもらいたい)。もっとも、しきい値75以前のサンプルが大多数を占めるはずです。とにかく双方で方法を揃えてテストし、問題点を具体的に考えたいと提案します。

UOzurumba (WMF) (talkcontribs)

Hello Omotecho, based on your suggestion, I have provided the MT translation of a section of 2 different articles (the article is in English, and I used the CX2 MT to translate it to Japanese with no edit). Please, manually translate the Japanese translation to English in the other column. Once you have done that, I will share the original article to compare.

Machine translation of a section of an article in EnglishThe human translation in English (please don't use Machine translation)
命名規則
命名規則は、物事に名前を付けるための一連の合意、規定、または一般に受け入れられている基準、規範、社会規範、または基準です。
親は、子供の名前を選択する際に命名規則に従うことができます。出生順にアルファベットの名前を選んだ人もいます。一部の東アジアの文化では、2 音節の名の 1 つの音節が直系の兄弟と同じ世代名であることが一般的です。多くの文化では、息子が父親または祖父にちなんで名付けられるのが一般的です.カメルーンなどの特定のアフリカの文化では、長男は彼の名前から姓を取得します。他の文化圏では、名前に居住地や出生地が含まれる場合があります。ローマの命名規則は社会的地位を表します。
主な命名規則は次のとおりです。
"天文学では、天文学的な命名規則
生物学では、二項命名法
化学では、化学命名法
古典では、ローマの命名規則
コンピュータ プログラミングにおける識別子の命名規則
コンピュータ ネットワークでは、コンピュータの命名規則
惑星科学では、惑星命名法
一般に、科学では、さまざまなものの体系的な名前"
製品は命名規則に従う場合があります。自動車には通常、2007 年のChevrolet Corvetteのように、年式に加えて、"make" (メーカー) と "model" の 2 項名があります。車の「装飾レベル」または「トリム ライン」の名前がある場合もあります。たとえば、貴金属の後にCadillac Escalade EXT Platinumなどがあります。コンピューターの名前には、次世代を示すために数字が増えていることがよくあります。
学校のコースは通常、命名規則に従います。つまり、サブジェクト領域の略語と、難易度の高い順に番号が付けられます。
多くの番号 (銀行口座、政府 ID、クレジット カードなど) はランダムではなく、内部構造と規則を持っています。名前や番号を割り当てる事実上すべての組織は、これらの識別子を生成する際に何らかの規則に従います。航空会社のフライト番号、スペース シャトルのフライト番号、さらには電話番号にも内部規則があります。
初期の人生とキャリア
オバマ氏は 1961 年 8 月 4 日 、ハワイ州ホノルルの女性と子供のためのカピオラニ医療センターで生まれました。 彼は、隣接する 48 州以外で生まれた唯一の大統領です。 彼はアメリカ人の母親とケニア人の父親から生まれました。彼の母親、アン・ダナム(1942–1995) はカンザス州ウィチタで生まれ、大部分が英国系であったが、2007 年に彼女の高祖父ファルマス・カーニーがアイルランドのマネーゴール村から米国に移住したことが判明した。 1850年。 2012 年 7 月、 Ancestry.comは、ダナムが 17 世紀にバージニア植民地に住んで奴隷にされたアフリカ人、ジョン パンチの子孫である可能性が高いことを発見しました。 オバマの父、バラク・オバマ・シニア ( Barack Obama Sr. ) (1934–1982) は結婚していた ニャンゴマ・コゲロ出身のルオ・ケニア人。 オバマ氏の両親は、1960 年にハワイ大学マノア校のロシア語クラスで出会いました。オバマ氏の父親は奨学金を受けている留学生でした。 カップルは、オバマが生まれる6か月前の1961年2月2日に、ハワイのワイルクで結婚しました。
1961 年 8 月末、バラクが生まれて数週間後、バラクと母親はシアトルのワシントン大学に移り、そこで 1 年間暮らしました。その間、バラクの父親はハワイで経済学の学士号を取得し、1962 年 6 月に卒業しました。彼は奨学金を得てハーバード大学の大学院に通い、そこで経済学の修士号を取得しました。オバマ氏の両親は 1964 年 3 月に離婚しました。 オバマ・シニアは 1964 年にケニアに戻り、そこで 3 度目の結婚をして、財務省の上級経済アナリストとしてケニア政府で働きました。 オバマが 21 歳だった 1982 年に自動車事故で亡くなる前、彼は 1971 年のクリスマスに一度だけハワイの息子を訪ねた 。 オバマ氏は幼い頃を思い出して、「私の父は私の周りの人々とはまったく似ていなかった.彼はピッチのように黒く、私の母はミルクのように白い. 彼は、彼の多民族の遺産に対する社会的認識を調整するための若い成人としての彼の闘争について説明しました.
1963 年、ダナムはハワイ大学でロロ・ソエトロと出会いました。彼は地理学のインドネシア東西センターの大学院生でした。カップルは 1965 年 3 月 15 日にモロカイ島で結婚しました。 J-1 ビザの 1年間の延長が 2 回行われた後、ロロは 1966 年にインドネシアに戻りました。彼の妻と義理の息子は、16 か月後の 1967 年に続きました。一家は当初、南ジャカルタのテベット地区にあるメンテン ダラム地区に住んでいました。 1970 年から、彼らは中央ジャカルタのメンテン地区の裕福な地区に住んでいました。


Please share the articles you said have been analysed by the community before making the MT limit 75% stricter with illustrations or explanations that will help us understand the errors or problems. Thank you for providing a sample in the spreadsheet and for the context everyone has provided in this thread.

Once I get the above translations, I will forward all the information provided to the Language team for evaluation towards understanding the problem.

Best regards.

Omotecho (talkcontribs)

Hello, @UOzurumba (WMF), thank you for your patience as my reply took weeks to come back.

In regards of bilateral analyses of CX2 output. I owe you an apology not being clear about my offer. Anyway, your two examples are for (1) technical writing and (2) a biography, and I respect your choices: yes, those writings are contrasting as far as we test if how far they are fit to Machine Translation.

However, my proposal for bilateral check is base on the trove that jawp translators had checked against and accumulated. That is a treasure trove, is it not?

That way, whenever you need to clarify, I can try and canvass insights on what point each turned-down contribution via CX2 ended up below acceptable threshold. Many times we blamed not how CX2 works, but the raw MT output that CX2 user did not correct.

Post-edit after MT on CX2 requires higher degree of fluency in language/translation. As my jawp friends might have noted elsewhere in this thread, and the past insights your team exchanged with jawp members.

What botheres me who translated the meta page describing CX2: it does not fully indicate post-editing on CX2 is your responsibility as CX2 user: you can easily believe you output on local wikis, then somebody angelic will clean the mess. That has been the bitter reality on jawp AFAIK.

Actually, proofreading poor translation is a very tiring task, it's much better to start translation from the first word spending the same amount of time. There are many people cleaning the mess.


Then, the other bunch of people outnumber the "cleaners", who finds joy in producing footprints on jawp. Maybe they praise themselves for the edit count, or grinning to see edit history pages starting with their usernames on the first line: being #1? Pioneered?

If they are made to believe they are using CX2 correctly, by the book, then myself who translated the Meta page owe them a lot of apology.

Sethemhat (talkcontribs)

@UOzurumba (WMF)-san; Yes, you are right on the point that I am not requesting abolishing CX2 itself. If "the machine translation limit can be adjusted anytime", please do it as soon as you can. Thank you.

I responded to your proposal a little: I tried just to change machine-translation into allowable Japanese, but it has been so corrupted and annoying to see that I dismiss it entirely and translated from the original text. Veteran translators usually not only translate (English) articles but also add information. However, I think this trial is vain because I completely do not trust post-editing texts provided by machine translation.

I am able to join the virtual meeting, but I won't because I don't think I have to do it and I am really busy. I am not willing to tell you about Japanese-English translation systems because you will use my info to improve "machine-translation", right? A layman (you. a person who doesn't understand both Japanese and English) can't change anything. I won't cooperate with you unless you try to tackle the machine-translation problem by inviting a person (e.g. College professor) who is a professional in English-Japanese translation. I beg your patience.

Sethemhat (talkcontribs)

I: Try to use CX2 'cause I haven't use it. Of course manual translation. I found an interesting Egyptian nomarch.

(...Translating...)

I: I did it! Let's publish it.

CX2: 78% MACHINE TRANSLATION! YOU CANNOT PUBLISH ARTICLE.

I: :| ...FORGE IT.

(...Adding Japanese old story to disguise article...)

I: Ok, Let's see how it goes.

Local Wikipedia: REF ERROR! NO CATEGORIES! NO DEFAULTSORT!

I: :|

It took me to 20 minutes to fix problems caused by CX2. Again, I say that thresholds are meaningless.

Sethemhat (talkcontribs)

? What are you doing, @UOzurumba (WMF)-san? I'm getting annoyed. I told you that our sample test will be in vain unless you do it with a professional en-ja translator. You told me "the machine translation limit can be adjusted anytime". Why didn't you do it first before asking us about somewhat meaningless sample test?

1234qwer1234qwer4 (talkcontribs)

Just a notification for @Pginer-WMF; this seems to be pretty much the largest problem with CX right now from my experience of recent (~a year) posts on this talk page.

Pginer-WMF (talkcontribs)

Thanks everyone for sharing their perspectives. The Language team intention is to support each community in the best possible way. When issues about low quality translations are reported we try to understand the issues in detail and explore options and their potential impact.

Machine translation is used in over 90% of the translations to Japanese, showing that many find it convenient. Disabling it would affect many users including those making a bad use of it, and those making a good use of it.

If we look a the number of deletions for this year (Jan-Oct 2022) we find that 9.5% of the translations are deleted in Japanese Wikipedia. This is interesting because:

  • The deletion rate for translations is lower than the deletion rate for articles created without using Content Translation in Japanese Wikipedia (14%). Showing that writing the whole article without machine translation support does not make the articles less likely to be deleted.
  • Assuming that all deleted translation were using machine translation, there would still be a majority of 81% of translations that were using machine translation but were not deleted.
  • Looking at the distribution by user expertise, less experienced users with a total edit count lower than 100 edits made 668 translations this year and had 20% of their translations deleted. More experienced users with an edit count over 1000 edits made 941 translations and had only 3% of their translations deleted.

Quality is hard to assess just by looking at numbers, but it shows that perceptions like those captured in the discussion stating that about "half of the translations using machine translation are deleted" may not be accurate. Drastic measures like disabling machine translaiton would also affect the experienced translators that are making a good use of it with very low deletion rates.

Disabling machine translation is equivalent to set a limit to allow 0% of unedited machine translation when publishing. That is, forcing users to write the whole translaiton on their own. Moving from the current limit from the default 99% to 0% seems a big change to do in one go. That's why we proposed an initial adjustment as an initial step in that direction to evaluate the effects and decide how much to move further. Setting an intermediate limit (as strict as needed, but not 0%) would allow users to still use machine translation while making sure that the cases where it is not edited enough are reduced.

Adjusting this is an iterative process, we don't need to find the perfect limit the first time. This is about enforcing a certain amount of modifications that you want to make sure are done based on the quality of the machine translaiton, and avoiding the system to change radically for many users. For this we need the community collaboration, and your input is totally welcome.

Thanks!

YellowSmileyFace (talkcontribs)

Hello @Pginer-WMF; I'm not planning to go deep about this issue, but I thought I would share my thoughts.

The thing with machine translation is that one user could produce hundreds of articles with poor translations (machine translations) and it would take a really long time to delete them/fix the translations. An example of this is an AfD in the Japanese Wikipedia, :ja:Wikipedia:削除依頼/IP:2400:4050:9920:BA00:0:0:0:0/64の作成記事(追加分). This pretty much lists 246 articles that were created by likely the same person without major fixes to the machine translation. It's taking more than a month to check which articles are worth keeping and which articles are not. Maybe, just maybe we could keep machine translation, but we'll need a good, easy and a quick way to assess the translations made with CX2 because it takes so much time to check them by hand.

Pginer-WMF (talkcontribs)

Thanks for the input @YellowSmileyFace. Apart from the isue with machine translation (where we are in the process of disabling), it would be very useful for us to learn more about which kind of support would be helpful to help review translations created with Content Translation. Are you thinking on aspects such as showing how much has been the user editing each part of the content? whether the user has had previous translations reverted/accepted? something else?

YellowSmileyFace (talkcontribs)

Thanks for deciding to disable machine translation! I think being visually able to see if a certain user has had their previous translations deleted would be pretty useful as it helps us when we are taking translated articles into consideration.

YellowSmileyFace (talkcontribs)

Hi @UOzurumba (WMF) and @Pginer-WMF;

It looks like there are confusions between us so I will try to make it clear.

We're pretty much saying that "If you don't ban machine translation from CX2 for translations to Japanese, we will (try to) ban CX2 itself", because there are way too many low-quality translated articles made with CX2 for the community to find and delete.

Hopefully that helps you understand what we're trying to say.

Omotecho (talkcontribs)

@Pginer-WMF, I am thankful at last I have a place to share and exchange with your team what is the short coming, and what is good for the Wikimedia sphere.

以下、要点のみ日本語を併記します〈Omotecho発言〉。

That said, I am afraid we are forgetting about our readers, or those who are our temporary editors-to-come, remember Movement Strategy 2030? Are we sure jawp readers look at us Wikimedians doing our job to the max scale, feel safe to rely on our Projects/local Wikipedias, even to prep for their semester-end tests?〈読者への目を向けているか、運動戦略2030と整合性はあるか(将来の読者、寄稿者の確保)〉

I am very much keen to know where you stand your evaluation on the "lowness" of turned-down/deleted CX2 output on jawp, on what contrast-comparison. Could we discuss on hard numbers/analysis please? Would you point me to any research, meta, mediawiki, or scholastic papers?

I can counter your points:

  • The deletion rate for translations is lower than the deletion rate for articles created without using Content Translation in Japanese Wikipedia (14%). 日本語版ウィキペディアで記事の削除率を比べると、CX2を使わなかった場合の14%よりCX2を使った方が低い。〈CX2を使わない=翻訳記事+書き下ろし記事を含むので一概に対比できない。CX2出力記事には自負して対処するグループはあるが、CX2を使わない記事群には管理者もしくは個別の有志しかいないため。〉
  • # Are you comparing translated articles with/without CX2? Does those articles created without using Content Translation include original writings or not? FYI, in jawp and its deletion rate, there is no specific group watching for that larger group of translated articles, but ad hoc editors and sysops.
  • Showing that writing the whole article without machine translation support does not make the articles less likely to be deleted. 機械翻訳の支援を受けずに書いた記事の方が削除が少ないと言えない。〈それは仮説になり得るが、ウィキペディア外で機械翻訳にかけた結果をコピペし、それを伏せたまま出稿できる以上、比較にならない。また機械翻訳のクセは学習できるので、それでウォッチャーが見破る、あるいは寄稿者はそこを加工すれば発覚しにくい面を、この分析は考慮に入れていない。〉
  • # Maybe yes, but ppl copy Machine Translation output from translate.google.com and publish as their original article: by-pass CX2 altogether, and trained editor can break that tactics by MT specific features, which in turn, veteran cheaters are well aware of and mask/retouch such weak points.
  • Assuming that all deleted translation were using machine translation, there would still be a majority of 81% of translations that were using machine translation but were not deleted. 削除された翻訳記事が全て機械翻訳を使っていたと仮定した場合でも、翻訳記事の81%は機械翻訳を使って存続した。〈もし機械翻訳を使った寄稿者数と、ウォッチして対処する編集者の数が半々なら、その見解に賛成できる。〉
  • # if you have matching number of contributors applying machine translation and those counter-acting those publication, surely I will agree with you.
  • Looking at the distribution by user expertise, less experienced users with a total edit count lower than 100 edits made 668 translations this year and had 20% of their translations deleted. More experienced users with an edit count over 1000 edits made 941 translations and had only 3% of their translations deleted. 利用者の経験の点。初学者で通算編集回数100回未満は翻訳記事を668本投稿し、削除率20%。同回数1千回超の利用者は同941本投稿し、削除率はわずか3%。〈Omotechoの個人的な感覚に非常に近い。統計値を求める手法をご教授願いたい。回数1千回超の利用者つまり「シニア」は(a)翻訳記事として受容されるレベルを覚えた。(b)そこそこの品質の翻訳記事に改良するコツは日本語版典拠の追加などすでにSethemhatさんからご披露した通り。CX2出力の翻訳記事をウォッチしている人々も忙しい合間を縫って、質の低い翻訳記事の改良方法を指南しているかと推測。〉
  • # That sounds very interesting as I have seen the same trend. I wish to grab it statistically, and would you care to instruct me which API I will use and recreate your analysis? My hunch is that editors with 1000+ edit count, aka Seniors, are either self trained or we watchers have taught them; (a) what is acceptable as translated article, and (b) how to improve from the translation into good quality, as @Sethemhat has told you, as adding local citations and so forth. In a way, busy Translated Article watchers are doing our best to be instructive, not always grumble in angry tone.

It's very easy for me to forget about CX2 and leave the room, but my responsibility as the translator on Meta:CX2 page does not allow me to drop CX2.

Serving our readership.

May I ask what statistics for the editors/readers do you keep track of for those sectors?

読者対応 (1)CX2出力記事の対策者の数。3500名超の活発な寄稿者に対してフェアな割合かどうか判断を求める。(2)活動中の翻訳者の全数(経験値を問わず)と、CX2利用率の変遷。(3)日本語版ウィキペディアで寄稿されるCX2出力記事の統計、(a)総計、(b)削除・削除候補、(c)存続し他の編集者が加筆して成長。(1)-(3)については「CX2に満足している言語版」の統計を共有していただきたい。〉

- (1) CX2-output watchers are you aware of: Is it fair to have that small number of people to try and break their backs and safeguard quality of jawp, with active editorship 3,500+?

- (2) What is the total number of active translator on jawp, either Junior or Senior? Is CX2 helpful for them, or what per centage of them apply CX2, increasing/decreasing?

- (3) What statistics do you have for the CX2 output on regular/article namespace on jawp, (a) the total ; (b) declined or nominated for deletion ; and (c) successful and expanded by other editors? Ratio please?

  • Compared with other language version Wikipedia where you assume users/readers are "happy" with CX2 output. Would you mind to share in this thread what figures of those three sectors (1)-(3) those counterpart wikipedia show and we can contrast with?

Quality versus publishing articles

Kindly try to track cases when Junior editors stops contributing after their first/second CX2 output on jawp. What statistics will you have?

品質:CX2を使って寄稿し、すぐにウィキペディアの活動が止まった利用者を追尾して統計を出してほしい。翻訳ウォッチャーは品質を重視していて書き下ろし記事よりも見方は厳しいかもしれない。〉

Quality is everything for translation watchers, maybe the threshold we put is stricter than non-translated articles. Sure, Junior editors try out CX2 (as they are encouraged by the system), and as a person myself who has translated the Meta page canvassing CX2 users, I feel very guilty that I am offering genuine hearted Junior editors to a field of hen-pecking, so to speak.

Instruction

We need better localized instructions on Meta page for CX2. Juniors are eager to be published, and that has to come with quality per jawp 方針.

〈メタウィキはCX2を使うように勧めてはいるが、使った時の副反応=ウィキペディアでどういう対応を受けるか説明するべきではないのか。それを知らずに使い、疲弊してウォッチャーに「叩かれた」からウィキメディアを去るのでは、あまりに悲惨。〉

But, I sense Senior editors, or Translated Articles / New Articles watchers, are too busy and worn out to give Junior editors/CX2 beginners good guidance on how to mature as translators. It is bitter to see Juniors being warned for their CX2 article and low quality, and heartbreaking to see them leave Wikimedia back into their usual busy SNS trend chasing.

On jawp, we who watch for CX2 output are so busy to tackle the target numbers accumulating, and I am sure you have looked into the workflow or records the group has accumulated data. That is behind the SMALL number you have pointed out.

〈CX2出力記事の品質チェックは人海戦術。CX2出力の質が悪い寄稿者に丁寧な指導ができない状態の解決策として、もし足りない人手を補うなら、ボランティアに今の例えば2倍の時間を費やせなどというナンセンスな話になってしまう。〉

As an example: to increase the number of watched/巡回済み CX2 output, you are asking me to increase 10% of my wiki-time watching CX2 under-quality output to 20% at least. Is it realistic? You might imagine the scenario as:

  • Those very practical/dry comments supplied to Junior editors who did not post-edit MT output on CX2, is in line with CoC on jawp, but,
  • written in jawp jargon which sounds intimidating to soft hearts of Juniors.
  • And they are not often explained why they are scolded as if not obeying the basic rules of translation into ja language, or as if ignoring Code of Conduct. Because
  • who care to post on Junior editors' User_talk pages are very much worn out to be in the shoes of a teacher or mentor.

Our future editors

Going through Junior to Senior phases, how can we take care of the Juniors who thought CX2 is a magic wand to publish their first article? But finding themselves bashed bacause they used CX2 as instructed by the system = us?

〈CX2が魔法の杖だと思って使ったら、それが原因で批判に晒されるのは、メタウィキの解説文書が対応不足では? 変更しなくて良いのか?〉

  • No, Meta page on CX2 does not let them be prepared to the hardship if they do not thoroughly post edit machine translation (MT);
  • No, jawp does not have the Mentorship project installed, which might be a set of seat belt for Juniors to endure the turmoil/jet coaster ride after their CX2 output published on Regular/article namespace. Because;
  • No, we (Meta-users) don't advise them to publish their work on CX2 to their User:somebody/sandbox/article_name, and ask for advise/co-editing. Hense;
  • The Meta documentation needs to be advanced, and assure Junior editor's independence to and good judgement at CX2.

Please see the issue as multi-faceted. It's not only Language, but involves mindset toward Wikimedia in the larger picture, and that is what we are learning together for Movement Strategy 2030. How can we stop the slowing down of outreach/new editors joining our Movement, if CX2 make their first steps very hard to walk through? And, is there any larger factor than MT that trap Junior editors publishing their translation into jawp pushed into criticism in which they are victims in part?

CX2 is a good tool, trust me, if it does not calculate my manual translation against its translation memory and shamelessly claim "hey, you are using MT!". To publish the fruit of my labor, I need to cheat the system, good example is illustrated by replies herein. That is not productive nor ethical to those who welcome your insight.

〈CX2を使ってもまるっきりの人力翻訳をしたところで、翻訳メモリと照合して一致率を出してくる。しきい値を超すためにすでに例を示したとおり小細工をすれば出稿できるが、本末転倒。個人的には倫理に反する点が苦しい。〉

Sethemhat (talkcontribs)

以後の会話は私の母国語で行わせて頂きます。 あなたに機械翻訳に関する問題を正しく認識頂けていないようで大変残念に思います。理解して頂けるという一縷の望みをかけてメディアウィキに来たのですが、無駄だったようです。@UOzurumba (WMF)さんは対処は可能だと仰る一方で実際に実行されようとはしませんでした。奇怪ですから、陰謀めいたものを感じるのも強ち間違いではないのかもしれません。最終的な決定権限(というか力)はローカルにあることをお忘れでしょうか。私は最初に警告しましたが、ローカルの意見を無視してまで機械翻訳に固執するならば、よいでしょう。こちらも相応の覚悟があります。そもそも、ja:PJ:翻訳検証などというプロジェクトが組織されるのが異常事態です。それはすなわち看過できない量の雑機械翻訳記事があるということを示しており、黙って指をくわえて見ている時は終わったのです。首を洗って待っていなさいとまでは失礼すぎるため言いませんが、私は全力をかけてこの問題を形はなんであれ終結に導きたいと思います(今はちょっと忙しいですけれど)。この件は一度ローカルに持ち帰らせて頂きます。

以下はあなたの見解について雑感と反論を述べさせていただきます。あなたがCX2内の機械翻訳を廃止しなくてもいいという理由に対してはすでに理由を挙げて@Omotechoさんが反論されていますが、私からは補足説明的に話します。 前提として、あなたにはローカルがわざわざMediawikiに来てまでこう訴えているという状況を認識しようとせず、結果として機械翻訳の問題点に向き合う姿勢が欠如しています。これはその名の通り「高次的」な観点よりご覧になっていることは確かなのですが、全くもって現場の気持ちを考えることになっていません。

  1. 「記事が削除される可能性が低い」=「記事が十分な質がある」ではありません。日本語コミュニティ内部の削除事情を知ってから論じて下さい。コンテンツ翻訳レビュースペースの中で特に看過できないものが優先的に対処されるのであって、微妙ラインまたは単純に気づかれなかったものはリソースが足らずに残置せざるを得なくなっています。私も特に機械翻訳を濫用したユーザーをja:PJ:TRANSI/USER以下にまとめていますが、これらの検証も後数年はできそうにありません。ゆえに、許容できない記事の数と削除数が乖離しているため、削除率で論じるのは不可能です。
  2. "less experienced users"や"More experienced users"というのはそれぞれ何人ですかという話です。私の予想では前者が多く、後者が少ないです。きちんと翻訳ができる方から有効活用できているツールを奪うのは私としても大変悔しいですが、他の面で実害が大きいならば規制せざるを得ないです。

まず、PJ参加者の思考のプロセスについて述べます。まず処置が必要な質の翻訳記事がCX2から吐き出されていることが分かった時点で、規制をかけなければならないと思い立ちます。CX2を廃止した結果手動切り貼りが増えるでしょうが、そうなれば今度は手動翻訳もまとめて全摘発するだけですから(技術上可能だということが分かっています)問題ありません。たとえそうなったとしても、流入元を止めるという時点で一定の効果はあると考えています。

そしてコンテンツ翻訳自体については、私も(直近ですが)実際に使って試しましたが、改めて「使えないツールだな」という印象しか持ちませんでした。今の時点で日本語文献は挿入しづらいわ、テンプレートの互換性の関係でエラーは吐くわ、注釈は挿入できないわなど持てる文献を用いて翻訳を改善しようとする意思が見えません。そもそも、翻訳元の記事が完璧なわけはありませんから編集者は自己の判断で記事をさらに向上させる必要がありますが、そういうことを想定していますか? 現在の状況では否と思います。開発者はWikipediaの記事をまさか執筆したことがないとは思いませんが実際どうなのでしょうかね。

閾値についてはすでに述べましたが、改めて述べます。あなた方に日本語の知識がないのに和訳の閾値を設定しょうとするのがちゃんちゃらおかしいのであってその時点で論外ですし、機械翻訳との一致度を検証しているのもどこの馬の骨とも知れない機械ですから、受け入れられない話です。ですから、日本語の知識がないあなたがたと協力して「機械翻訳と一致している率が高ければはじくが、でも日本人が見て問題ない文章は通す」フィルターなんか、専門家と協力もしないのに開発できるわけがないのです。車を運転したことが無い者が違反切符を自動で切る機械を考えるようなものです。完全に判断は状況によりますでしょう? 

今までどんなに数多くの有能な編集者が、コンテンツ翻訳を使用したものを含む機械翻訳がらみの案件で疲弊してきたかを理解頂きたかったです。

EWikiLearner (talkcontribs)

Hello, @UOzurumba (WMF) and @Pginer-WMF,

Thank you for your attention to this matter. It seems to me, unfortunately, that we are yet not good enough to help you understand what our problem is.

So let me try add some explanation below, although there might be some overlap with what my friends from Japan have already mentioned.

  • You are comparing the deletion rates of CX articles and non-CX articles. This is absolutely pointless. Here's why: whenever a CX article gets deleted, it's because the quality of that article is poor. On the other hand, the vast majority of the non-CX articles get deleted because they lack notability or they've got third-party copyright violations. These problems are practically inexistent with the source articles for translation, because they are presumably cleared of such problems beforehand. If you do want to compare the deletion rates between the CX and non-CX articles, you should at least exclude those deleted based on non-notability and 3rd party copyright vio's. You may compare deletion rates, for example, between Google Translate articles and Yandex articles and tell which one is better, but otherwise it's just illogical.
  • The deletion rate for CX articles as you see today does not mean in any way that these articles are qualified. The Japanese Wikipedia community is constantly shorthanded in terms of human resource for evaluating the quality for translated articles. There is only a handful of editors who are capable (both in terms of proficiency and availability) of engaging in quality checks. Therefore, there's a huge lag between the moment an article is published and the time it even get any attention. In other words, the low deletion rate you see is a result of most articles being simply overlooked, not "passed" after any quality assurance check. You should be very worried about this situation. In my relatively short experience with Wikipedia, I'd say the Japanese community is already in a quasi-defunct state for assessing translation qualities. As time goes, there will only be a growing number of articles with horrible quality, never being checked.
  • Let me provide you with a real-life example. Here's an example of a summary I had to write up in an AFD to prove that this article has a problem, for Japanese admins and other editors in the community. It did get deleted after all, and only as the first one for this particular editor who has another 150 articles published since summer last year, yet to be scrutinized. While I'll have to spend well some time (if not hours), this editor can publish an article like this with a few clicks. And yes, this one's got over 4,000 edit counts, so s/he is certainly contributing to a low deletion rate for veterans. There's another one who's published more than 70 articles in a single month, accounting for nearly half the amount of translated articles that month. None of his/her articles have ever been seriously evaluated so far, partly because, I guess, they're all too long.
  • As for the CX "threshold", I have to say, it is now meaningless thinking about adjusting it. It is almost amazing how "innovative" those machine-translation-addicted are. I dare not detail the way they go around the threshold restriction currently in place, as I'm sure you are aware of it. Be it 75%, 50%, or even 0%, they know how to smuggle in their terrible translations. Please, please, just disable the machine translations within CX. People will still use machine translation by copy & pasting between browsers, but removing it from CX will make it "just a little more" cumbersome to create a Wikipedia article than readily posting a rubbish Google translation with only a few clicks as you watch World Cup on TV.

Regards,

Sethemhat (talkcontribs)

I will give @UOzurumba (WMF)-san and @Pginer-WMF-san another last chance to reconsider abolishing machine translation. Yes, I'm in a confusion about your ununderstanding of the machine translation problem although you are a multi-language speaker.

BTW, another local wiki's translation confirmer has already contacted me via e-mail to know the situation in Jawiki and to apply the method in his wiki. I have already said before: "If the CX2 team cannot listen to the feedback given by the community, I am sure that there will be a continuation of a ban throughout other Wikipedia as well." This is about to occur real soon. There are many local Wikipedias whose languages are more complicated than English or major languages. I'm sure it's time for local editors to raise a riot against the tyranny of CX2 machine translation.

Please, choose the best way for you.

Pginer-WMF (talkcontribs)

Hi all,

We'll go ahead with the proposal to disable machine translation for Japanese. I created a ticket where you can follow the progress. At this time of the year, the deployment process may take some extra time but I'd expect this to be completed in less than two weeks.

I don't think disabling machine translation is the best solution, since support for machine translation was requested in the past by Japanese editors, and this change will impact negatively those making good use of it. I understand the effort required to deal with low quality content and the urgency of solving this. However, I think there was still room to investigate the issues in more detail and come with solutions that could provide a better balance. Having said that, our intention is to surface information and considerations that may not be easier for communities to gather and try to learn as much as possible from those involved directly. If the community decides that disabling machine translation is what will work the best, we are ok to give it a try and observe the impact this change has in the upcoming months.

Omotecho (talkcontribs)

Hi, @Pginer-WMF, @UOzurumba (WMF), confirmed MT disfunct on CX2, and so appreciating your crisp arrangement. Surely, well intended and thoughtful CX2 users are bothered that they are not able to apply MT for in-case check up for their post editing / translation. I can claim myself one among them, and wish to keep the communication channel open with your team as you have kindly offered. Ping me anytime if you have any queries or suggestions, thank you.

For the time being, jawp translated article checkers will run a Notice for local jawp users that MT is not usable on CX2. Cheers,

Doc James (talkcontribs)

Hey All. Wondering if it would be possible for certain translators, who have a history of good work, to be granted the ability to use machine translation on a case by case basis? If poor translations result than that privilege could be rescinded. These types of articles are being created as part of our medical translation project and the translator working on them find that the machine translation is somewhat useful. Increasing the threshold just causes problems and I recommend we not do that.

YellowSmileyFace (talkcontribs)

Hello @Doc James; I am sorry that a member of the medical translation project is affected by the change. Your idea sounds good to me, but I actually don't know if it would be possible technically. It also might force more workload for Jawp users as people will have to "sign up" to use machine translation and others will have to look through one's contribution history and see if their translations are good enough.

I also want to note that, even if it might be a bit more work for the translator, they can always use machine translation on the web.

Doc James (talkcontribs)

It is technically possible. The current question is, is this acceptable to the JA community? Doc James (talk) 02:54, 24 December 2022 (UTC)

YellowSmileyFace (talkcontribs)

It's clear that if we let people on Medical translation project use mt, other people will request to use it also. The whole point of banning mt was to stop the creation of poorly translated articles with just a few clicks, so letting people on the medical translation project use mt pretty much means letting everyone else use mt again also. That is my main concern.

Doc James (talkcontribs)

Meh. That is incorrect and not what I proposed. But no worries. Have a nice day.

Omotecho (talkcontribs)

From my background as a long time Machine Translation (MT) user, I see certain field of writing is better fit to MT: engineering, medicine and physics. And for the urgency in those areas to support local community access the better solution to their/our challenge, I support the option that some measure would be needed to have access to MT.

In fact, those three areas are where MT software houses in general finds better financial support / contribution towards renewing thesauri-terminology, and that is why better output is enjoyed.

That said, @Doc James, would you kindly share why our ContentTranslation2 is better to work with for translating those technical documentation? In particular, may I ask you if or not any advantage applying CX2 on Wikimedia wikis might suggest the following please?

  • If we speak of the medical terminology, CX2 system on Wikipedia offers terminology better refined/more adequate to medical translation? Because users are encyclopedia writers/editors? ->(a) (b)
  • Is it for the licensing issues involved, that the user are confident they are in safe zone from infringement?
  • Is it to offer equity to work as a translator, regardless of the means of web connection? My imagination extends to the Internet-in-a-box project we are amazed at? (mobile router)

(a) I wonder the cycle of terminology update: how often does CX2 offer Google thesauri that Wikimedia users have chosen in translating encyclopedia articles?

(b) I assume the translation memory holds MT output and post-edited sentences, which will be used to filter/screen and update MT dictionary. Maybe we Wikimedians are offered custom made MT dictionary or not?

Doc James (talkcontribs)

User:Omotecho the medical translation task forces primary translator into Japanese has found that well MT is far from perfect it is somewhat helpful for their work. The MT is okay at picking synonyms for technical words, its grammar and sentence structure on the other hand is horrible into JA.

They are starting from content that is CC BY SA 3.0 and therefore their are no concerns regarding licensing issues or infringement.

They will continue translating without MT if it is no longer available. But it would be unfortunate to punish those doing good work just because certain folks are using MT poorly.

Omotecho (talkcontribs)

@Doc James: Thank you for the insight of the medical translation work force. So even in medical translation, grammatical structure of ja language is a challenge for MTs (machine translation systems). If you are a veteran translator, you will manage to find synonyms with MT outputs.

Copyright issue.

Sorry I made a vague mention about copyright: I was thinking that some external MT engines are offered with limitation on how the output is BY-SA, not CC, while on a local discussion among jawp translators/checkers, it was pointed out that they, or MT developers/packagers not necessarily state copyright conditions clearly on their user handbooks.

That means, when you use such MT engine outside of Wikimedia Wiki, you need to be very cautious to input MT output on our projects, and need to post edit to the extent that the texts are no more claimable to be the output of such MT systems.

FYI, Asia-Pacific Association for Machine Translation publishes their newsletter, and December 2022 issue is offered online here (pdf link). Analysis on present MT systems (to/from ja language) are extended to evaluating how MT output can be merged into TV news items and so forth.

Doc James (talkcontribs)

Because the input is "SA" the output is also required to be "SA" by law. The MT systems cannot change this.

Sethemhat (talkcontribs)

Hello, I am Sethemhat, the suggester of this conversation. This user is Japanese, and he said 「医療記事は命にかかわる情報であることもあり、情報源が明記されていないものを翻訳するのには正確性を確認する」(The medical article may be directly connected with people's lives so I will confirm the information's accuracy when I see the information without references.). I can say that this user is qualified in getting the special license for using MT in CX2. This is because this comment shows that he is checking the original text. '''Others don't'''. This is the core problem of using MT; if the translator doesn't read the original text, the probability of mistaken translation soar.

I'm sure you are correctly grasping the problem of using MT in the en-ja translation because you said "its grammar and sentence structure  is horrible into JA." Thus, I am willing to consider allowing using MT in CX2 in this particular case - using MT for choosing the specific terms (I didn't compromise when I tried to abolish the MT in the CX2 because the CX2 team couldn't understand that MT's sentences are horrible in en-ja translation.). However, I think we need to toss around several problems:

  1. We, the Project of Translation Confirmation does not always have resources or an editor who can judge whether the specific user has the ability to "control" the MT For example, I am currently so busy that I'll leave Wikipedia temporarily in near future. This is my main concern when I judge if your proposal is acceptable.
  2. We have to check whether some users do need MTs. I have already asked Wakkie1379-san about this problem on his talk page.
  3. We may be ought to take the survey toward the whole user whether the specific user wants to use MT in CX2 or not because we allowed the usage of it in one user.
EWikiLearner (talkcontribs)

Hi all,

hi Doctor @Doc James: ,

I am sorry to hear a member of your task force team is affected by the countermeasure we had to take. Actually, the Japanese Wikipedia community constantly lacks the human resource who can participate in verifying the quality of translations, so if you know a good translator in any specific area, we need to ask them take part in checking articles made by others, besides their own translation work. There's no doubt medicine is a highly specialized area that requires highly educated talents, and we are definitely in shortage of those people.

I am not sure if we can say there are only a few people posting articles with poor quality machine translation results, although it can be clearly said that there are several MT users creating huge, noticeable problems. Those are the ones who spend little time reviewing the output from MT apps, making themselves prolific and problematic.

Allowing MT for only certified members with proven qualification (from Wiki Project Med, for example) for certain area may be a solution. In that case, others should be strictly prohibited from using MT for the same area, or otherwise it is meaningless. People may (or may not, I honestly don't know) consider such solution, if it is proposed from a Wikimedia-wide task force like yours.

Even if MT is currently disabled within the Content Translation Tool (CX), it is no secret that anyone can still use an MT engine of his/her choice. The user will only have to copy and paste between browsers, and that's possible with or without CX. In fact, the top "translator" of the month-to-date at ja-WP is a user who is translating medical articles, heavily relying on Google Translate. While I can hardly tell if those "translations" are good or not, it is obvious they are made by minimal edits (to match the superficial encyclopedic "writing style") to raw MT outputs. This is exactly what is called "light post-editing", as ISO 18587 standard defines --- "understandable" and "good enough" for gisting, but not for publishing. (E.g. ja:覚醒下開頭術,ja:静脈内局所麻酔) It appears to me that the resulting articles are nearly the same as fully-automatic MT results, except for the minimal changes. This is a matter not limited to medical articles but is a Wiki(p)/(m)edia-wide issue. Many of the translated articles today are sub-light(est) post-edit even without correcting mistranslations. If these are really "good enough", there is no point to having human beings manually paste machine outputs onto the Japanese Wikipedia anymore. Just implement a button on every page to allow readers see machine translated text automatically, and that's it. I would happily retreat myself from worrying about post-edited articles.

At the end of the day, it is how serious WMF is about the "quality" of articles. As far as articles translated into Japanese are concerned, I'd suggest WMF get in touch either with a linguistic research lab specialized in translation or an LSP based in Japan to assess the current situation.


Regards,

Doc James (talkcontribs)

Thanks User:EWikiLearner. Yes it would be nice for the WMF to build the option of having a MT flag that can be given to users by the JA community. For those who have the flag machine translation will work within the tool. Everyone else will start without the flag and thus without MT. If the JA community would accept this as a compromise we can bring it to the CTX engineering team to be built.

With respect to our current processes, it involves first improving the leads of articles significantly in English before putting them up for translation, as if you are not starting with high quality EN content, the output will not be good either. We are only translating the leads of articles at this time as more than 60% of our readers never go past the lead.

My interest is mainly in the area of health care, and thus this is the area we are recruiting translators within. Volunteers are hard to find and I am not sure we will be able to recruit people interested in other topic areas.

EWikiLearner (talkcontribs)

Hi Doctor @Doc James:

Apparently I didn't get myself understood, and that's my fault.

Let me put it this way:

First, we need to have an effective mechanism that prevents machine translated + light-post edited articles being posted. Since the beginning of December, there have been about 40-60 machine translations out of total 120-140 translated articles to date, and we are still suffering from poor quality.

Once that's achieved, then for the first time, we could consider any following steps.

Please ask your TF member look into the two articles I have mentioned in my previous post, and tell us what he thinks of the quality of translation. We are in desparate need for help from those people to think about how we can handle the problem, instead of just asking for privilege.

I am afraid the ja-WP community is not any more capable of treating this matter alone. We need help from external resources who can provide us consistent, reliable and professional service to sort this situation out.

Thanks,

Doc James (talkcontribs)

So currently we have seen the WMF turn off MT translation within CTX for JA WP. Is your concern related to MT articles ending up within JA WP via other methods?

With respect to the two articles you mention above, we always work to improve the content in EN before translating. The starting material for both of these could use work. I do not have the ability to judge the translation quality.

Quality of content is an issue across many languages, including EN WP. We are currently only translating leads of articles, for topics in which nothing currently exists in the target language. We might one day look into mechanism to translate content were the existing content is low quality / poorly translated, but doing this is much more complicated.

With respect to your request for "consistent, reliable and professional service to sort this situation out", all we are able to support with is hopefully a few more health care articles that are high quality.

EWikiLearner (talkcontribs)

Doctor @Doc James:

>>So currently we have seen the WMF turn off MT translation within CTX for JA WP. Is your concern related to MT articles ending up within JA WP via other methods?

That's right. One need not be super tech-savvy to do it. Having MT within CX only makes machine translation even easier (meaning it'll worsen the situation).

>>I do not have the ability to judge the translation quality.

I understand. And sorry I am insisting; that's why I am asking you to ask the Japanese person have a look at them, tell you what he thinks of them, for you to grasp a better idea of what I have been trying to tell you all the way long.

>>Quality of content is an issue across many languages, including EN WP. We are currently only translating leads of articles, for topics in which nothing currently exists in the target language. We might one day look into mechanism to translate content were the existing content is low quality / poorly translated, but doing this is much more complicated.

Again, I am sorry removing MT from CX has caused you this inconvenience. If only the community had designed and implemented some structure back in 2016/2017, when neural machine translation has come into play, we probably wouldn't have been in this situation now. In my purely personal opinion, it is the innocence and indifference of the larger community that let this problem this big so that we finally had to take this extreme countermeasure. We can't put off this anymore.

>>With respect to your request for "consistent, reliable and professional service to sort this situation out", all we are able to support with is hopefully a few more health care articles that are high quality.

Thank you, and I have no doubt your team should be qualified for handling anything related to medical area. With that said, what I mean is that this issue of machine translation need to be considered as a whole. If there's a solution like spinning off medical stuff out of local language Wikipedias into a single Wiki Med site containing articles for all languages, that's perfectly fine to me.

Thanks,

Omotecho (talkcontribs)

I’d like you to jot a separate but related issue I’ll post. 41 out of 43 MDwiki translations en-ja are earmarked as applying unknown parameters in their infobox. A category is there for maintainers.

I am not extending our focus to include other med/healthcare infoboxes: maybe more has slipped into jawp via CX2.

  • Kindly keep in mind that even though MT is not usable, templates are copied across language on CX2.

@Doc James: , FYI, the translation originals for those earmarked are coming from MDwiki. Appreciate med translators to be noted about the situation.

Doc James (talkcontribs)
Omotecho (talkcontribs)

No error message is involved here, just scroll down to the bottom at [ https://ja.wikipedia.org/wiki/%E6%BB%91%E6%B6%B2%E5%8C%85%E7%82%8E ], do you see either of them below the standard Categories? Appears grouped as:

Category:隠しカテゴリ, and two are present

Your Preference controls to show/not show them. [表示] (display, or second from L, top menu bar) -> scroll down to [LaTeX...] and a above it, [隠しカテゴリを表示] (show hidden category) -> activate.

Doc James (talkcontribs)
Omotecho (talkcontribs)
Doc James (talkcontribs)
Reply to "Request from Jawiki to abolish machine tranlation on CX2"