Data on the deletion ratio of Wikipedia articles created with and without Content Translation.
Wikis with higher deletion ratios for CX created articles[edit]
We reviewed wikis where the deletion rate of articles created with content translation is higher than the deletion rate for articles created with other tools as part of T286636 during a specified timeframe. This data is updated quarterly (every three months) to assess the evolution of deletion rates as improvements are made. This timeframe was selected to capture a sufficient time for editors to review content and limit seasonality effects.
Data comes from mediawiki_history and reflects the deletion ratios of main namespace articles that were created using Content Translation compared to the deletion ratios of main namespace articles created without the tool. Bots were excluded. We also removed wikis where 15 or fewer articles were created with content translation during the reviewed timeframe to reduce noise in the data and focus on wikis with more representative data.
Wikipedias with higher deletion ratios for articles created with Content Translation (at least 2% higher deletion rate, as compared to articles created without using CX)
Reviewed Time Period: October through December 2023 (FY 23 Q2)
Wikipedia
Created Articles
Deleted Articles
Deletion Ratios
Created CX Articles
Created non-CX Articles
Deleted CX Articles
Deleted non-CX Articles
CX Articles Deletion Ratio
Non-CX Articles Deletion Ratio
Deletion Ratio Difference
kuwiki2
22
15023
3
24
13.64%
0.16%
-13.48%
svwiki
104
12910
19
1618
18.27%
12.53%
-5.74%
ocwiki2
45
1053
3
22
6.67%
2.09%
-4.58%
dewiki
535
53710
89
6865
16.64%
12.78%
-3.86%
1 Excludes Wikipedias with 15 or fewer articles created with Content Translation during the reviewed time period.
2 Also identified in the prior quarter as a wiki with a higher deletion ratio for articles created with Content Translation.
Wikis with higher deletion ratios for articles created with Content Translation
Reviewed Time Period: January 2021 through March 2021 (Q3)
Wiki project1
Created Articles
Deleted Articles
Deletion Ratios
Created CX Articles
Created non-CX Articles
Deleted CX Articles
Deleted non-CX Articles
CX Articles Deletion Ratio
Non-CX Articles Deletion Ratio
Deletion Ratio Difference
hawwiki
64
85
25
1
39.06%
1.18%
−37.89%
kuwiki
204
4011
34
69
16.67%
1.72%
−14.95%
lawiki
25
923
3
48
12.00%
5.20%
−6.80%
ltwiki
23
2272
13
1135
56.52%
49.96%
−6.57%
fiwiki
83
9448
12
787
14.46%
8.33%
−6.13%
fiu_vrowiki
16
131
1
4
6.25%
3.05%
−3.20%
eowiki
123
5221
5
62
4.07%
1.19%
−2.88%
kawiki
122
5434
23
889
18.85%
16.36%
−2.49%
arzwiki
110
37033
3
335
2.73%
0.90%
−1.82%
thwiki
17
4635
1
208
5.88%
4.49%
−1.39%
bewiki
256
5225
9
115
3.52%
2.20%
−1.31%
mrwiki
164
4771
3
60
1.83%
1.26%
−0.57%
bswiki
52
1953
5
187
9.62%
9.58%
−0.04%
1 Excludes wikis with 15 or fewer articles created with Content Translation during the reviewed time period
Monthly deletion ratios for representative wikis (Jan 2016 - Jan 2019)[edit]
Monthly data about the deletion of articles created with and without Content Translation on several Wikipedias was prepared as part of T215397. This data is for whole months, not quarters, from January 2016 until January 2019.
This spreadsheet is publicly shared and can be filtered, copied, etc. Note that only pages in the main namespace are counted. This may lead to discrepancies between this data and the data at Special:CXStats, which includes all namespaces.
To examine the queries used to created and to run this yourself, see query 53775 in Quarry. To look at different languages and dates, replace the database name and the timestamp value.
The use of the CX deletion statistics comparison data[edit]
One of the ways the WMF Language team uses this data is to determine when to adjust the machine translation limit in the Wikis to enforce the review and modification of initial machine translation before articles are published to encourage quality translations. Below are some criteria for changing Machine Translation limits in the tool when there is a high deletion of CX articles in any Wikipedia.
More than 50% of articles are deleted. (Please note that the 50% applies when there are more than ten translated articles in a quarter or more; it will not apply if the articles created are less than 10)
Occurs in one quarter
Investigate
Occurs consecutively in two quarters (5 to 6months)
Take action based on findings from earlier investigation. If no tangible findings, make Machine Translation (MT) limit more strict by 5% and monitor for changes.
More than 75% of articles are deleted.
(Please note that the 50% applies when there are more than ten translated articles in a quarter or more; it will not apply if the articles created are less than 10)
Occurs in one quarter.
Make MT limit more strict by 10% and monitor changes
Extends to another quarter.
Community Consultation required to:
Gather samples of problematic translations to increase Machine translation based on logic.
Understand if there are underlying issues with the tool.
Difference in deletions: translations (CX) vs. new articles (non-CX)[edit]
The difference between the two deletion ratio (non-CX and CX) is above -50%
Occurs in a quarter
Make MT limit more strict by 5% and monitor changes.
Extends to another quarter
Make MT limit more strict by another 5% and monitor changes.
The difference between the two deletion ratio (non-CX and CX) is above -80%
Occurs in a quarter.
Make MT more strict by 10% and monitor changes.
Extends to another quarter
Community Consultation required to:
Gather samples of problematic translations to increase Machine translation based on logic.
Understand if there are underlying issues with the tool.
If articles created with CX decrease by 90% from previous month
First quarter
Investigate, check for technical issues.
Three months consecutively
Community consultation, from the outcome, determine the next step.
The next step can be making the MT limit less strict by 10% and monitor changes.
- This page is currently being maintained by Krishna Chaitanya Velaga, Product Analytics. - Please reach out to kcvelagawikimedia.org or pginerwikimedia.org for any questions or feedback.
↑Starting FY23 Q2, it was decided to only empahsize wikis where the deletion rate of CX created articles is at least 2% more than that of non-CX created articles. The reason being, as the goal of these reports is to inform changes to the thresholds, insignificant differences do not add any value and can't be the basis for any changes.
↑Compared to previous quarters, this is an increase in deletion rate for CX created articles. However, in September 2023, there has been increased activity and deletion rate of articles (both CX and non-CX) on Uzbek Wikipedia, likely due to a campaign, which has caused the quarter's average to increase. By excluding Uzbek Wikipedia, the deletion rate for CX created articles for the quarter is 3.65%.