ORES/Issues/Article quality
Appearance
To file a report, append the following template to a section named for your wiki:
{{misclassification report | type = <!-- false-positive or true-positive --> | wiki = <!-- interwiki prefix to link to the target wiki; e.g. fr for French Wikipedia, etc. --> | rev_id = <!-- the rev_id of the version that was scored --> | model = <!-- the model with the problematic classification (e.g. "damaging") | score = <!-- ORES prediction --> | comment = <!-- Description of the problem --> | resolved = <!-- To be filed once resolved --> }}
English Wikipedia
[edit]French Wikipedia
[edit]Portuguese Wikipedia
[edit]See also information about the current version the model and the meaning of each statistic.
Articles which are better than predicted by ORES
[edit]- misclassification (version) articlequality: 1 (2.13): Good quality with low classification
- What would you say this label should be? "2" or "3"? --EpochFail (talk) 13:27, 24 April 2020 (UTC)
- It's a good stub, so at least a 2, if not a 3.--DarwIn (talk) 16:47, 24 April 2020 (UTC)
- misclassification (version) articlequality: 1 (1.1): Non problematic article gets almost minimum classification
- This looks like a Stub to me. Isn't this exactly what is expected for a "1" class article? --EpochFail (talk) 13:27, 24 April 2020 (UTC)
- Yes, but it's a non problematic stub. Yet, it is getting a lower classification (1.1) than articles that do not even meet the minimum standards (1.4).--DarwIn (talk) 16:49, 24 April 2020 (UTC)
- misclassification (version) articlequality: 1 (1.09): Fine stub gets almost minimum classification
- This looks like a Stub to me. Isn't this exactly what is expected for a "1" class article? --EpochFail (talk) 13:27, 24 April 2020 (UTC)
- (see above).--DarwIn (talk) 16:49, 24 April 2020 (UTC)
- misclassification (version) articlequality: 3 (4.06): Prediction was reduced from 4 to 3 after fixing some translation issues (of the "Gradient boosting" article, btw ;-); Setting feature.ptwiki.revision.cn_templates=0.0 (instead of 1.0) restores the prediction to 4.
- This one is interesting because on one hand, the change to the prediction is minimal. The weighted sum moves from 4.10 to 4.06 so I'm guessing the prediction was close to the threshold of 3 and 4. But also, it seems interesting because the article contains English language before it is cleaned up. I wonder if we should be tracking use of English and other langs as a feature of quality. We do this for Basque Wikipedia because they have a bit of various language content left over from translations. In order for this to work, we need articles that have been assessed poorly because they need translation work. Do you think people update the talk page quality labels *before* fixing translation issues? --EpochFail (talk) 13:27, 24 April 2020 (UTC)
- I believe having a feature for the (relative or absolute) amount of foreign language used in the article could be useful. But I don't know if users did update the template on talk page before fixing the issues. Helder 17:00, 24 April 2020 (UTC)
- misclassification (version) articlequality: 2 (2.28): ORES predicted quality lowers as real quality improves (adding category and navigation template). Initial ORES quality at creation 2 (2.66). After adding category: 2 (2.3). After adding navigation template: 2 (2.28)
Articles which are worse than predicted by ORES
[edit]- Done misclassification (version) articlequality: 1 (1.37): Very bad quality, should be getting the minimum classification
- The scores are: 1, 2, 3, 4, AB and AD, so 1 is the minimum. Helder 22:58, 23 April 2020
- I think this suggests we have few (if any) items in our training set that look like this and get a "1" label. I wonder if we could somehow add them. Alternatively, it's possible that we have some observations that look like this and get a better than "1" label. We could potentially re-process the dataset to look for revisions with no citations that appear > "1" category. --EpochFail (talk) 13:34, 24 April 2020 (UTC)
- It should also check for interlinks, and eventually Infobox. It doesn't even seem to score the pictures when they are inside an infobox.--DarwIn (talk) 16:45, 24 April 2020 (UTC)
- It is supposed to be detecting infobox images. Do you have an example of an article which has an image which is not taken into account by ORES? (I've updated the ORES link, so it shows the values for some of characteristics it measures for each article. You can search for "image" to see the relevant counts) Helder 17:44, 24 April 2020 (UTC)
- It doesn't detect them when they are brought by a Wikidata Infobox.--DarwIn (talk) 19:32, 24 April 2020 (UTC)
- It is supposed to be detecting infobox images. Do you have an example of an article which has an image which is not taken into account by ORES? (I've updated the ORES link, so it shows the values for some of characteristics it measures for each article. You can search for "image" to see the relevant counts) Helder 17:44, 24 April 2020 (UTC)
- It should also check for interlinks, and eventually Infobox. It doesn't even seem to score the pictures when they are inside an infobox.--DarwIn (talk) 16:45, 24 April 2020 (UTC)
- The issue is that it doesn't seem appropriate to give the same score (a bigger score, actually) to articles that do not even meet the minimum standards, and to good/acceptable stubs. I wouldn't mark this one as resolved.--DarwIn (talk) 16:45, 24 April 2020 (UTC)
- Done misclassification (version) articlequality: 1 (1.41): Very bad quality, should be getting the minimum classification
- The scores are: 1, 2, 3, 4, AB and AD, so 1 is the minimum. Helder 22:58, 23 April 2020
- List articles are weird. Do people label these list articles with the same quality scale? --EpochFail (talk) 13:34, 24 April 2020 (UTC)
- The article do not even meet the minimum standards. It's just a list of names. If anyone tags it for half-speed elimination, it will be gone in 4 days for sure. Even so, it managed to get an higher classification than very acceptable stubs listed above.--DarwIn (talk) 16:53, 24 April 2020 (UTC)
- Could it be due to the number of wikilinks (11)? Helder 17:39, 24 April 2020 (UTC)
- The article do not even meet the minimum standards. It's just a list of names. If anyone tags it for half-speed elimination, it will be gone in 4 days for sure. Even so, it managed to get an higher classification than very acceptable stubs listed above.--DarwIn (talk) 16:53, 24 April 2020 (UTC)
- misclassification (version) articlequality: 3 (2.55): Bad quality article well classified
- Again with the zero references. I wonder how someone might classify this article now given that it does have some content. --EpochFail (talk) 13:34, 24 April 2020 (UTC)
- misclassification (version) articlequality: 3 (2.33): Bad quality article well classified
- misclassification (version) articlequality: 3 (3.14): Bad quality article well classified
- misclassification (version) articlequality: AD (3.64): Bad/mediocre quality being proposed as Featured Article
- misclassification (version) articlequality: 3 (2.85): Article without a single source (just an unformated URL) gets classified as 3
- misclassification (version) articlequality: AB (4.02): Blanked article classified as Good Article
Other cases
[edit]- misclassification (version) articlequality: 3 (3.35): Weird case where a mere vandalism inserted and reverted generated increased quality: 3 (3.39)
- misclassification (version) articlequality: 1 (1.83): to the article decreased the weighted sum from 2.01 to 1.83
- (version) articlequality: 3: The weighted sum decreased from 3.17 to 2.96 after an edit which added a few wikilinks, and merged two references
Summary
[edit]rev_id | Human | model 0 (april'20) | model 1 (more data) | model 2 (w2w) |
---|---|---|---|---|
|
2 or 3 | 1 (2.13) | 2 (2.63) | 3 (3.21) |
1.4 | 1 (1.1) | 1 (1.10) | 1 (1.11) | |
|
1.4 | 1 (1.09) | 1 (1.20) | 1 (1.16) |
|
4 | 3 (4.06) | 4 (3.88) | 4 (3.996) |
|
2.66 | 2 (2.28) | 2 (2.53) | 2 (2.58) |
3 (2.65) | 3 (2.87) | 2 (2.89) | ||
3 (2.33) | 3 (2.30) | 1 (2.23) | ||
3 (3.14) | 3 (3.16) | 3 (3.10) | ||
6 (3.64) | 4 (3.81) | 4 (3.76) | ||
3 (2.85) | 2 (2.79) | 3 (2.87) | ||
5 | 5 | 2 |
EpochFail DarwIn He7d3r This is the table summarizing the model changes in performance. We are satisfied with model 2 and are moving forward with it. Please raise any concerns you may have or any other misclassifications that you would want us to look over. Chtnnh (talk) 14:43, 18 May 2020 (UTC)