Wikidata talk:Item quality/Archive 1

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Large number of statements

Does it really matter how many statements an item has? 100 is not necessarily better than 10, though 10 is probably better than 1. Perhaps the proportion of statements which have a source matters more. And also the variety of properties. --Nemo 16:19, 30 January 2017 (UTC)

Hi Nemo. I have revised the quality criteria. Could you take a look at it again? and let me know if something is not right on the revised criteria? Thanks! --Glorian WD (talk) 17:48, 3 February 2017 (UTC)
Thanks, this looks better. When I mentioned "variety", I meant not only the number of unique properties but also the "kind" of property. An obvious example: a statement with many different identifiers, or even terms, is not necessarily a useful or "pleasant" one, especially if those statements are partially redundant (think of a VIAF identifier which actually "implies" other identifiers linked from it) or not self-explanatory. --Nemo 15:13, 6 February 2017 (UTC)

Statement about what divides an E/Stub from a D/Start item

"nikki" said the following in IRC and I thought we should capture it somewhere:

I would say that to get out of the stub class, the statements need to provide enough information to easily identify the item

This sounds like a good criteria to me. --EpochFail (talk) 19:06, 30 January 2017 (UTC)

So what misses Coniophora arida (Q10646558)? --Succu (talk) 20:19, 28 February 2017 (UTC)

Grading scheme

Hi Multichill, Jheald, Abián, EpochFail, Ladsgroup!

Below is the grading scheme that I have revised based on the project chat discussion. I only added two additional criteria, that are, "number of unique properties" and "items contain all expected properties".

What do you think that is still missing/is not right? Could you give me some suggestions to fix it?

Thanks! :) --Glorian WD (talk) 17:20, 3 February 2017 (UTC)

I just moved this from the main page to here --EpochFail (talk) 17:44, 3 February 2017 (UTC)

Hi Glorian WD,
Perhaps the exposed points are too verbose for users to classify any item quicky and easily. A decision tree or another representation that can reduce the cognitive load and remove redundancies or informations that won't be used to get the final degree of completeness would be helpful. As an extra, a tool with yes/no questions (similar to the Wikidata game) that helped users with these decisions when training your system could also be useful and, with it, you could ignore decisions that can be automatically made by a machine, such as the presence or absence of a link to an image from the item. --abián 13:37, 4 February 2017 (UTC)
Abián & Glorian WD, I've made an edit that I think cleans things up a lot and removes some of the redundancies. See Special:Diff/441493641/443412841. --EpochFail (talk) 16:50, 7 February 2017 (UTC)
Thanks, EpochFail! Now, do you think that we could write some approximate ranges for ambiguous terms like "some" or "minimal"? I'm not suggesting that we define ambiguity-free rules, but wondering if we could specify, for example, a reasonable interval of aliases that can be considered "minimal" (e.g., 1-4 aliases), "some" (2-6 aliases), etc. These ranges could overlap, but they would let users know if they're clearly misunderstanding the criteria or not. --abián 23:03, 7 February 2017 (UTC)
Abián hmm. I think we should drop "minimal" as it can certainly be confused with "some". But "most" is clearly different from "some". I'd like to make reference to criteria for "important" or "critical" set of languages for translation (maybe referencing the languages with the largest number of speakers -- e.g. see https://stats.wikimedia.org/EN/Sitemap.htm). --EpochFail (talk) 16:57, 8 February 2017 (UTC)
EpochFail, please find my feedback below:
  • I am unsure with the high quality image criteria. I think most of showcase items have images which have good quality, instead of high quality.
  • Concerning to scale B and C, we have criteria "Some translations completed: labels, ..." and "Some completed translations: labels, ...". I think these basically imply the same meaning. I would suggest to change the scale B criterion to "Translations are mostly completed: labels, ..."
  • Concerning to scale B and C, I think we should have "appropriate ranks" and "qualifiers where applicable" under the criterion "properties for this type of item have statements with" because we want to evaluate ranks and qualifiers on these scales.
  • Where are the criterion for "items should have unique properties". I think this is one of the important criteria mentioned in the discussion.
  • Concerning to scale B, we have criterion "Most appropriate aliases exist in most important languages". I think people have different important languages. Thus, if we use this criterion, it may yield to high inconsistency. I would suggest to simply remove the word "important".
  • Concerning to scale C, we have criterion "References for some non-trivial statements". We should add "External references for some non-trivial statements" in order to emphasize that we are looking for external references instead of references to Wikimedia projects.
  • Concerning to scale D, we have criterion "Some relevant properties for this type of item have statements". I believe you forgot to add 3 points under this criterion, which are, external references, qualifiers where applicable, appropriate ranks
  • Concerning to scale D, we have criterion "Minimal aliases and translations". The word "translations" seems redundant because we have specified the criterion for translations in the previous point (i.e. "Minimal translations: labels, ...")
All in all, I still think the previous criteria better than this one (the one that you think is redundant) because despite a bit redundant, it can clearly explain each criteria.
In the criteria that you made here, it is indeed more simple. But I think they comprise many implicit criterion which hard to be noticed. --Glorian WD (talk) 14:23, 8 February 2017 (UTC)

Hi Glorian WD, nice project, very interesting indeed. The criteria for classes A and B differ in that the first one has all relevant statement, whereas the second has all the most important one. This distinction is unclear to me, could you explain it, please? Also, why is class B the class of showcase Items? Thanks, --Alessandro Piscopo (talk) 11:37, 22 February 2017 (UTC)

Hi Alessandro Piscopo! Regarding to the difference between that two scale, items class B should have slightly less number of relevant statements than items class A. So, let's say the items with a statement instance of: human (Q5), should contain statements with properties sex or gender, date of birth, place of birth. Items class A should contain all of these properties, whereas items class B may only miss one of these properties. Note that people can use their own judgment to define whether an item contains all relevant statements. You can read further about this on Wikidata:Item_quality#Relevant_statements_.28completeness.29.
Regarding to showcase items which fall under class B criteria, I found that the existing showcase items seem do not satisfy the class A criteria. However, I am still consulting about this decision with Lydia Pintscher (WMDE) --Glorian WD (talk) 20:24, 24 February 2017 (UTC)
Hi Glorian WD, thank you for reply. Now it is clearer, I will keep following this project for further developments.--Alessandro Piscopo (talk) 08:53, 6 March 2017 (UTC)

C to B quality threshold

Glorian WD said:

I would suggest to change the scale B criterion to "Translations are mostly completed: labels, ..."

I don't think there's a good way to operationalize "translations are mostly complete". How many languages is "complete"? "Mostly" means that we'd have to have 50% of all languages translated for. That's just not practical. --EpochFail (talk) 16:15, 8 February 2017 (UTC)
EpochFail, then, how you want to change the specific criterion in these scales? because we have criteria "Some translations completed: labels, ..." and "Some completed translations: labels, ..." in scale B and C respectively. Do you think "Some translations completed and Some completed translations" have different meaning? --Glorian WD (talk) 10:54, 9 February 2017 (UTC)
I think those have the same meaning. We should pick one wording and apply that across the set. --EpochFail (talk) 22:30, 11 February 2017 (UTC)
Since those have the similar meaning, why we add those for both scale B and C? --Glorian WD (talk) 08:39, 12 February 2017 (UTC)
Glorian WD, basically, the difference in wording is incidental. You could call it a typo. As I said in my last message "We should pick one wording and apply that across the set." --EpochFail (talk) 16:28, 14 February 2017 (UTC)
As we have agreed on IRC, we will use "Most important translations are completed" in scale B --Glorian WD (talk) 19:18, 14 February 2017 (UTC)

Glorian WD said:

Concerning to scale B and C, I think we should have "appropriate ranks" and "qualifiers where applicable" under the criterion "properties for this type of item have statements with" because we want to evaluate ranks and qualifiers on these scales.

I think that evaluating based on the quality of ranks and qualifiers should be reserved for top quality criterion. We're not saying that B class would have "no ranks and qualifiers". I'd rather say something like "Some half-assed attempt at qualifiers and ranking" than specify that it must be "good" or "appropriate" in the mid/low quality classes. --EpochFail (talk) 16:15, 8 February 2017 (UTC)
EpochFail, yes. We can leave those for scale "C". However, I guess we still have to evaluate the qualifiers and rankings for grade "B". I still think it is fine for adding "appropriate ranks" and "qualifiers where applicable" on scale "B" because some "appropriate ranks" or some "qualifiers" do not really make sense for me. --Glorian WD (talk) 13:54, 9 February 2017 (UTC)
How do you mean that it doesn't really make sense for you? Can you please help me understand what is confusing. --EpochFail (talk) 22:30, 11 February 2017 (UTC)
I agree that we should remove the criteria "appropriate ranks" and "qualifiers" from scale C. However, I think we should have them in scale B. In scale B, the criterion can be written as something like some half-assed qualifiers and ranks. But, I do not know a good wording for this. --Glorian WD (talk) 08:39, 12 February 2017 (UTC)
OK. That works. I'd say just use the word "some". As in "Some apprioriate ranks" and "Some qualifiers (if applicable)". --EpochFail (talk) 16:28, 14 February 2017 (UTC)
+1 --Glorian WD (talk) 19:18, 14 February 2017 (UTC)

Glorian WD said:

"Most appropriate aliases exist in most important languages". I think people have different important languages. Thus, if we use this criterion, it may yield to high inconsistency. I would suggest to simply remove the word "important".

OK. We have many measures of importance of language. E.g. number of speakers. See https://stats.wikimedia.org/EN/Sitemap.htm for a ranking of languages by number of speakers. The list goes: English, Chinese, Hindi, Arabic, Spanish, Malay, ... I'm pretty sure we can come up with a list of languages we'd like to see held as a minimum threshold for translations. --EpochFail (talk) 16:15, 8 February 2017 (UTC)
Ok EpochFail. Do we want to discuss how to measure the importance of language here? or would it be better to open another subsection? --Glorian WD (talk) 11:15, 9 February 2017 (UTC)
Sure. Please be bold. :) --EpochFail (talk) 22:30, 11 February 2017 (UTC)
Hmm. I have just realized that the mentioned statistics (https://stats.wikimedia.org/EN/Sitemap.htm) is for Wikipedia. I do not know if it is okay for using this as a guide for measuring the importance of language when people evaluate Wikidata item quality. --Glorian WD (talk) 08:39, 12 February 2017 (UTC)
That page has a could of the number of speakers per language. That is not Wikipedia specific. --EpochFail (talk) 16:28, 14 February 2017 (UTC)
Apologize! I missed that --Glorian WD (talk) 19:18, 14 February 2017 (UTC)

References and where they are required to be external

Glorian WD said:

Concerning to scale C, we have criterion "References for some non-trivial statements". We should add "External references for some non-trivial statements" in order to emphasize that we are looking for external references instead of references to Wikimedia projects.

I disagree. I think that requiring external reference requirements should be reserved for high quality levels and that a poorly referenced item with most of the important properties can still be a C-class. --EpochFail (talk) 16:15, 8 February 2017 (UTC)
Okay fair enough. I find this can help increasing the evaluation consistency --Glorian WD (talk) 13:54, 9 February 2017 (UTC)

Ranks and qualifiers in low quality classes

Glorian WD said:

Concerning to scale D, we have criterion "Some relevant properties for this type of item have statements". I believe you forgot to add 3 points under this criterion, which are, external references, qualifiers where applicable, appropriate ranks

Nope. I deleted those on purpose. D is the second lowest quality class. I don't think specifying that ranks and qualifiers must be good makes any sense at that level. --EpochFail (talk) 16:15, 8 February 2017 (UTC)
Okay fair enough. I find this can help increasing the evaluation consistency --Glorian WD (talk) 13:54, 9 February 2017 (UTC)

Translating the text

Maybe it could come in handy to translate this page to different languages like german, french and dutch. I don't know how to put an article into translationmode, but I am willing to help translate it into dutch. Q.Zanden questions? 19:35, 7 February 2017 (UTC)

Hi Q.Zanden, do you mean translate the quality criteria to other languages than English? --Glorian WD (talk) 19:36, 7 February 2017 (UTC)
Yes, the talkpage translating wouldn't be very interesting, I think... ;) Q.Zanden questions? 19:38, 7 February 2017 (UTC)
Thanks for the initiative! as I have explained in the Wikidata:Project_chat#Quality_Criteria_for_Building_a_Tool_to_Evaluate_Item_Quality, this criteria is supposed to be used as a guide for labeling campaign. So actually, I am not really sure that we can maintain the context of the criteria if we translate it to other languages than English. But, I would like to hear the opinion about this from EpochFail. --Glorian WD (talk) 20:28, 7 February 2017 (UTC)
I think translations will be hard, but we need them. It's not OK to expect it all to be in English. I'm also mostly unfamiliar with how to make something like this easy to translate, but I'm very happy to work with anyone who does. --EpochFail (talk) 22:09, 7 February 2017 (UTC)
There is a start for translating the page. The problem is the template that uses the page WD:Item quality/Class. I think we should translate that page also, or there should be language specifiers in /class. Q.Zanden questions? 00:00, 10 February 2017 (UTC)
Hi Q.Zanden, IMO, we should wait until the quality criteria has been finalized, prior to translating it. The quality criteria is still under a discussion. If you want, I can notify you when it is already finished ;) --Glorian WD (talk) 10:14, 10 February 2017 (UTC)
Glorian WD, that would be great! Q.Zanden questions? 16:23, 10 February 2017 (UTC)
Hi Q.Zanden! I want to give you the current status of the criteria. So, it is pretty much done for the time being. I am going to run a pilot campaign using the existing criteria soon. But after the pilot campaign, the criteria might be revised again. So, I beg your patience for this. I will notify you again once we have a final quality criteria. Thanks for your help :) --Glorian WD (talk) 09:16, 4 March 2017 (UTC)
Hi Glorian! Thanks for letting me know the next step of the procedure of the qualitycriteria. I hope it will work soon! --Q.Zanden questions? 21:27, 4 March 2017 (UTC)
Hi Q.Zanden, I think you can start translating the quality criteria now. Thanks for your help! I do appreciate it! --Glorian WD (talk) 15:18, 9 April 2017 (UTC)

Property Constraint

We can improve the existing quality criteria by adding a criterion about Wikidata property constraints. For instance, Items grade "A" should follow all relevant property constraints. --Glorian WD (talk) 20:41, 8 February 2017 (UTC)

See Property talk:P21#Documentation for an example of a set of property constraints. This proposed change makes sense to me. --EpochFail (talk) 22:04, 8 February 2017 (UTC)
See also Template:Property documentation. It seems like we should link to this in the description of "A" class. --EpochFail (talk) 22:06, 8 February 2017 (UTC)
For the moment, although we can define some consistency criteria, I think that we can't evaluate these constraints because some work behind the Special:ConstraintReport should be finished before (perhaps Lydia can shed some light on this). Anyway, I don't believe in more than two levels of consistency, as an item is simply consistent if it satisfies all the mandatory constraints, or not consistent if we can find any violation of a mandatory constraint. --abián 23:22, 8 February 2017 (UTC)
Yeah let's simply use "satisfies all mandatory constraints" and "doesn't satisfy all mandatory constraints". --Lydia Pintscher (WMDE) (talk) 14:29, 13 February 2017 (UTC)
Lydia Pintscher, are you suggesting that we limit an editor's assessment to the formally defined set of property constraints? Or is it OK if we allow the assessing editor to use their judgements about how constraints ought to be applied to property usage (whether or not a formal constraint has been stated on a property)? If you are stating the former, then I don't think that really makes sense. These constraints are under a refinement process. An item that satisfies the explicit constraints one day may violate the constraints on another. I'd like to advocate that we allow assessing editors the power to judge what makes sense for an item (explicitly constrained or otherwise). --EpochFail (talk) 16:46, 14 February 2017 (UTC)
Makes sense. Do you have a good wording for that? --Lydia Pintscher (WMDE) (talk) 17:54, 14 February 2017 (UTC)
How about "uses properties appropriately" with a link to discussions about property constraints. Is there some essay or other document we could point people to? --EpochFail (talk) 17:02, 15 February 2017 (UTC)
I fear not but maybe there is a help page I am not aware of. Anyone else knows? --Lydia Pintscher (WMDE) (talk) 17:28, 15 February 2017 (UTC)

Labels of Used Properties

EpochFail argues that we should not add "labels of used properties" to the criteria because evaluating properties is external to the item and hence, out of scope. Maybe Lydia Pintscher (WMDE) can explain the reason why we should evaluate this specific criterion. --Glorian WD (talk) 20:16, 14 February 2017 (UTC)

If the labels are not there in your language or a fallback you see the ID only. That makes it hard to understand what a particular statement is about. --Lydia Pintscher (WMDE) (talk) 17:27, 15 February 2017 (UTC)
Lydia Pintscher So, I agree that this reduced the utility of an item page, but it seems that this is a quality characteristic of "properties" and not the item itself. You shouldn't be able improve the quality of an item by editing something other than the item itself. --EpochFail (talk) 21:26, 15 February 2017 (UTC)
We can put it like that. But at the end of the day I wouldn't want an item with a lot of missing labels in properties to be a showcase item. From a reader-perspective it doesn't matter what you have to edit to fix a quality issue. --Lydia Pintscher (WMDE) (talk) 12:06, 19 February 2017 (UTC)
This sounds similar to making sure that a Featured Article doesn't have any redlinks. After all, an article is more useful if its related articles exist, but that doesn't reflect on the quality of the article itself. If we include things that are not part of an item itself, then a revision of the item would not have a static quality level and I find that highly undesirable. We'd need to take a snapshot of the rest of the characteristics of related wiki elements in order to know the quality of an item. This just seems to over-complicate things. IMO, this should be kept out of scope WRT "item quality". --EpochFail (talk) 18:58, 20 February 2017 (UTC)
OK. I have dropped the criterion "translation for labels of used properties" for all quality scales. EpochFail, do you want me to add "capturing item quality based on characteristics of related wiki elements" as a potential future work?--Glorian WD (talk) 20:26, 20 February 2017 (UTC)
Maybe we could just call that Wikidata:Property quality. --EpochFail (talk) 21:19, 20 February 2017 (UTC)

Languages

For translations, the first column has not the same order as the Wikipedia page cited as a source:

Wikipedia:

Rank Language Internet
users
        
1 English 948,608,782 26.3%
2 Chinese 751,985,224 20.8%
3 Spanish 277,125,947   7.7%
4 Arabic 168,426,690   4.7%
5 Portuguese 154,525,606   4.3%
6 Japanese 115,111,595   3.2%
7 Malay 109,400,982   3.0%
8 Russian 103,147,691   2.9%
9 French 102,171,481   2.8%
10 German 83,825,134   2.3%
11–36 Others 797,046,681  22.1%
Total 3.61 Billion 100%

Wikidata:

rank Internet users
1 Chinese
2 Spanish
3 English
4 Hindi
5 Arabic
6 Portuguese
7 Bengali
8 Russian
9 Japanese
10 Punjabi

For the "native speakers" column, I think that list of languages by total number of speakers (Q1394450) is more important that the native language: Hindi and French are not in the 7 languages (Chinese, Spanish, English, Arabic, Portuguese, Russian, and Japanese) while they are speak much more than Japanese.

rank Total number of speakers Native speakers
1 Chinese English
2 English Chinese
3 Spanish Spanish
4 Hindi Arabic
5 Arabic Portuguese
6 Malay Japanese
7 Russian Malay
8 French Russian
9 Portuguese French
10 German German

Are you OK to change this?

Tubezlob (🙋) 18:07, 12 March 2017 (UTC)

New proposal: 8 languages: Chinese, Spanish, English, Arabic, Portuguese, Russian, French and German.
rank Internet users[1] Total number of speakers[2] Wikipedia editors[3]
1 English Chinese English
2 Chinese English German
3 Spanish Spanish French
4 Arabic Hindi Spanish
5 Portuguese Arabic Japanese
6 Japanese Malay Russian
7 Malay Russian Italian
8 Russian French Chinese
9 French Portuguese Portuguese
10 German German Arabic
Thanks Tubezlob. That looks great to me. --EpochFail (talk) 14:50, 14 March 2017 (UTC)
I've added the new proposed table to the page (and cleaned up a bit of formatting) in Special:Diff/466229578 --EpochFail (talk) 14:55, 14 March 2017 (UTC)
Agree. It makes sense to use list of total number of speakers than native speakers. Thanks Tubezlob --Glorian WD (talk) 15:03, 14 March 2017 (UTC)
  1. https://en.wikipedia.org/wiki/Languages_used_on_the_Internet#Internet_users_by_language
  2. https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers
  3. https://en.wikipedia.org/wiki/List_of_Wikipedias