Wikidata talk:Item quality

From Wikidata
Jump to: navigation, search

Media quality criteria[edit]

Glorian WD said:

I am unsure with the high quality image criteria. I think most of showcase items have images which have good quality, instead of high quality.

I think I might be able to agree with this. The Featured Article criteria on enwiki specifies the following:

Media. It has images and other media, where appropriate, with succinct captions, and acceptable copyright status. Images included follow the image use policy. Non-free images or media must satisfy the criteria for inclusion of non-free content and be labeled accordingly.

I think that adopting a statement that is much more in line with this could be more appropriate. We could just say something like "has images and other media where appropriate". One thing I'd like to make clear in this criteria is that it should be both "good enough" and "best available". You can't have a showcase item if the "best available" image is not sufficiently illustrative. E.g. for a long time, en:Knee had File:Male Knee by David Shankbone.jpg as it's main image. This may have been the "best available" knee image, but it is not "good enough" for an article about knee anatomy. The current image, File:Blausen_0597_KneeAnatomy_Side.png, is much more appropriate. --EpochFail (talk) 16:00, 8 February 2017 (UTC)

EpochFail, I am wondering if there is a standard that we can use to define the "best quality image". If such standard exists, I think we can use it to evaluate the image. --Glorian WD (talk) 13:54, 9 February 2017 (UTC)
Maybe something like en:Wikipedia:Manual_of_Style/Images --EpochFail (talk) 15:48, 10 February 2017 (UTC)
Ok EpochFail. How about attaching en:Wikipedia:Manual_of_Style/Images#Choosing_images as an explanation/hint for this specific criterion? So, if the image in an item meets all criteria on that manual, that image is deemed as high quality. --Glorian WD (talk) 15:09, 11 February 2017 (UTC)
I'm OK with that, but I'm not sure how Wikidata editors feel about using an English Wikipedia reference for the quality scale. Maybe it's OK for right now. --EpochFail (talk) 22:28, 11 February 2017 (UTC)
That is a good point. I think Lydia should have a say for this proposed criterion. --Glorian WD (talk) 08:39, 12 February 2017 (UTC)
Let's use it for now. It's the best we have. --Lydia Pintscher (WMDE) (talk) 14:27, 13 February 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── In another thread, Jane023 said:

As a visually oriented person, I do try to illustrate Wikidata items when possible, but the quality of the image is not a main target. Generally speaking any image at all is better than none, because generally the image has a wider reach than any text-based description. Any item might be linked from eight or ten other items, and it is this interlinked aspect that adds the most value to users. So an artwork item might have a very high quality without an image, because it has an extraordinary provenance or is cited as a source for lots of other copies, or because it inspired scientific journal articles for some reason, etc.

I totally agree. I think this is what we were hoping for with "if applicable". I think we should explicitly call out what Jane highlights as it is certainly a point of confusion. Currently, we have an "(if applicable)" in the criteria for images. We should extend this to "(if applicable and available)" and discuss a specific example in our section about High quality media. The example should be an item where an image would be relevant, but we're unable to have one hosted on commons for whatever reason. Maybe Jane023 could point us to a nice, clear example. --EpochFail (talk) 18:31, 14 March 2017 (UTC)

Class A[edit]

Is there any example for this? --Succu (talk) 22:18, 7 February 2017 (UTC)

Maybe Douglas Adams (Q42)? He is also the reasonator preview... Q.Zanden questions? 22:21, 7 February 2017 (UTC)
Missed External references for non-trivial statements as far I'm concerned. --Succu (talk)
hi Succu, I would say Douglas Adams (Q42) has a bunch of external references (references to non-Wikimedia projects) because the references on this item link to other sources than merely "Imported from Wikipedia". --Glorian WD (talk) 13:32, 8 February 2017 (UTC)
E.g. genre (P136) lacks a reference. I think this is a non-trivial statement, Glorian WD. --Succu (talk) 22:23, 8 February 2017 (UTC)
According to the existing criteria, all appropriate properties for this type of item have statements with: External references for non-trivial statements. So, since Douglas Adams (Q42) instance of human, probably genre (P136) does not fall under this specific criterion. --Glorian WD (talk) 14:28, 9 February 2017 (UTC)
If I understand you right „appropriate properties“ are domain specific (in this case a member of domain human (Q5))? Then by what means and by whom is this set of properties defined? Common sense (=no label (Q15812177))? Technical constraints? Community decision? --Succu (talk) 21:15, 9 February 2017 (UTC)
Correct, expected properties of human (Q5)). I would expect people to evaluate this using their common sense. Yes, there could be inconsistencies, but I would hope the classifiers that I am going to implement can make up for it. --Glorian WD (talk) 10:14, 10 February 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Succu, I'm not sure, but it seems like you are worried about people applying their judgement when evaluating the quality of items. One thing that I'd like to note is that this looks a lot like the criteria for Good Articles in English Wikipedia: "Provides a [...] complete description of the topic". What decides "completeness"? Well, that's both context and topic dependent. All bios need to have a certain set of things. That's very different from, say, a road, a building or an abstract concept (e.g. en:Externality). There can be no complete guide, so considering context and applying judgement will always be necessary. --EpochFail (talk) 22:36, 11 February 2017 (UTC)

The criteria is „Items containing all relevant statements“. The question is: which statements are relevant to rate externality (Q275372) as class A. --Succu (talk) 22:59, 11 February 2017 (UTC) PS: Good article (Q21167453)? --Succu (talk) 23:19, 11 February 2017 (UTC)
Succu exactly. That is the question we will ask people doing the evaluations to figure out how to answer on their own. They'll be able to draw from their intuitions, the set of property constraints, and other sources of information. This is just like the evaluation that editors perform for the Wikipedia article about en:Externality. There's nothing on Wikipedia that says, "An article about an abstract economic concept should cover topics A, B, and C." So the assessing editor is going to need to work that out with what they have available to them. --EpochFail (talk) 16:38, 14 February 2017 (UTC)
Sorry, but I think this proposed rating of item quality should be computable and not depend on subjective judement of people. I doubt Wikidata users are willingly to judge the quality of millions of geographical entities or taxa.--Succu (talk) 19:38, 14 February 2017 (UTC)
Succu, no one would have to judge the quality of "millions of geographical entities or taxa". Once we finish formalizing this quality scale, we hope to label a small set of items and train a machine learning model to label the rest for us. In the end, quality is inherently subjective, so I think we'll find that prematurely asserting computability will tie our hands. Let's let humans use their judgement and let the computer make it "computable" later. --EpochFail (talk) 22:23, 14 February 2017 (UTC)
EpochFail (aka User:Halfak (WMF)) your we means WMF? --Succu (talk) 21:41, 15 February 2017 (UTC)
"We" means me and Glorian WD as we find time for it. This isn't a WMF project. At least not yet. --EpochFail (talk) 22:19, 15 February 2017 (UTC)
Thanks for clarification that two WMF guys have a pet project. --Succu (talk) 22:36, 15 February 2017 (UTC)
@Succu: That's not really fair and I don't understand why you are so upset. Glorian has done an internship with WMDE. This has ended now and now he is working on this topic as part of his master thesis. He is not employed but getting support from me. Aaron is helping Glorian as well to get the basics right. I appreciate the work they're putting into Wikidata just like everyone else here. --Lydia Pintscher (WMDE) (talk) 12:04, 19 February 2017 (UTC)
Hey Lydia, somehow I missed your remark. I was upset because the relationship to WMF was unclear and I think you know how some people react to such circumstances. Personally I do not like broad ranking systems like this one. --Succu (talk) 19:43, 3 March 2017 (UTC)

high quality references[edit]

How do you define them? --Succu (talk)

How should a high quality references provided? Is it sufficient to use reference URL (P854) to point to an URL only or should it be done via stated in (P248) refering to a structured item with more details? --Succu (talk) 21:47, 16 March 2017 (UTC)
Just another thought: Once upon a time Cactaceae (Q14560) was a Project:Featured articles (Q16465) (= showcase item). How do additions from a Wikimedia project (sourced by imported from (P143)) like this (Wikispecies) or unreferenced addition based on an external id like this (Quora) have an impact? Shouldn't A class rated items undergo a continuous reevaluation? --Succu (talk) 19:35, 24 March 2017 (UTC)
@User:Glorian WD: Any thoughts? --Succu (talk) 19:38, 11 April 2017 (UTC)
Hey. Sorry, just saw your comments. First of all, I have revised the word "high quality references" on the page. Secondly, I am afraid I do not understand your concern. Could you please rephrase it? Thanks --Glorian WD (talk) 19:47, 11 April 2017 (UTC)
To solid references I assume, Glorian WD? --Succu (talk) 19:52, 11 April 2017 (UTC)


We might need to add a criterion which covers identifiers. For instance, check whether the identifier connects to the correct item in the corresponding database. Maybe Lydia Pintscher (WMDE) can suggest a good criterion for this? --Glorian WD (talk) 20:08, 14 February 2017 (UTC)

Glorian WD (aka User:Glorian Yapinus (WMDE)): Probably you know Wikibase-Quality-External-Validation? --Succu (talk) 21:51, 15 February 2017 (UTC)
I think something like "all external identifiers link to the correct concept in the other project/database/knowledgebase/..." would be ok. --Lydia Pintscher (WMDE) (talk) 10:24, 7 March 2017 (UTC)

Plurality of references[edit]

Something we noted on the last discussion ( Wikidata:Requests for comment/Data quality framework for Wikidata ) is that statement should preferably be referenced with multiple ( not correlated ) sources. I'd insert this as criterion for A items.
--- Jura 06:06, 5 March 2017 (UTC)

I think this makes sense. We don't want the same source for every statement in a high-quality item. --Lydia Pintscher (WMDE) (talk) 10:25, 7 March 2017 (UTC)
Ok. This has been added on the Notes section ;) --Glorian WD (talk) 16:51, 11 March 2017 (UTC)
BTW: Wikidata:Item quality#Plurality of references. I think we'll want to link to this from the criteria. --EpochFail (talk) 16:53, 11 March 2017 (UTC)
EpochFail: How do you weighting primary source (Q112754), secondary source (Q905511) and tertiary source (Q1063801) in your Plurality of references? --Succu (talk) 19:47, 11 April 2017 (UTC)
Succu, I'm not sure why you're calling me out specifically. I'm not weighing things. They aren't my "plurality of references". If you have questions about what sources are appropriate, see Help:Sources and Wikidata:Verifiability. If you'd like to extend the criteria to weigh source arity, I think it'd be prudent to start a new thread about it. --EpochFail (talk) 21:44, 11 April 2017 (UTC)

Wikipedia sources[edit]

I think the text should make it clear that Wikipedia sources may be on items, but simply aren't considered as references (when such are needed). The current text ("Some statements will be sourced with a reference to another Wikipedia or another Wikimedia wiki, but this is undesirable") may lead people to remove such sources.
--- Jura 06:06, 5 March 2017 (UTC)

Thanks for the suggestion Jura. To clarify, references to Wikipedia or other Wikimedia wiki are still considered as references. However, they have lower quality than external references (read: External_references). Thus, such references are not in the class A and B criteria.
Pertaining to the text (Note section), what wording that you would suggest such that people will not remove references to Wikipedia or other Wikimedia wiki? --Glorian WD (talk) 10:17, 6 March 2017 (UTC)
Somehow it might get clearer if the terms "references" and "sources" are used differently. We require references for some statements and try to include sources for automated statements. Confusingly, both end up in the same "reference" section, that was called "sources"-section.
--- Jura 14:43, 6 March 2017 (UTC)
Seems like Help:Sources does not differentiate the word "references". --EpochFail (talk) 23:09, 6 March 2017 (UTC)
I think it was written before "sources" was changed to "references" ;)
--- Jura 08:50, 7 March 2017 (UTC)

This was the version on the page:

=== Statement sources ===
See also Help:Sources
Most statements should indicate where the data comes from via a source. Sources are not required for undisputed common knowledge, for statements that refer to an external source of information (e.g. authority control), or when the item itself is a source for a statement (e.g. the author of a book).
==== External references ====
Some statements will be sourced with a reference to another Wikipedia or another Wikimedia wiki, but this is undesirable. Where possible, sources should reference external information sources.

However, please note that a reference to Wikipedia or other Wikimedia projects is still counted as a reference. But, the quality of such reference is lower than references to external information sources (i.e. external references).

Here is a rephrased version:

=== Statement references ===
Most claims should include a reference supporting it (this makes a claim "a statement"). Such references are not required for undisputed common knowledge, for claims that refer to an external source of information (e.g. authority control), or when the item itself is the reference for a claim (e.g. the author of a book).
Claims may also include the source from where the claim comes (e.g. "imported from Wikipedia"). Unless it's an external source, this has no impact on the reference requirement.

Please suggest amendments. Please feel free to remove any mention of "claim".
--- Jura 21:19, 11 March 2017 (UTC)

I've reverted your removal. That was unnecessary. You can discuss content on the page without removing it. If you want to make an update, make your edit. If someone disagrees, they can revert you and start a discussion. Do not re-revert them until the discussion has completed. This is the bold, revert, discuss cycle. --EpochFail (talk) 14:35, 14 March 2017 (UTC)
On the merits of your suggested changes, I don't see a clear improvement. What exactly is your issue with how the section is currently worded? It seems like linking to Help:Sources is essential for this section and you seem to have removed that. --EpochFail (talk) 14:40, 14 March 2017 (UTC)

High quality image[edit]

Not sure if this should be a requirement. Wikidata is not Commons and sitelinks don't need to be to featured articles either.
--- Jura 06:06, 5 March 2017 (UTC)

Having a statement with a high quality image (when applicablee) is a generally good data item. High quality images are characteristics of high quality Wikipedia articles. Why not Wikidata items? --EpochFail (talk) 23:12, 6 March 2017 (UTC)
High quality articles are highly desirable as well.
--- Jura 09:13, 7 March 2017 (UTC)
I think the image is a bit special though. It is used to represent the concept more than the article. Think of suggesters for example or Special:Nearby and similar services. --Lydia Pintscher (WMDE) (talk) 10:27, 7 March 2017 (UTC)
So image quality would depend on how its thumbnail shows key features?
--- Jura 10:38, 7 March 2017 (UTC)
The thumbnail would be important, yes. Did you read the criteria we currently have? It clearly states that the subject should be large and preferably by itself in the photo/image. These make for a higher quality thumbnail and other useful characteristics of an image. Please see the past discussion about media quality, #Media quality criteria --EpochFail (talk) 16:23, 7 March 2017 (UTC)
I removed the section for now. Apparently there is no consensus for it.
--- Jura 21:30, 11 March 2017 (UTC)
There was past consensus. Just because you have showed up and gotten confused does not mean that consensus has been lost. --EpochFail (talk) 15:00, 14 March 2017 (UTC)
I don't see a community consensus for this. It seems that it's just a staff proposal.
--- Jura 20:55, 14 March 2017 (UTC)

Item used as value[edit]

For class A and B items, one should ensure that these are used as value in a good/reasonable amount of items. This is something we tend to ensure for featured items and something that was brought up in the last discussion ( Wikidata:Requests for comment/Data quality framework for Wikidata ).
--- Jura 06:06, 5 March 2017 (UTC)

It seems to me that this would be a matter of applying property constraints, right? --EpochFail (talk) 23:13, 6 March 2017 (UTC)
No. Maybe @Lydia Pintscher (WMDE): can arrange for internal coordination and get basic questions sorted out prior to WMF staff commenting here.
--- Jura 08:48, 7 March 2017 (UTC)
At least a part of this is covered by constraint reports though via the inverse and symmetric value constraints for example. The question is if we need more of that in the criteria. --Lydia Pintscher (WMDE) (talk) 09:40, 7 March 2017 (UTC)
Montblanc (Q761735) was hardly used in other items when it was proposed as a showcase item. One could easily imagine that some or all items that are using it now would link to another duplicate class E item (cf. "From related items" on the Reasonator page I linked).
--- Jura 10:03, 7 March 2017 (UTC)
That's fair enough. Then the question left is: Is being linked to a quality feature of the item in question or of Wikidata as a whole? --Lydia Pintscher (WMDE) (talk) 10:28, 7 March 2017 (UTC)
Good question. One could also expand it in the opposite direction: imagine (for a given item) all item-datatype statements use duplicate class E items as values.
--- Jura 10:43, 7 March 2017 (UTC)
Seems to me that'd be a larger problem with Wikidata as a whole that would be resolved through merges. --EpochFail (talk) 15:55, 11 March 2017 (UTC)
I added an explanation about this to the critieria. How does it fit?
--- Jura 21:31, 11 March 2017 (UTC)
I don't think it makes sense at all. The presence of a duplicate item does not take away from the quality of an item. If it did, you could change the quality of an item without editing it and that seems deeply problematic. I don't think you've addressed this concern Jura -- or the concern that Lydia raised about this being a feature of Wikidata as a whole. I suggest you do that before making any more edits to the criteria. --EpochFail (talk) 15:17, 14 March 2017 (UTC)

Check for duplicates/merge[edit]

For classes A and B (and maybe C), the detailed guidelines should include that one should check for possible duplicate items. To some extent this is included in the description of what sitelinks should be there, but it should be there explicitly.
--- Jura 07:35, 5 March 2017 (UTC)

OK so this may be a fair point about duplicate items, but I don't see why an item would be any lower quality if there were a duplicate somewhere. What's your thinking here? --EpochFail (talk) 23:13, 6 March 2017 (UTC)
A part of them are also hopefully caught via the constraints on external identifiers. --Lydia Pintscher (WMDE) (talk) 09:41, 7 March 2017 (UTC)
Sure, some maybe. A quick search with the item's label should help identify them. Sometimes this leads to multiple items that each have identifiers from different sites.
--- Jura 11:00, 7 March 2017 (UTC)
As discussed above, this seems to be a property of Wikidata and not of the item. --EpochFail (talk) 15:55, 11 March 2017 (UTC)

Languages / translations of labels/aliases/etc.[edit]

For class C, I think it should require translations into languages with at least two different scripts (e.g. Latin and Cyrillic).
--- Jura 08:19, 5 March 2017 (UTC)

What's your thinking there? Why is this, specifically, important? --EpochFail (talk) 23:14, 6 March 2017 (UTC)
I think the point is that for names for example the language often doesn't matter to you being able to read them but the script does. --Lydia Pintscher (WMDE) (talk) 09:42, 7 March 2017 (UTC)
I see! That's an interesting point. Do you think we should have independent criteria for script coverage in labels? --EpochFail (talk) 16:51, 7 March 2017 (UTC)
I've added Wikidata:Item quality#Different scripts since we seem to all agree here. --EpochFail (talk) 16:52, 11 March 2017 (UTC)
In that case, I also restored the addition to class C.
--- Jura 21:32, 11 March 2017 (UTC)

Optional aliases[edit]

Not sure if it's of much help to mention aliases. A quick reading of the pages suggests that one has to add them, but looking into the detailed description the "if applicable" leaves it fairly vague.

Obviously, they are good to have, but I wouldn't mention them beyond class A.
--- Jura 08:19, 5 March 2017 (UTC)

Thanks for the insight Jura! do you mean the class A must have aliases for the most important languages? Personally, I am not sure if there are a lot of items which have aliases in the most important languages. --Glorian WD (talk) 10:10, 6 March 2017 (UTC)
I wouldn't change the wording for class A, but drop mentioning "alias" from all others.
--- Jura 14:15, 6 March 2017 (UTC)
Hmm, I am not sure dropping the alias criterion for all scales other than A. Perhaps Lydia Pintscher (WMDE) can give a decision about this. --Glorian WD (talk) 23:03, 6 March 2017 (UTC)
The "if applicable" seems to mean "if there are any relevant aliases to file". So I don't think I agree with removing it. maybe we should clarify it more. For items below class A, it seems that aliases are an easy threshold to cross. I feel like easy thresholds should exist in the lower quality classes. Am I missing something about aliases? --EpochFail (talk) 23:16, 6 March 2017 (UTC)

I rephrased it. I wonder if the class A samples meet the criteria for aliases. "Douglas N. Adams" isn't included in Portuguese and Spanish.
--- Jura 07:28, 11 March 2017 (UTC)

I don't see how your rephrasing makes sense and we certainly haven't reached consensus here. I see that Glorian WD has reverted your changes. Please don't make them again until we've reached a conclusion in this discussion. --EpochFail (talk) 16:05, 11 March 2017 (UTC)
Unfortunatly, we can't include the current version if there is no consensus for it. I will revert all additions about aliases.
--- Jura 21:09, 11 March 2017 (UTC)
I removed the stuff about aliases. What version can we agree on? Why had Q42 been a sample item without having the alias "Douglas N. Adams" in Portuguese and Spanish?
--- Jura 21:35, 11 March 2017 (UTC)
Past consensus stands. The first revert leads to a discussion. No more reverts until the discussion concludes. That's how it works. --EpochFail (talk) 15:20, 14 March 2017 (UTC)
Jura, I think Q42 (Douglas Adams) does have aliases in Spanish and Portuguese. --Glorian WD (talk) 15:27, 14 March 2017 (UTC)
@Glorian WD: Please re-read my question. The question is not if it has any alias in Spanish and Portuguese, but all.
--- Jura 20:34, 14 March 2017 (UTC)
what do you mean with "all"? do you mean Q42 should have aliases of all languages? Perhaps, it would be helpful if you rephrase your question. --Glorian WD (talk) 22:30, 14 March 2017 (UTC)
If it's still under discussion, there is no consensus for it. You can't just revert people that remove your additions. Unless it's an office action, I don't see why we should include this in a draft.
--- Jura 20:34, 14 March 2017 (UTC)

The other ..[edit]

As a sample, the page currently gives Donald Trump (Q22686) as a class "A" item. Personally, I think it needs a separate item for Donald Trump (Q27947481) to qualify. I think it's debatable if the two need to be interlinked with different from (P1889), but even if they aren't, both need to be in Wikidata. For classes A and B, this should be included in the detailed criteria.
--- Jura 08:19, 5 March 2017 (UTC)

I'm not sure I'm understanding you clearly. Do you think that Donald Trump (Q22686) can't be high quality unless Donald Trump (Q27947481) exists? Or are you saying that Donald Trump (Q22686) can't be high quality unless it is distinguished from other "Donald Trump"s? --EpochFail (talk) 23:20, 6 March 2017 (UTC)
I wonder how that would be different. What usecases do you have in mind?
--- Jura 09:11, 7 March 2017 (UTC)
I asked a specific question. I'm not sure what you are asking about re. "usecases". Maybe you intended to respond to a different topic. --EpochFail (talk) 16:49, 7 March 2017 (UTC)
I'd be happy to answer that, but I don't quite see for which usecases this makes a diffference. Can you clarify your question?
--- Jura 21:15, 9 March 2017 (UTC)
Jura1, I'm sorry for the confusion here. You're the one who proposed a change. I asked a clarifying question. And now you're asking me what usecases would apply. I don't know! You're the one who made an (apparently unclear) proposal! --EpochFail (talk) 16:08, 11 March 2017 (UTC)

Add an additional class (e.g. "F")[edit]

Items that don't meet the threshold of notability form another class: these should be deleted.
--- Jura 10:37, 5 March 2017 (UTC)

I thought the class "E" has served this purpose already? --Glorian WD (talk) 10:02, 6 March 2017 (UTC)
I don't quite see why you would want to delete the items listed under "E". Why do you think these don't me our notability criteria?
--- Jura 14:14, 6 March 2017 (UTC)
Notability is orthogonal to quality. You could have a very high quality item about something that is not notable. Does Wikidata even have a notability criteria? --EpochFail (talk) 23:23, 6 March 2017 (UTC)
Yes. At Wikidata:Notability. --Lydia Pintscher (WMDE) (talk) 10:06, 7 March 2017 (UTC)
Ahh! Of course. It seems that this just riffs off of the various other wiki's notability criteria via sitelinks -- which is what I meant to gesture towards. Still I shouldn't have brought it up because the assertion that "notability is orthogonal to quality" stands on its own. --EpochFail (talk) 16:47, 7 March 2017 (UTC)
  • I rephrased class E and included it. Obviously, if people agree that non-notable items can be A/B/C/D, we could remove it. Maybe a Lydia wants to comment on this?
    --- Jura 07:28, 11 March 2017 (UTC)
Hi Jura. I apologized that I reverted your changes on the criteria because I thought we have not reached a consensus on those changes. I think we should reach a consensus about those changes first prior to modifying the criteria. --Glorian WD (talk) 16:09, 11 March 2017 (UTC)
I think you also undid all other changes. Please don't add anything for which there is no consensus.
--- Jura 21:08, 11 March 2017 (UTC)

Community feedback ?[edit]

It seems there is quite a lot of discussion among staff (or private accounts of staff members and staff members). Maybe some input from the community should be sought as well.
--- Jura 06:06, 5 March 2017 (UTC)

Hi Jura, I have started a discussion in the project chat (Wikidata:Project_chat/Archive/2017/02#Quality_Criteria_for_Building_a_Tool_to_Evaluate_Item_Quality) prior to moving here. --Glorian WD (talk) 09:50, 6 March 2017 (UTC)
It would probably help if a single staff member was assigned to commenting here. I think it's highly annoying having to explain Wikidata to WMF staff, people who are paid to support Wikidata. @Lydia Pintscher (WMDE): can you arrange for coordination within WMF/WMFDE?
--- Jura 08:43, 7 March 2017 (UTC)
Jura1, no one is asking you to explain Wikidata to them. We're asking you to explain your opinions. --EpochFail (talk) 16:43, 7 March 2017 (UTC)
In that case, Lydia Pintscher (WMDE) and myself might have misunderstood your question above ("Does Wikidata even have a notability criteria? --EpochFail (talk) 23:23, 6 March 2017 (UTC)").
--- Jura 23:04, 9 March 2017 (UTC)
Linking me to a policy does not "explain Wikidata". If you're done attacking me for asking for a link, maybe we can focus our discussions on Wikidata:Item quality. --EpochFail (talk) 16:34, 11 March 2017 (UTC)
Well, it's fairly basic for Wikidata and you might want to read it in more detail.
--- Jura 21:24, 11 March 2017 (UTC)

"At least one sitelink (if applicable)" (class D)[edit]

Not quite sure what that would mean.

  • Class D items should have 1 sitelink?
  • Items with 1 sitelink are class D?
  • Items where all sitelinks, but 1 are on a second item?

The last would meet the definition, but I'm not sure if this is what is intended. Maybe we should just remove it.
--- Jura 07:28, 11 March 2017 (UTC)

Items in grade D should have at least one sitelink, meaning they can have 1 or more sitelinks. --Glorian WD (talk) 16:17, 11 March 2017 (UTC)
If it's not applicable, what then?
--- Jura 21:07, 11 March 2017 (UTC)
If the item meets the other criteria on D, then it will fall on D. Else, it will fall on E. Anyway, do you think "if applicable" is potentially confusing people and should be removed? --Glorian WD (talk) 14:10, 14 March 2017 (UTC)
I don't think a number should be included. Maybe the general problem is that there is a relative/absolute way to see this: Wikidata:Project_chat#Item_quality_and_the_number_of_sitelinks.
--- Jura 20:47, 14 March 2017 (UTC)
Ok. So I guess your concern is more to the word "one" sitelink right? you think that scale D should not mention any number of sitelink. Probably, it should be rephrased just like the other scales by using the word "some", "most", etc. Do I correctly understand your concern? --Glorian WD (talk) 22:39, 14 March 2017 (UTC)
It seems to me that "one" and "none" are two very useful numbers. I don't see the problem. Maybe Jura1 could provide an example of an item or two that might be inappropriately classified using these constraints. --EpochFail (talk) 13:08, 15 March 2017 (UTC)

some brief comments about quality[edit]

I think this is an interesting effort, even if all it does in the end is to show how difficult it is to judge the state of completeness of any given item. I think blanket assertions such as "All available information is recorded with reliable references" is of course unrealistic. We will never know if we have all available information or whether the information we have is in fact relevant. I mean is it relevant that we have the brother linked in an item about an archeologist? We may have it, but that has nothing to do with that person's claim to fame, namely archeology. So would we delete that statement in order to make the item conform to some theory of relevance? No. On the other hand, if all we had on an archeologist item besides Q5+gender was a single statement with a link to their famous brother, then I think the item might not qualify for inclusion in Wikidata, even though the item agrees with our "notability rule" of one link away from something notable. On the subject of label translation, I disagree that we need 7 translated labels for quality. Reach is a different concept than quality in terms of data, though I am inclined to agree that the main purpose of Wikidata is to increase "findability" rather than increase knowledge directly. On the subject of images, for the case of modern art we lack the images on Commons due to copyright restrictions, and this should not impact quality. Though good images may be a bonus, it would be ridiculous to only promote those artworks for which we have high quality images. Art quality doesn't work that way and neither should Wikidata. On the subject of sitelinks, a high number only shows relevance to the wikiverse, but we should be able to measure relevance to Wikidata, i.e. incoming links. So for some specific story from the Bible it would be interesting to measure the number of domains that link to it (art/architecture/literature etc). Just my 2c. Jane023 (talk) 19:26, 11 March 2017 (UTC)

@Jane023: Thanks for your insightful feedback. I think it's good to have more Wikidata contributors comment on it.
  1. Interesting comment about "all available information" and "all relevant statements". It does sound very ambitious for class A. BTW I think your sample archeologist would have human (Q5) as type and be required to be complete compared to all other people.
  2. I think translations into the 7 listed languages are only required for class A, so I think it's "not unreasonably" ambitious for these items. Maybe User:Pasleim/Language_statistics_for_items proves me wrong. It suggests that the gap is most likely in Arabic.
  3. The image quality criteria do seem odd. Obviously, we shouldn't pick a bad image if better ones are available, but I don't think once a reasonable pick is made this should have further impact on the item quality. If none is available, I guess it's not "applicable". As there seems no consensus for this, I removed it for now. Please see #High_quality_image above.
  4. The parts about sitelinks are fairly vague. I tend to think that they are always complete if someone actually did some consistent editing on an item. The main risk are duplicates (IMHO), apparently some (non contributing) user doesn't want to have this checked systematically (see #Check_for_duplicates.2Fmerge above). The only specific number of sitelinks is mentioned at class D. I don't quite get that, see #.22At_least_one_sitelink_.28if_applicable.29.22_.28class_D.29 just above.
  5. I added a suggestion about links from other items: see #Item_used_as_value above. You might want comment there.
    --- Jura 22:16, 11 March 2017 (UTC)
Hi @Jane023:. Re. all available information, we're trying to mimic what's done in Wikipedia regarding articles. Specifically, the Featured Article criteria states that FA quality articles are "comprehensive: it neglects no major facts or details and places the subject in context;". I think we're going for something similar here. It's hard to judge that an item has "comprehensive" coverage, in the same way that it's hard to do for an encyclopedia article, but yet, it's what we strive for.
Regarding the image quality, I think you might have not read the "if applicable". Maybe we should change that to "if available". If you'd like to discuss the media quality criteria more, I suggest we continue in the relevant section and ping the people who formed the last (arguably light) consensus. Please see #Media quality criteria. --EpochFail (talk) 15:12, 14 March 2017 (UTC)
Well my main problem is in fact the idea that you can copy any such quality model from one of the other projects (Commons or Wikipedia) and expect that to work for Wikidata. What is considered comprehensive for one language will not match what another language wants or needs to know. Especially for places, so e.g. in nlwiki many Dutch city articles have their public transit lines listed, but this is information that casual readers in other languages most probably don't need to know. I guess the main reason I edit Wikidata is because it enables me to track my progress easily in the domains that interest me. Over time, it has enabled insights that wouldn't have occurred to me otherwise. So for example I had no idea that most art museums contain less than 5% artworks by women in their holdings. I also assumed that the "greatest works" of any artist would be the same in any language wiki. That said, it is also fascinating to see that most artworks held by Western art museums are in certain fashionable art movements such as "impressionism". I guess what Wikipedians will choose to write about reflects what art museum goers want to see in their locale. In any case, I would say the main value of Wikidata is enabling queries in any language, and having references per statement in some monolingual language along the lines of the Wikipedia Featured Article requirement is not in fact very useful to Wikidata users who don't speak that language. Even adding a reference at all is not easy, unless you are adding a link. Most references on Wikipedia are from books, and those books don't have Wikipedia articles. To make a reference in a Wikidata statement to a book, you first need to create an item for the book. The same goes for external ids, though I guess certain large databases might be beneficial to more than one language. As a visually oriented person, I do try to illustrate Wikidata items when possible, but the quality of the image is not a main target. Generally speaking any image at all is better than none, because generally the image has a wider reach than any text-based description. Any item might be linked from eight or ten other items, and it is this interlinked aspect that adds the most value to users. So an artwork item might have a very high quality without an image, because it has an extraordinary provenance or is cited as a source for lots of other copies, or because it inspired scientific journal articles for some reason, etc. Jane023 (talk) 16:16, 14 March 2017 (UTC)
It seems like we're continuing to discuss the image quality outside of the #Media quality criteria topic. I'm going to move some of your statements there and respond. I hope you'll follow me there. I'll respond to the rest of your comment separately. --EpochFail (talk) 18:32, 14 March 2017 (UTC)
Surely you mean to say that we can't copy a quality model from other projects and just expect it to work. I certainly agree. That's why we're iterating on a lot of Wikidata-specific concerns here. I think you're hitting on a difference between what the expected coverage of a topic is between, say, English Wikipedia, Dutch Wikipedia, and Wikidata. I think that's just fine. Of course, this is Wikidata after all and we get to define this in ways that make sense to the work we do here. We get to set out own definition of what "complete" or maybe "comprehensive" mean in this context. I'd say that there's a strong weight put on a set of expected/common properties. e.g. An item with instance of (P31):human (Q5) should have sex or gender (P21), but only might have doctoral advisor (P184). It seems that criteria like this have found easy consensus (there's a lot of tooling & research around property recommendation). But you're pushing on the idea that there's any judgement at all that can be applied to items to say whether or not they are roughly "complete"/"comprehensive". I think that someone, at some point, applied this judgement to the showcase items (e.g. Douglas Adams (Q42)). I'm really hoping we can capture some aspects of that judgement in the criteria, but at a bare minimum, we can say "this judgement must be made" (which roughly translates to "someone thought this was comprehensive for some definition of 'comprehensive'"). --EpochFail (talk) 18:44, 14 March 2017 (UTC)
Yes I am saying we can't copy quality models from other projects. No I am not saying we can't judge anything. I am just not convinced we can measurably judge anything yet. Showcase items like the one you linked or I linked were and I guess still are used a lot for newbies to help them understand how statements work with qualifiers and properties. They are not the same thing as quality indicators in the sense that you can measure those individual statements on those items as a measure of quality. When it comes to questions of quality, most people associate this with a usage score, i.e. is the thing fit for purpose etc. When you give a person a hammer and no other informmation or tools however, everything around them starts to look like a nail. When you look at one of these showcase items in specific domains, and then change your language to something other than English, like Macedonian, it can be very educational to see the number of Q numbers for labels on properties and qualifiers. The way an item hangs in a hierarchy of things is important to the way we absorb information, and it is this "being anchored in the hierarchy" that should be measured, not the typical "is this statement sourced with three references" type of Wikipedia thinking. So we should identify key hierarchies (ones that ARE translated into lots of languages) and then somehow measure an item's placement within such hierarchies as a measure of quality. Jane023 (talk) 11:29, 16 March 2017 (UTC)
EpochFail: we're trying to mimic what's done in Wikipedia regarding articles. That's the problem. It doesn't work's here that way. --Succu (talk) 22:40, 14 March 2017 (UTC)
That's one opinion. I don't think that the term "mimic" captures what we're doing here. I'd say we're inspired by what Wikipedians do WRT articles. In the thread that we posted on Wikidata:Project chat month ago, some folks asked specifically for something that looked like enwiki's wp10 scale and that makes sense to me too. We even have Lydia Pintscher here helping us. So, obviously you don't speak for me or the rest of the Wikidata community. It seems like we've been doing a pretty good job of capturing an ordinal scale representing quality so far and I have faith that we'll be able to keep iterating on it to make it better. If you're ideologically opposed to this effort, you don't need to help us. That's OK, of course. --EpochFail (talk) 13:04, 15 March 2017 (UTC)
You used the term mimic, not I and I'm not ideologically opposed to this effort. As Lydia probably can confirm you I was concerned about Wikidatas reliability from the beginnings of my involvement to this project. I simply don't think your approach (vote on a scale for A to ?) helps to make WD more accepted within the Wikipedia communities. --Succu (talk) 22:21, 16 March 2017 (UTC)

Additional class "NA" ?[edit]

For category, template, disambiguation, Wikinews page and Wikisource page items, I'd create an additional class. I don't see much benefit in attempting to measure their quality. A flag could probably be calculated.
--- Jura 15:14, 12 March 2017 (UTC)

In English Wikipedia, they have a separate quality scale for "lists" and I do believe that "disambiguation" pages get excluded. Could we have have "NA" appear in a template somewhere? I don't think it belongs on this scale. However, maybe we could have a statement about which items are intended to be rated with this scale. What would you call the items that are not category/template/disambig/etc.? --EpochFail (talk) 14:58, 14 March 2017 (UTC)

Page protection[edit]

It seems that Ladsgroup has protected the page due to edit warring. If we could all take a breath and have some calm & productive discussion, then I'm sure Ladsgroup will lift the protective status. Ladsgroup, what would you like to see here before you unlock the page? In the meantime, I guess we'll ping you about changes that reach consensus. --EpochFail (talk) 13:57, 15 March 2017 (UTC)

Number of sitelinks[edit]

It seems that Jura1 has posted on Project_chat about the number of sitelinks. It seems that he's asking a question about "relative" and "absolute" measurements, but I think the real question is this: What does it mean when we say "All appropriate sitelinks to corresponding Wikimedia projects"? Does it mean that an item must have a sitelink-able an article on all ~900 wikis or it can't be A class? Or does it mean that all projects with a relevant article have a sitelink? I think that it's 100% clear to us (the authors of this page) that we mean the latter, but that's not clear to a reader in the wording of the criteria.

I propose that we change "All appropriate sitelinks to corresponding Wikimedia projects" to something like "All appropriate sitelinks to corresponding pages that exist on other wikis" (with emphasis). Alternatively, we could have a section in the notes linked from "appropriate sitelinks" that goes into detail about which sitelinks are required, but I imagine that would be a very short section that essentially says the same thing. --EpochFail (talk) 14:30, 15 March 2017 (UTC)

This is the wrong approach[edit]

Any item is only relevant given the items it is connected to. It follows that many items are relevant because the amount of items that connect to them. For instance for a university, the number of items for "educated at" and "employed at" are certainly more relevant than the plain facts. When you talk about quality; the quality is not only in the numbers but also in the comparison with other sources. When Wikidata has more relations than a Wikipedia category for instance, it has a higher quality.

Quality exists in the comparison with other sources. When other sources confirm what Wikidata indicates, it is a sign of quality. When the level of detail is more precise in Wikidata than in an external source (some sources only indicate the year for birth and death) it is a sign of quality. When Wikidata differs from other sources and it has sources for the statements that differ it is a sign of quality.

The benefit of an approach like this is that quality becomes actionable. When Wikidata differs from other sources, it is an invitation to seek clarity find sources and curate both Wikipedia and the external source. Static quality like proposed will do us no good. Thanks, GerardM (talk) 08:47, 17 March 2017 (UTC)

You say that "many items are relevant because the amount of items that connect to them.", but I think you might be confusing "quality" with "importance". Here's a copy of a response I gave to you about this topic in another thread.
GerardM said "the value is in the connections between the items and not in the item itself" and I think we're in agreement here. Statements connect items directly and like sets of statements (E.g. instance of (P31):human (Q5) and occupation (P106):singer (Q177220)) allow interesting cross sections to be drawn across items. If you're talking about counting the indegree of an item's links, I think we're talking more about "importance" (see PageRank for a deeper discussion of network theoretic importance measures). I'm working on that too. See m:Research:Understanding Wikidata's Value.
When you say "Quality exists in the comparison with other sources. When other sources confirm what Wikidata indicates, it is a sign of quality." I think you're stating a basic tenant of Wikimedia validity that Wikidata:Verifiability captures pretty well. When we assess quality, I believe that making sure that claims are not just sourced but that the source actually supports the claim is implied at a more basic level.
You keep saying "actionable" and you say the grading scale described here is "static", but what do you mean by that? How is quality not "actionable"? How is the scale "static"? --EpochFail (talk) 16:57, 29 March 2017 (UTC)
The problem with the word "sources" is that there are two. There are sources like books, papers, etc. Also there are the Sources like a Wikipedia article a VIAF etc who make statements with or without attribution to a source. When Wikidata is in alignment with another Source, it follows that the statements are the same. This implies that we can concentrate on the things where we differ. This is a much more useful application of our time. This is what I mean by actionable. With all the items that we have, there is more value in curating the differences, finding sources for what is right. In this the lack of statements in Wikidata can easily be remedied by adding them from a Source (it is what we do all the time, from Wikipedia, from Swiss Prot) and it does not diminish our quality really.
I do not subscribe to your notion of "importance". An item for a person may be necessary to complete the people who won an award, who are the founders of a fraternity or are the venue where an artists had a show. When you compare an item with an article, the links in the article versus the statements in an item, it would be great when we can model them all. THAT is for me the level of quality we should strive for. THAT requires that we have an item for each red link. THAT will improve quality. THAT will allow for a better comparison between articles in different Wikipedias as well. It will prevent a large amount of the faulty wiki links that I find regularly (more than 1%).
The current grading scale is static because in the way it has been presented so far it is all about the individual item. I do not care that much about the value in the grading scale of a single item. Only once the value on a scale is queryable in combination with a data set, it becomes possible to use it as a tool that allows for determining what to do for the people who have an interest in a subset. For instance, what people who have "catalog" "Black Lunch Table" have no "profession" and have a low value in the grading scale, here you find the easy wins.
Personally I care more about "Sources" and less about "sources". That is for other people with the typical Wikipedia outlook on quality but at the same time I do really applaud all the tools that are being employed for adding source statements. You will get more people enthused to work on this when they can target for the subset they care for so I hope that these tools enable the targetting for a subset. Thanks, GerardM (talk) 19:04, 29 March 2017 (UTC)
GerardM, it seems that you are arguing for a practical process for maintaining consistency between Wikipedia and Wikidata (Sources). I do not see how that is at odds with the quality scale that is being proposed here. I'm a fan of consistency as I'd imagine others are. It seems that verifiability is also critical. I don't see how they are at odds.
Re. importance, this is not my notion, but rather a field of people who have been building up measurement theory for years. It seems that it would help you make sense of this stuff if you'd familiarize yourself with what we know about the signal that can be extract from indegree. This is why I linked you to articles about it.
FWIW, I think we're on the same page with regards to collections of like items. See the discussions below. I'm not sure what to say other than I agree with you that *collections are important*. Having all a complete set of items in a collection above a certain quality level means that the collection is highly valuable as a dataset. That's the point of this scale. It's actually not that unique to Wikidata. WikiProjects in Wikipedia look at collections with the same group-wise view. It's better to have a C-class article for all scientists than to just have a few Featured Articles.
The quality model we're building will allow you to target subsets of items that are mostly complete or completely lacking of something important. --EpochFail (talk) 20:58, 29 March 2017 (UTC)

Language support[edit]

When language support is considered to be a quality indicator, the question should be why? When it is just to collect labels, Amir has a bot that he runs regularly that ensures that a label is added when a Wikipedia has an article. When language is to be relevant, make it relevant and support text generation based on Wikidata statements. This has been done in several language in Reasonator for many years and it is a reason to add statements.

Limiting the number of languages to just a few and at that the biggest is a missed opportunity. Again; quality should be actionable. When labels are added it should mean something. I use Reasonator because it makes a difference to add statements when I disambiguate. I add labels when I find that I easily can. I do it because added quality means additional benefit.

Quality needs to be actionable and should no be static. Thanks, GerardM (talk) 08:53, 17 March 2017 (UTC)

"I add labels when I find that I easily can. I do it because added quality means additional benefit." it seems you answered your own question. See past discussions /Archive_1#Languages, /Archive_1#Grading_scheme, and /Archive_1#Translating_the_text for why we are pushing on "important" languages. --EpochFail (talk) 16:18, 29 March 2017 (UTC)
Struck out one link because it's not relevant after all.--EpochFail (talk) 16:49, 29 March 2017 (UTC)

Pilot Campaign[edit]

Hi Q.Zanden, Jane023, Tubezlob, Nemo, Multichill, Jheald, Abián, Alessandro Piscopo, and others that I may miss to mention!

EpochFail and I have launched the pilot campaign to test the Wikidata:Item quality scale. We've loaded 250 items into Wiki labels and we're hoping you'll help us try to label them using the current version of the quality scale. To participate in the pilot campaign, you can go to, and click “Request Workset”. Afterwards, you can grade items using the criteria described in this page. This shouldn't take us long to do. Once we're done, we'll report some statistics about our agreement with regard to the grading scale and iterate from there.

Thanks for your help! --Glorian WD (talk) 10:43, 22 March 2017 (UTC)

I did one set of ten, but I encountered a few very specific biological terms without any sitelinks, but with lots of statements and referemces. But I didn't know what grade I had to give it. Maybe there should be some more specific rules about this type of biological terms... (item Q27592994). Also most of the scientific articles do not have many sitelinks, but can be very comprehensive. How to handle with these kinds? Q.Zanden questions? 14:31, 22 March 2017 (UTC)
Hi Q.Zanden! thanks for helping! I hope you will label some more items too :). I'd say biological terms without any sitelinks but have lots of statements and references, could fall into class "D" since class "D" may not have sitelinks (note the "if applicable" on the corresponding criterion). Nevertheless, I would say that you can rely on your intuition in determining the grade for such items. In my opinion, it makes sense that very specific biological terms such like Q27592994, does not have any sitelinks. I am not sure if we should create the article for very specific biological terms. What do you think regarding to creating articles for such biological terms EpochFail?--Glorian WD (talk) 20:01, 22 March 2017 (UTC)
Items like this manifest the difference between a Wikidata entity and a Wikipedia article. --Succu (talk) 21:00, 22 March 2017 (UTC)
I'm not a fan of setting a minimum sitelinks criteria. In this case, there would not be any applicable sitelinks, so zero would be just fine. --EpochFail (talk) 22:07, 22 March 2017 (UTC)
So what's your rating for Newbury-1 virus (Q18967535)? --Succu (talk) 22:23, 22 March 2017 (UTC)
I'd say "D". What do you think? --Glorian WD (talk) 22:47, 22 March 2017 (UTC)
„containing all relevant statements, with solid references“ it's A --Succu (talk) 22:56, 22 March 2017 (UTC)
I would argue it won't be an "A" item because it lacks of translation. --Glorian WD (talk) 11:05, 23 March 2017 (UTC)
That's quite a difference in grade. But I would say it's C. There are a few important statements, but I think there are more that apply to this virus. Q.Zanden questions? 23:02, 22 March 2017 (UTC)
Q.Zanden, there is allways a more or better involved. Do you miss statements that could be mapped to existing properties? --Succu (talk) 22:29, 24 March 2017 (UTC)
Hey :) just to clarify: When I added the sitelink criteria to the showcase item page ages ago my intention wasn't to enforce that there have to be articles on Wikipedia or any other project about the topic. The intention was to make sure we have links to all articles that already exist. If none exists that is fine. No new articles need to be written to improve the quality of the item. --Lydia Pintscher (WMDE) (talk) 10:50, 23 March 2017 (UTC)

I have an idea, how about adding "if applicable" to the criterion related to sitelinks on all quality scales? that way, specific biological terms could fall on higher quality scales (i.e. A or B). Using this modified criterion, biological terms which have comprehensive set of statements, references, translations but no sitelinks, could fall on class "A", because it may not applicable for these items to have sitelinks. --Glorian WD (talk) 11:03, 23 March 2017 (UTC)
Yeah. And maybe reword it to "if the article exists" or something? --Lydia Pintscher (WMDE) (talk) 11:20, 23 March 2017 (UTC)

I encountered also a redirect item. Q.Zanden questions? 22:18, 22 March 2017 (UTC)

Yeah. This has been noted. Thanks for the feedback Q.Zanden! --Glorian WD (talk) 22:31, 22 March 2017 (UTC)

Bad approach according to my experience[edit]

The proposed approach which targets to use only 5 categories is useless in quality improvement. The reason is the mixing of 3 differents criteria in one system.

Just take the description of class A: "Items containing all relevant statements, with solid references, and complete translations, aliases, sitelinks, and a high quality image".

Nice description but please explain us how you classify "Items containing all relevant statements, with no references, and complete translations, aliases, sitelinks, and a high quality image" ? Class A or B ? Or E ? Having three parameters mixed in one final classification can lead to very different situations mixed in the same class.

For example how can I differentiate an item having all relevant statement but without references from an item having most of relevant statements but with some references. The number of references in one compensates the number of relevant statements. How the proposed classification can distinguish between these different cases ?

Second problem, the use of the quality classification for improvement purpose. If I am a contributors speaking several languages, I can be interested in improving the quality of items by working on items labels and descriptions. How can I select the classification to target only the items with very few translations and not the items having few relevant statements or few references ?

My proposition is to use different classification of each parameter. For example 3 classes for the number of relevant statement, 3 for the number of references and 3 for the number of translation for labels and description. This leads to a global classification of 27 classes. For simplification, these 27 classes can be then grouped into 3-5 groups but in any cases we can retrive the original classification based on 3 independent marks. Snipre (talk) 21:17, 22 March 2017 (UTC)

- Relevant statement Reference Labels and descriptions
Class A >90% of the relevant statements are present >90% of the statements have at least one reference >90% of the labels/descriptions are translated
Class B >50% of the relevant statements are present >50% of the statements have at least one reference >50% of the labels/descriptions are translated
Class C <50% of the relevant statements are present <50% of the statements have at least one reference <50% of the labels/descriptions are translated

This leads to a multievaluation like AAA, AAB, ABB,... Then all combinaisons composed of at least one A and without C (AAA, AAB, ABA, BAA, ABB, BAB, BBA) can be grouped in a superclass A. Snipre (talk) 21:51, 22 March 2017 (UTC)

Hi Snipre, this makes sense and I agree that quality is certainly multidimensional. However, it's so useful to put quality on a single scale as that's how we've been doing it everywhere else for years in all the other wikis. Have you ever had experience with a multidimensional rating scale for quality? Was it on a wiki? I'm a big fan of keeping things simple, but I think that there is certainly merit to your concerns about dimensionality. In my personal opinion, an item with lots of relevant, but unsourced statements *is* low quality but it would be *better* if we could differentiate that item from an item with few relevant, but sourced statements.
One cool thing that we can do with this is continue to label items on a single dimensional scale and allow the model to tell us what criteria seem to be holding an item back from a higher quality level. We're working on that right now for the article quality models in Wikipedia. See Phab:T158679. The cool thing about this strategy is that we don't have to think of all of the dimensions ahead of time. --EpochFail (talk) 22:17, 22 March 2017 (UTC)
So you agree but because it works well for Wikipedia means that it is still better to "keep it simple". The other projects were wikipedias and not Wikidata. It is different and for a reason that easily enables the wrong kind of behaviour you persist? Thanks, GerardM (talk) 05:19, 23 March 2017 (UTC)
GerardM, I think the importance of maintaining simplicity applies on broad subjects, not only Wikipedia or Wikidata. In the meantime, I have tried to label some items on the pilot. I learned that a more complex criteria could potentially overwhelm editors in labeling items, something that we all want to avoid. Speaking of wrong behavior, I would humbly think you should also reflect on how you express your opinion to EpochFail. --Glorian WD (talk) 15:51, 23 March 2017 (UTC)
When there is a problem with overwhelming editors, it is a machine that will apply those same complexities. The problem is that we face is that a flawed model will allow for deletion arguments for items that have a direct bearing on quality. When a professional like EpochFail insists on a flawed approach and insists that "we are to blame", he can be and should be called to answer for the arguments in his position. When he is responsible for your experiment, he should accept that when he does not reflect on the arguments and insists that the model is valid he gets in equal measure of what he has been giving out.
For you and a team (that may include Aaron) to work on an initial approach to individual quality assessment is no problem. What is not acceptable is calling it valid based on a set of data that does not protect the data from policies but is based on a mistaken importance given to an initial set of data that is difficult to grade by trainers.. It is not acceptable because policies invariably arise from such a quality approach. Thanks, GerardM (talk) 06:16, 24 March 2017 (UTC)
The table looks fine to me, but one thing: all disambiguationpages and wikimedia-categories are automatically translated into all languages. That means that they would all be in superclass A because of >90% of the labels and descriptions are translated. I have my doubts if that is what we want with these sort of items. Q.Zanden questions? 22:23, 22 March 2017 (UTC)
@QZanden: No, because there is no reference for disambiguationpages and wikimedia-categories item so you will get a A for the labels/descriptions but a C for the reference. If we assume a A for the relevant statements, the score is ACA, so not part of the superclass A but merely part of superclass B. Snipre (talk) 22:09, 23 March 2017 (UTC)
@EpochFail: Just have a look at the current classification used in Wikipedia: just as example this evaluation table for chemical articles in WP:en.
Two parameters are evaluated for each article: the importance of the articles and the quality of the articles. The table allows you to access to the category you want without any need to have one unique quality value.
And if you read again my proposition, I didn't forget the importance to have one value for communication purpose (read again my proposition of superclass) but we should be able to use this information. However your target to have only one value without having the possibility to go deeper in the analysis to extract how the quality assessment of an item was obtained. And as you propose to use different parameters, you have to provide the influence of each parameter on the final value. Snipre (talk) 20:10, 23 March 2017 (UTC)
Hey Snipre. I think I see the point of confusion. We're not intending to evaluate the importance of a Wikidata item -- just it's quality. As you say, importance is independent of quality. I'm working on an independent analysis of importance in Wikidata and Wikipedia. See m:Research:Automated classification of article importance and m:Research:Understanding Wikidata's Value. I think we'll want to continue to look at these attributes independently. When it comes to nuanced quality, I agree that we can and should discuss the influence of each parameter on the final value. I think you're looking at the scale as though it is supposed to be mechanically applied. Personally, I look at it like it is a set of guidelines that are intended to be discarded (or fixed) when they don't make sense. The machine learning process will eventually learn weights (influence) that should be applied to reflect human judgement and those will, of course, be reported. --EpochFail (talk) 17:05, 29 March 2017 (UTC)
Symbol support vote.svg Support User:Snipre with multi dimensional ratings
at least statements/references should be threated differently. Because how data internally organized at Wikidata.
  • References are added using special tools at Wikidata (standard tools are quite slow to perform mass edits)
  • We have additional steps (statement nodes-reference nodes) to access references
Suggestions for statements are not directly applicable to references. d1g (talk) 07:38, 18 May 2017 (UTC)

m:Research:Wikidata gap analysis[edit]

I don't remember if this was linked. --Nemo 23:47, 22 March 2017 (UTC)

Wikipedia references are apparently OK for images[edit]

The article says "Some statements will be sourced with a reference to another Wikipedia or another Wikimedia wiki, but this is undesirable".

Actually, it is not always true: Wikipedia references are apparently OK for images. See

Could someone with write permission please mention that?

Thanks! Syced (talk) 09:18, 23 March 2017 (UTC)

Hi Syced, it seems the criteria does say "note that a reference to Wikipedia or other Wikimedia projects is still counted as a reference" and it explicitly links to Help:Sources for more information. It doesn't seem like the conversation you've linked to has come to a conclusion. In fact, it seems like JakobVoss has laid out several options for how to source an image for an item. I don't see any conclusion that suggests which option should be considered policy. Still I think you've brought up a good point about explicitly saying that there are exceptions to the need for "external" sources so I've edited the page to clear say "See Help:Sources for exceptions." --EpochFail (talk) 13:59, 5 April 2017 (UTC)

From showcase items to showcase datasets[edit]

(... or, at least, to viably useful datasets)

To an extent this echoes what other people have already written (and some at length), but -- especially compared to Wikipedias -- I think it is worth noting that for many of us, I think our natural focus and programme of activity here is centred on the dataset (ie a group of items), rather than the individual item. In that respect, individual editors here are perhaps more like WikiProjects on Wikipedias than individual editors there, and the kind of progress readout we may be most interested in, or most motivated to put on sub-pages of our user pages, may be more like the broader progress tables such as eg this for en:WP:Physics rather than progress on individual items. That's the kind of thing where it might be interesting to see eg month-by-month rating comparisons, that might actually be motivating.

Typically such sets will be defined by the intersection of one or more property-value pairs, which may vary quite widely from editor to editor. For example User:Jane023 has done amazing work focussed on various sets of items with creator (P170) = a particular artist; User:GerardM has recently been very involved helping activity around items with catalog (P972) = Black Lunch Table (Q28781198); various botanical and biological specialists will work on a group of items at a time defined by parent taxon (P171)* = some taxon; I have recently been working on items with instance of (P31) = civil parish (Q1115575), and other types of UK administrative entity. These are the kind of sets that it would be useful to be able to produce time-series (or even just snapshot-sequences in page histories) to show evolution in overall quality as a whole for.

Another thing I would say is that, unlike a Wikipedia where editors get very involved in getting single items up to Good Article or Featured Article status, I am not so interested in the higher ranges of your quality scale. Rather than the ratings for the best items, I am much more interested in where the rating for the 90% or 95% or 99% percentile down towards the bottom of the set comes in, and to try to raise that -- because that (to me, with a personal interest more in queries than in infoboxes) is the difference between a usable and a non-usable dataset.

One thing that worries me though, with a dataset of maybe ten or twenty thousand items, is that one could spend eg a month and make a lot of edits -- have QuickStatements going round the clock -- yet after all that, one might still only have turned a largely unworked-on E dataset into perhaps only a D, or maybe a C-minus -- potentially quite dispiriting for so much work. What would be motivationally useful would be to have much more detail, than just the single atomistic scores: to be able to see, plotted as a progress bar on that group review page, how each component element that fed into the overall score was doing across the aggregate -- eg how complete are each of the key properties (and which are judged "must-have"/strongly expected); which have the most constraint conflicts; which have statements that are over-broad/insufficiently precise. This would feel like a much deeper assessment; and at the same time, for a big dataset, just to push up one of those quality bars would feel like a recognisable (and, now, recognised?) achievement -- much more motivational.

Another thing that is very important for such a dataset is the idea of completeness (and, also, over-completeness -- what is being returned with the dataset that perhaps shouldn't be). It's hard to judge, of course, the quality of a dataset based on what is not in the dataset. But sometimes there may be a key external identifier, so eg every real-world thing that is on a list of those identifiers should have an item and be in the dataset; and nothing should be in the dataset that does not have the identifier (though it can take time to match them up). Such identifiers (and, potentially filters for them, eg "property X with a value that starts E04") could perhaps be declared at the top of the page, together with the expected number to be found, and the script could give weight to how well they had been found (probably not a linear function of the proportion).

On the other hand, looking at the current Item Quality list, there may be some things that should perhaps be given a bit less weight. People have already brought up the question of sitelinks, and I share their concern. This is essentially something that (in most cases) is already an external given in the dataset, over which I have no control. If some wikis have decided to create pages for a particular subject, or others have not, that's not something that speaks to the quality of the item on Wikidata. Indeed, quality may (sometimes) be increased by splitting an existing item into two separate items, which must share the sitelinks between them. That should be done cautiously, because of the reduction in co-referencing and inter-wiki-ability that such splitting brings; but the increased precision and definition it may bring to the items may well be justified, and the overall score for the dataset should not be punished on that account.

Another thing which might in some circumstances also cause problems could be an over-emphasis on referencing. For example, with coordinate location (P625) I know I am much happier if co-ordinates do not exactly match those from the UK Ordnance Survey or from GeoNames -- because both of those databases are copyright, and explicitly not released as CC0. It's one thing to give links to the corresponding OS or GeoNames items, but including data from them beyond that may be highly dangerous, particularly the OS who have a history of some quite heavy litigations in the past. So in my view it's a lot happier if somebody has gone to a map and estimated the coordinates themselves, rather than taken them from a database. From other databases too, we should be cautious about how much we quote before it becomes a 'substantial taking'; we should be cautious about any signals we give that might be (mis?)interpreted as pressure in that direction.

I haven't taken part in any of the grading exercise -- in the past on en-wiki I've found it's not really my thing; I'd rather spend the time fixing a little bit more from the mass of problems I already know about. But I wish the project well. However, as I've been saying, I think it will make its real impact if it starts to offer meaningful, attractive, updated reports about large user-definable sets of items -- so thinking about how to get to that stage, and quickly, could be a useful focus if the project wants for its work to be taken up and make a difference. Jheald (talk) 14:44, 28 March 2017 (UTC)

You are right. But that is not what is at issue. As an approach it is Wikipedia written large. Items are not relevant as it is like with a telephone network, the value is in the connections between the items and not in the item itself. The issue is that there are two separate things. There is the project itself and it is run by a student. When he is able to run it with the time he has available, it is already an accomplishment when we get a working tool. When the result are available through query, we will be able to get some qualitative results on a subset and that is good. The tool can be refined lateron.
The second thing is the assumptions for this project. There have been talks with a WMF researcher about this and so the Wikipedia approach was validated for Wikidata. There have been no user stories for Wikidata based on these results in fact there have been that negate the model. For me this indicates that at the end of the project not only the tool but also the assumptions need to be validated.
I am ok with this project to go forward because it will give us a new tool and this will also be the basis to put tools and quality on the agenda, with a date and the need do a proper evaluation of this project. An evaluation of the assumptions, the use of the tool and the validity of the results of the approach. We can determine what user stories it covers. Thanks, GerardM (talk) 15:40, 28 March 2017 (UTC)
Thanks for the compliment! I think it is crucial to try to work on a dataset in order to even understand what the benefit of Wikidata is. It is only by learning how to model data in incremental phases of accuracy that you start to get a feeling for what quality means in Wikidata terms. I agree with Gerard that the value lies more in connections than in "nodes", but of course how items are fleshed out is important and I really like the suggestor tool that we have to suggest additional properties once the P31 is filled in. That said, I find it very difficult to explain what, if anything, should be measured in order to give an impression of quality. For items that (currently) don't need more than two statements (such as categories), I don't think it's really important to look at those items individually. It is however, highly interesting to see which languages share category trees and exactly which ones are shared. As far as referencing goes, many items are wholly sourced to one source (a GLAM, or a book, or a Commons file, or a Wikipedia article) and repeating the same reference for each statement just seems a bit redundant to me and I don't see the potential value of this besides jacking up the number of "references per statement". For items with lots and lots of external references, is it going to be necessary to have ALL external references propagated to each statement they agree with in the item? I would hope not, but then maybe our user interface will change so that you don't see all of those references that might just make it hard to get a feeling for what the item is about. The problem with measuring something and calling it a quality measurement, is that it tends to generate the wrong kind of behavior and the wrong kind of discussions. I really think it is too premature to have that discussion yet, because I just don't think we can measure anything yet that will actually give an indication of "quality" at the item level. Jane023 (talk) 20:02, 28 March 2017 (UTC)
Hey folks. Sorry for my silence. I've been on a self-imposed Wikibreak to get some distance from the tensions. I think we're actually in agreement here. I'll pick on a few different topics that have been brought up in this thread separately in case we want to keep discussing them like that.
  • Jheald, I agree that a good dataset consists of a roughly complete set of items. If there are a few items in the dataset that are not at the same quality level as everything else, we should probably focus on building those up so that use of the dataset is not compromised. There's also the problem of missing items -- which I think is out of scope for this specific project but I do not think is out of scope for Wikidata as a whole. It's also a problem when assessing any cross-section of Wikipedia. Check out this fascinating analysis by Emijrp that gives a sense for how incomplete Wikipedia is. We certainly suffer in similar ways in Wikidata and any discussion of the overall utility of wikidata must keep that in mind. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
  • Jheald, it seems we also agree that there's a huge amount of importance between items that are essentially empty ("E" class) and having items that are at least identifiable ("D" class). When I think about using the prediction model that we are working on, I also think about Wikidata items in terms of collections of related objects for which we might be concerned about overall quality. As you say, this is how most WikiProjects think about their work assessing Wikipedia articles too. It's only a few editors who pride themselves on their construction of Featured Articles. I think there's a good argument for why our efforts are often better spent expanding Stubs/E-class than pushing a Good-Article/B-class to FA/A-Class -- with few exceptions. So I also value the "showcase" threshold less than having some discussion of quality at a more basic level. I also agree that the tedious work of grading all of the items and keeping the grades up to date would be a waste of time. This is why I'm working with Glorian to build a machine learning model to automate the process. We'll only need a small set of carefully applied quality grades in order to train a model and apply it to the entire corpus of Wikidata. That sounds save loads of time and energy. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
  • Jane023, I agree with you that there are some items for which many statements are not necessary, there are others for which no sitelinks are relevant, and there are many types of statements for which a high quality external reference is necessary while for others, it's unnecessary or even impossible! Generally, I think we need to capture enough of a gist of what is to be expected in the criteria, but ultimately what it comes down to is Wikidata editor judgement. If you'd say that an item is roughly complete and there's not much else to do with it, that should be "A" and we should adapt the criteria to correspond to your judgement. We're doing a lot of that in this talk page. Many of the concerns you have brought up are actively being addressed in discussion right now! Feel free to chime in on #Number of sitelinks. It seems we're all in agreement. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
  • GerardM said "the value is in the connections between the items and not in the item itself" and I think we're in agreement here. Statements connect items directly and like sets of statements (E.g. instance of (P31):human (Q5) and occupation (P106):singer (Q177220)) allow interesting cross sections to be drawn across items. If you're talking about counting the indegree of an item's links, I think we're talking more about "importance" (see PageRank for a deeper discussion of network theoretic importance measures). I'm working on that too. See m:Research:Understanding Wikidata's Value. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
  • Generally, Jane023 seems concerned over how the quality scale might *direct* people's efforts on Wikidata in good or bad ways. I suspect that if we do a good job of iterating on the scale while we apply our best judgement towards items, then the scale will most closely reflect reality and direct/reward efforts that actually do directly improve quality. It's not like we're tying our hands here. We're just taking the first step. We currently don't have a criteria for "references per statement", but if you think we should have one (or explicitly discuss not having one), start a thread about that specifically so we can use it as a reference in the future. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
Stepping back a bit, it seems I'm personally being accused of having Wikipedia-brain. I don't think that such accusations are at all accurate or they certainly don't further the conversation productively. It would be much better if we could limit ourselves to topics that are relevant to the quality level of items (as that is the topic of this page). In the end, Lydia came to me and Glorian to ask us to work on this. Does Lydia have Wikipedia-brain? I think not. We're just trying to shepherd a process so that we can do something valuable. Many people supported the idea of a Wikipedia-like grading scale in our original conversations at Project chat. Even Alessandro Piscopo, the researcher who brought a much more nuanced approach to assessing Wikidata quality, supports this approach and project as it would enable his work towards building up a more nuanced view of Wikidata quality. In the end, I'm not personally invested in this scale working one way or another. I only hope that we can construct a scale based on consensus process (what we've been doing in this talk page for months) and then apply that scale to a set of items so that we can train an AI to recognize them. --EpochFail (talk) 16:47, 29 March 2017 (UTC)
A Wikipedia approach is what I say. They are imho accurate because you insist on the grading to be done by humans and denying a more complex scale because it is more difficult. This is a fallacy because this grading is only for the initial subset. All the grading will be automated. So what you need is data to train the engine. Seeking data that fits a "story" in the grading suffices particularly when it is about relations. In this current model it is only about existing items and there is no relation to "Sources" in assessing quality. The relevance of Wikidata is high because it connects any and all Wikipedia articles on the same topic and they all have the same "concept cloud". All the relations between articles should be modelled in one Wikidata relation through statements. All the red links and often not only red links need items in Wikidata. That is a Wikidata approach, looking at Wikidata as a stand alone project only begins to determine the quality of Wikidata. Its quality is in the alignment with "Sources".
The reason why YOU are accused is because YOU accused us of being hostile to an/your approach. You did not go into the ARGUMENTS that were put forward, they were not addressed. There are several points to the resistance. First, the model is too simple, models like this tend to lead to assumptions that because of a "low" quality, things can be deleted. The importance of user involvement is not on a same level as with Wikipedia; for a training model you need to add to the dataset so that new issues can be learned. The one thing that we have not heard is how the ratings can be used. If it cannot be combined with query outcomes it will not help us focus on the issues people care about individually.
As you made that accusation the conversation changed from one where Glorian was addressed to one where a Wikimedia researcher is being addressed. I expect more of you. I expect you to appreciate how Wikidata can drive Wikipedia quality and how a complete Wikidata quality model enables this. Thanks, GerardM (talk) 05:00, 30 March 2017 (UTC)

For the record, I do not agree that anyone is being hostile to anyone else and I do believe that this discussion is worthwhile having. I am still however, highly doubtful that measurements would have any value at all to any Wikimedia project (including Wikidata) but that is because I think I have a pretty good feeling for the current state of progress of wikidata and though admirable, it is still very young. That said, I applaud any efforts to attack the "problem" if you can call it that, and just reading the comments here have helped me form my opinion. I am really sorry that the conversation has deteriorated to the point of the word "hostile" appearing in comments and even a "wikibreak"!!!! I mean really, I guess that yes, I am concerned that "you get what you measure" and therefore am against measurements that I know up front could be detrimental to user behavior. On the other hand, all I do on Wikidata is constantly measure stuff to check my own progress, and these measurements have even been an inspiration to others. So measuring is good, and the nature of Wikidata makes it easy to measure stuff, so let's do it! But do we have to call it "quality"? That is such a heavy word. Thanks for spending your time on this, everybody. Jane023 (talk) 07:29, 30 March 2017 (UTC)

An alternative to calling it quality might be to call it the "Glorian rating". ChristianKl (talk) 12:37, 3 April 2017 (UTC)
Oooh I like that idea!!! Jane023 (talk) 08:53, 4 April 2017 (UTC)

Proposed Changes from the Pilot Campaign Analysis Result[edit]

I have finished analyzing the result from the recent pilot campaign. The analysis result can be found here.

Based on the pilot campaign result and feedback from various editors, I would like to propose the below changes:

  • In the quality criteria, add "if applicable" to all sitelinks related criterion. Since biological related items (e.g. protein, glucose, RNA) are unlikely to have sitelinks, this change will make such items get a fairer quality scale. As far as I observe, biological related items such as hsa-miR-424-5p (Q27595296) and SRY (sex determining region Y)-box 9 (Q21990154), are lack of language translations. I guess this is the reason of why their quality scale is dropped into "D", even though they are rich on statements with external references. This criterion change will make biological related items which have some completed translations, fall at least under quality scale "C".
  • In the quality criteria on quality scale D, add "if applicable" to "Minimal aliases".
  • Exclude unwanted pages such like Wikimedia disambiguation page, Wikimedia category, Wikinews article, Wikimedia template, and Wikimedia list article for the sample in full campaign. Perhaps, someone can point out other unwanted pages which are not mentioned here.
  • Add the explanation of "if applicable". Define what we mean with "if applicable" on the "Notes" section.
  • Explain that we do evaluate the references of item identifiers. This should be emphasized on the "Notes" section.

--Glorian WD (talk) 11:18, 3 April 2017 (UTC)

Hi Glorian WD, It's good to see the results. I agree with your comments on the sitelinks related criterion. Several biological related items (given above) were of very high quality with references from external sources (journals etc.) However, I am not of the opinion to completely remove the above 'unwanted pages' from the campaign since they are useful (especially Wikimedia disambiguation page, Wikimedia category, Wikimedia template and Wikimedia list article). I feel we need to define the quality criterion for such items. When do we say item page for a category/template/infobox is of high quality? Jsamwrites (talk) 12:29, 3 April 2017 (UTC)
Jsamwrites, IMO, we might need different quality criteria for 'unwanted pages' as they are different than normal items. I think the evaluation of the 'unwanted pages' might be added as future iteration of this project. --Glorian WD (talk) 14:33, 4 April 2017 (UTC)
Agreed. Other projects have separate quality scales for lists (e.g. en:Wikipedia:Featured list criteria) and it seems they skip disambiguation pages. It seems like we could, in the future, set up specific criteria for disambiguation items, categories, etc. Personally, I think it will eventually make sense that even content-items (anything that's not falling into Glorian's list) will even have more specific criteria set up by subject-matter focused WikiProjects. E.g. WikiProject sum of all paintings might state more specific criteria around use of identifiers and acceptable external references. In the meantime, it seems valuable to target the general criteria toward items that would be considered for the showcase. --EpochFail (talk) 13:42, 5 April 2017 (UTC)
Thanks for the write-up, Glorian. The adjustments make sense to me. I believe we can start the campaign. --Lydia Pintscher (WMDE) (talk) 18:18, 10 April 2017 (UTC)
Thanks for your support, Lydia. We'll get the full campaign started ASAP. --EpochFail (talk) 18:30, 10 April 2017 (UTC)

Being clear about the purpose of quality metrics[edit]

Whenever one creates metrics for an attribute, the metrics might measure the metrics in a way that's useful to make decisions about the attribute or they measure the metrics in a way that's misleading.

This means that to know whether or not a given metric is a good metrics it's necessary to be clear about the purpose of the metrics. In this case that means we need a list of use-cases for the quality metric to optimize the metric to give good results for those use-cases.

I'm opposed to building a quality metric without any concern about whether it's actually measures something useful simply because it's possible to have a metric.

If we discuss a question like whether having many sitelinks is a sign of quality that heavily depends on the use-case and it's quite pointless to discuss the question without being clear about why we want the metric.

It's also not possible to talk about which items are "unwanted" for the campaign without being clear about the goals of the metric. I could imagine use-cases that depend on every item having a rating. ChristianKl (talk) 12:26, 3 April 2017 (UTC)

Hi, perhaps answers your question concerning to possible use cases? --Glorian WD (talk) 13:53, 3 April 2017 (UTC)
Criteria (1) "see if and how the quality of items on Wikidata is improving over time. This will help us better make the case to Wikipedians for example that we are improving Wikidata's data" It seems to me like Wikipedians don't care at all for things like quantity of statements, names of labels or aliases. Wikipedia cares about the amount of references and I don't see how the proposed metric helps Wikipedia's decide to interact with Wikidata differently.
Important decisions in Wikipedia are about whether a given template should import data from Wikidata. Item-level quality with the proposed scale is irrelevant for this and I don't think it provides actionable information for the decision.
(2) Currently, we showcase areas of Wikidata that are supposedly good via . Can you explain how the proposed quality criteria help with that? How do they help us to know whether a given query rests on high quality data or whether the data behind a query is low quality?
(3) "create worklists for wiki projects to help them find the items in their area of expertise that could use most help" for that goal it's not useful to include translations in the quality score. Translations are best done by native speakers, so they have no relevance for worklists. Even if good work is done about an item it would stay at D when the translations are lacking.
It seems like for both (1) and (3) Snipre's approach would provide more actionable data.
(4) "assess quality improvement over time" Is your claim that it would be bad if the average quality rating according to your proposed criteria would get lower over time? I don't know whether I would support that claim. If a lot of new items are created I don't have a problem with the average amount of statements per item going down.
(5) Could you explain in more detail of how you see "work groups by quality level focusing effort"? How does that user story look like?
Is there a reason of why you think mentioning site links helps with either of use-cases you proposed?
It seems like for all of the mentioned use-cases it would be good to label doublicate items as low-quality.
ChristianKl (talk) 15:50, 3 April 2017 (UTC)
(1) It is one piece of the puzzle. It is not supposed to be the only thing we use to show we are working to improve the data on Wikidata.
(2) That Twitter account is not related to quality. It is just a showcase of cool/weird/... queries and other things on Wikidata. The idea behind the original point is that at some point we will be able to find items of high-quality in a given area. I am somone interested in Marvel comics. I'd like to see which items on Wikidata related to Marvel comics are high-quality so I can use them to show off Wikidata to someone else.
(3) I do believe that translations should be part of a worklist. If an item isn't readable by a large number of people it is a big problem for us and should be worked on (by a native speaker). But even that native speaker needs a way to figure out which items to concentrate on.
(4) I have been tracking the average number of statements per item for quite a while now. It is going up and that is how it should be at this point. This might change in the future but I would consider this something we need to work against. If we star tknowing less about more that is an issue.
Hope that clarifies it a bit. --Lydia Pintscher (WMDE) (talk) 18:17, 10 April 2017 (UTC)
The problem is that putting it on the general worklist has nothing to do with letting it get worked on by a native speaker of the language. Given the proposed quality standards there's nothing a monolingual English speaker can do alone to get an item for quality grade D to quality grade C. The only way they can get the item to quality grade C is to add a translation in a language for which they don't have competency.
The only way they can increase the quality grade is by doing edits they should not be doing because they lack the language abilities. Snipre's proposal on the other hand provides a way where the monolingual English speaker can actually increase the quality rating of all items on the worklist if he puts in effort for which he's qualified. ChristianKl (talk) 09:13, 11 April 2017 (UTC)