Wikidata:Requests for comment/Quality is measurable
An editor has requested the community to provide input on "Quality is measurable" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- This is not a RfC or a similar format which can be discussed really. John F. Lewis (talk) 15:19, 14 May 2014 (UTC)[reply]
We all work on Wikidata for our own reasons. There are the reasons for the existence of Wikidata and the objectives of the Wikimedia Foundation as well.
- The aim of the WMF is to share in the sum of all knowledge
- The first purpose of Wikidata was to serve Wikipedia for its inter language links. Other projects followed; Wikiquote is the latest who is getting connected into Wikidata for its inter language links.
- The second identified purpose is to provide as a source for data to be served in information boxes.
Quality as a consequence can be measured by identifying to what extend it serves its purposes.
Contents
Interlanguage links for Wikipedia, Wikivoyage, Wikisource, Wikiquote
[edit]Arguably this is where Wikidata shines. By centralising the information in one place updates are obvious and automatic. The number of interlanguage links [1] is increasing all the time.
If there is any concern, it is that many articles have not been identified as being of a same subject particularly when the languages involved are not one of the big ten or are in a different script.
Serving information
[edit]Technical considerations
[edit]- When Wikidata is to provide the information that is currently in an infobox, all articles of a Wikipedia that include that info box require an item.
- When an infobox is specific to a specific "subclass", every instance of that subclass needs an item. This means that we want to include the information as specified in that infobox never mind if that item has an article in a specific Wikipedia.
Human considerations
[edit]- As everybody works towards his or her own goals, the objectives differ and consequently the way success is measured. The statistics [1] show clearly that we slowly but surely add more statements and labels to more items. As a consequence, Reasonator [2] is becoming more informative on more subjects.
- As success and therefore quality is understood in many ways, it is imperative to understand how quality can be measured and, how the different quality requirements stack up towards providing the service indicated above.
- The objective of working towards quality is that we can agree how we must achieve our technical objectives and strive to include the personal objectives as well.
Measuring quality
[edit]When a community indicates an interest for it to use Wikidata for its information, there are several technical quantifiers involved. They are:
- having an item for each article
- having at least all the information that exists in all the info boxes
Until the moment when we actually serve that information, we need to continuously make sure that the information in Wikidata is compared with the information in that Wikipedia. Differences need to be identified and, a reconciliation process needs to be in place. Information in Wikidata can come from multiple sources so no source is obviously correct.
Sources and quality
[edit]Obviously we want sources for each statement. The statistics show however that the data we have is piss poor [1]. With more than 50% of our items having none or one statement, we do not even identify most of our items for its subclass. Many Wikipedias, not all, have a requirement for sources. As it is, we do not / cannot harvest sources associated with statements from our Wikipedias.
As we value sources, an effort needs to be made to develop tools that harvest sources from Wikipedias. Given our current lack of data, adding only data with sources by hand is not feasible. Arguably, when we show that our data is of the same quality as our sources, the need for sources in Wikidata at this time is less pressing.
Comparing external sources with Wikidata
[edit]In the WMF we have a history of sharing data with external sources. Of particular value are the processes where we identify differences to our partners. There are two kinds of differences: we or our partner lacks data and, the data differs between the two.
- Arguably when we identify differences between the two data sets, we have a need for sources that decide the matter.
- Arguably when we compare data, we make use of a database but we do not copy it. Consequently it can not be considered to be a copyright violation.
- Arguably when a source indicates in its terms of use that we cannot compare its data with the data we have, it is not an open data source and it is debatable if we should link to it.
- The statistics show that Wikidata is becoming more informative.
- We can measure the quality of subsets of data, for instance the "deaths of 2014" is where Wikidata currently has more information than any Wikipedia.
- When we work toward our goals, consideration should be given as to how we are progressing by using queries and statistics.
Thanks, GerardM (talk) 08:00, 10 April 2014 (UTC)[reply]
- [1] Wikidata stats
- [2] Reasonator on Wikidata