Recoin ("Relative Completeness Indicator") is a user script that extends Wikidata items of persons with information about the relative completeness of the information. Relative completeness refers to the extend of information found on an item in comparison with other similar items. At the moment, it works for all instances of human that have at least one occupation.
The indicator aggregates the extend of information into a colored progress bar, showing 5 possible color-coded levels of completeness that range from Very detailed information to Very basic information. Technically, there are two user scripts, the first only adding the indicator, while the second shows details of the computation of the indicator level.
Recoin is intended to both help authors to know where to potentially focus their attention, and to make data consumers aware of the degree of information found in a specific article.
What it is good for
Recoin is intended to assist both authors and consumers of Wikidata.
For users (consumers), it provides a handy summary of the degree of completeness of information in Wikidata, which may help them in deciding whether to rely on Wikidata or not in order to satisfy their information need. This is because judging purely by article length may not always be a good idea, as for instance the chess player Jeff Sarwer (Q3494327) has a long article due to lots of statements about his Elo rating, but is missing even very basic information such as citizenship or family name.
For authors, similarly it provides information about which persons information is more complete than others, thus allowing to focus attention on more incomplete persons. For an individual person, it allows to see the most important properties that are missing, which authors then might focus on completing, or, if no values for these properties exist, might mark this with a novalue assertion.
How it works
The script computes the relative completeness of a person by comparison with all persons that have the same profession. The explanation script allows to see the most frequent attributes that the other persons have but the person does not, sorted by frequency (shown in brackets). The completeness indicator is then computed based on the relative amount of properties appearing on the other items that also appear on the present item.
For instance, Jimmy Wales (Q181) misses, among other things, the predicates languages spoken, written or signed (P1412), member of political party (P102) and position held (P39), which are specified for 13.435%, 9.347% and 8.376% of people of same occupation. In fact, he is missing 29 out of the 50 most frequent properties for people of his occupations, thus, his relative completeness indicator is computed as fair.
For persons with multiple occupations, the script uses the preferred occupation, if existing (Obama, for instance, has occupations politician, memoirist, lawyer and political writer, but politician is set as preferred value), otherwise, it takes all people that have any of the occupations as comparison group.
For instance, Arno Kompatscher (Q15074414) is both a politician and jurist. There are 297,370 politicians and 12,635 jurists in Wikidata. Out of these, 40% have the property position held (P39) set, which is thus the most frequent attribute among these persons that Arno Kompatscher does not have. On the second place comes member of political party (P102) with 35%, and on the third place languages spoken, written or signed (P1412) with 22%.
The computation of relevant attributes is similar to the one in the Wikidata Property Suggestor, however, the focus is to be more specific to the background of the individual person (the suggestions by the Wikidata Property Suggestor appear to be more generic, e.g., its first suggestion for most living people is to add a date and place of death).
So far, the tool only works for persons, and only those that have at least one of the 100 most frequent occupations. Extensions to arbitrary persons and other objects are under development. Also so far, all properties marked as unique identifiers are filtered out, as is the property name in kana (P1814).
There are two components of the script, the core module and the explanations module. Enabling the core module will add the status indicator to all items of humans with a profession. Enabling the explanations module, on the other hand, will show a list of important missing properties at the top of such person articles, which are the basis of the computation of the status indicator.
To install the scripts, copy the following lines into your common.js file:
importScript( 'User:Ls1g/recoin-core.js' ); importScript( 'User:Ls1g/recoin-explanations.js' );
Your common.js file is located at
Publication: Assessing the Completeness of Entities in Knowledge Bases, Albin Ahmeti, Simon Razniewski, Axel Polleres, ESWC 2017 (link)
This script is developed in the context of the TaDaQua project at the Free University of Bozen-Bolzano. We are looking forward to your feedback:
- Simon Razniewski - firstname.lastname@example.org (Conceptual lead)
- Albin Ahmeti - email@example.com (Technical lead)
You might also be interested in the related project COOL-WD, which allows to assert the completeness of individual properties directly inside Wikidata.