Wikidata:2020 report on Property constraints

From Wikidata
Jump to navigation Jump to search
by David Abián | January 2020 | CC BY-SA

Unless a software bug is introduced, the Wikibase data model, that of Wikidata, is inviolable: it is not possible to add a paragraph with a quantity-type Property, a statement cannot belong to two different Items, an Item cannot have several descriptions in the same dialect, etc. These restrictions are fixed, they apply equally to all data and give Wikidata basic representational consistency while not preventing it from representing infinite possible entities in infinite different ways. Since the rules mentioned do not vary according to the stored information, but the presence of certain information does make other information necessary or impossible, another system of rules is used: Property constraints. These rules are configured, or not, for each Property depending on its particular meaning, they are improved over time, their configurations are part of the Property data and their compliance is checked but not currently enforced by the software in any case.

How to read this report[edit]

This report does not present decisions, only data and evidence-based conclusions and suggestions about Wikidata Property constraints to assist those who contribute to Wikidata in their regular decision-making processes.

This report cannot and is not intended to analyze all the features and efforts related to Property constraints. Some relevant aspects outside the scope of this report include, but are not limited to, constraint violations, user interfaces, the Lexeme namespace, the code of WikibaseQualityConstraints and the current workflow of the Wikidata development team.

As of December 2019, the percentages regarding the reported behavior and opinions of active Wikidata editors can be extended to all active Wikidata editors with a confidence level of 95% and a margin of error of ±10%. These figures are based on a survey sent to the active Wikidata editors who had the email reception enabled in Wikidata.

The figures regarding the definition and configuration of Property constraints on Wikidata have no statistical margin of error because they have not been sampled, all Wikidata Properties and their constraints have been analyzed.

Glossary[edit]

For the purposes of this report:

  • An active Wikidata editor is one who has made 5 edits or more to Wikidata with a non-automated user account in the 30 days prior to the data capture (December 2019).
  • A Property constraint is a Wikibase statement whose:
    • subject is a Wikibase Property (the constrained Property);
    • property is property constraint (P2302) on Wikidata, or the analogous Property defined in WikibaseQualityConstraints on other Wikibase installations;
    • value is a Wikibase Item representing a constraint type.
  • An exception to a constraint is an exception to constraint (P2303) qualifier on that constraint.

For other terms and concepts, see Wikidata:Glossary.

Goals[edit]

Throughout this report, goals with progress bars are presented. These goals represent ideal states of Property constraints; while not all are necessarily achievable, the progress bars visualize how far Wikidata currently is from those ideal states.

Goal: There are no Wikidata Properties without constraints.
98% completed

These goals do not reflect the only aspects that should be taken into account when estimating the current quality or maturity of Property constraints.

Knowledge, perception and ease of use[edit]

Best practices in business suggest that ensuring the quality of large volumes of data requires investments in strategic plans, carefully designed processes, software solutions and highly qualified staff. With this framework, achieving the optimal configuration of data quality systems for a project of the size and growth rate of Wikidata solely by volunteers, many of them anonymous, from all cultures and professional backgrounds, using only free software and open standards, seems an unprecedented challenge.

Wikipedia, the most comprehensive reference work in history, has well proven that these conditions do not limit, but rather encourage, the writing of a sufficiently useful free encyclopedia. However, when the goal is not to write an encyclopedia but to build an interoperable knowledge base, some of the What-You-See-Is-What-You-Get features disappear, mastery of natural language is no longer a key skill, and it is more difficult to contribute with correct changes. With these drawbacks, the software needs to code more rules and processes to make quality management tasks easier and reduce the cognitive load of volunteers, most of whom either have no specific training in data management or are unfamiliar with the particular conventions and use cases of Wikidata.

clarity of constraint violation messages according to active editors
😠 😠 😐 😐 😐 😀 😀 😀 😀 😀

Although the direct analysis of user interfaces is outside the scope of this study, the ease of use has been analyzed in an indirect way using the perceptions of the editors surveyed. When asked whether the warnings about constraint violations are clear enough, most active editors (51%) said that, "in general, they are clear enough" (😀); some of them (28%) said that "sometimes they are clear enough, but sometimes are not" (😐); and a smaller proportion (18%) said that "they are not as clear as they should be" (😠).

The experience and commitment of the editors have some influence on the clarity that they find in constraint violation messages. While 66% of administrators said that, in general, these warnings are clear enough (😀), 58% of Property creators and only 40% of the rest of active editors provided this response.

perceived accuracy of constraint violations among active editors
😠 😐 😐 😐 😀 😀 😀 😀 😀 😀

When asked how often the data needs to be corrected when a constraint violation appears on Wikidata, most active editors (63%) said "often" (😀), some of them (27%) said "about half the time" (😐) and a small proportion (9%) said "occasionally" (😠).

general ease of use of Property constraints according to active editors
😠 😠 😠 😐 😐 😐 😐 😀 😀 😀

When asked how easy or hard to use they thought Wikidata's system of Property constraints is in general, 36% said it is "neither hard nor easy" to use (😐), 31% said it is "relatively hard" to use (😠), and 27% said it is "relatively easy" to use (😀). Although the goal of making the system easy to use for all active editors would be unrealistic, even unnecessary if some editors are not interested in data quality issues, some improvements may still be necessary.

User interfaces are the communication tool with the most potential, more effective at announcing new features to editors or instilling good habits in them than mailing lists, messaging groups, documentation or help pages, social profiles, talk pages or the project chat, which are sometimes not read. It is suggested to make user interfaces as self-explanatory as possible and, when additional information on Property constraints is actually required, to design one or more automatically generated documentation pages similar to Special:ListDatatypes to reduce the efforts needed to keep this information consistent, complete and updated.

Understanding of constraint types[edit]

Active editors were given a list of names of constraint types and were asked whether they knew and remembered what each of the listed constraint types was for, with three possible answers: "not really"; "yes, more or less"; or "yes, totally". Their answers reflect their knowledge of the constraint types and how descriptive the name of each constraint type is.

Data: Proportion of active editors who think they know at least "more or less" the purpose of each constraint type given its name
single value 
91%
allowed units 
88%
citation needed 
88%
allowed qualifiers 
86%
conflicts-with 
86%
item requires statement 
86%
mandatory qualifier 
86%
format 
82%
distinct values 
77%
type 
77%
value type 
77%
inverse 
76%
multi-value 
72%
none of 
67%
value requires statement 
67%
range 
65%
allowed entity types 
64%
Commons link 
64%
one-of 
64%
integer 
62%
symmetric 
62%
single best value 
53%
difference within range 
51%
contemporary 
48%
Property scope 
36%
no bounds 
26%
Reading assistance
82% of active editors think they know either "more or less" or "totally" what the format constraint type is for when they read its name.

Constrained Properties[edit]

Goal: There are no Wikidata Properties without constraints.
98% completed

The number of Wikidata Properties without constraints is low enough. 98% of Wikidata Properties have at least one constraint, and 61% have at least four. However, some constraints should be better defined to be effective, and some widely applicable constraint types are underused (e.g., the allowed entity types and Property scope constraint types) or poorly known.

Most unconstrained Properties (2%) represent cases in which it was forgotten to define constraints or ignored that it was necessary to do so. This could be related to the fact that the web user interface does not require or suggest the addition of statements, and constraints in particular, when creating a Property, nor shows the constraints section when there are no statements with property constraint (P2302).

Since many constraint types can be applied to some Property types, and very few to others, the differences in the number of constraints between Property types are significant. This is not necessarily a negative fact, since both the amount of information that each Property type is able to represent and the impact of each constraint type are variable.

Data: Number of uses of Wikidata Properties according to their types
Property type npr min q1 med q3 max sum mean stdev skew kurto gini
CommonsMedia 59 1 106 962 9,615 3,008,919 3,671,630 62,231.0 391,393.7 7.4 53.3 0.95
WikibaseItem 1,304 1 76 793 8,650 61,831,810 277,293,348 212,648.3 2,271,918.6 20.5 487.6 0.97
ExternalId 4,419 1 48 380 2,433 22,680,832 109,238,577 24,720.2 470,817.1 41.9 1,881.1 0.96
Url 61 2 33 662 6,774 22,601,823 28,335,282 464,512.8 2,904,325.2 7.4 54.2 0.97
String 276 2 103 780 8,922 26,891,580 146,106,679 529,372.0 3,431,474.8 7.1 49.6 0.97
Quantity 533 1 13 122 1,883 1,675,113 8,360,690 15,686.1 88,706.0 13.6 233.7 0.94
Time 49 4 425 15,902 90,626 32,930,175 78,828,717 1,608,749.3 6,496,680.3 4.6 19.0 0.93
GlobeCoordinate 9 2 74 8,715 8,752 7,550,701 7,586,025 842,891.7 2,515,432.3 2.5 4.1 0.89
Monolingualtext 47 3 312 2,228 64,886 29,454,147 32,975,920 701,615.3 4,289,767.4 6.6 41.8 0.96
WikibaseProperty 13 4 44 167 842 14,124 25,318 1,947.5 4,035.5 2.4 4.6 0.79
Math 15 4 18 19 31 4,032 4,360 290.7 1,035.1 3.5 10.1 0.88
GeoShape 2 3 853 1,704 2,554 3,404 3,407 1,703.5 2,404.9 0.0 −2.0 0.50
TabularData 4 2 2 4 8 18 27 6.8 7.6 1.0 −0.8 0.47
WikibaseLexeme 12 4 10 68 834 2,788 7,433 619.4 966.3 1.5 0.6 0.71
WikibaseForm 4 5 7 68 580 1,939 2,079 519.8 947.9 1.1 −0.7 0.71
WikibaseSense 15 4 7 19 116 1,167 3,187 212.5 395.1 1.7 1.1 0.77
MusicalNotation 4 2 8 14 82 273 304 76.0 131.5 1.1 −0.7 0.68
Reading assistance
Each Wikidata Property of type Time has, on average (mean), 1,608,749 uses.
277,293,348 uses (sum) correspond to Wikidata Properties of type WikibaseItem.
The set of Properties of type WikibaseItem is the one that accumulates the highest number of uses (sum).
4,419 Properties (npr) have the type ExternalId.
25% of the Properties of type ExternalId have 48 uses or less, while 75% have 48 uses or more (q1).
Half of the Properties of type Quantity have 122 uses or more, while the other half have 122 uses or less (med).
The Gini coefficient of 0.97 (gini), close to 1.00, indicates that a few Properties of type WikibaseItem concentrate most uses.

Most constraint types do not apply to most Property types. However, when editing the statements, all the constraint types are suggested by the web user interface for all the Property types. As there are a considerable number of different constraint types, this opens the door to confusion between the names of different constraint types and makes some of the applicable constraint types less visible. If a user defines a constraint that cannot be applied to the constrained Property because of an incompatible Property type, the software system allows the constraint to be saved and the problem is not reported.

Data: Number of constraints by constraint type and Property type
Property type aet aq au c cl cn cw dv dwr f int inv irs mq mv nb no oo ps r s sbv sv t vrs vt
CommonsMedia 2 13 56 21 12 170 18 4 1 33 1 6 25
ExternalId 8 93 5 578 4097 4107 5070 22 393 17 3963 2488
GeoShape 2 1 1 2
GlobeCoordinate 1 7 1 13 1 3 6 2
Math 1 1 2 3 2 8 1 7
Monolingualtext 4 10 2 14 3 25 17 18 4 18
MusicalNotation 1 1 2
Quantity 7 78 425 32 35 1 1 132 122 130 1 73 308 245 14 55 271
String 29 5 4 51 76 212 104 30 133 1 90 139
TabularData 2 1 4 2
Time 1 10 3 20 7 16 1 16 20 2 19 29
Url 6 9 1 3 29 42 51 5 27 3 13 17
WikibaseForm 1 1 2
WikibaseItem 33 150 106 31 275 55 112 473 81 14 33 149 602 36 5 149 801 271 804
WikibaseLexeme 5 2 1 2 2 2 1 1 1
WikibaseProperty 2 1 2 1 1 4 2 8 1 4 7 9
WikibaseSense 10 3 8 1 1 11 4 1
Reading assistance
There are 245 range constraints on Properties of type Quantity.

Severity levels[edit]

Wikidata as a community, and the WikibaseQualityConstraints extension as an implementation, classify all constraints and their possible violations into three mutually exclusive categories. These categories are sometimes called statuses, the English label of constraint status (P2316) and original name in the system of template-based Property constraints; and sometimes called levels, a less ambiguous name that is preferred over statuses by 73% of active editors. It is recommended to apply the necessary changes to use a single name for this categorization.

Severity levels are a highly unknown feature among editors. No reliable percentages can be provided in this regard, but several active editors who claim to have edited Property constraints say they do not know what "severity levels" or "statuses" of Property constraints are, while others show that they are unaware of the existence of such a categorization.

The current severity levels of Property constraints are, ordered from highest to lowest severity, the mandatory constraint level, the normal constraint level and the suggestion level.

Data: Number of constraints by constraint type and severity level
aet aq au c cl cn cw dv dwr f int inv irs lrl lrlc lvrlc mq mv nb no oo ps r s sbv sv t vrs vt
mandatory 38 60 96 17 52 6 458 1,149 2 2,016 34 5 1,709 0 0 0 28 0 6 8 60 1,221 99 5 3 600 619 29 103
normal 41 337 329 89 13 62 550 3,121 6 2,506 98 104 3,910 13 5 1 240 21 67 22 88 349 160 35 36 3,698 3,179 230 710
suggestion 0 3 0 0 0 11 9 7 0 40 0 4 273 0 0 0 7 1 0 3 1 0 6 1 4 9 9 19 1
aet aq au c cl cn cw dv dwr f int inv irs lrl lrlc lvrlc mq mv nb no oo ps r s sbv sv t vrs vt
mandatory 48% 15% 23% 16% 80% 8% 45% 27% 25% 44% 26% 4% 29% 0% 0% 0% 10% 0% 8% 24% 40% 78% 37% 12% 7% 14% 16% 10% 13%
% normal 52% 84% 77% 84% 20% 78% 54% 73% 75% 55% 74% 92% 66% 100% 100% 100% 87% 95% 92% 67% 59% 22% 60% 85% 84% 86% 84% 83% 87%
suggestion 0% 1% 0% 0% 0% 14% 1% 0% 0% 1% 0% 4% 5% 0% 0% 0% 3% 5% 0% 9% 1% 0% 2% 2% 9% 0% 0% 7% 0%
Reading assistance
There are 1,221 mandatory constraints of the Property scope constraint type.
80% of Commons link constraints are mandatory.
84% of contemporary constraints are neither mandatory constraints nor suggestions.

The names of the three severity levels have consistency problems, even the levels are not clearly defined. The lexical category that distinguishes the mandatory constraint level is an adjective ("mandatory"), the one that distinguishes the suggestion level is a noun ("suggestion"), and the normal level has no defined name. Sometimes the word "suggestion" is understood as contradicting the name of the system in which it is framed, "Property constraints", while some users consider the word "mandatory" redundant. Some users consider that talking about "mandatory constraints" is misleading because these constraints are also violated. When active editors were asked whether they would change the current names of the severity levels to any of a list of suggestions, 67% said "maybe" they would change the names, 24% said they would definitely change the names, and 9% said they would not change the names.

mandatory[edit]

The mandatory constraint level was introduced by the original system of template-based Property constraints and continues today, although there is no agreement on when to apply it or what it currently means. A constraint belongs to this level when it has the constraint status (P2316) qualifier with value mandatory constraint (Q21502408). This level represents 29.2% of constraints on Wikidata.

In 2014, the user in charge of developing and running the bot that updated the violation reports created a specific report of mandatory constraint violations, still in use, to "prevent data structure degradation," and requested to mark as mandatory those constraints without violations, without exceptions, without known conflicts, and whose constrained Properties had "existed for some time." According to the described criteria, this severity level was not determined by the meaning, support or relevance of the constraint, but it was understood as a rank to which, ideally, all constraints should aspire when they reached full compliance. From this perspective, the mandatory constraint level would prevent data degradation by preserving the past achievements in terms of data quality.

Today these conditions for specifying the mandatory constraint level are not met, they have not been openly discussed and, due to the low visibility of the page where they were written, it is often not known that they were written. In particular, there are 0.3% mandatory constraints with exceptions, and one exception to a mandatory constraint every 125 mandatory constraints. When active editors were asked if they agreed that "mandatory constraints should never have exceptions," 40% of them said they did not agree, 36% said they were not sure, and only 24% said they agreed. Administrators and Property creators seem less opposed than average to the idea that mandatory constraints should never have exceptions, with an agreement-disagreement ratio of about 50-50.

normal[edit]

This is an implicit level without an agreed name and represents an undefined, default, normal, common or less-than-mandatory constraint level. A constraint belongs to this level when it has no constraint status (P2316) qualifier. This level represents most (69.4%) constraints on Wikidata.

When active editors were asked to keep the current names of the other severity levels and give a name to this one from a list of proposals:

  • 36% of them chose "warning level" (with "warning" being a noun, as in the suggestion level),
  • 21% of them chose "standard constraint level" (with "standard" being an adjective, as in the mandatory constraint level),
  • 12% of them chose "non-mandatory constraint level", and
  • 10% of them chose "normal constraint level".

Other active editors chose other options ("undefined constraint level", "default constraint level"), proposed other names ("hint level", "recommendation level"), did not feel able to make a choice or expressed the opinion that this level should not be considered as such or should not exist.

suggestion[edit]

The suggestion level, introduced in 2019, is intended to make editors aware of the possibility of making a change. In other words, the suggestion level specifies possibility, not necessity or convenience. A constraint belongs to this level when it has the constraint status (P2316) qualifier with value suggestion constraint (Q62026391). This level only represents 1.4% of constraints on Wikidata.

As announced by the Project Manager Community Communication for Wikidata, this severity level was created "[i]n order to allow more flexibility and subtlety in constraints definition" so that "editors can distinguish the really crucial constraint violations from the ones that only suggest additional edits that would be nice to make."

Exceptions[edit]

Goal: No constraint on Wikidata has exceptions.
95% completed

4.6% (1183) of constraints have one or more exceptions defined on Wikidata. In particular:

  • 0.3% of mandatory constraints have exceptions,
  • 6.5% of normal constraints have exceptions, and
  • 4.9% of suggestion constraints have exceptions.

There is an average of 0.42 exceptions per constraint. Specifically, there is an average of:

  • 0.008 exceptions of mandatory constraints per mandatory constraint,
  • 0.6 exceptions of normal constraints per normal constraint, and
  • 0.2 exceptions of suggestion constraints per suggestion constraint.

While the number of constraints with exceptions can be considered reasonably low, the high number of exceptions of these poorly defined constraints, which globally cause the existence of more than 2 exceptions every 5 constraints, reveal some bad habits among the users who edit them. In some cases, users indiscriminately add constraints to meet a need of their exclusive interest. In other cases, editors, prioritizing the short term, or without daring to substantially modify a system they feel they do not know well, add false positives as exceptions to appease violation warnings instead of identifying the constraints as wrong and rethinking them.

Data: Proportion of constraints without exceptions for each constraint type
allowed entity types 
100%
Commons link 
100%
allowed units 
99%
mandatory qualifier 
99%
one-of 
99%
Property scope 
99%
allowed qualifiers 
98%
integer 
98%
citation needed 
97%
conflicts-with 
97%
distinct values 
97%
format 
97%
no bounds 
97%
inverse 
96%
type 
95%
value requires statement 
95%
value type 
95%
item requires statement 
94%
contemporary 
92%
single value 
92%
range 
91%
Reading assistance
98% of integer constraints on Wikidata have no exceptions.
Data: Number of exceptions of constraints according to their types
constraint type ncon q3 max sum mean stdev skew kurto gini
allowed qualifiers 400 0 2 10 0.0 0.2 8.2 72.9 0.98
allowed units 425 0 3 8 0.0 0.2 12.5 182.3 0.99
citation needed 79 0 3 4 0.1 0.4 7.7 59.9 0.98
conflicts-with 1,017 0 85 153 0.2 2.7 30.2 938.9 0.99
contemporary 106 0 19 35 0.3 1.9 8.7 80.5 0.96
difference within range 8 0 4 4 0.5 1.4 2.3 3.1 0.88
distinct values 4,277 0 2,095 3,497 0.8 32.5 62.8 4,042.1 1.00
format 4,562 0 104 556 0.1 1.8 42.7 2,275.4 0.99
integer 132 0 1 2 0.0 0.1 7.9 61.0 0.98
inverse 113 0 5 12 0.1 0.6 6.6 46.6 0.97
item requires statement 5,892 0 683 3,747 0.6 11.0 45.7 2,582.7 0.99
mandatory qualifier 275 0 2 4 0.0 0.1 11.2 132.7 0.99
multi-value 22 0 1 1 0.0 0.2 4.4 17.0 0.95
no bounds 73 0 1 2 0.0 0.2 5.8 31.5 0.97
one-of 149 0 1 1 0.0 0.1 12.1 144.0 0.99
Property scope 1,570 0 1 8 0.0 0.1 13.9 191.3 0.99
range 265 0 39 99 0.4 2.7 11.7 154.8 0.97
single best value 43 0 11 17 0.4 1.7 5.5 30.2 0.95
single value 4,307 0 124 2,244 0.5 4.8 18.3 397.0 0.98
symmetric 41 0 2 4 0.1 0.4 4.0 15.8 0.94
type 3,807 0 21 363 0.1 0.7 15.5 326.3 0.97
value requires statement 278 0 22 41 0.1 1.4 15.2 241.8 0.98
value type 814 0 6 62 0.1 0.4 7.6 85.4 0.96
Reading assistance
The distinct values constraint with the highest number of exceptions has 2,095 exceptions (max).
The constraints of the single value type have 2,244 exceptions in total (sum).
On average (mean), each distinct values constraint has 0.8 exceptions.

There are 15 constraints with more than 100 exceptions each. Two of these constraints have so many exceptions (2,095 and 683) that WikibaseQualityConstraints cannot afford to read them, which results in an error message ("The parameters of this constraint could not be imported because they were too long"). The Property RKDartists ID (P650) has three poorly defined constraints of the same type which also add up to 683 exceptions.

# Property constraint type exceptions
1 taxon name (P225) distinct values 2,095
2 IMA status and/or rank (P579) item requires statement 683
3 RKDartists ID (P650) item requires statement 228
4 RKDartists ID (P650) item requires statement 228
5 RKDartists ID (P650) item requires statement 227
6 Swiss municipality code (P771) item requires statement 190
7 Mérimée ID (P380) distinct values 189
8 category's main topic (P301) distinct values 187
9 GND ID (P227) distinct values 127
10 traffic sign (P14) distinct values 126
11 RKDimages ID (P350) single value 124
12 IMA Number, broad sense (P484) single value 123
13 IMDb ID (P345) single value 117
14 MusicBrainz artist ID (P434) single value 109
15 Philippine Standard Geographic Code (P988) format 104

It is suggested to propose a bot task that consists of automatically removing all constraints with an unreasonable number of exceptions (e.g., more than 70, 80, 100), and define and implement an upper hard limit (e.g., more than 100, 150, 200) above which software systems, including WikibaseQualityConstraints and the described bot, may fail. The sole application of a lower hard limit is discouraged, as it would not allow the automatic detection and removal of wrongly defined constraints, causing their accumulation with the maximum number of exceptions.

Although there are many technical tasks to be solved in Wikibase and it is not one of the objectives of this report to analyze them, it is recalled that the problem described by phab:T168379 is a relevant source of exceptions and false positives of the range, difference within range and contemporary constraint types, and some editors have requested that this fact be highlighted in this report.

In order to draw complete and clarifying conclusions on the quality and maturity of Property constraints, the number of exceptions should be studied together with the number, distribution, authorship and context of constraint violations.

Fictional entities[edit]

Only 0.13% of Wikidata Items are fictional classes or instances, but they represent 2% of exceptions, which is more than 15 times the proportion of fictional Items on Wikidata.

Listing as exceptions all the individual fictional Items to which certain constraints do not apply is not feasible. But there is another problem that does not only concern Property constraints: not all data on fictional Items are fictional, and there is no method for unambiguously distinguishing fictional data from non-fictional data on Wikidata. Most Properties can specify either fictional or non-fictional facts from any universe and, with the current data, it is not possible to algorithmically solve this ambiguity.

Until a correct representation of fictional facts is agreed and extended, some lists of exceptions and some lists of constraint violations will remain longer than desirable, some wrong data from fictional entities will not be detected as such, and some normal constraints will be prevented from being promoted to the mandatory severity level even when their performance with non-fictional facts is perfect.

Constraint types[edit]

Data: Number of checks of Wikidata constraints according to their types
constraint type ncon min q1 med q3 max sum mean stdev skew kurto gini
allowed entity types 79 1 14 142 2,562 2,460,006 5,555,826 70,326.9 347,282.4 5.9 34.3 0.95
allowed qualifiers 400 2 229 5,531 86,462 61,831,810 354,132,851 885,332.1 4,712,908.3 8.4 83.8 0.95
allowed units 425 1 17 177 2,315 1,675,113 7,929,409 18,657.4 98,418.3 12.4 191.6 0.93
citation needed 79 5 131 2,542 57,750 3,805,020 12,747,019 161,354.7 567,722.5 4.7 23.5 0.91
Commons link 65 2 126 1,190 12,061 3,008,919 6,651,212 102,326.3 505,631.2 5.4 27.5 0.94
conflicts-with 1,017 1 865 6,677 66,233 32,233,113 664,977,413 653,861.8 2,864,473.2 7.6 66.4 0.93
contemporary 106 5 9,206 41,220 227,522 61,831,810 101,097,635 953,751.3 6,062,071.0 9.7 94.8 0.92
difference within range 8 562 1,320 1,799 36,476 1,824,418 1,961,446 245,180.8 639,504.2 2.2 3.1 0.85
distinct values 4,277 1 53 426 2,632 22,680,832 114,354,129 26,737.0 480,720.6 40.7 1,787.7 0.96
format 4,562 1 80 597 3,732 29,454,147 728,425,356 159,672.4 1,885,352.7 14.1 202.5 0.99
integer 132 2 104 847 7,080 488,464 2,485,412 18,828.9 59,833.7 5.4 33.7 0.87
inverse 113 4 226 1,660 13,591 948,368 6,073,958 53,751.8 176,893.9 4.0 15.1 0.89
item requires statement 5,892 1 159 881 3,912 22,680,832 240,752,445 40,860.9 644,521.1 30.8 1,045.9 0.96
Lexeme requires language 13 4 9 381 1,287 1,747 8,689 668.4 694.4 0.3 −1.5 0.55
Lexeme requires lexical category 5 9 24 94 102 814 1,043 208.6 340.9 1.4 0.2 0.65
mandatory qualifier 275 1 22 208 2,432 2,683,444 10,549,680 38,362.5 244,271.8 9.6 98.5 0.96
multi-value 22 8 32 70 2,972 726,585 798,893 36,313.3 154,332.5 4.3 17.0 0.93
no bounds 73 3 138 486 4,498 161,764 659,132 9,029.2 28,137.6 4.5 20.3 0.86
none of 33 29 32,068 182,753 764,525 61,831,810 140,261,360 4,250,344.2 12,519,587.3 3.7 13.1 0.88
one-of 149 1 100 1,382 5,729 61,831,810 77,475,731 519,971.3 5,088,205.2 11.9 140.4 0.98
Property scope 1,570 1 37 348 3,320 61,831,810 234,759,450 149,528.3 2,139,493.3 22.0 543.3 0.98
range 265 2 49 670 7,919 32,930,175 78,869,468 297,620.6 2,835,207.6 11.2 124.9 0.98
single best value 43 7 2,002 35,058 126,436 3,805,020 11,765,340 273,612.6 717,067.4 3.7 13.5 0.84
single value 4,307 1 52 428 2,758 29,454,147 193,866,012 45,011.8 788,918.5 30.6 1,003.7 0.98
symmetric 41 6 103 1,053 9,066 339,374 1,227,489 29,938.8 71,472.1 3.0 8.6 0.84
type 3,807 1 55 494 4,074 32,233,113 290,281,292 76,249.4 1,106,181.5 23.7 592.6 0.98
value requires statement 278 3 182 1,844 13,214 61,831,810 104,341,243 375,328.2 3,785,267.4 15.5 248.9 0.97
value type 814 1 113 1,448 12,502 37,769,801 173,109,817 212,665.6 1,848,885.5 15.4 270.7 0.97
Reading assistance
There are 5,892 constraints (ncon) defined of the constraint type item requires statement.
The constraint type format has 728,425,356 checks (sum).
The set of constraints of type format is the one with the highest number of checks (sum).
Each constraint of type none of has, on average (mean), 4,250,344 checks.

The information and specific goals of each constraint type are presented below. They should be complemented with the rest of the report.

The absence of exceptions is indicated as a goal only for those constraint types with at least 50 constraints.

First constraint types[edit]

This section presents information on the current constraint types that were conceived between 2013 and 2015. These were initially represented only as templates and implemented with bots by the community in the first place.

single value[edit]

Goal: No single value constraint on Wikidata has exceptions.
92% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "single value" is for.
91% completed

The single value constraint type (see also: Item single-value constraint (Q19474404)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that entities should have no more than one claim with the constrained Property.

This is the third constraint type with the most constraints defined (4,307, that is, 14.9% of constraints) on Wikidata, only after the item requires statement and format constraint types.

Single value constraints are applied:

format[edit]

Goal: All Wikidata Properties of the ExternalId type have a format constraint.
90% completed
Goal: No format constraint on Wikidata has exceptions.
97% completed
Goal: No format constraint allows an infinite number of values.
37% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "format" is for.
82% completed

The format constraint type (see also: Item format constraint (Q21502404)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that values with the constrained Property should meet a certain pattern defined by a regular expression.

This is the second constraint type with the most constraints defined (4,562, that is, 15.8% of constraints on Wikidata), the one with the most checks (728,425,356, that is, 20.4% of constraint checks on Wikidata) and one of the most important constraint types for external identifiers.

Format constraints are applied:

The most common regular expressions for format constraints on Wikidata are the following. Regular expressions 1 and 3 represent the same pattern, which is used in more than 700 different Properties.

# regex uses character set accepted values
1 [1-9]\d* 645 0123456789 infinite
2 \d+ 523 0123456789 infinite
3 [1-9][0-9]* 83 0123456789 infinite
4 [^\s\/]+ 71 immense infinite
5 [1-9]\d{0,5} 46 0123456789 999,999
6 [1-9]\d{0,4} 42 0123456789 99,999
7 [1-9]\d{0,6} 33 0123456789 9,999,999
8 \d{9} 33 0123456789 1,000,000,000
9 [1-9]\d+ 32 0123456789 infinite
10 [^ ]+ 30 immense infinite

63% of format constraints on Wikidata accept an infinite number of distinct values and most patterns for format constraints are too generic. These facts do not mean that current format constraints are not useful, but that they have significant room for improvement. It is advisable to avoid infinite repetitions (*, + or {…,}) whenever possible and to better check and fit the actual range of acceptable values for each Property.

Besides, dots (.), non-digit metacharacters (\D), non-whitespace metacharacters (\S), non-word metacharacters (\W) and negated character classes ([^…]) are discouraged because they allow the values to contain thousands of different invalid characters without this problem being obvious to the users who write or read the patterns. This is the case for the fourth ([^\s\/]+) and tenth ([^ ]+) most used regular expressions for format constraints on Wikidata. Even when the set of acceptable characters is large and uncertain, it is advisable to try to specify all the potentially acceptable characters rather than listing those that are not.

The knowledge and intuition that active editors have are sufficient to interpret the simplest regular expressions composed of literals and digit ranges, but insufficient for all other regular expressions. The following table presents the hit rates that resulted when active editors were asked to indicate which strings matched each of the four regular expressions listed, among the most widely used in format constraints. These results may have an optimistic bias because they only consider active editors who were willing to respond. Several values close to 50% should be interpreted as a zero level of understanding, since this is the approximate percentage that would be obtained with random responses.

[1-9] [1-9]\d* [^\s\/]+ [1-9]\d{0,5}
0 74% 85% 57% 87%
1 96% 54% 48% 50%
21.5 96% 80% 52% 83%
24 80% 70% 43% 61%
048 85% 87% 54% 89%
48/9 96% 93% 65% 100%
E480 98% 98% 63% 100%
48000 83% 63% 43% 52%
9999999 83% 70% 48% 91%
#Á@½þ€ŋ 100% 100% 59% 98%

Regular expressions are currently stored with Wikibase Properties of the String type. It is suggested to create a RegularExpression (or Pattern, or similar) Property type to better edit, monitor, manage, check and process patterns represented by regular expressions. Some technical issues not addressed by the current study that are related to the lack of ability to manage regular expressions are described on phab:T176312, phab:T214378, phab:T236150 and phab:T240884.

distinct values[edit]

Goal: All Wikidata Properties of the ExternalId type have a distinct values constraint.
93% completed
Goal: No distinct values constraint on Wikidata has exceptions.
97% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "distinct values" is for.
77% completed

The distinct values constraint type (see also: Item distinct-values constraint (Q21502410)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class), originally unique value as a template, specifies that each value should only be defined once on Wikidata with the constrained Property. Analogous to UNIQUE SQL constraints, it is one of the oldest and most basic constraint types of database management systems.

This is the fourth constraint type with the most constraints (4,277, that is, 14.8% of constraints) defined, the second with the most exceptions (3,497) defined and the one with the highest ratio of constraint exceptions per constraint (0.8 exceptions per constraint) on Wikidata.

Applicable to almost any Property type, it is particularly important for external identifiers: 96% of distinct values constraints belong to Properties of the ExternalId type, and 93% of Properties of the ExternalId type have distinct values constraints.

Because of their importance for external identifiers, distinct values constraints are applied:

The distinct values constraint type and the format constraint type are complementary and closely related. Poorly defined format constraints allow several identifiers that refer to the same resource to be stored differently (for example, "123" and "0123") and prevent the corresponding distinct values constraints from detecting duplicates. In the opposite situation, for the Property of an external catalog completely matched on Wikidata, when a distinct values constraint is combined with a well-defined format constraint, both act as a virtual full constraint, which turns any addition or change into at least one violation.

conflicts-with[edit]

Goal: No conflicts-with constraint on Wikidata has exceptions.
97% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "conflicts with" is for.
86% completed

The conflicts-with constraint type (see also: Item conflicts-with constraint (Q21502838)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that entities with the constrained Property should not have certain other Properties or statements.

This is the second constraint type with the most checks (664,977,413, that is, 21.1% of checks) on Wikidata, only after the format constraint type. It can be applied to all Property types, although it is especially used for Properties of the ExternalId and WikibaseItem types. It has a high proportion of mandatory constraints (45%).

item requires statement[edit]

Goal: No item requires statement constraint on Wikidata has exceptions.
94% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "item requires statement" is for.
86% completed

The item requires statement constraint type (see also: Item item-requires-statement constraint (Q21503247)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class), originally item, then item requires claim, specifies that entities with the constrained Property should also have certain other statements.

This is the constraint type with the most constraints (5,892, that is, 20.4% of constraints on Wikidata) and exceptions (3,747) defined.

It would be possible to change the name of this constraint type to subject requires statement or similar both to accommodate new Wikibase entity types other than Items and to avoid confusion with the value requires statement constraint type.

type[edit]

Goal: No type constraint on Wikidata has exceptions.
95% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "type" is for.
77% completed

The type constraint type (see also: Item subject type constraint (Q21503250)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that entities with the constrained Property should be instances or subclasses of other entities.

A high number of constraints (3,807) on almost all Property types and a high number of checks (290,281,292) belong to this constraint type, which shares space with all the other constraint types.

Concerning the relations that these constraints specify:

  • 89.3% of type constraints require a relation of "instance of,"
  • 7.3% require a relation of "subclass of," and
  • the remaining 3.4% allow either of the two relations.

The name of this constraint type may be considered inaccurate or confusing for two reasons:

  • rdf:type is an RDF property to state that a resource is an instance of a class, but the type constraint type is also used to specify that a resource should be a subclass of another class, for which rdfs:subClassOf is used instead;
  • the word type is already used to refer to Property types (ExternalId, String, etc.), value types, entity types (Item, Lexeme, etc.) and, most confusingly, constraint types.

The name subject class is proposed, since this constraint type is used to specify the class of which the subject entity is an instance or a subclass. Likewise, for the current value type constraint type, the name value class is proposed.

allowed qualifiers[edit]

Goal: No allowed qualifiers constraint on Wikidata has exceptions.
98% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed qualifiers" is for.
86% completed

The allowed qualifiers constraint type (see also: Item allowed qualifiers constraint (Q21510851)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that statements for the constrained Property should not have any qualifiers other than the listed ones.

Commons link[edit]

Goal: No Commons link constraint on Wikidata has exceptions.
100% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "Commons link" is for.
64% completed

The Commons link constraint type (see also: Item Commons link constraint (Q21510852)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that values for the constrained Property should be valid names of existing pages on Wikimedia Commons within a certain namespace.

56 (86%) constraints specify the File namespace; 4 (6%), the Data namespace; 2 (3%), the Category namespace; 1, the Creator namespace; 1, the Institution namespace, and one remaining constraint does not specify any namespace.

This is one of only two types whose constraints have the mandatory severity level (52, 80%) more often than the normal severity level (13, 20%). Consistently, it has no exceptions and no constraints with the suggestion level. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.

difference within range[edit]

Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "difference within range" is for.
51% completed

The difference within range constraint type (see also: Item difference-within-range constraint (Q21510854)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that the difference between the values for two Properties should be within a certain range or interval.

This constraint type only has 7 constraints defined for 7 Properties: date of death (P570), dissolved, abolished or demolished date (P576), service retirement (P730), date of disappearance (P746), date of baptism (P1636), time in space (P2873) and date of burial or cremation (P4602).

inverse[edit]

Goal: No inverse constraint on Wikidata has exceptions.
96% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "inverse" is for.
76% completed

The inverse constraint type (see also: Item inverse constraint (Q21510855)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that the constrained Property has an inverse Property, and values for the constrained Property should have a statement with the inverse Property pointing back to the original entity.

mandatory qualifier[edit]

Goal: No mandatory qualifier constraint on Wikidata has exceptions.
99% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "mandatory qualifier" is for.
86% completed

The mandatory qualifier constraint type (see also: Item required qualifier constraint (Q21510856)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that a certain qualifier is mandatory for the constrained Property.

multi-value[edit]

Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "multi-value" is for.
72% completed

The multi-value constraint type (see also: Item multi-value constraint (Q21510857)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that entities should not have only one statement with the constrained Property.

Only 22 constraints on Wikidata belong to this constraint type, 14 of them applied to Properties of the WikibaseItem type.

one-of[edit]

Goal: No one-of constraint on Wikidata has exceptions.
99% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "one-of" is for.
64% completed

The one-of constraint type (see also: Item one-of constraint (Q21510859)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that only certain values are allowed for a Property. It is limited to Properties of the WikibaseItem type. It would be possible to broaden its scope, although this could result in an overlap, and possible inconsistencies, with the format constraint type.

range[edit]

Goal: No range constraint on Wikidata lacks a lower bound.
100% completed
Goal: No range constraint on Wikidata has exceptions.
91% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "range" is for.
65% completed

The range constraint type (see also: Item range constraint (Q21510860)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that values for the constrained Property should be within a certain range or interval.

Range constraints are applied to 74.2% of Properties with integer constraints and 72.6% of Properties with no bounds constraints.

symmetric[edit]

Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "symmetric" is for.
62% completed

The symmetric constraint type (see also: Item symmetric constraint (Q21510862)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that a Property is symmetric, and values for that Property should have a statement with the same Property pointing back to the original item

value requires statement[edit]

Goal: No value requires statement constraint on Wikidata has exceptions.
95% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "value requires statement" is for.
67% completed

The value requires statement constraint type (see also: Item value-requires-statement constraint (Q21510864)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class), originally target requires claim or target required claim, specifies that values for the constrained Property should have a certain other statement.

This constraint type can be considered a more general, but less powerful, set of rules (superclass) of other constraint types, such as the inverse, symmetric and value type constraint types, which also require a statement on the value entity.

value type[edit]

Goal: All Wikidata Properties of the WikibaseItem type have a value type constraint.
61% completed
Goal: No value type constraint on Wikidata has exceptions.
95% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "value type" is for.
77% completed

The value type constraint type (see also: Item value-type constraint (Q21510865)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that values of the constrained Property should be instances or subclasses of a given entity. Since all Wikidata entities should be specified at least as instances or subclasses, and since the value type constraint type allows to specify arbitrarily general entities as values, all the Properties of the WikibaseItem type could have a value type constraint, although this is only the case for 61% of them.

Value type constraints are applied:

Concerning the relations that these constraints specify:

  • 77.4% of value type constraints require a relation of "instance of,"
  • 11.4% require a relation of "subclass of," and
  • the remaining 11.2% allow either of the two relations.

The name of this constraint type may be considered inaccurate or confusing for two reasons:

  • rdf:type is an RDF property to state that a resource is an instance of a class, but the value type constraint type is also used to specify that a resource should be a subclass of another class, for which rdfs:subClassOf is used instead;
  • the word type is already used to refer to Property types (ExternalId, String, etc.), value types, entity types (Item, Lexeme, etc.) and, most confusingly, constraint types.

The name value class is proposed, since this constraint type is used to specify the class of which the value is an instance or a subclass. Likewise, for the current type constraint type, the name subject class is proposed.

allowed units[edit]

Goal: All Wikidata Properties of the Quantity type have an allowed units constraint.
80% completed
Goal: No allowed units constraint on Wikidata has exceptions.
99% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed units" is for.
88% completed

The allowed units constraint type (see also: Item allowed units constraint (Q21514353)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that only a certain set of units may be used with the constrained Property. It can only be applied to Properties of the Quantity type. 80% of these Properties have allowed units constraints, but ideally all of them should; values expressed in unexpected units whose conversion is impossible or unknown are not only useless for exploitation, but they can also easily go unnoticed and produce misinformation if other units are taken for granted by software agents or data reusers.

Allowed units constraints are applied:

  • to 84.9% of Properties with no bounds constraints,
  • to 83.5% of Properties with range constraints, and
  • to 82.6% of Properties with integer constraints.

The most frequently specified units are, after the absence of unit (158 Properties):

To make this constraint type more correctly used and easier to maintain, it is suggested that implementations recognize the definition of classes of units (e.g., the class unit of length), usually better than trying to list most or all the instances of such a class (e.g., the instances light-year, astronomical unit, foot, parsec, ångström, metre, centimetre, millimetre, and potentially hundreds of other instances). Once this implementation is finished, it is suggested to transform at least the longest lists of instances into their corresponding classes.

The Wikidata Properties with the longest, but not necessarily complete, lists of allowed units are:

Modern constraint types[edit]

This section presents information on the current constraint types that were first implemented between 2018 and 2019, that is, when the use of templates to define constraints was already obsolete, and after a time period of more than two years without new constraint types. Most of these constraint types are not as well known among active editors as the constraint types above.

In addition to these constraint types, there exist the Lexeme requires lexical category, the Lexeme requires language and the Lexeme value requires lexical category constraint types, which are outside the scope of this study.

contemporary[edit]

Goal: No contemporary constraint on Wikidata has exceptions.
92% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "contemporary" is for.
48% completed

The contemporary constraint type (see also: Item contemporary constraint (Q25796498)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that the subject and value entities linked through the constrained Property should coexist at some point in history. It is only applicable to Properties of the WikibaseItem type.

no bounds[edit]

Goal: No no bounds constraint on Wikidata has exceptions.
97% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "no bounds" is for.
26% completed

The no bounds constraint type (see also: Item no-bounds constraint (Q51723761)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that the value of the Property, which must be of type Quantity, should not be used with upper or lower bounds.

The no bounds constraint type serves its purpose, although it is the least known constraint type and one of the constraint types with the least descriptive names according to active editors. Consistently, it is the constraint type with the lowest number of checks (659,132).

allowed entity types[edit]

Goal: All Wikidata Properties specify the entity types to which they apply.
1% completed
Goal: No allowed entity types constraint on Wikidata has exceptions.
100% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "allowed entity types" is for.
64% completed

The allowed entity types constraint type (see also: Item allowed-entity-types constraint (Q52004125)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies the Wikibase entity types (Items, Lexemes, Properties…) where the constrained Property should be used. Ideally, all Properties could have this information specified, but this only happens for 1% of Wikidata Properties.

This constraint type is the third with the highest proportion of mandatory constraints (48%), only after the Commons link and Property scope constraint types. Consistently, it has no constraints with the suggestion level and no exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.

single best value[edit]

Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "single best value" is for.
53% completed

The single best value constraint type (see also: Item single-best-value constraint (Q52060874)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that a Property should have a single "best" value. It may have any number of values, but exactly one of them, the "best" one by whatever criteria, should have preferred rank.

none of[edit]

Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "none of" is for.
67% completed

The none of constraint type (see also: Item none-of constraint (Q52558054)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that certain values are not allowed for the constrained Property.

This constraint type only applies to Properties of the WikibaseItem type.

integer[edit]

Goal: No integer constraint on Wikidata has exceptions.
98% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "integer" is for.
62% completed

The integer constraint type (see also: Item integer constraint (Q52848401)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that the value with the constrained Property should be an integer, a quantity without decimal places.

Integer constraints are applied to 72.6% of Properties with no bounds constraints.

Property scope[edit]

Goal: All Wikidata Properties have their scope (main value, qualifier, reference…) specified.
22% completed
Goal: No Property scope constraint on Wikidata has exceptions.
99% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "Property scope" is for.
36% completed

The Property scope constraint type (see also: Item property scope constraint (Q53869507)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies whether or not the constrained Property should be used for the main value of a statement, in a reference or as a qualifier. It applies to all Property types. Ideally, the scope of all Properties should be specified and this constraint type should be used for that; however, only 22% of Wikidata Properties have a Property scope constraint.

This is one of only two types whose constraints have the mandatory severity level (1,221, 78%) more often than the normal severity level (349, 22%). Consistently, it has no constraints with the suggestion level and is one of the constraint types with the fewest exceptions. Widely applicable constraint types without exceptions, with a high proportion of mandatory constraints and with a clear and controlled set of parameters should be considered good candidates for becoming default Wikibase features.

The Property scope constraint type is the second least known constraint type and one of the constraint types with the least descriptive names according to active editors. Its purpose can easily be confused with those of type or allowed entity types constraint types.

citation needed[edit]

Goal: No citation needed constraint on Wikidata has exceptions.
97% completed
Goal: All active editors think they know "more or less" or "totally" what the constraint type with the name "citation needed" is for.
88% completed

The citation needed constraint type (see also: Item citation needed constraint (Q54554025)  View with Reasonator View with SQID [talk]; help page [talk]; PHP class) specifies that statements for a certain Property should have at least one reference.

Having imported its name from the Wikipedia culture, it is one of the best understood constraint types despite its recent introduction.

Complex constraints[edit]

The so-called complex constraints, which are based on templates, are not so distinguished by their complexity as by the need to specify them from scratch as SPARQL queries. The word "complex" has misleading connotations that may discourage work with these constraints. It is suggested to change the name of these template-based constraints to custom, a name that is already used on some pages.

Suggested constraint types[edit]

When active editors were asked whether it would be better to develop new constraint types or to improve existing features:

  • 47% said both things were equally important,
  • 31% said it would be better to improve existing features,
  • 17% did not answer, and
  • only 5% said it would better to focus on developing new constraint types.

The names of the proposed constraint types listed below are not necessarily the most convenient or the original ones.

acyclic[edit]

The constraint type most demanded by active editors (40 supporters, 69%) by a wide margin (12 supporters away from the second most demanded constraint type) would be the one that would allow checking "whether a Property causes impossible cycles (e.g., A is B's mother, B is C's mother, C is A's mother)."

This constraint type was proposed with the tentative name acyclic as the task phab:T173771 in 2017. Among other Properties, this constraint type would apply to the Property subclass of (P279) and therefore help to improve the Wikidata class hierarchy.

Other suggestions[edit]

  1. unused (28 supporters, 48%): "Check whether an obsolete Property is used." This constraint type was proposed as the task phab:T214244.
  2. sibling reference (22 supporters, 38%): "Check whether a Property used as a reference lacks a certain other Property as a reference (e.g., 'reference URL' lacks 'retrieved' with the access date)." This constraint type was proposed as the task phab:T229178.
  3. type of snak (20 supporters, 34%): "Check whether a Property is used with novalue or somevalue but should not." This constraint type was proposed as the task phab:T172129.
  4. label in language (18 supporters, 31%): "Check whether the Item with a Property does not have a label in a required language." This constraint type was proposed as the task phab:T195178.
  5. self-link (17 supporters, 29%): "Check whether a Property wrongly defines a self-link (the subject Item and the value Item are the same but should not be)." This constraint type was proposed as the task phab:T224837. It could become less useful with the presence of the acyclic constraint type.
  6. geographic precision (13 supporters, 22%): "Check whether a Property has a geographic location with a precision lower or higher than required."
  7. description in language (12 supporters, 21%): "Check whether the Item with a Property does not have a description in the required language." This constraint type was proposed as the task phab:T195179.
  8. time precision (10 supporters, 17%): "Check whether a Property has a date with a precision lower or higher than required."
  9. minimum number of statements (10 supporters, 17%): "Check whether the Items with a Property have fewer statements (regardless of the properties) than required." This constraint type was proposed as the task phab:T195181.
  10. calendar model (10 supporters, 17%): "Check whether a Property specifies dates with calendar models other than those required (e.g., other than the Gregorian calendar)."
  11. single value per language (10 supporters, 17%): "Check whether a Property has more than one value (text string) for each language but it should not." This constraint type was proposed as the task phab:T213967.
  12. identical values (7 supporters, 12%): "Check whether different properties that should be used with the same values are not identical." This constraint type was proposed as the task phab:T191963. This would be a special case of a general derivative statement constraint type.
  13. globe (6 supporters, 10%): "Check whether a Property has the geographic location of a celestial body other than the required one (e.g., other than Earth)."
  14. number of values (6 supporters, 10%): "Check whether a Property has fewer or more statements than required on the same Item (customized minimum and maximum)." This constraint type was proposed as the task phab:T172134.

Acknowledgements[edit]

This work has been made possible by a Rapid grant from the Wikimedia Foundation. You are invited to participate in the program.

Thanks to the Wikimedia Foundation, to Wikimedia Deutschland, to the 58 active editors who completed the survey on Property constraints and to all those who at some point have dedicated their time and effort to improving the quality of Wikidata data.

In memory of Amrapali Zaveri (Q34315853) (1984–2020).