Help talk:Property constraints portal

From Wikidata
Jump to navigation Jump to search

Improvements for 2018[edit]

For me constraints are a form of quality control. Constraints can be used to find mistakes and to measure completeness. Say we have a scale from 0, an empty item, to 100, an item with all relevant labels, descriptions, sourced statements, etc. A lot of the constraints are used to keep track of this completeness. For example every item using RKDimages ID (P350) has collection (P195). This way you can get the quality for a certain subset of items to a level and keep track that it stays on this level. Over time of course we hope these subsets increases and the level increases so, because that means our overall quality is improving.

Possible new constraints:

  • Label in language: An item using a certain property should at least have a label in this language or these languages. example defined on Property talk:P650. ---> phab:T195178
  • Description in language: Same as above, but for description ---> phab:T195179
  • Minimum number of statements: Item using a certain property should at least have this number of statements (available in RDF for a while) ---> phab:T195181
  • Minimum number of identifiers: Same for identifiers, might be less useful, but might as well implement it right away
  • Minimum number of sitelinks: Same for identifiers, might be less useful, but might as well implement it right away
  • Minimum number of labels: Not directly in RDF, but probably useful
  • Minimum number of descriptions: Not directly in RDF, but probably useful
  • Probably dead: Date of birth is set and it's more than 100 years in the past. The person is probably no longer around (example)
  • Related item is incomplete: Item using a certain property should also have another property and the target of that other statement should contain a certain property. For example every item using RKDimages ID (P350) should have creator (P170) and that target should have RKDartists ID (P650). Example defined on Property talk:P350.

First step is to have a certain statement, second step is to have it properly sourced. Would also be nice to be able to include that a statement should have a (valid) reference. So for example on RKDartists ID (P650) a constraint is set for sex or gender (P21), I would like to add that all should be sourced (phab:T195052). That brings me to a related concept: constraint status (P2316). That's currently either mandatory constraint (Q21502408) or nothing. Would be nice to have something like "complete with exceptions" to indicate statements that have been cleared out, but left with some exceptions that can't be solved.

We can probably think of some more constraints to raise the quality. Any suggestions? I haven't filed Phabricator tasks yet because I would like to discus this first.

So we have all these constraints and more of them are being added over time. It becomes increasingly harder to keep track of this. Pages like Wikidata:Database reports/Constraint violations/Mandatory constraints/Violations become too big and have too many things mixed. So I think the basic workflow as a user (or group of users) is:

  1. Figure out a constraint that's doable to clear out (every human using RKDartists ID (P650) should have sex or gender (P21))
  2. Clear out the list of violations until you're done
  3. Maintenance mode: Every once in a while a violation occurs and needs to be cleared

The last step (the "maintenance mode") is currently quite hard. Over time you clear out more and more subsets and you just loose track of it. It's like maintaining a gigantic garden, some parts are really nice, but some parts are still a wilderness. Currently I'm trying to group things together like on User:Multichill/Humans no gender, but that doesn't seem to scale.

I would like to gather some input here and maybe we can also discus this in person at the Barcelona hackathon.

@Ivan A. Krestinin, Pasleim, Lydia Pintscher (WMDE), Lucas Werkmeister (WMDE), Sjoerddebruin: you're probably interested in this. Multichill (talk) 14:05, 27 April 2018 (UTC)

Yeah, let's discuss this in a few weeks! :) Sjoerd de Bruin (talk) 14:31, 27 April 2018 (UTC)
Yes I think this is a good idea, and also the bit about suggestions, which should be generated from this somehow. So yes if you have RKDartist id, then you could have gender, etc. Jane023 (talk) 14:40, 27 April 2018 (UTC)
Sjoreddebruin: :) "let me open a ticket so we can discuss".
--- Jura 15:29, 27 April 2018 (UTC)
  • Pictogram voting comment.svg Comment Maybe the complex constraints could be converted into a statement based system. This could make it easier to replicate them to other properties. The main disadvantage of that approach is that one wont be able to query the violations through query service. At least, once that option is available.
    --- Jura 15:32, 27 April 2018 (UTC)
  • Good ideas, I add more:
Ivan A. Krestinin (talk) 07:58, 29 April 2018 (UTC)
We met up yesterday at the hackathon. We created some new phab tasks. These are added to the original post. Multichill (talk) 10:08, 20 May 2018 (UTC)

If I may jump in at this slightly late date, I would very much like to have relative ranges. For exmaple, the producers of a work of art can be just about any agent (person, organization, etc.) but the producers of a film are human. As well, the offspring of humans are humans, but the range of offspring in general is much broader. Peter F. Patel-Schneider (talk) 13:31, 23 May 2018 (UTC)

My insight: we should have the option to prevent adding what is known to be representationally wrong, but this mainly depends on the resources and decisions of the development team. We are an open project, we accept all points of view and we assume that truth is relative, so we don't manage truth, we manage verifiability, we want to continue doing so, we want to move away from the dystopia in which "no human has successfully edited the site in years, with flocks of admin-enabled AI bots reverting any such attempt, citing concerns about referential integrity", and I agree with all this. But, at the same time, we can't just continue labeling everything as wrong, this does not ensure that labeled issues will be solved at some point: our community is small, there are too many entities per active editor and I guess we can't easily take care of more data. Fortunately, there are some constraints related to the representation of information that must always be met. These are some truly mandatory points that could be enforced, although other points could be added:
  • Every new item and property must have a non-reflexive statement with instance of (P31) or subclass of (P279) (or both). Since the degree of precision is arbitrary, there's no valid excuse, everyone can find a class, precise enough or not (in the worst case, something like object (Q17553950), occurrence (Q1190554), etc.), to link an entity with. No exceptions.
  • Well-identified symmetric properties are always symmetric. But, currently, this constraint can't be enforced because it's not possible to atomically add, modify or remove two symmetric statements. It would be great that the development team could make it possible, although this seems technically hard.
  • Mandatory format constraints can also be enforced. If it's not technically feasible to use an arbitrary regular expression, the simplest solution would be useful too: minimum and maximum lengths, just numbers, just a-z and A-Z letters, no uppercase characters, no lowercase characters, no special characters, URL format, etc.
Apart from this, I miss some UI improvements that could increase completeness, reduce mistakes and, as a result, reduce the work of volunteers who have to correct these mistakes later. I think it would be especially useful to use pre-filled models to create or complete common instances (humans, books vs. editions, places, etc.) and add some help messages and warnings while editing.
I would like to ping Lydia Pintscher in case she want to tell what points she agrees on and, if any, how volunteers could help the developing team to address them more easily, or possible alternatives if that's simply not possible. --abián 22:54, 26 May 2018 (UTC)

Clarification on english label for allowed qualifiers constraint (Q21510851)[edit]

allowed qualifiers constraint (Q21510851) is described in several locations and the label as describes in the English translation, that these constraints are "allowed" ones -- i.e that other qualifiers are not allowed at all. However, in practice, is this just a "preference", that triggers constraint reports? Could we modify the English description to be a bit more flexible? It communicates to some other audiences, a bit more of a "forbidden" vs "allowed" rather than what we have, which is more of a way to monitor inconsistency. Sadads (talk) 21:38, 10 May 2018 (UTC)

From the implementation point it is not just a preference. The listed properties are allowed to be used in the qualifiers and all others trigger a violation warning. Does that help? --Lydia Pintscher (WMDE) (talk) 15:36, 11 May 2018 (UTC)

property scope constraint (Q53869507)[edit]

What is property scope constraint (Q53869507)? Is it somewhere documented? Was its introduction discussed somewhere? Please don't add a new constraint to thousands of properties without prior announcement! --Pasleim (talk) 18:07, 22 May 2018 (UTC)

Hello @Pasleim:, see the discussion here. Lea Lacroix (WMDE) (talk) 10:55, 23 May 2018 (UTC)
@Lea Lacroix (WMDE): That discussion is long, and not clear. I share Pasleim's concerns. Please can we have a single-paragraph explanation of property scope constraint (Q53869507), with some annotated examples? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 2 June 2018 (UTC)
Why are you asking Lea? What does she has to do with it? Ivan created this mess, not the developers of WMDE. Help:Property constraints portal/Scope was created, but isn't very clear. Multichill (talk) 20:37, 2 June 2018 (UTC)