Wikidata:Property proposal/external error

From Wikidata
Jump to navigation Jump to search

suspected external error[edit]

Return to Wikidata:Property proposal/Generic

   Under discussion
Description(use as qualifier) the database towards which this external identifier links made the suspected error
Data typeItem
Example 1South America (Q18) VIAF ID (P214) 258766259 → multiple values for exact match (Q96473051)
Example 2South America (Q18) VIAF ID (P214) 170506329 → multiple values for exact match (Q96473051)
Example 3MISSING

Motivation[edit]

Currently, there are many constraint violations in properties like VIAF ID (P214) for single value constraint (Q19474404) and distinct values constraint (Q21502410). Some of them are because of errors within Wikidata and some others are because of errors in VIAF ID (P214). If we would tag the values that are errors in the external database it will be a lot easier for the external database to fix their errors then when they get a list that both contains our errors and theirs. Using this as a qualifier has the additional benefit that we can create queries that list all single value constraint (Q19474404) for VIAF ID (P214) that don't have external error which makes it easier to work with the list of errors because then we can either fix the error on our side or mark the item as an external error and not leave items in our list without being able to take an action that clears the item from our worklist. ChristianKl❫ 13:26, 20 June 2020 (UTC)

Discussion[edit]

  • Symbol oppose vote.svg Oppose this info is not relevant to display inside entities, there are constraint violations pages, and other query ways to find out errors Germartin1 (talk) 16:50, 20 June 2020 (UTC)
  • Constraint violation pages don't classify errors and as such mix errors within Wikidata with errors made in external databases. Declaring a statement as an external error is a value judgment that you don't get via a query if you have no way to input the data. ChristianKl❫ 17:30, 20 June 2020 (UTC)
  • Pictogram voting comment.svg Comment I think I can see the utility of this; however, can we always be certain that something is an error of this sort in an external db? What if there really are two distinct entities that look like the same thing but are really different somehow, so it's not actually an error? How do we tell? ArthurPSmith (talk) 13:24, 22 June 2020 (UTC)
  • @ArthurPSmith: If we have a list of things we believe to be errors in an external database that allows us to talk to the people who make the external database about those things we believe to be errors. To the extend that it turns out that certain statements aren't errors we can remove the claims and see what needs to be done on our side.
Talking might either mean contacting them or having them take part in Wikidata directly. ChristianKl❫ 18:47, 22 June 2020 (UTC)
Hmm, maybe a less accusatory property name is needed then? "suspect error" maybe? ArthurPSmith (talk) 20:33, 22 June 2020 (UTC)
@ArthurPSmith: I changed it to "suspected external error". ChristianKl❫ 18:45, 23 June 2020 (UTC)
  • Pictogram voting comment.svg Comment A general question to ask oneself: is it an external error or an exception to a constraint defined at Wikidata?
    VIAF as a sample has another issue: is it really productive to look at it and try to handle them like standard external identifiers? VIAF clusters are reorganization reguarly, also based on Wikidata.
    Thirdly, VIAF clusters beyond items for people don't necessarily work that well. Some might say the same about Wikidata though. --- Jura 10:01, 23 June 2020 (UTC)
@Jura1: This property is designed for those things that are external errors and not just expections to a constraint in Wikidata. The fact that VIAF does reorganization regularly raises hope that they are willing to take our listing of their errors into account when they reorganize. Especially if they can easily get a list of what we consider to be errors in VIAF that excludes the cases where we might have a constraint violation that's not an error on their side. ChristianKl❫ 12:03, 23 June 2020 (UTC)
  • They do that already, at least for people. There is no need for any additional action than adding the additional identifiers to the item. Besides, I don't think the single value constraint on P214 should be read literally. --- Jura 12:11, 23 June 2020 (UTC)
  • While I'm aware of an example where they did it for people I'm not aware of a source that shows that it happens at scale. If there's such a source I would be happy to see it. It would be quite useful in the discussions with dewiki about reusing our IDs.
Currently, there seem to be plenty of unfixed errors. When I looked at both the single value constraints and distinct value constraints of VIAF ID (P214) I found plenty of erros on our part and on VIAFs part. Currently it seems to be very cumbersome to work through the list of constraint violations because while some errors can be fixed others can't. As a result we don't get data cleanup. Unfortunately, we have a few people who add additional data based on existing VIAF identifiers and as a result the errors accumulate. ChristianKl❫ 18:45, 23 June 2020 (UTC)
  • You could follow Krbot's updates for merged VIAF clusters.
    I doubt it's worth looking into VIAF clusters beyond Q5-items. --- Jura 13:50, 25 June 2020 (UTC)
  • Symbol support vote.svg Support I think this is worth having. ArthurPSmith (talk) 19:18, 24 June 2020 (UTC)
  • Pictogram voting comment.svg Comment this seems useful, and I agree strongly with the overall idea. But how is this different from the currently practiced approach for example here Peter Müller (Q62333) using reason for deprecation (P2241) together with conflation (Q14946528) or duplicate entry (Q1263068). It seems to me this would achieve the same goals, I assume your approach does not rely on deprecation and allows better modelling? --Hannes Röst (talk) 21:44, 24 June 2020 (UTC)
    • @Hannes Röst: I think the key difference is that this property can be used where we don't deprecated values like example where there are two identifiers for the same person.
      • Pictogram voting comment.svg Comment it would be good to see some samples. I don't think the one about Q18 is that useful. --- Jura 18:41, 26 June 2020 (UTC)
      • Pictogram voting comment.svg Comment I agree it would be helpful to see some examples of (i) two external identifiers for the same item and (ii) a single conflated identifier for two or more items. --Hannes Röst (talk) 19:45, 26 June 2020 (UTC)