Wikidata:WikiProject Data Quality/Issues

From Wikidata
Jump to navigation Jump to search

This page has the following subpages:

What data quality issues did you spot recently? And, did they occur repeatedly?

Should property "of" (P642) be deprecated?[edit]

There has been discussion (most strongly in "P642 considered harmful", an essay by Lucas Werkmeister, and also on the Wikidata Telegram channel) that of (P642) is convenient for editors but is not structured data and therefore makes finding information in Wikidata more difficult.

I believe this is a data quality issue and it should be addressed systematically within this project by identifying the cases where P642 is used today and what other properties might be used instead, and then creating lists or queries where editors or projects could work to replace these uses. Lucas identifies a number of scenarios in his essay - are there others? Can we identify best practice alternatives? Do we need new properties for some of these uses? - PKM (talk) 00:42, 13 November 2021 (UTC)[reply]

Case: Setting scope of "subclass of"[edit]

One use of P642 Lucas identified that I find most annoying is in defining the scope of a subclass statement:
This structure is widely used (probably because people like me see it, assume it is standard practice, and copy it). On the face of it this produces an illogical statement: "camera model" is a subclass of "model of camera". This use can be improved by adding the proper scope statements (<has parts of the class>, <country>, <jurisdiction>) and removing the <of> qualifier. Perhaps this would be a good place to start? PKM (talk) 20:55, 13 November 2021 (UTC)[reply]
@PKM We now have is metaclass for (P8225) View with SQID as a replacement of the « of » qualifier. To put as a main statement. author  TomT0m / talk page 17:31, 9 February 2022 (UTC)[reply]
Thanks, I've added that use case to the list. Swpb (talk) 20:37, 9 February 2022 (UTC)[reply]

Approach thoughts[edit]

I helped sort out the uses of the deprecated "as" property a few years ago, and I can say P642 is going to be a beast, but it must be done. Lucas' groupings, helpful as they are, are broad classes. If we can get down to specific use cases where each use can be handled exactly the same way, we can automate (or at least standardize) a lot of the migration. The downside is there are probably hundreds of such cases, including probably quite a few single-instance cases. We will need several new properties. The result will be an incremental removal of P642 over time. My question is: should we patch the hull before we start bailing? Meaning: should we deprecate P642 immediately, and then sort the uses out? That would prevent new uses, but it could drive editors to use inconsistent alternatives. It might be better, albeit more effort up front, to set up complex constraints to flag uses for which we've identified alternatives. We should also use Wikidata property example (P1855) to direct users to our table of use cases and their handling, to encourage consistency. Also, note that right now, nine properties have P642 as a required qualifier; those might be a good place to start. Swpb (talk) 15:58, 26 January 2022 (UTC)[reply]

@PKM, Lucas Werkmeister: I've started a table of use cases based on some of Lucas' examples here: Wikidata:WikiProject Data Quality/Issues/P642. Obviously there are many more. Please feel free to add to/modify it, optimize the queries that time out (I'm not great with SPARQL) and move it to the project namespace if you know of an appropriate location. Swpb (talk) 21:20, 27 January 2022 (UTC)[reply]

@Swpb, Lucas Werkmeister: Thanks for taking this on. I'm overwhelmed IRL at the moment but I'll be back in a week or two (nothing to worry about, just one wrenched shoulder and one sprained ankle in a two-person household). - PKM (talk) 01:15, 30 January 2022 (UTC)[reply]
Sorry to hear that! Take care. Swpb (talk) 14:10, 31 January 2022 (UTC)[reply]