Wikidata:Property proposal/broader concept

From Wikidata
Jump to navigation Jump to search

broader concept[edit]

Originally proposed at Wikidata:Property proposal/Generic

DescriptionBroader concept that the present item is part of, as mapped by an external source
Data typeItem
Domain(UPDATED) this would be a qualifier on statements
Examplefolk tale (Q1221280)oral literature (Q986539)
-- object named as (P1932) "folk literature"
-- stated in (P248) Library of Congress Genre/Form Terms (Q47537953)
-- reference URL (P854) <http://id.loc.gov/authorities/genreForms/gf2014026344.html>


pyjamas (Q193204) Art & Architecture Thesaurus ID (P1014) 300215942;
-- broader conceptnightwear (Q1187616)

folk tale (Q1221280) Library of Congress authority ID (P244) gf2014026344
-- subject named as (P1810) "folk tales"
-- broader conceptoral literature (Q986539)
Sourceexternal thesauruses
See alsonarrower external class (P3950)

Motivation

External thesauruses often contain a hierarchical structure, with links from subjects to broader and narrower terms. These links may or may not correspond to Wikidata triples. For example, "broader" may often correspond to subclass of (P279); but sometimes it might correspond to facet of (P1269), part of (P361) or instance of (P31); sometimes there may be no direct link at all, because we might have an intervening class in between the two.

To assist import, sourcing, and verification, and as relevant information to have for itself, it would be useful to be able to represent the connection here as presented by the external source. This is what this property is intended to do. Note that it is absolutely not intended to be used as a substitute for P279, P1269, P361, or P31, only to complement them. Therefore, its use should always be referenced to an external source.

One might also consider an inverse property "narrower term". But because that might sometimes contain very very many values, that would clutter up items, I have not proposed it. The existence of narrower classes that do not (yet) have items in Wikidata can be indicated by narrower external class (P3950). Jheald (talk) 14:22, 2 February 2018 (UTC)[reply]

REVISED PROPOSAL (25 February): I have revised the proposal to suggest that this property is used as a qualifier on an external ID. This is necessary on a technical level, for a SPARQL path query to be able to extract the items corresponding to part of a hierarchy of a particular external thesaurus. But it may also have other advantages. For discussion and detailed information of this revised proposal, including a full experimental worked example, see new section below. Jheald (talk) 22:22, 25 February 2018 (UTC)[reply]

Discussion

  • For the discussion of the original proposal, see /archive

Revised proposal; and experiment[edit]

@Pigsonthewing, ArthurPSmith, Vladimir Alexiev, ChristianKl: @TomT0m, Peter F. Patel-Schneider, Yair rand, PKM: @John Cummings, Jneubert, MisterSynergy, Thryduulf:

I am revising the original proposal to suggest that the new property should now be a qualifier on external IDs and similar statements, rather than a statement in its own right. It would therefore generally appear "below the fold", out of the main section of statements about the item, instead usually appearing as part of the "External IDs" section, qualifying an external identifier.

The overwhelming reason for making this change is technical: if the property were a main-statement property, it would not be possible to write a path query based on wdt:broader_concept* to extract items from the particular hierarchy of a particular single thesaurus. It is simply not possible to control a SPARQL path query of this form using qualifiers. (See Wikidata:Request_a_query#Virtual_graph_? for further discussion on this point).

But if the property is used as a qualifier, then it is possible to extract the tree of a single thesaurus, using a path query for

( p:Pnnn / pq:broader_concept )* .

As an experiment, to demonstrate this, I have made an experiment based on the 'costume' section of the Getty Art & Architecture Thesaurus ID (P1014) thesaurus. This hierarchy can be found at Wikidata:WikiProject_Fashion/Taxonomy/aat, with items currently matched here linked in blue.

For the purposes of the experiment, rather than using the proposed new qualifier "broader concept" I have used the existing property part of (P361). However, as discussed below, I do believe it would be better to instead use a new property for this purpose.

With the information added in this way, it becomes easy to extract all the items matched to a particular section of a particular external thesaurus, so the following query extracts all the items in the 'costume' section of the Getty Art & Architecture Thesaurus ID (P1014) thesaurus:

SELECT ?item ?itemLabel WHERE { 
   wd:Q9053464 (^pq:P361/^p:P1014)* ?item  .  # inverted form is much faster
   SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

To achieve the above I have taken the hierarchical information and used it to annotate the Art & Architecture Thesaurus ID (P1014) statements, using the qualifier part of (P361). So, for example, for the experiment a typical statement on an item looks like Q193204#P1014.

I have also annotated items where we do not have an item matched for the immediate parent entry in the thesaurus, but only one for a higher-up entry. An example of such a statement is Q763457#P1014.

Here I have again used qualifier part of (P361) to indicate an item higher in the thesaurus tree, but also qualified with sourcing circumstances (P1480) = hierarchical link is not direct (Q50095342), to indicate that the hierarchical relationship in the thesaurus is not a direct one.


Now that one can identify items that correspond to a part of the tree of an external thesaurus, it becomes possible to also investigate their state.

For example, the following query identifies items in the part of AAT thesaurus below costume (Q9053464) where the upward relationship in the thesaurus cannot as yet be 'explained' by our existing subclass of (P279) relations:

query link.

There may be a variety of reasons for rows appearing in the above query, eg:

  • the wrong item has been matched to the AAT id;
  • items have been matched to the external hierarchy, but no subclass of (P279) has been created here at all;
  • matched items missing what might be an appropriate extra subclass of (P279) connection;
  • editors here having made a different choice of how to classify the class, compared to the thesaurus.

All of these, I submit, are interesting, and useful to be able to reveal with a query.

They are not accessible by querying on our existing structures (qua User:Pigsonthewing), because they directly use the external structure to identify current variants or deficiencies in our own structures. I also do not believe the same can be achieved by a federated query (@ User:Jneubert), because I don't believe such a query could jump over the unmatched query in the way required. I think the query is useful, because it lets us compare the alignment of our items with the way entries in external thesauruses are aligned, which I believe can be very valuable, and which at the moment we simply don't do.

Making the new property a qualifier will I hope also go some way to making @ User:Peter_F._Patel-Schneider more comfortable about, now that it will not appear in the block of main statements, but will be annotating an external property, in the external properties section, 'below the fold'.

As the above experiment shows, a new property is not absolutely needed. It would be possible to proceed, as the experiment above does, using part of (P361).

But I believe a new property would be useful, because IMO a more appropriate use of P361 as an annotative qualifier would be to parallel skos:scheme, so that where we have a property like British Museum thesaurus ID (P3632) or LoC and MARC vocabularies ID (P4801) where a single property is a number of different thesauruses offered by a particular site (cf eg the thesaurus vocabularies broken out from this page for the British Museum), one could use P361 to identify which thesaurus the particular match is coming from, in line with how they may often use skos:scheme for this purpose internally.

I therefore commend to the community this proposal to represent the attribute of "broader concept" with a new property.

Since, in qualifier form, it is now also substantially different to the original proposal, I suggest archiving the discussion so far, and consider this a new discussion for decision, on the proposal for this property as a qualifier.

@ArthurPSmith, Thryduulf, Jneubert, Jheald, PKM, TomT0m: ✓ Done: broader concept (P4900). − Pintoch (talk) 16:49, 1 March 2018 (UTC)[reply]