Wikidata:Property proposal/KBpedia ID

From Wikidata
Jump to navigation Jump to search

KBpedia ID[edit]

Originally proposed at Wikidata:Property proposal/Authority control

   Done: KBpedia ID (P8408) (Talk and documentation)
DescriptionIdentifier for the KBpedia knowledge graph, which provides consistent mappings across seven large-scale public knowledge bases including Wikidata, and is used to promote data interoperability and extraction of training sets for machine learning. Aliases: KBpedia, KKO, KBpedia knowledge graph, KBpedia ontology, kbpedia, Kbpedia
RepresentsKBpedia (Q64139102)
Data typeExternal identifier
Domainitems
Allowed values([A-Z\d])([(A-Za-z\d-_)]+)+ Syntax clarification PascalCase format with digits allowed for first character and hyphen or underscore in remaining positions except trailing
Example 1Afghan cuisine (Q383096)AfghanCuisine
Example 2bat (Q28425)Bat-Mammal
Example 3C-C motif chemokine ligand 16 (Q21113921)CCL16
Example 4Dixie Highway (Q818896)DixieHighway
Example 5ecosystem (Q37813)Ecosystem
Example 6fire station (Q1195942)FireStation
Sourcehttps://kbpedia.org, https://github.com/Cognonto/kbpedia
Planned useKnowledge graph-based subset creation and item (Qid) retrieval from Wikidata; much more going forward
Number of IDs in source44,368
Expected completenessalways incomplete (Q21873886)
Formatter URLhttps://kbpedia.org/kko/rc/$1
Robot and gadget jobsLikely to use OpenRefine for import; maybe later validity checks
See alsoDBpedia, Wikidata, Wikipedia, GeoNames, schema.org, UNSPSC, Cyc

Motivation[edit]

  • KBpedia is a logically consistent knowledge graph -- written in RDF, SKOS, and OWL -- with 58 K concepts that integrate 200 K key items from Wikidata, Wikipedia, DBpedia, GeoNames, schema.org, Cyc, and UNSPSC products and services. The KBpedia ontology is a computable framework for reasoning and for making fine-grained entity subset selections from these sources to aid data interoperability and machine learning (AI). The 44 K concepts mapped to Wikidata are one means for Wikidata users to combine, select, and access entities across Wikidata. Mkbergman (talk) 15:36, 17 June 2020 (UTC)

Discussion[edit]

  • @Mkbergman: Can you add information about what KBpedia is to the description of this property? ChristianKl❫ 07:46, 21 June 2020 (UTC)
    • Is this the kind of expanded description you were seeking? If not, I can revise. Mkbergman (talk) 14:10, 22 June 2020 (UTC)
  • Symbol support vote.svg Support --Jneubert (talk) 19:22, 22 June 2020 (UTC)
  • Symbol support vote.svg Support Pictogram voting comment.svg Comment it seems like this is a consumer of Wikidata (they claim "KBpedia has 98% coverage of Wikidata") and this would lead us to add such an identifier to every single item in Wikidata which I am not sure is such a good idea -- or at least warrants a larger discussion before we start doing this. I generally see the point in adding crosslinks to orthogonal / upstream databases but I am not sure about automatically generated downstream databases. Or maybe I misunderstood what you plan to do? --Hannes Röst (talk) 19:28, 24 June 2020 (UTC)
    • @Hannes Röst:You are correct that KBpedia is a 'consumer' of Wikidata information, and Wikidata is likely the most important contributor to KBpedia's seven major knowledge bases. But, no, it is not likely that the number of mappings to Wikidata will increase much. KBpedia principally links to types and classes, and is itself unlikely to grow much beyond its current 58 K reference concepts (44 K of which now map to Wikidata). KBpedia acts more like a table of contents, than a comprehensive compilation of instances. Wikidata and other constituent knowledge bases are the proper location for that specific content. KBpedia maps to Wikidata instances (Q items) via their parent classes or types, not generally directly unless the instance is quite prominent like Rome or John F Kennedy. Let me know if I can offer additional commentary. Mkbergman (talk) 22:50, 24 June 2020 (UTC)
      • I see, so the idea is that we would import all the 1:1 mappings and then eventuall complete the other 12k concepts so that we now have a complete mapping from WD to the other seven knowledge basis? If that is the plan, then I am in favor (changed my position). --Hannes Röst (talk) 18:20, 1 July 2020 (UTC)
        • @Hannes Röst:Exactly. I'm not sure if all currently missing 12 K concepts would find a match, since some are needed to maintain the integrity of the knowledge graph, but your understanding is correct. For example, one of the seven knowledge bases, UNSPSC (Q1361569), currently maps (UNSPSC Code (P2167)) to about 1000 WD Q entities. That would immediately increase to about 6500 with the KBpedia linkage. Mkbergman (talk) 19:11, 1 July 2020 (UTC)
          • I guess that is a discussion for later but it may make sense to represent them here as well even if they are internal to KBPedia. It seems worthwhile to have the additional 6500 items with UNSPSC Code (P2167) but how did KBPedia do the mapping, what is the quality of the mapping ? Was this done by hand and has high quality or some automated process with high error rate? --Hannes Röst (talk) 15:03, 2 July 2020 (UTC)
            • @Hannes Röst:All mappings have been manually vetted, though to differing degrees of scrutiny. Checks for types and class relationships are the most stringent, followed by mappings, and then annotations, with alternative labels the least scrutinized. During builds, disjointedness checks, and logical inconsistency and satisfiabiity checks are applied. Builds can not be accepted with such errors. All new versions go through multiple builds to build without error. The overall process, then, is semi-automatic, with manual inspection the final step. That does not guarantee there are not errors, which we correct as identified in next releases. We think we have F1 score (Q6975395) as high or higher than other 'gold standard' knowledge bases, but that remains to be independently checked. Mkbergman (talk) 18:11, 2 July 2020 (UTC)
@Mkbergman, ChristianKl, Jneubert, Hannes Röst: ✓ Done KBpedia ID (P8408) Pamputt (talk) 08:20, 5 July 2020 (UTC)