From Wikidata
Jump to navigation Jump to search


Copied here from Project Chat provides a vocabulary for structured data markup on the Web. Structured markup using this markup can be found on more than 30% of pages in a sample of 10 billion. is used by a wide range of organizations: from the New York Times to the WWF, whether global organizations like Greenpeace or UNESCO, or establishments like many local cinemas or pubs, is used by more than 10 Million organizations worldwide to publish data on the Web. You can read more about in CACM or on Wikipedia.

A major cost factor for applications using this data is in aggregating data about a given entity from different sources. Whereas the vocabulary is standardized - defines properties and types - the identifiers for the individual items were not. This was done by design, to make it easier for publishers (see the aforementioned CACM paper for details).

In order to reduce the cost for applications consuming markup, in particular smaller organizations and individual developers, to aggregate fragments of markup from different sources, is considering to encourage the use of Wikidata as a common entity base for the target of the schema:sameAs relation (not to be confused with owl:sameAs).

There is also a class of entities, that are intermediate in generality, between very high general terms such as Person and birthDate and very specific concepts such as individual persons or movies, that may be standardized. This includes lists such as the list of languages and countries. The idea is to use SPARQL queries in order to produce and publish easy to use URIs for those items, e.g. These would be published by with a mapping to Wikidata as part of the normal release process. The necessity for these arise from the fact that they will be easier to use and reuse than the Q-ID based Wikidata URIs.

This will allow anyone to grab a bunch of data from different sites, and integrate them with much less effort than currently. To name just one example: IMDB publishes data about movies using Wikidata uses these pages as references. By having IMDB using Wikidata identifiers, scripts like the one developed by Adam Shoreland, will be able to much easier compare the existing data in Wikidata with such external data sources - on many more sites. would like to discuss this step with the Wikidata community before implementing it, in order to discuss potential issues early and prepare for them. So I am here to open this discussion. --Denny (talk) (wearing his Google hat) 17:57, 4 May 2017 (UTC)


Issues, comments and feedback[edit]

Please see Wikidata