Wikidata:Flemish art collections, Wikidata and Linked Open Data/Whitepaper
The whitepaper below was written in October 2015, for the project Linked Open Data publication with Wikidata. In this project, several Flemish museums contribute collection data to Wikidata. This whitepaper may be interesting for other parties as well – especially other cultural institutions who consider contributing data to Wikidata.
Feedback is welcome. Questions and remarks can be placed on the talk page.This paper was written by Sandra Fauconnier (User:Spinster), Bert Lemmens & Barbara Dierickx (PACKED vzw). It is published under a Creative Commons Attribution Share-Alike license.
This whitepaper is the first project deliverable of the project Linked Open Data publication with Wikidata (D1. Whitepaper open data management in Wikidata) and was described in the project plan as follows:
PACKED vzw and Wikimedia develop a shared vision on how data managers in museums may publish collection data on Wikidata, and how they can update this collection data at regular intervals. This includes:
This vision is recorded in a whitepaper that is proposed to the steering committee of this project. The aim of the whitepaper is to match the objectives of Wikimedia and museums, and Wikidata.
- how collection data is modeled in Wikidata;
- how the upload and export to Wikidata works.
This paper was originally written in Dutch and then translated to English in order to distribute it more broadly in the Wikimedia community.
Based on this whitepaper, further actions within the project are planned out. This will happen in accordance with the Wikimedia volunteers who carry out these actions.
|This paper consists of three chapters.
The conclusion of this whitepaper contains a SWOT analysis of using Wikidata to make your collection data available on the web.
- 1 Wikimedia, Wikipedia and Wikidata
- 2 Costs and benefits of contributing data to Wikidata
- 3 Crosswalk, data delivery and upload method
- 4 Conclusion
- 5 Annex: Usage guidelines
- 6 Notes and references (if not included as hyperlinks)
Wikimedia, Wikipedia and Wikidata
This chapter introduces Wikidata in relation to its 'parent' project Wikimedia and its well-known sister project Wikipedia. This information may be quite familiar to experienced Wikimedia editors.
Wikimedia is a world-wide movement with a mission to make educational content freely available to the world. Its most well-known project is Wikipedia, the free encyclopedia. More than a dozen less well-known projects (such as Wikimedia Commons, Wikidata and the software MediaWiki) belong to the same family.
All Wikimedia projects are edited by a community of users (mostly volunteers) and run on MediaWiki software. All contributions to the projects fall under a Creative Commons license, so that the contents can be re-used, edited, copied and freely distributed - also for commercial purposes.
The various projects in the Wikimedia movement support each other, and exchange content where possible. Wikimedia Commons is the free media bank where images, sound files and videos of the other Wikimedia projects are hosted. The free database Wikidata serves as a central 'data hub' between the various Wikimedia projects.
Wikipedia is the best-known project in the Wikimedia movement. The free encyclopedia was founded in 2001 and exists in 290 different languages (as of October 2015). It is a reference work; this means, for instance, that it summarizes information from other (i.e. secondary) sources; it is not a place for original research.
In September–October 2015, the English Wikipedia had almost 5 million articles, edited by nearly 30,000 active users.
Wikidata was founded in October 2012. It's a free knowledge base that intends to cover the whole world; Wikidata is designed to be readable by humans and by machines. It provides data in all languages covered by the Wikimedia projects.
The data on Wikidata is even more 'free' than the information on other Wikimedia projects: Wikidata's data is made available under the Creative Commons CC0 license, in order to allow third parties to re-use the data as freely as possible (see below for more info about this license). The data on Wikidata is explicitly intended to be universally useful and re-usable by anyone, for any purpose – from educational to commercial.
Wikidata is financially supported by, among others, Google. In 2015, Google has de-activated its own free knowledge base, Freebase. Presumably the Google Knowledge Graph will, therefore, be partly based on data in Wikidata. Not only large search engines can use Wikidata's data; because of the CC0 license every developer is allowed to do so.
On Wikipedia itself, the first steps are being taken to re-use data from Wikidata in Wikipedia articles. With the scripting language Lua, Wikipedia editors can retrieve data from Wikidata and use it in so-called templates (such as infoboxes). In various Wikipedias, this development happens at a different pace; decisions for retrieving data from Wikidata depend on consensus within the local Wikipedia community. Some Wikipedias are more open to experimentation in this area than others.
- The list of paintings by Jacob van Ruisdael on English Wikipedia is automatically generated from Wikidata.
- Infoboxes about French cheeses (Modèle:Infobox fromage) are generated from Wikidata. Example: Brie (fromage).
- The so-called 'authority control' template in biographies on the English-language Wikipedia pulls its data directly from Wikidata. Find an example at the bottom of the English Wikipedia article of Théo van Rysselberghe.
External developers can pull data out of Wikidata via its API. Since mid-2015, Sparql queries are possible as well.
Items in Wikidata
Wikidata consists of a collection of interlinked items. An item refers to an object from the real world (e.g. a building, artwork or person), a concept or an event. Each item has a label (a human-readable name) in at least one language, has its own identifier, and contains metadata. Each item also has its own page on Wikidata and a unique Q number. Examples of items on Wikidata:
- James Ensor – Q158840 – http://www.wikidata.org/entity/Q158840
- peace – Q454 – http://www.wikidata.org/entity/Q454
- the Royal Question (a Belgian political crisis) – Q2666386 – https://www.wikidata.org/entity/Q2666386
In October 2015, Wikidata had approximately 15 million unique items. The 'earliest' items result from a mass import of concepts that correspond with existing Wikipedia articles. After this large-scale import of concepts from Wikipedia, volunteers and bots (scripts) add thousands of new items daily.
What is Wikidata's scope? Which items belong on Wikidata, and which do not? For an explanation of notability on Wikidata, see below (section 'Notable')
Data in Wikidata
Wikidata has started with (open) data, imported from all Wikipedias in the world. Every topic with a Wikipedia article received its own Wikidata item. Metadata about these topics was retrieved from the infoboxes on Wikipedia, mostly by bots or scripts, and added to the items as properties or statements.
In addition to this information from Wikipedia, external (open) data is constantly added to Wikidata. Examples:
Artists who don't have a Wikipedia article yet, but who are relevant (they are mentioned in reputable publications/sources) and who fulfil a structural role – for instance, they are creators of artworks on Wikidata. An example is the Dutch artist Klaas Kloosterboer (Klaas Kloosterboer (Q19938879)), a few of whose artworks are already included on Wikidata.
All Dutch Rijksmonumenten (protected buildings) have their own Wikidata item, also if the building or monument doesn't have a Wikipedia article yet. Example: the Nederlands Hervormde Kerk in Sprang-Capelle (no label (Q17441238)).
The Wikidata community is open to larger-scale uploads of open data, initiated by external institutions. More information about this can be found in the project page Wiki Loves Open Data.
This project page mentions the expectations of the Wikidata community with regards to donated open data. This ideally has the following characteristics:
- Free, and more specifically licensed under CC0, for easy use, reuse, and manipulation without legal hassles.
- Notable, as in relatable to entities featured or qualifying to be featured in Wikimedia projects – see Wikidata:Notability.
- Referenced, allowing verification of data and publication of multiple values according to different sources.
- Queryable, enabling tools to perform publication and maintenance processes as automatically as possible.
- Editable, like the rest of Wikimedia content, which implies that you must be open to integrate improvements on your data.
- Maintained, as opposed to dumped once and forgotten – Wikidata is here for a long term relationship.
Several of these points are explained further below.
As a museum you’re only going to add metadata to Wikidata of which you have renounced all copyrights. In no way you claim the right to use the data. You do this in order to lower the barrier as much as possible to re-use and modify this data in other applications, and to have your data as widely disseminated/distributed as possible.
The data that finds its way to Wikidata should be available under a Creative Commons CC0 license, in order for third parties to freely re-use the data. This implies that anyone can use the data for any purpose, from educational to commercial applications.
CC0 allows anyone to re-use what is published under the license without any kind of attribution. The license does not legally demand it, in contrast to e.g. a CC-BY license it is not an intrinsic part of the licensing conditions. For those who re-use the data that is made available, CC0 is a clear added value. When you’re combining data from different sources, enrich or rework it, you may otherwise end up with a quite complex way of giving proper attribution.
Yet there are also re-users out there with the best intentions, who would really like to give you attribution for what you made available (to them). How do you solve this, when you’re not legally demanding it through a license? One solution lies in the shift from making it a legal requirement, into a social request. You allow the fact that re-use doesn’t have to happen, but also bring a moral element to the table that does encourage to do so. This can happen by creating usage guidelines, and adding them to your data.
Europeana was one of the pioneers to, alongside its metadata publication policy using CC0, create such usage guidelines. The Europeana Usage Guidelines for Metadata contain the following:
- Give credit where credit is due: give attribution to who made available the digital material and its metadata information. These organisations play a crucial role in collecting, managing and harmonising data so that they may become widely available and interoperable.
- Metadata is dynamic: consider using the metadata via the Europeana APIs or by linking: the metadata can be subject to change (renewal, additions, ..) and thus can best be used through a dynamic call method.
- Mention your modifications of the metadata and make your modified metadata available under the same terms: don’t claim to be the source of the data if it already comes from another provider.
- Please note that you use the metadata at your own risk: if you would use non-complete information you do this at your own risk - Europeana collects metadata that was delivered to them by third parties.
Dan Cohen, the Executive Director of the Digital Public Library of America, referred to this kind of attribution request as follows:
I have been calling this implied or ethical attribution. Or, if you like short and snappy symbols, think of it as CC0 (+BY) rather than CC-BY (or ODB-BY).
He also mentions that when you’re cynical, you could state that people with bad intentions may go and do bad things with all that open data. But that’s an intrinsic characteristic of the web. It doesn’t really matter what license you are going to apply to your information; someone with bad intentions will take it anyway. We worry so much about possible misuse, that the use which is in line with what we hope to achieve, almost goes by unnoticed. It is the experience of the DPLA that a lot of software developers who do things with their data, make proper attribution out of their own intention, based on their DPLA Data Use Best Practices. And this despite the fact that the CC0-license did not force them to do so.
I think CCO (+BY) is the best of both worlds: the data in a free-flowing environment that enables creativity and reuse, with attribution still maintained by the vast majority of people who consider themselves part of a social contract. – Dan Cohen
The DPLA and Europeana are not lone soldiers with this way of working: others have followed in their tracks. Tate recently opened up metadata on about 70.000 works of art and 3.500 artists. They did this using a CC0-license, but next to the license declaration a user also finds the heading ‘Usage guidelines’. The American institutions MoMA and Cooper-Hewitt followed the same idea. (See the annex of this whitepaper for a summary.)
In the project Linked Open Data publication with Wikidata, such usage guidelines are also an integral part of the Data Usage Agreement signed by the project partners. Although these guidelines are non-binding, they will be published alongside the different published datasets. The minimal usage guidelines that are proposed in this project, make clear:
- that the material only contains metadata (no images);
- that attribution of the collection of origin is appreciated;
- that deceptive and irresponsible use is not appreciated;
- that changes and improvements to the material may be occuring and may be integrated by the project partners;
- that (re)use of the material happens at own risk.
Depending on the own intentions, these may of course be further specified or extended per institution.
Which information belongs on Wikidata, which doesn't? In its initial phase, Wikidata has the following two goals:
- to centralize interlanguage links across Wikimedia projects
- and to serve as a general knowledge base for the world at large.
An item is acceptable on Wikidata if and only if it fulfills at least one of these two goals, that is if it meets at least one of the criteria below:
- It contains at least one valid sitelink to a page on Wikipedia, Wikivoyage, Wikisource, Wikiquote, Wikinews, Wikibooks, Wikidata or Wikimedia Commons.
- It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references.
- It fulfills some structural need, for example: it is needed to make statements made in other items more useful.
The data provided in the project Linked Open Data publication with Wikidata usually falls under goal #2 and criterion #2. In a few cases, Wikipedia articles already exist about artworks in the contributing collections, which makes these fall under goal #1 and criterion #1. The same principle applies to the artists who created the artworks in the contributing collections.
By the end of 2015, no significant problems have emerged in terms of notability of unique artworks (paintings, drawings, installations, unique sculptures) from public collections, described in art historical literature and/or whose creator is mentioned in reputable sources.
Notability of items produced in series is under discussion. Individual copies of a massively spread publication (like a book) don't belong on Wikidata. In October 2015, there's no community consensus or 'best practice' yet on describing individual prints of e.g. engravings or lithographs, of which different copies may exist in various art collections.
Individual everyday objects in art collections are usually not relevant or notable enough for Wikidata. An exception can be made if it is a very special object, described individually in independent and reputable sources. A good example is the Saliera (Cellini Salt Cellar (Q697208)) by Benvenuto Cellini, in the collection of the Kunsthistorisches Museum Wien. This object has a Wikipedia article in many languages and is covered in many publications.
Datasets in the project Linked Open Data publication with Wikidata have a number of (persistent) URI fields. These URIs refer to sources for a number of statements, such as the creator of an artwork, its date and inventory number.
Datasets in the project Linked Open Data publication with Wikidata are made available for upload/import as static (csv) files. In the upload phase, they can be queried and edited/cleaned by Wikidata volunteers. Ideally, such datasets are made available publicly and permanently, like MoMA and Tate have done via GitHub, and/or are queryable through an API, like Europeana. The participating collections of the project Linked Open Data publication with Wikidata plan to build their own data hub which will also make this possible.
Wikidata, like any Wikimedia project, is filled and maintained by a community of (mainly) volunteers. A data donor maintains and controls its own data in its own databases and platforms/websites. After import to a lively platform like Wikidata, the data will be enriched and edited there by volunteers and bots. Data donors must be aware of this, and must be open to additions and improvements by external parties on Wikidata.
What is the relationship between the work of volunteers on Wikidata and the carefully compiled and controlled content by experts? And how does the Wikidata community find partners that want to engage in long-term management of their information on Wikidata?
Who edits (art and culture on) Wikidata?
In September 2015, Wikidata had 25,917 registered users, of which 6,126 can be considered active. These users edit Wikidata mainly in their free time. According to their areas of interest, Wikidata volunteers organize themselves (among other things) in so-called WikiProjects. In the area of the visual arts, the following WikProjects are active:
- WikiProject Visual Arts – 14 active volunteers in October 2015 – discussions on 'best practices' for the description of visual art on Wikidata
- WikiProject Sum of All Paintings – 24 active volunteers in October 2015 – strives to create a Wikidata item for every notable painting in the world
Most volunteers of the cultural WikiProjects are passionate, well-read culture and art lovers; some work for cultural institutions. Several edit Wikidata both by hand and with bots they have written.
A typical Wikimedia volunteer keeps track of watchlists of articles to which he/she actively contributes. With these watchlists, a user can keep an eye on recent edits in his/her area of interest and can react promptly if needed. Most Wikimedia projects, including Wikidata, have a specific workflow and dedicated volunteers who focus on countering vandalism. Nonsensical edits are typically reverted within a few minutes.
Museums as authorities
On Wikidata, museums are considered authorities on their own collections. Wikidata strives to have reliable sources for all statements; references to reputable (online) publications by museums are very suitable for this.
The data donation in the project Linked Open Data publication with Wikidata contains a number of such references: persistent links to artwork descriptions on the participating museums' websites. These references are included in the upload to Wikidata. Of course, after the upload volunteers can add other references to these statements as well.
Contradictory and 'wrong' information on Wikidata?
Contradictory statements can find a place on Wikidata. When various (reputable) sources contradict each other (for instance in the attribution of an artwork or the birth date of a person), both statements – with their own sources – can be included. If one statement is considered 'the most up-to-date', it is possible to give it a 'preferred' status. For historical and research purposes, it is very interesting to maintain (and not delete!) an older, 'deprecated' statement that might have been considered 'true' in the past.
It must be emphasized that references to sources are crucial. If a volunteer or an expert consider a specific statement 'true' or 'false', he/she must be able to support this claim with an independent, trustworthy source.
Contributing incomplete and unchecked data?
Perfect is the enemy of good, said Voltaire (or Montesquieu?). Most museum collection websites show only a selection of the whole collection: only those items that have been approved by curators or other museum staff. These items have been thoroughly checked and are considered good enough for publication.
However, collection management databases usually contain a multitude of information that has not been 'cleaned' or checked yet. Is it acceptable (or even preferable) to also publish 'messy', potentially unclean and incomplete data online, and to include this in a data donation? MoMA, for instance, has decided to do this when publishing its collection data under a CC0 license on GitHub. Sufficiently 'clean' and checked data is marked 'curator approved' in the dataset; other data is included too, but without this notice. Fiona Romeo, MoMA's Director of Digital Content & Strategy states that this decision is inspired by a proven need from researchers:
...a bigger cultural shift lies behind the records that are marked “not curator approved.” More than half of the records included in this data release have incomplete information and may contain errors. There is established evidence that researchers want online access to collection records as quickly as possible, “whatever the perceived imperfections or gaps in the records.” We therefore decided that we would share this work in progress in order to provide a more comprehensive view of MoMA’s collection.
Also literally: 'authority control' on Wikidata
Wikidata is a knowledge base that wants to cover the whole world. In October 2015, Wikidata contained, for instance, almost 3 million people. In order to clearly identify and distinguish all items, and in order to embed Wikidata as a data hub among other information sources, authority control is a central activity for many Wikidata volunteers.
Wikidata items are linked, as much as possible, with reputable external authority databases. An up-to-date overview of the many authority properties on Wikidata can be found on https://www.wikidata.org/wiki/Wikidata:List_of_properties/all#Authority_control.
In the visual arts, the following selection of authority databases is (among many others) referred to on Wikidata:
- People and organisations: ULAN, RKDartists, VIAF
- Places: Thesaurus of Geographic Names
- Concepts/keywords: Art and Architecture Thesaurus
If external, donated datasets (like the datasets in Linked Open Data publication with Wikidata) already contain a matching with external authority databases (example: artist names are already linked with their identifiers in RKDartists), then this helps to find the exactly correct people on Wikidata.
It's important to note that, because of clarity, concepts are (almost) always linked directly; only on a second level a connection is made to an external authority database.
Artworks, for instance, are described as follows:
<item (artwork)> creator (P170) <item (person)> RKDartists ID (P650) identifier in RKDartists
<item (artwork)> depicts (P180) <item> AAT ID (P1014) identifier in the Art and Architecture Thesaurus
Wikidata and new terminology
Creating and maintaining authority databases is time-intensive and often requires long discussions between publishers and experts (such as between Getty, RKD and the international cultural sector for the maintenance of the Art and Architecture Thesaurus). Wikidata, on the contrary, is quick to react to new developments: new terminology emerges quickly, for instance as soon as a Wikipedia article is written about a topic for the first time. For instance, the concept Internet art (Q1569950) is not present yet in the Art and Architecture Thesaurus, but does have an item on Wikidata.
When data from cultural institutions is uploaded to Wikidata, it will be edited there by Wikidata volunteers. Therefore, Wikidata should be considered an external and open platform for a dialogue about museum objects and cultural heritage. In that regard, Wikidata is complementary to – and does not replace – the institutions' own, internally managed collection databases and websites.
The dialogue on Wikidata consists of enrichment, corrections, and the juxtaposition of different opinions. The Wikidata community expects a certain commitment from the museum and heritage community to effectively participate in that dialogue. Ideally this also involves regular updates of the data, for instance when new items have been added to the collection.
What do both the Wikidata and museum/heritage community benefit from such a dialogue? The next chapter investigates costs and benefits for Wikimedia projects, for museums and for society in the broad sense.
Costs and benefits of contributing data to Wikidata
This chapter presents an analysis of the costs and benefits of using Wikidata to make information about works of art available as open data. The analysis is made for museums, the Wikimedia community and society as a whole.
For museums / art collections
The benefits of a data donation to Wikidata for museums were also explained in a screencast specifically made for the project Linked Open Data publication with Wikidata. (Screencast is in Dutch.)
The Wikidata platform is a cheap, solid infrastructure to make data available for re-use on the web. The platform offers a robust interface and API to manage data and to integrate it in other applications.
Museums save on the development and technical expertise to create and manage a similar platform in-house.
Wikidata and the related Wikimedia projects have a big public outreach. The Wikidata platform itself has 6,000 active users. This is a very high number, since it is a specific and technically experienced crowd who often develop applications by themselves. Because of the known brand and openness of Wikidata, developers from outside Wikimedia also easily find their way to the platform (cf. the recent support by Google).
Through Wikidata, museums may find access to a vast, international and very diverse audience. Through Wikidata, the collection reaches a much wider audience than a museum can realise through its own educational and communication departments.
Next to pure outreach, the 6,000 active Wikidata users offer something extra, namely the capacity to open new perspectives on the collection. Wikidata reaches a specific public of ‘digital natives’ who interact very spontaneous and creatively in mixing and processing data in web applications. In addition this is also a young group, that could help museums to make the necessary translation towards younger people.
Through Wikidata, museums find access to a precious reservoir of creativity that can help them to communicate (about) collections in an efficient way to this growing groups of ‘digital natives’.
Museums’ data doesn’t end up in Wikidata in a specialised ‘silo’, but in a knowledge base that covers the entire world. This means that the data is placed in a wide and rich context. Wikidata also contains metadata of (and links to authorities about) subjects that are depicted in artworks, like historical events and famous persons. Wikidata is a data hub of external authority- and terminology sources like VIAF and the Art and Architecture Thesaurus. Lastly, artworks become part of artists’ oeuvres across the boundaries of individual collections.
|Loss of data exclusivity
The data that you publish on Wikidata is released for re-use under a CC0 license. Museums distance themselves explicitly from any form of exclusivity over the data they publish to the Wikidata platform. Once the data has been published under CC0, this license is irrevocable.
Through Wikidata, museums donate data to society. By doing so, museums discard any possible model of gain/benefit based on making data available for re-use to third parties. Specifically this means for example that a museum can not gain revenue by selling licenses on the collection data. Museums can also not claim a share in the profit that third parties make with re-using data in a product.
Time investment for updates
The museums who publish data in Wikidata are expected to be dedicated to regularly update that data and engage in the dialogue with other (non-professional) Wikidata users regarding the correctness and completeness of the data.
This requires that data managers in the museums get acquainted with the Wikidata interfaces through which they can manage data and engage themselves as Wikipedians. This engagement is voluntarily, but essential in order to facilitate the re-use of the data.
Time investment for data cleanup
Data from collection management systems needs to be cleaned and normalised before it can be uploaded to the Wikidata platform. With the tools that are currently available to do so, this requires a considerable amount of manual work – including exports, linking to authorities, normalisation and mapping of data, etc.
This requires the data manager to have specific expertise to transfer this from one system to the other and have familiarity with specific tools for data cleaning.
The project Linked Open Data publication with Wikidata is part of a broader strategy to renew the digital infrastructure of the Flemish art museums. This group of museums has already gone through an intensive trajectory in which data was cleaned, normalised, enriched and identified with persistent URIs. Because of this, a large part of this cost is already covered and the data can be uploaded to the Wikidata platform with minimal adaptations.
The Wikimedia community regularly works together with social and cultural organisations, from UNESCO and the British Library to educational institutions and museums all over the world. Collaboration with cultural partners happens under the umbrella of the GLAMwiki project (Galleries/Libraries/Archives/Museums).
Liam Wyatt, the first Wikipedian in Residence (British Museum, 2010): “We’re doing the same thing, for the same reason, for the same people, in the same medium. Let’s do it together.”
A data donation from art collections brings the following benefits and costs for Wikimedia and Wikidata:
Institutions donate data that is compliant with the mission of the Wikimedia movement and that falls within the notability criteria of Wikidata.
Institutions donate data that is carefully edited, of high quality, containing references to reliable sources.
A data donation project like Linked Open Data publication with Wikidata offers a learning opportunity for the Wikidata community, in the areas of collaboration with experts from the cultural sector, data modelling, import and re-use.
Stepping stone for inclusion of more free content
The donated data will hopefully encourage the participating museums/collections to enrich and add more free information (like images and other media).
The donated data takes up server space of the Wikimedia Foundation and, therefore, generates a certain cost in terms of storage, maintenance and energy use.
The donated data is edited, managed and maintained by volunteers. This asks for a significant amount of goodwill and investment of people's free time.
Need for new tools
The donation of data creates a need for more and better tools (e.g. for mass uploading, measuring and updating data); there is possibly no budget and time for this in the short run.
For society (funding body, commissioning organisation, the public, taxpayers...)
A data donation increases the findability, visibility and accessibility of the heritage that is preserved in (Flemish) museums (in accordance with the mission of the Flemish Art Collection).
The use of an open platform is also cost-efficient in terms of infrastructure for society as a whole.
Donating data to Wikidata is a tangible implementation of the European PSI directive. Data produced by institutions financed with public money is made available as open data, in a sustainable way. Museums can use an additional instrument, the Wikidata API, to re-use the data in their own daily work.
The donated data can serve as a good source for article writers on Wikipedia. In this way, the public is also benefitting: a source of information for writers is provided, and a worldwide audience can consult this information. The same is true for developers who use the Wikidata API.
An organisation like the Flemish Art Collection wants to present Flemish art heritage to an international audience. Through a platform like Wikidata, information on such works ends up in Wikipedia articles in different languages. Since an artwork will only have one Wikidata item to fall back on. Every article writer sources from this one, same record that is containing authoritative information.
|Investment of taxpayers' money
The data is produced with public budgets.
Crosswalk, data delivery and upload method
This chapter proposes a minimal input profile for an artwork on Wikidata. Which CC0 data is needed to create a minimal but complete Wikidata item for an artwork? How is this described and recorded in Wikidata?
Next, we briefly outline the method of delivery and how the delivered data is integrated in Wikidata by volunteers (October 2015).
A good example artwork item is The Reading by Emile Verhaeren (Q21012032), a painting by Théo van Rysselberghe.
|Metadata to deliver in dataset.
(Please note: data which was not provided in the original dataset, might be added by volunteers later)
|Title of the work, in at least one language||Label of the Wikidata item (in the language(s) provided||Titles may be provided in more than one language, if the dataset clearly indicates which language(s). Alternative titles are welcome too; these are stored as aliases in Wikidata and improve findability of each item.|
|Creator(s) of the work||Property creator (P170)||Preferably formatted as Firstname Lastname, or first name and last name in separate fields. Formatting as Lastname, Firstname is less clear.
The uploader must 'match' all creators in Wikidata – i.e. the exact, correct person with his/her Q number must be found on Wikidata). Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores VIAF, ULAN and RKDartists identifiers; via this way, artists are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.
|Type of object (what kind of artwork is it?)||Property instance of (P31)||The uploader must 'match' all object types in Wikidata – i.e. the exact, correct concept with its Q number must be found on Wikidata. Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores AAT identifiers; via this way, types and genres of artworks are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.|
|Collecting institution||Property collection (P195)||The uploader must 'match' all organisations in Wikidata – i.e. the exact, correct organisation with its Q number must be found on Wikidata. Therefore it helps if the original dataset already contains a match with Wikidata. Wikidata also stores ISIL identifiers; via this way, types and genres of artworks are also findable. Providing (a selection of) these IDs in the original dataset is thus also very helpful.|
|Inventory number in this collection||Property inventory number (P217)|
|Date (if known)||Property inception (P571)||Wikidata only supports precise dates that coincide with
Dates like 'circa 1856' and 'between 1574 and 1603' can't be expressed in Wikidata. Approximations will be used in a data import.
|URL / URI||Preferably persistent / a permalink. A URL that refers to more information about the artwork.|
|Image(s)||Property image (P18)||Like all other Wikimedia projects, Wikidata only includes (links to) images and media that are available under a free license (public domain, CC-BY, CC-BY-SA) and that are uploaded to the media bank Wikimedia Commons.|
|What is depicted on the artwork?||Property depicts (P180)|
|Location of the artwork||Property location (P276)||In most cases the same as the collecting institution, but may be different (e.g. in case of long-term loan, art in public space...)|
|Material||Property material used (P186)|
|Genre||Property genre (P136)|
|Art movement||Property movement (P135)|
|Width||Property width (P2049)|
|Height||Property height (P2048)|
|Weight||Property mass (P2067)|
Data delivery and upload method
At the time of writing this whitepaper (October 2015), no tools exist yet for easy/straightforward mass upload of external data to Wikidata.
At this point, upload of external data is performed by an experienced volunteer. He/she usually uploads the data with a custom script (a 'bot). Such an upload bot can handle data in many different formats. The most important condition is a clear and logical structure in the dataset.
Among others, the following delivery formats are suitable. If a data donor has any questions or doubts, he/she is advised to contact the uploading volunteer.
- csv, tsv or otherwise 'delimited' text file
- excel file, Google Sheet, OpenOffice spreadsheet...
- XML or RDF
- a Microsoft Access export (though a 'flat' file is preferred)
- a publicly accessible API
The order of fields/metadata in the dataset is not important.
After receiving the dataset, the uploading volunteer and his/her bot will
- match people, organisations, concepts in the datasets with Wikidata (i.e. look up the corresponding Q items)
- create missing people, organisations and concepts on Wikidata
- check if any artworks in the dataset already exist on Wikidata and if so, make sure that they are not duplicated during the upload
- add each new artwork, one by one, as a new Wikidata item with its own Q number; all delivered metadata will be added as properties, according to the principles outlined in the crosswalk above
- persistent links / URIs are added to/as properties and references where relevant
Maintenance of the data, manual changes and updates, and RDF extraction will be covered in this project's handbook (Deliverable 4, December 2015).
Wikidata is still a young project. It was launched in 2012 and is continuously in development, both in terms of technology and data modelling.
In October 2015, many questions and issues remain open, such as
- data modelling of artworks produced in series
- precise and correct dates for artworks
- tools for import, statistics, maintenance and mutual updates of donated data
Advantages of early participation in Wikidata:
- A large data donation places the issues above higher on the agenda of the Wikidata community
- Practical experiences and arguments of early data donors can influence future developments
- Earlier donated data will, in case of changes to Wikidata's data model, be updated towards the new situation, together with all other data on Wikidata.
SWOT analysis of data donation to Wikidata
Annex: Usage guidelines
The Museum of Modern Art has made its collection data available as a csv file, under CC0 license, on GitHub: https://github.com/MuseumofModernArt/collection
This includes a README files with additional usage guidelines: https://github.com/MuseumofModernArt/collection/blob/master/README.md
In brief, these guidelines outline:
- Images not included
- Research in progress
- Give attribution to MoMA
- Do not misrepresent the dataset
Tate has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/tategallery/collection
This includes a README files with additional usage guidelines: https://github.com/tategallery/collection/blob/master/README.md
In brief, these guidelines outline:
- Give attribution to Tate
- Metadata is dynamic
- Mention your modifications of the Metadata and contribute your modified Metadata back
- Be responsible
The Smithsonian Cooper-Hewitt, National Design Museum has made its collection data available as csv files, under CC0 license, on GitHub: https://github.com/cooperhewitt/collection
This includes a README files with additional usage guidelines: https://github.com/cooperhewitt/collection/blob/master/README.md
In brief, these guidelines outline:
- Give credit where credit is due. Give attribution to Smithsonian Cooper-Hewitt, National Design Museum
- Metadata is dynamic
- Mention your modifications of the Metadata and contribute your modified Metadata back
- Be responsible
- Ensure that you do not mislead others or misrepresent the Metadata or its sources
- Please note that you use the Metadata at your own risk
- For an overview of all Wikimedia projects, see https://wikimediafoundation.org/wiki/Our_projects
- A Wikimedia user is considered 'active' when (s)he makes an edit at least five times per month. For extensive statistics, see https://stats.wikimedia.org/EN/TablesWikipediaEN.htm
- Google's announcement on deactivating Freebase: https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc
- See a blog post by Dan Cohen from November 2013, available at http://www.dancohen.org/2013/11/26/cc0-by/
- Statistics about Wikidata's editors can be found at http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaWIKIDATA.htm
- “...their most important wish is that online access to museum databases to be provided as quickly as possible, even if the records are imperfect or incomplete.” From http://www.rin.ac.uk/our-work/using-and-accessing-information-resources/discovering-physical-objects-meeting-researchers-