User:Multichill/Connecting paintings and images

From Wikidata
Jump to navigation Jump to search

This page describes the possibilities and challenges with connecting paintings and images. The ultimate goals is to have every painting (Q3305213) here on Wikidata illustrated and to have every image on Commons of a painting link back to it's Wikidata item.

Illustrating painting items[edit]

Every painting (Q3305213) should have image (P18) linking it to the best image available of the painting on Commons. This is not always possible because not every painting has an image on Commons. This could be due to copyright restrictions or because no image is available anywhere on the web. We might want to somehow mark these cases so we don't keep coming back to try to illustrate the same items. At the moment (March 2016) we have about 30.000 paintings with an illustration (query) and about 90.000 paintings without an illustration (query).

Linking Commons images to Wikidata[edit]

Every image of a painting on Commons should use the Artwork template. This template offers the possibility to link an image to Wikidata. If that's the case the artwork template categorizes the image in Category:Artworks with Wikidata item and otherwise Category:Artworks without Wikidata item. In the end all paintings should have a Wikidata link and multiple images of the same painting will link to the same painting item on Wikidata. For the Google Art project paintings more specific categories have been created: Category:Google Art Project paintings with Wikidata item & Category:Google Art Project paintings without Wikidata item.

Syncing from Wikidata to Commons[edit]

If an item about a painting (Q3305213) has a (valid) image (P18) claim then the target image on Commons should link to the item here. See for example The Milkmaid (Q167605) and File:Johannes Vermeer - Het melkmeisje - Google Art Project.jpg. A bot should run on a regular basis that adds the missing links from Commons to Wikidata (example). This way a user just has to add a link in one direction and a bot makes sure the data is consistent.

Syncing from Commons to Wikidata[edit]

If an image on Commons has a link to a Wikidata item, the item is about a painting and doesn't have a image (P18) claim, a bot should add the image to the Wikidata item to illustrate it.

Matching painting items and images[edit]

Syncing data in either direction is relatively straightforward. Correctly matching painting items and images on Commons is much harder. The general process is:

  • Try to match an item and an image based on available data
  • Optionally offer it to a user for verification
  • Make the link (one or two directions)

Available data[edit]

The available data is quite tricky because Commons doesn't do structured data so you end up trying to extract data from the artwork template. Some quite useful sources are:

Depending on what data you're able to find you might end up with a strong match. For example if you manage to match "artist", "institution" and "accession number", you got a very strong match.

General process[edit]

An automated process should probably harvest two data sets:

  1. All paintings here on Wikidata with relevant properties
  2. All images in Category:Artworks without Wikidata item about paintings

A matching algorithm should be run to match items in these two data sets. If the match is strong it should be added right away, if it's weaker it should somehow be offered to a user to decide if the match is correct or not. User decisions should be recorded and remembered so we don't keep offering the same (incorrect link). If matches are made (with the tool or some other way), the database should be updated so it won't be offered to users anymore. The whole thing should probably be offered in some game like way and should probably have a leader board.

User interaction[edit]

How to offer a possible Wikidata painting item to Commons match is a challenge in it's own right. Plenty of paintings look the same if you just look at the metadata. Titles can be quite generic and used multiple times. The best way to offer it to the user would be a visual way. If a user gets offered the image on the museum website and the image on Commons, it's easy to make the match, a robot might even be able to do that.

The hard part is extracting the image from the (museum) website. At the very least the painting item needs to have a described at URL (P973) link. Of the about 90.000 paintings without an image, about 75.000 (80%+) have an external link so that part looks promising. The code to extract the relevant image from the museum website will probably takes the most effort to write.

Probably should offer an option to create a painting item here, but that's a different user flow.

Other approach[edit]

Completely different approach would to just upload all the images from all the museum websites and match these. This way we don't have to deal with all the current images on Commons with messy metadata. That still leaves us with a lot of images of paintings on Commons that don't have a link to Wikidata.