Wikidata:WikiProject Städel Museum Wikidata Clean-Up/Learnings

From Wikidata
Jump to navigation Jump to search
 Home Learnings Works in this pilot 

Scope and nature of the clean-up[edit]

Scope: For this pilot we focused on a small set of highlight artworks from the Städel Museum's collection. We only addressed statements that referenced the Städel Museum as a source, and decided to limit the clean-up to statements made using the properties listed in the table below. We have not touched any other statements, as this information was not part of our project, but part of other users' research.

Nature of the clean-up: We compared the existing wikidata statements to our current data. Where the statements were already up-to-date, we added a new reference using our updated permalink urls (https://staedelmuseum.de/go/ds/xxx). Where more recent data was available we added a new statement with the updated information, referenced with our permalink url. We haven't removed any old out-dated statements, we just gave our updated statements a preferred rank (Help:Ranking).

Wikidata property Städel Museum data field Comment
instance of (P31) "Object Type" / "Objektart" Consulted Wikidata:WikiProject_Visual_arts/Item_structure#Types_of_visual_artworks.
inception (P571) "Date" / "Datum" (date) Where the exact date of creation is not known, we followed the conventions described at Help:Dates#Inexact_dates
image (P18) If not already in place, we added a statement connecting to the file with highest resolution image available on commons.
title (P1476) "Title" / "Titel" (title_en, title_de)
location (P276) "Institution" In every case this was Städel Museum (Q163804)
genre (P136) "Genre" / "Motivgattung" (motiv_general) We learned that the community around Wikidata:WikiProject_sum_of_all_paintings aims to restrict entries for genre to a limited list wherever possible, to make it possible to effectively query using these statements. Most used genres can be found at Wikidata:WikiProject_sum_of_all_paintings/Top_genres.

However, our data generally includes multiple entries for the genre data field, and these can be difficult to map to the preferred options from the wikidata community. Each artwork needed to be researched individually. We did not establish a consistent strategy for the use of this property during this pilot.

main subject (P921) "Main motif" / "Motiv" (motiv_specific) Similar to genre (P136), we learned that, to support effective querying, the community tries to control the entries for this property. Artworks should, wherever possible, only list one main subject (in rare cases more than one) and, wherever possible, this should be limited to the most frequently used main subjects detailed in the lists at Wikidata:WikiProject_sum_of_all_paintings/Main_subject. (Thanks to @multichill for the hint)

However, our data doesn't specify a single main motif. Instead we list multiple entries for this field. This meant that, for each artwork, we had to research a suitable term. Again, we struggled to establish a consistent approach during the pilot.

depicts (P180) "Main motif" / "Motiv" (motiv_specific) In wikidata the constraints for use of this property are much looser. It's possible to add multiple depicts statements and the values need not be limited to any pre-defined lists.

We used this property to add further items from our "main motif" field. Although, again, due to the difficulties with the main subject (P921) statements, we didn't manage to be very consistent in this.

creator (P170) (creator) In most cases this was straightforward. Where not, we consulted Wikidata:WikiProject_Visual_arts/Item_structure#Use_of_creator_(P170)_in_uncertain_cases
inventory number (P217) (object_number)
made from material (P186) "Physical Description" / "Material und Technik" Whilst the wikidata property made from material (P186) should focus materials only, our data includes widely-used standardised terms like mixed technique (Q17141444) that cover both material and techniques used. Mapping our data to wikidata required us to research options for many artworks individually. Again, we struggled to be consistent with our use of this property for this pilot.
width (P2049), height (P2048), diameter (P2386), thickness (P2610) "Dimensions" / "Maße" Consulted Wikidata:WikiProject_Visual_arts/Item_structure#Dimensions.
depicts Iconclass notation (P1257) Iconclass Here we listed both primary and secondary iconclass statements according to our database.
described at URL (P973) Permalink We added statements for our new permalink with the qualifiers:
  • language of work or name (P407)German, English
  • publisher (P123)Städel Museum (Q163804)
copyright status (P6216) "Picture Copyright" / "Bildrechte" All artworks in this pilot are already in the public domain (Q19652).

We were advised to qualify this statement with the following qualifiers:

However, as we are in the European Digital Single Market and all our works become public domain 70 years after the death of the artist, we are unsure it this was the right thing to do (see comment below).

on focus list of Wikimedia project (P5008) In every case this was WikiProject Städel Museum Wikidata Clean-Up (Q124393172)


Outstanding issues from the pilot[edit]

  • We did not delete any information, but added current information and ranked it as preferred. We would appreciate feedback on whether it is ok to delete information that is definitely incorrect or outdated, for example if something is listed in "depicts" that is definitely not in the picture?

Observations and Questions[edit]

  • We think it makes sense to edit the existing data records instead of simply adding new information, as we do not want to create duplicate values or overwrite the existing statements. However, individual editing takes more time than simply generating new information or data records.
  • There are different ways of assigning data, there are different data models (e.g. CIDOC CRM, LIDO) and database-specific procedures. As a result, there is sometimes no standardized procedure and categories are filled in differently. We have added/updated data here to the best of our knowledge and are happy to receive feedback.
  • We have some quality concerns from a data science perspective: Different perspectives on data from Wikidata are interesting, but to effectively query or classify the data, it needs to be clean. For example, we found that for some works “oak wood” is specified as the material/surface, in others “oak panel”. For our part, we have tried to work as consistently as possible, but as mentioned, we have not deleted any other edits. What stage is the Wikidata community at? Is it about collecting more data first or standardising the data and improving the quality?
  • The works we worked on are all in the public domain, but we came across different information regarding "applies to jurisdiction" and "determination method" when specifying the copyright status. Some works had the indication "countries with 80 years pma or shorter" with "80 years or more after the death of the author(s)" others had the indication with 100 or 95 years. Since June 2021, the new copyright law has been in force in Germany, which transposes the EU Digital Single Market Directive into German law and clearly states in Part 7, Section 68 of the Copyright and Related Rights Act that "reproductions of public domain works of visual arts are not protected by related rights under Parts 2 and 3". As the works are located in Germany and are in the public domain after 70 years under German law, the information should actually read "countries with 70 years pma or shorter" and "70 years or more after author(s) death". We see the advantage of specifying the longest possible applicable period so that it is clear that if you live in a country with 100 years of copyright protection, the work is also in the public domain there. However, this would require works to be regularly checked for the public domain span. Our wish would be for all works to have a 70-year indication and for us to be able to standardize this so that there is no confusion as to why some of our works have a 100-year indication and others a 70-year indication. We would be pleased to receive feedback on this.