Wikidata:WikiProject Cultural heritage/Reports/Towards Ingesting GLAM Inventories on a Global Scale

From Wikidata
Jump to: navigation, search

Version 1.0, 11 January 2017

Author: Beat Estermann

Introduction to the Case Report[edit]

After documenting my experience with the ingestion of the data about Swiss heritage institutions (see case report), I set out to create an overview of cultural heritage related projects on the Wikidata platform and to break down the various tasks related to the data ingestion process, so as to make it possible for a larger number of contributors to participate in the process of ingesting data about heritage institutions and other heritage related aspects throughout the world.

The example case this report is focusing on is again concerned with the ingestion of data about heritage institutions, but this time on a global scale. It has been inspired by the fact that data about virtually all heritage institutions from over ten countries have been gathered for the OpenGLAM Benchmark Survey, but not yet ingested into Wikidata. It has been further inspired by the intention of the International Council of Museums ICOM to run a project in cooperation with the Wikimedia Foundation and other partners with the goal of ingesting data about all the museums in the world into Wikidata – if possible by ingesting databases that already exist in the various countries, or otherwise by means of crowdsourcing.

Once this example case has been set up, it can be abstracted and applied to other topic areas, such as built heritage (cf. Wiki Loves Monuments), artworks in the public space (cf. Wiki Loves Public Art), the performing arts domain, and so forth.

Some of the work related to this case (3.5 days) have kindly be funded by eCH, the Swiss E-Government Standardization Organization and has been coordinated via the eCH Specialized Group “Open Government Data”. The present report marks the completion of the tasks that have been funded, namely the creation of guidelines for the collection and description of data sources on Wikidata and for the batch editing on Wikidata in spreadsheet mode.

Goals[edit]

This project was called into being in fall 2016 after the experience of ingesting the data about Swiss heritage institutions has led to the following insights:

  • A good overview of cultural heritage related projects and data was lacking on the Wikidata platform.
  • The learning curve for being able to handle the data ingestion process autonomously was (and still is) rather steep, putting the average potential contributor in front of insurmountable hurdles.
  • The huge potential of Wikidata for the heritage domain can hardly be tapped into without better coordination, documentation, and outreach to relevant partners.

The goals can therefore be summarized as follows:

  • Provide an overview of cultural heritage related projects on the Wikidata platform
  • Create an example WikiProject page serving the coordination of the ingestion of data about heritage institutions that can then be used as a blueprint in view of the creation of similar project pages for other topic areas.
  • Break the various tasks required in the context of the data ingestion down into smaller bits and provide guidelines for new contributors.
  • Provide a basis for discussion both for the project in cooperation with ICOM and in view of the coordination of the ingestion of data about built heritage in the context of the next edition of Wiki Loves Monuments.

Achievements so Far[edit]

Overview over heritage related projects on Wikidata[edit]

I first set out to get an overview of all heritage related projects on Wikidata. I thereby followed the approach suggested in the earlier case report about the ingestion of data about Swiss heritage institutions. The result is a navigation template that can be included on the various project pages to provide links to related content.

Analysis of the content and the structure of the various heritage related WikiProjects[edit]

I then had a closer look at each of the WikiProjects. An overview of the content and structure of the various WikiProjects can be found in Annex 1 of the present report.

Proposed structure and content of heritage related WikiProject Pages[edit]

The next step consisted in devising a draft structure and an overview of the content for WikiProject pages pertaining to the ingestion of cultural heritage related data in view of their harmonization. The draft structure and content can be found in Annex 2 of the present report.

Creation of the WikiProject “Cultural heritage” as a central hub[edit]

Next I created the new version of the WikiProject “Cultural heritage” and started to set it up as a central hub for the coordination of all heritage related content (an earlier version focused exclusively on built heritage). It comprises a description of the aim and vision as well as of the concrete goals of the project that have been largely inspired by the stated goals of the various cultural heritage related Wikiprojects on Wikidata. The declared aim of the project is "to coordinate, facilitate and promote the ingestion of cultural heritage related data into Wikidata, to facilitate the cleansing and enhancement of this data and to promote its use across Wikipedia, its sister projects and beyond"; it is in line with the vision of "establishing Wikidata as a central hub for data integration, data enhancement, and data management in the heritage domain".

Example WikiProject page[edit]

I then created the WikiProject page Heritage institutions as a first implementation case of the newly proposed structure of the WikiProjects pertaining to the ingestion of heritage related data. I thereby tried to break down all the work steps required for the ingestion of the data into smaller bits, providing guidelines for contributors. In most of the cases, the different steps of the work process are described at the level of the given WikiProject as they are rather specific to the chosen domain (heritage institutions). They can however easily be adapted to other domains later.

The various instructions are spread across the pages of the WikiProject Heritage institutions, with the section “Tools & Tasks” providing an overview of the various tasks and links to associated instructions. Some task descriptions and guidelines are still missing, but the general break-up of tasks should become clear, and there are plenty of tasks which potential contributors can start to engage in.

Guidelines for the Editing of Data in Spreadsheet Mode[edit]

One set of guidelines – the Guidelines for the Editing of Data in Spreadsheet Mode – was not directly integrated into the WikiProject Heritage institutions, as it is of a more general nature. It has been positioned at the level of the WikiProject Cultural heritage for now, but may eventually be moved to a more general section on the Wikidata platform.

Guidelines with regard to the ingestion of entire data sets[edit]

There are various existing guidelines and case reports pertaining to the ingestion of entire data sets – both specifically focusing on heritage data and at a more general level. They have been referenced on the central page of the WikiProject Cultural heritage. However, no effort has been made yet to analyze the different guidelines and to provide a synthesis in the sense of converging best practices. This job still needs to be done; but probably it is useful to gather feedback from users first with regard to the various guidelines.

Next Steps[edit]

The next steps consist in putting the various guidelines to a test by different users and to gather feedback from the community with regard to the proposed structure and content of the cultural heritage related WikiProjects. Throughout this process, insights about further coordination issues should be gathered and documented.

The present report will also serve as a basis for further discussions in view of the project in cooperation with ICOM. One possible next step in this context may consist in matching different tasks to various stakeholder groups and contributor profiles in order to develop communication material adapted to each of them.

And finally, the structure that was created in view of the ingestion of data from the cultural heritage sector, may be reproduced also for other areas, e.g. for government data, sports data, etc.

Annex 1: Structure and Content of Cultural Heritage-related WikiProject Pages[edit]

Below is an overview of the structure and content of cultural heritage-related WikiProject Pages as of November/December 2016 that served as a basis for the proposal of a harmonized WikiProject structure for WikiProjects relating to the ingestion of external databases (see Annex 2).

Museums

  • Main page
  • Talk page:
    • How to deal with the subclasses of museums?
      • via the property “museum” and qualifier “main subject”?
      • by enumerating the different collections? (property “has part”)
    • Add Properties: opening hours, entrance fees to provide more useful information for museums & tourism sites as well as for Wikivoyage
      • “hours” not supported yet
      • How to model entrance fees?
      • Wikidata museum map: motivational tool; showing completeness of data
      • Where to add coordinates? - on building? or on institution? - institutions stretching across several buildings should have coordinates for every building
      • Employees of museums
      • Changes of museum names in history
      • Museum as organization vs. museum as location
      • How to proceed step-by-step when ingesting heritage data? need for coordination
  • Musée de France (instructions on how to ingest data about museums of the class “Musée de France”
  • Talk page (Musée de France):
    • museums statistics as data source
    • base de données “Muséofile” vs. base de données “Musées de France” / “musée de France” vs. “Musée de France”

Books

  • Main page:
    • mission / scope;
    • Properties (for different classes; property templates; with explanations and reference to external conceptual model)
    • Mapping to infobox templates and to other ontologies (in Goolge Doc)
    • Participants
    • Related Links
    • Tools
    • Navigation boxes:
      • Authority control properties
      • Book properties
      • Bibliographical properties
  • Talk page (excerpt, 2016):
    • how to model “lost literary works”?
    • automatically import book references from ISBN; pre-mature, still struggling to model the ontology (ISBNs are related to “editions” of books)
    • Guiding principles for ontology modelling (account for complexity; think in terms of queries; parcimony: create items only when we need them)
    • minimum set of statements aimed for in items of a given class
    • Think about the use case when modeling data
    • Big discussion about various ontology modeling issues
    • Translation issues regarding concepts, cf. classification of genre
    • Guidelines on how to handle pseudonyms
  • (not accessible via the menu:) Queries

Periodicals

  • Main page:
    • mission / scope
    • Properties (for different classes; property templates)
    • Participants
    • Navigation box:
      • Wikidata properties related to bibliographic metadata
    • Related links
    • Tools (link to separate page) + Search items without claims (?)
  • Talk page:
    • Ontology discussion
    • Abbreviations
    • Data imports
    • Examples of data aggregations and visualizations (lists & graphics based on queries)

Visual Arts

  • Home:
    • Goals
    • Participants
    • Related projects
    • Interesting resources
    • Boring tasks to be completed
  • Item structure
    • Properties (for different classes; property templates and explanations; custom-made tables to represent ontology/vocabularies)
  • Questions
    • (in place of a talk page)
  • Maintenance
    • Property cleanup (constraint violations)
  • Tools

Sum of all paintings

  • Main page
    • background / scope
    • Ways to participate by contributing metadata
    • Work to be done
    • Data structure
    • Ways to contribute
    • Participants
    • Use cases (examples how the data is being used)
    • To-do list
  • Talk page
    • data modelling issues
    • maintenance (missing inventory numbers - lists)

Heraldry

  • Main page
    • Mission/scope
    • Properties/qualifiers that need to be created
  • Talk page
    • Show case item
    • Modelling / description issues

Built heritage (see also Connected Open Heritage Project as well as WLM database project)

  • Main page
    • Remark: Scope and functioning of the project currently under debate
    •  Goal
    •  Ingesting the Monuments Database, todos
    •  Participants
  • Talk page
    • international coordination
    • data modelling for the different countries / preparation in view of data ingestion
    • Template: Cultural heritage properties (list of identifiers) → scope? (“cultural heritage” or “built heritage”?)
    • Data modelling issues
    • Calls for help / contributions

See also: WikiProject WLM

  • Main page
    • Goal: Wikidatafy WLM
    • Link to country projects
  • Talk page
    • Country specificities

Lighthouses

  • Main page
    • scope
    • pointer to AutoEdit gadget
    • Properties (property templates and explanations)
    • Missing properties
    • Open modelling questions
    • Links to Wikipedia sources
    • List of Wikipedia categories
    • pointer to PetScan tool
    • Tasks
    • Mapping of infoboxes parameters
    • Participants
    • Links
  • Talk page
    • Modelling: separate lighthouses and islands? (problem: interwiki links don’t necessarily always link pages about exactly the same concept)

Film

  • Home
    • Goal / scope
    • Participants
    • Reference to bot
    • Links to other projects
    • Sample items
  • Properties / Structure:
    • Core properties (property templates)
    • Main properties (dito…)
    • Other properties
    • Main qualifiers
    • Awards and ratings
    • External identifiers
    • Conventions regarding labels and descriptions
    • Category:Properties in a WikiProject
    • Talk page:
      • to do
      • property discussions
      • ...
  • New items:
    • Listeria list: 500 most recent items for films
  • Tools / Maintenance reports:
    • Toolbox
    • Useful queries (missing)
    • Wikipedia infobox mapping
    • Maintenance reports (constraint reports, top missing properties, …)
    • Tasks
  • New films
    • Listeria list:
  • Statistics (manually updated, Dec. 2015)
    • Key indicators
    • Number of films with a given property
    • Films by number of sitelinks
    • Films with max sitelinks by year
    • number of properties
    • various other lists
  • Reference lists / Infobox
    • Wikidata:WikiProject Movies/lists (subpage with various lists of movies)
    • Queries
    • Infoboxes making use of data from Wikidata
  • Main Talk page
    • Modelling issues
    • Bot imports

Music

  • Main page
    • aim, scope
    • properties (property templates)
    • Current tasks
    • Participants
    • Related links
    • Tools
    • Navigation boxes (authority control properties)
  • Talk page:
    • scope / create an item for each song from each album of notable musicians? -- notability
    • ontology / modelling issues
    • mapping Wikidata - schema.org
    • opera: modelling scenes and acts

Broadcasting

  • Main page
    • Goal / scope
    • Terminology
    • Todo

Video games

  • Main page
    •  Properties
    •  Reproduction of en:wp category listings
    •  Links to WP embassies / related Wikiprojects on Wikipedia
    •  Members
    •  Template for own user page
  •  Talk page
    •  Modeling issues

Software

  • Main page:
    • Related WikiProjects
    • Links to infoboxes
    • Properties (property templates plus explanations
    • Queries
    • Reference ontologies
    • Participants
  • Talk page:
    • modelling issues

Websites

  • Main page:
    • aim/scope
  • Participants (one)

Theatre

  • Home (Welcome; Participants)
  • Properties (Different classes; property templates)
  • Todo (data imports from performing arts schools; under the label “people and organizations”)
  • Queries (focus on: drama schools)
  • Talk page


Annex 2: Proposed Structure and Content of WikiProject Pages[edit]

Below is the proposed structure for WikiProject Pages, based both on an analysis of existing WikiProjects related to cultural heritage data as well as on the concrete case related to the ingestion of data about heritage institutions.

Home (with talk page)

  • Background / project aim / project scope
  • Project history
  • Ways to contribute
  • Participants
  • Links to related Wikipedia WikiProjects
  • Related Links
  • Navigation boxes

Data Structure (with talk page)

  • List of properties (for different classes; sorted by relevance; based on property templates with explanations; custom-made tables to represent ontology/vocabularies)
  • Vocabularies
  • Reference ontologies
  • Mapping to reference ontologies
  • Mapping to infobox templates

Typology

  • Main typologies used in the area covered (e.g. heritage institutions)

This section may be included in the “Data Structure” section if it is not too large.

Data Sources

  • List of sources to import data from
  • Mapping from data sources to data structure

Use Cases

  • Various Listeria Lists
  • List of infobox templates using data from Wikidata

Tools & Tasks

  • Tasks
  • Tools
  • Maintenance lists etc.