Wikidata:Mismatch Finder

From Wikidata
Jump to navigation Jump to search
Mismatch Finder
Tackling mismatching data between Wikidata and external databases.

Video giving an introduction to the Mismatch Finder data quality tool.

Mismatch Finder is a tool, developed by Wikimedia Deutschland, that helps editors work on mismatches between Wikidata's data and other databases/websites. The tool stores mismatching data between Wikidata and external databases, then presents it to editors to review and fix. It can also be used to suggest new statements that are missing in Wikidata but need a human-review step before adding them.

What is the purpose of Mismatch Finder?

The tool helps highlight differences in the data between Wikidata and other databases, in order to improve data quality in the open linked data ecosystem. The tool itself doesn’t browse databases automatically: it is necessary to upload a list of possible mismatches first, so they can be analyzed and processed by Wikidata editors. With providing such a tool, we hope to support the Wikidata editors to spot and fix mistakes in Wikidata as well as organizations reusing Wikidata’s data, who now have a convenient way to contribute back by reporting lists of possible mismatches as well as suggestions for new statements.

Who should use this tool?

This tool is primarily made for:

  • Editors, who care about data quality and want another tool to help work through potential issues in Wikidata's data
  • Data re-users and researchers who are finding issues in Wikidata's data and want a meaningful way to give back to Wikidata by providing lists of potential data quality issues

How does this tool work?

At its core, this system has a store for mismatches. Different people and organizations can load mismatches they find into this system. Various tools can then get mismatches from the system to help editors resolve them.

Sources for the mismatches can be many. Sources for mismatches may include categories on Wikipedia that indicate a mismatch between the local value on that Wikipedia and the corresponding value on Wikidata. Research organizations as well as large data re-users can also contribute mismatches they find in their internal processes when doing quality assurance on Wikidata’s data. Read more about how you can provide mismatches.

One of the tools that will process the mismatches is the Mismatch website.

Simply put, the system's workflow is:

  • Get mismatching data.
  • Add the mismatches to a mismatches store via an API.
  • Allow the user to look for mismatches on multiple Items through the mismatches website.
  • Display all the mismatches between Wikidata and the external databases in a table.
  • Provide the user with the links to the statement on Wikidata and the other database. There the user can have a closer look, find out the nature of the mismatch and (hopefully) resolve it.

Editors who want to help review mismatches may install a gadget that shows mismatches directly on an Item page.