Wikidata:Mismatch Finder

From Wikidata
Jump to navigation Jump to search
This page is a translated version of the page Wikidata:Mismatch Finder and the translation is 17% complete.
Outdated translations are marked like this.
Mismatch Finder
Tackling mismatching data between Wikidata and external databases.


Video giving an introduction to the Mismatch Finder data quality tool.

The Wikidata development team is working on a tool to perform checks on mismatching data between Wikidata and external databases.

What is the purpose of Mismatch Finder?

The tool helps highlight differences in the data between Wikidata and other databases, in order to improve data quality in the open linked data ecosystem. The tool itself doesn’t browse databases automatically: it is necessary to upload a list of possible mismatches first, so they can be analyzed and processed by Wikidata editors. With providing such a tool, we hope to support the Wikidata editors to spot and fix mistakes in Wikidata as well as organizations reusing Wikidata’s data, who now have a convenient way to contribute back by reporting lists of possible mismatches as well as suggestions for new statements.

Who are we doing this for?

This tool is primarily made for:

  • Editors, who care about data quality and want another tool to help work through potential issues in Wikidata's data
  • Data re-users and researchers who are finding issues in Wikidata's data and want a meaningful way to give back to Wikidata by providing lists of potential data quality issues

Big-picture solution

We will build a system. At its core, this system will have a store for mismatches. Different people and organisations can load mismatches they find into this system. Various tools can then get mismatches from the system to help editors resolve them.

Sources for the mismatches can be many. We will start with mismatches that we found as part of previous work to find references for statements lacking references (aka Reference Treasure Hunt). In the future, categories on Wikipedia that indicate a mismatch between the local value on that Wikipedia and the corresponding value on Wikidata could also be possible. Various research organisations as well as large data re-users could also contribute mismatches they find in their internal processes when doing quality assurance on Wikidata’s data.

One of the tools that will process the mismatches is the mismatch website.

Simply put, the solution is to

  • Get mismatching data from the reference treasure hunt byproducts.
  • Add the mismatches to a mismatches store via an API.
  • Allow the users to look for mismatches on multiple Items through the mismatches website (on toolforge.org).
  • Display all the mismatches between Wikidata and the external databases in a table.
  • Provide the users with the links to the statement on Wikidata and the other database. There the users can have a closer look, find out the nature of the mismatch and (hopefully) resolve it.

Editors who want to help review mismatches may install a gadget that shows mismatches directly on an Item page.