Wikidata:Pywikibot - Python 3 Tutorial/Changing Items

From Wikidata
Jump to navigation Jump to search

In this chapter we will look at, how we can fix statements that link to the wrong item. That means looking for statements that use a property with a mistankenly used item and correcting the link to the right item.

This is a really common type of mistake on Wikidata, because people choose or accidently click on the wrong item. The mistankenly used item usually has a similar label. Examples are links using color (P462) to link to orange (Q13191) (the fruit) instead of orange (Q39338). Both words are similar in many languages.

Danger Zone[edit]

This example edits Wikidata-proper (Not test-wikidata). Running it will most likely not cause too many edits. It is your responsibility to check on those edits. You don't need a bot-flag because the total edits will probably be smaller than 5 and waiting ~ 8 seconds between edits won't matter. Be careful when customizing the example. It is probably a good idea to make test edits on test.wikidata first.

Example[edit]

Staying with this example of non-colour items used as colours, we will first investigate the mistakenly used item on the following page:

At the time of writing the following list of mistakenly used items was found, and placed together with the correct items in a dictionary:

error_dict = {"Q13191": "Q39338",      #orange - "fruit": "color"
              "Q897": "Q208045",       #gold - "element": "color"
              "Q753": "Q2722041",      #copper - "element": "color"
              "Q25381": "Q679355",     #amber - "material": "color"
              "Q134862": "Q5069879",   #champagne - "drink": "color"
              "Q1090": "Q317802",      #silver - "element": "color"
              "Q1173": "Q797446",      #burgundy - "region": "color
              "Q13411121": "Q5148721", #peach - "fruit": "color"
              }

If you speak a European language, the errors will be immediately apparent to you. But the chances are high that some mistakes will only be understood in the context of another language (In 300 + languages and dialects the chances of similar words are very high).

Now let us write the code that will find the items with the wrong statements. We will use a query so we only iterate over those items that really contain errors:

import pywikibot
from pywikibot import pagegenerators as pg

site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
property = "P462"
error_dict = {"Q13191": "Q39338",      #orange - "fruit": "color"
              "Q897": "Q208045",       #gold - "element": "color"
              "Q753": "Q2722041",      #copper - "element": "color"
              "Q25381": "Q679355",     #amber - "material": "color"
              "Q134862": "Q5069879",   #champagne - "drink": "color"
              "Q1090": "Q317802",      #silver - "element": "color"
              "Q1173": "Q797446",      #burgundy - "region": "color
              "Q13411121": "Q5148721", #peach - "fruit": "color"
              }

for key in error_dict:
    wdq = 'SELECT DISTINCT ?item WHERE {{ ?item p:{0} ?statement0. ?statement0 (ps:{0}) wd:{1}. }} LIMIT 5'.format(property, key)
    generator = pg.WikidataSPARQLPageGenerator(wdq, site=site)

The example currently doesn't do anything. All we are doing is loading the site of Wikidata proper. Loading the data_repository() for later usage. We define the property we are running corrections for and the dictionary containing the wrong and correct items. We then iterate over the dictionary: For each key in the dictionary we create a string containing the query. The first query will look like this: ?item wdt:P462 wd:Q13191 and will call all items that contain the statement color (P462) = orange (Q13191). This string is used to load a generator and we add another line to preload the pages (5 at a time). The last line is not really required, because each query will return far less than 5 items.

All we still have to do is add a function that replaces the false claim with the correct one. The full example can look something like this:

import pywikibot
from pywikibot import pagegenerators as pg

site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
property = "P462"
error_dict = {"Q13191": "Q39338",      #orange - "fruit": "color"
              "Q897": "Q208045",       #gold - "element": "color"
              "Q753": "Q2722041",      #copper - "element": "color"
              "Q25381": "Q679355",     #amber - "material": "color"
              "Q134862": "Q5069879",   #champagne - "drink": "color"
              "Q1090": "Q317802",      #silver - "element": "color"
              "Q1173": "Q797446",      #burgundy - "region": "color
              "Q13411121": "Q5148721", #peach - "fruit": "color"
              }

def correct_claim(generator, key):
    for page in generator:
        item_dict = page.get()
        claim_list = item_dict["claims"][property]
        for claim in claim_list:
            trgt = claim.getTarget()
            if trgt.id == key:
                print("Correcting {} to {}".format(key, error_dict[key]))
                correct_page = pywikibot.ItemPage(repo, error_dict[key], 0)
                claim.changeTarget(correct_page)

for key in error_dict:
    wdq = 'SELECT DISTINCT ?item WHERE {{ ?item p:{0} ?statement0. ?statement0 (ps:{0}) wd:{1}. }} LIMIT 5'.format(property, key)
    generator = pg.WikidataSPARQLPageGenerator(wdq, site=site)
    correct_claim(generator, key)

As you can see we load each page of the generator and then the item dictionary. We then unpack the claims and then the P462-claims. In that list of claims we look for the claim that links to the wrong item, and then replaces (.changeTarget()) the wrong item with the correct item.

This is all we need to do, to automate a relatively boring Wikidata maintenance task. Be sure to look at other properties where similar mistakes occur and customize the script to fix those problems.