Wikidata:Pywikibot - Python 3 Tutorial/Changing Items
In this chapter we will look at, how we can fix statements that link to the wrong item. That means looking for statements that use a property with a mistankenly used item and correcting the link to the right item.
This is a really common type of mistake on Wikidata, because people choose or accidently click on the wrong item. The mistankenly used item usually has a similar label. Examples are links using color (P462) to link to orange (Q13191) (the fruit) instead of orange (Q39338). Both words are similar in many languages.
Danger Zone[edit]
This example edits Wikidata-proper (Not test-wikidata). Running it will most likely not cause too many edits. It is your responsibility to check on those edits. You don't need a bot-flag because the total edits will probably be smaller than 5 and waiting ~ 8 seconds between edits won't matter. Be careful when customizing the example. It is probably a good idea to make test edits on test.wikidata first.
Example[edit]
Staying with this example of non-colour items used as colours, we will first investigate the mistakenly used item on the following page:
At the time of writing the following list of mistakenly used items was found, and placed together with the correct items in a dictionary:
error_dict = {"Q13191": "Q39338", #orange - "fruit": "color"
"Q897": "Q208045", #gold - "element": "color"
"Q753": "Q2722041", #copper - "element": "color"
"Q25381": "Q679355", #amber - "material": "color"
"Q134862": "Q5069879", #champagne - "drink": "color"
"Q1090": "Q317802", #silver - "element": "color"
"Q1173": "Q797446", #burgundy - "region": "color
"Q13411121": "Q5148721", #peach - "fruit": "color"
}
If you speak a European language, the errors will be immediately apparent to you. But the chances are high that some mistakes will only be understood in the context of another language (In 300 + languages and dialects the chances of similar words are very high).
Now let us write the code that will find the items with the wrong statements. We will use a query so we only iterate over those items that really contain errors:
import pywikibot
from pywikibot import pagegenerators as pg
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
property = "P462"
error_dict = {"Q13191": "Q39338", #orange - "fruit": "color"
"Q897": "Q208045", #gold - "element": "color"
"Q753": "Q2722041", #copper - "element": "color"
"Q25381": "Q679355", #amber - "material": "color"
"Q134862": "Q5069879", #champagne - "drink": "color"
"Q1090": "Q317802", #silver - "element": "color"
"Q1173": "Q797446", #burgundy - "region": "color
"Q13411121": "Q5148721", #peach - "fruit": "color"
}
for key in error_dict:
wdq = 'SELECT DISTINCT ?item WHERE {{ ?item p:{0} ?statement0. ?statement0 (ps:{0}) wd:{1}. }} LIMIT 5'.format(property, key)
generator = pg.WikidataSPARQLPageGenerator(wdq, site=site)
The example currently doesn't do anything. All we are doing is loading the site of Wikidata proper. Loading the data_repository()
for later usage. We define the property we are running corrections for and the dictionary containing the wrong and correct items. We then iterate over the dictionary: For each key in the dictionary we create a string containing the query. The first query will look like this: ?item wdt:P462 wd:Q13191
and will call all items that contain the statement color (P462) = orange (Q13191). This string is used to load a generator and we add another line to preload the pages (5 at a time). The last line is not really required, because each query will return far less than 5 items.
All we still have to do is add a function that replaces the false claim with the correct one. The full example can look something like this:
import pywikibot
from pywikibot import pagegenerators as pg
site = pywikibot.Site("wikidata", "wikidata")
repo = site.data_repository()
property = "P462"
error_dict = {"Q13191": "Q39338", #orange - "fruit": "color"
"Q897": "Q208045", #gold - "element": "color"
"Q753": "Q2722041", #copper - "element": "color"
"Q25381": "Q679355", #amber - "material": "color"
"Q134862": "Q5069879", #champagne - "drink": "color"
"Q1090": "Q317802", #silver - "element": "color"
"Q1173": "Q797446", #burgundy - "region": "color
"Q13411121": "Q5148721", #peach - "fruit": "color"
}
def correct_claim(generator, key):
for page in generator:
item_dict = page.get()
claim_list = item_dict["claims"][property]
for claim in claim_list:
trgt = claim.getTarget()
if trgt.id == key:
print("Correcting {} to {}".format(key, error_dict[key]))
correct_page = pywikibot.ItemPage(repo, error_dict[key], 0)
claim.changeTarget(correct_page)
for key in error_dict:
wdq = 'SELECT DISTINCT ?item WHERE {{ ?item p:{0} ?statement0. ?statement0 (ps:{0}) wd:{1}. }} LIMIT 5'.format(property, key)
generator = pg.WikidataSPARQLPageGenerator(wdq, site=site)
correct_claim(generator, key)
As you can see we load each page of the generator and then the item dictionary. We then unpack the claims and then the P462-claims. In that list of claims we look for the claim that links to the wrong item, and then replaces (.changeTarget()
) the wrong item with the correct item.
This is all we need to do, to automate a relatively boring Wikidata maintenance task. Be sure to look at other properties where similar mistakes occur and customize the script to fix those problems.