Wikidata:Requests for permissions/Bot/Identifier sync bot
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 20:04, 20 December 2022 (UTC)[reply]
Identifier sync bot[edit]
Identifier sync bot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Harej (talk • contribs • logs)
Task/s: Add database identifiers to items, mainly journal article items.
Code: GitHub
Function details: Identifier sync bot updates Wikidata with identifiers by mapping bibliographic records across databases. The bot uses identifiers already on the page and compares to records in other databases to find other identifiers. Typically this is with a common cross-database identifier like DOI, but in the case of Fatcat (Q59296908), many of their records are already directly mapped to Wikidata. Each added statement comes with a citation describing where the bot found the additional identifier and how it arrived at the conclusion to make the edit. You can see this in practice with this citationgraph bot edit. My first two sources for this bot are fatcat and OpenAlex (Q107507571). --Harej (talk) 02:57, 22 November 2022 (UTC)[reply]
- Comment I read the code and it looks good to me. This will improve the scientific article items already in Wikidata but not add new ones. It will result in millions of new added statements linking the graph better together. Given the yet unsolved backend scaling issues in Wikidata I suggest we allow this bot only with a throttle on number of edits per hour, so the growth will be slower than otherwise and give the WMF a little more time to fix the problems.
- I suggest 100 edits/hour which translates to 2400 edits a day. Every edit can add a mean of 40 cites work statements each consisting of a few triples. This means up to 2400*40*3 new triples/day. Could you make some test edits we can look at? (Full disclosure: We are colleagues at Turn All References Blue (Q115136754))—-So9q (talk) 06:02, 22 November 2022 (UTC)[reply]
- This bot isn't adding "cites work" statements. Really the bot won't be adding that much to items and it should be able to operate within the rules all other bots follow. Harej (talk) 15:53, 22 November 2022 (UTC)[reply]
- I have made these three test edits: [1] [2] [3]. Harej (talk) 19:11, 22 November 2022 (UTC)[reply]
- This seems fine. You are taking into account that DOI's in Wikidata are upper-cased (and DOI and possibly some other relevant identifiers are case-insensitive)? ArthurPSmith (talk) 20:22, 22 November 2022 (UTC)[reply]
- ArthurPSmith, I have accounted for DOIs being uppercase on Wikidata, and in general I have checked each type of identifier in the source against its equivalent Wikidata property and applied string transformations where appropriate. Harej (talk) 20:37, 22 November 2022 (UTC)[reply]
- I am going to approve the bot in a couple of days provided no objections have been made.--Ymblanter (talk) 20:09, 18 December 2022 (UTC)[reply]