Wikidata:Requests for permissions/Bot/ADSBot English Paper
From Wikidata
Jump to navigation
Jump to search
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 21:17, 22 July 2022 (UTC)[reply]
ADSEnglishBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Feliciss (talk • contribs • logs)
Task/s: Importing scholarly articles from ADS database to Wikidata, by creating Wikidata Item of a scholarly article (optionally author items) and adding statements and statements-related properties to the item. Part of Outreachy Round 24.
Code: import_papers_from_ads.py
Function details: --Feliciss (talk) 11:16, 15 July 2022 (UTC)[reply]
- Search all surnames in English from Wikidata. There are about 7,000 surnames in English on Wikidata.
- Use surnames as keys in first_author to find papers in the ADS database where the person who shares the same surname is in the first place of the paper.
- Extract DOI information from the paper and try to find an item on Wikidata
- If a DOI exists in ADS, and
- There's no article containing that DOI on Wikidata.
- Create an item of a title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.
- There's an article containing that DOI on Wikidata.
- ADSBot English Statement will handle this situation.
- There's no article containing that DOI on Wikidata.
- If a DOI isn't available in ADS
- Extract ADS bibcode information from the paper and try to find an item on Wikidata.
- If there's an article containing that ADS bibcode on Wikidata
- ADSBot English Statement will handle this situation.
- If there's no such ADS bibcode on Wikidata
- Create an item of a title in the scholarly article (optionally author items) on Wikidata and add statement and statement-related properties to the item.
- If there's an article containing that ADS bibcode on Wikidata
- Extract ADS bibcode information from the paper and try to find an item on Wikidata.
- If a DOI exists in ADS, and
Trials:
Examples:
Notes:
- For those who are curious about what statements will be added to Wikidata from the ADS database, there's an item listing that: https://www.wikidata.org/wiki/Q112684896
- There're about 47 DOIs of 50 articles in the ADS database, assuming the DOI-in-articles ratio. The title exists in every paper in the ADS database.
- Original thoughts come from Pathway 1 on a diagram drafting on Wikimedia Phabricator if anyone's interested: https://phab.wmfusercontent.org/file/data/lnlj5477majaglrd4eas/PHID-FILE-gidyiuwdukmtjap42zgi/Approach_to_Surnames_%282%29.png
- This bot runs regularly in the case a new surname is added to Wikidata.
- From my estimation, there will be approximately 290,000 articles that will be added to Wikidata from this bot run. 290,000 comes from 6994 (surnames in English on Wikidata) * 5 (estimated authors share the same surname) * 10 (estimated average paper per author) * (1 - (13300000 / 749103 / 100)) (percentage of non-existent articles of ADS on Wikidata, 13300000, total articles in ADS and 749103, articles with a value of ADS bibcode on Wikidata)
- Trials are meant for unsuccessful edits, while Examples show successful import.
- Hi Feliciss - ADS is a good source for this information, so I'm glad to see somebody working on it. However, I don't understand your estimate of the number of articles. Why would it not be roughly the total number of non-existent articles in ADS (i.e. 13300000 - 749103 or about 12.5 million)? Are there so many that have first-author surnames that are somehow not in our list? ArthurPSmith (talk) 16:17, 15 July 2022 (UTC)[reply]
- Hi. To answer your concerns.
- 1. Why would it not be roughly the total number of non-existent articles in ADS (i.e. 13300000 - 749103 or about 12.5 million)? - Because this bot is focusing on English surnames for the first author thus English papers, so it would be not enough to considering the amount of all non-existent articles in ADS and adding them to Wikidata. It's just a small portion of all non-existent articles that's in English.
- 2. Are there so many that have first-author surnames that are somehow not in our list? - Definitely yes. If you can look at https://github.com/philipperemy/name-dataset, there're 983K last names from a third source database, while there're only 501607 (501K) results of surnames on Wikidata at the moment. Feliciss (talk) 09:01, 18 July 2022 (UTC)[reply]
- I've committed the up-to-dated code today and I suppose there should be no error and unsuccessful edits for this bot anymore.
- @ArthurPSmith Can you review this bot request? Feliciss (talk) 11:25, 21 July 2022 (UTC)[reply]
- Support. Feliciss is an Outreachy intern, being mentored by User:Mike Peel and me. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:46, 17 July 2022 (UTC)[reply]
- Support Ok, I guess I don't really get the "English surname" thing, but adding these papers would definitely be useful, and the example items look fine, so yes let's approve the bot. ArthurPSmith (talk) 12:54, 21 July 2022 (UTC)[reply]
- @Feliciss: Please register the bot--Ymblanter (talk) 21:18, 22 July 2022 (UTC)[reply]
- I flagged ADSEnglishBot--Ymblanter (talk) 19:32, 25 July 2022 (UTC)[reply]
- I've updated the username in the bot request, to avoid further confusion. Thanks. Mike Peel (talk) 14:38, 4 August 2022 (UTC)[reply]
- I flagged ADSEnglishBot--Ymblanter (talk) 19:32, 25 July 2022 (UTC)[reply]