Wikidata:Requests for permissions/Bot/OpenAlexBot
OpenAlexBot[edit]
OpenAlexBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: So9q (talk • contribs • logs)
Task/s: Import new scientific items from OpenAlex.
Code: https://github.com/dpriskorn/OpenAlexBot
Function details: Given a list of DOIs
- OpenAlexBot looks up the DOI in OpenAlex.
- If found it looks it up in Wikidata
- If not found, it imports the DOI as a new item.
It supports importing all these properties:
AUTHOR = "P50" AUTHOR_NAME_STRING = "P2093" CITES_WORK = "P2860" DOI = "P356" INSTANCE_OF = "P31" ISSUE = "P433" MAIN_SUBJECT = "P921" OPENALEX_ID = "P10283" PAGES = "P304" PMID = "P698" PUBLICATION_DATE = "P577" PUBLISHED_IN = "P1433" SERIES_ORDINAL = "P1545" # aka author position STATED_AS = "P1932" TITLE = "P1476" VOLUME = "P478"
It does not support: Whether a paper is retracted and LANGUAGE_OF_WORK = "P407" (information missing)
It can import an item in a few seconds and OpenAlex have about 60M items in total.
Most are still missing in WD so this bot could double the number of scientific papers in Wikidata. Because of the concerns surrounding the backend of WDQS the author suggests that the community decides which speed the import should be allowed to happen.
If you approve this bot please specify the speed e.g. 100 items/day (= 365.000 items a year) or 1000 items/day (= 3.650.000 items a year).
Caveats:
- Currently, it tries to detect the language of titles using langdetect. This unfortunately does not always work well because the title is too short to give a correct result. If anyone has a solution, I would be happy to change that. E.g. we could get the language from another API or use "mul" as language code if that is available?
--So9q (talk) 11:09, 18 February 2022 (UTC)
- OpenAlex appears to be a an aggregator, wouldn't it be better to import the data from its primary sources? Also all of its sources isn't compatible with CC0. Abbe98 (talk) 12:03, 24 February 2022 (UTC)
- This is a valid point. However, that would probably require more programming effort, and the concepts found in OpenAlex are unique for this source AFAIK.
- Do we have a bot currently importing from Crossref? If not, I can write one using this bot as a skeleton.
- @Csisc would you be willing to propose a new task for @OpenCitations Bot to import new items also now that WMF have a disaster plan? So9q (talk) 10:05, 25 February 2022 (UTC)
- @So9q: Probably not. Let us see the discussion on solving this matter. --Csisc (talk) 10:28, 26 February 2022 (UTC)