Wikidata:Requests for permissions/Bot/OpenAlexBot

From Wikidata
Jump to navigation Jump to search

OpenAlexBot[edit]

OpenAlexBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: So9q (talkcontribslogs)

Task/s: Import new scientific items from OpenAlex.

Code: https://github.com/dpriskorn/OpenAlexBot

Function details: Given a list of DOIs

  • OpenAlexBot looks up the DOI in OpenAlex.
  • If found it looks it up in Wikidata
  • If not found, it imports the DOI as a new item.

It supports importing all these properties:

   AUTHOR = "P50"
   AUTHOR_NAME_STRING = "P2093"
   CITES_WORK = "P2860"
   DOI = "P356"
   INSTANCE_OF = "P31"
   ISSUE = "P433"
   MAIN_SUBJECT = "P921"
   OPENALEX_ID = "P10283"
   PAGES = "P304"
   PMID = "P698"
   PUBLICATION_DATE = "P577"
   PUBLISHED_IN = "P1433"
   SERIES_ORDINAL = "P1545"  # aka author position
   STATED_AS = "P1932"
   TITLE = "P1476"
   VOLUME = "P478"

It does not support: Whether a paper is retracted and LANGUAGE_OF_WORK = "P407" (information missing)

It can import an item in a few seconds and OpenAlex have about 60M items in total.

Most are still missing in WD so this bot could double the number of scientific papers in Wikidata. Because of the concerns surrounding the backend of WDQS the author suggests that the community decides which speed the import should be allowed to happen.

If you approve this bot please specify the speed e.g. 100 items/day (= 365.000 items a year) or 1000 items/day (= 3.650.000 items a year).

Caveats:

  • Currently, it tries to detect the language of titles using langdetect. This unfortunately does not always work well because the title is too short to give a correct result. If anyone has a solution, I would be happy to change that. E.g. we could get the language from another API or use "mul" as language code if that is available?

--So9q (talk) 11:09, 18 February 2022 (UTC)[reply]

OpenAlex appears to be a an aggregator, wouldn't it be better to import the data from its primary sources? Also all of its sources isn't compatible with CC0. Abbe98 (talk) 12:03, 24 February 2022 (UTC)[reply]
This is a valid point. However, that would probably require more programming effort, and the concepts found in OpenAlex are unique for this source AFAIK.
Do we have a bot currently importing from Crossref? If not, I can write one using this bot as a skeleton.
@Csisc would you be willing to propose a new task for @OpenCitations Bot to import new items also now that WMF have a disaster plan? So9q (talk) 10:05, 25 February 2022 (UTC)[reply]
@So9q: Probably not. Let us see the discussion on solving this matter. --Csisc (talk) 10:28, 26 February 2022 (UTC)[reply]