Wikidata talk:WikiProject Source MetaData
![]() Archives | ||||||
---|---|---|---|---|---|---|
| ||||||
Importing from OpenAlex[edit]
Hi. I just posted this in the Wikidata chat in Telegram:
Related to the issues concerning BlazeGraph there is a new thread here https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Source_MetaData#Chaotic_end_to_the_activities_of_WikiCite_and_WikiProject_Source_MetaData I participated in the WDQS scaling community call yesterday and I invite anyone interested to join the next call. Meanwhile I'm continuing my work on my new bot with the goal of importing 20M+ articles into Wikidata from OpenAlex now that we have a disaster plan and don't have to make fear-based decisions. If BG breaks, WMF simply cuts out the scientific articles from WDQS according to the plan. Anyone can set up a Wikibase and import a part of Wikidata and make it possible to make SPARQL queries on the scientific items and I predict someone will do it within a month from the disaster plan is executed. I will post the request for botflag here once it is ready.
The code is here https://github.com/dpriskorn/OpenAlexBot --So9q (talk) 08:09, 18 February 2022 (UTC)
- Here is the request https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot/OpenAlexBot--So9q (talk) 15:00, 23 February 2022 (UTC)
- @So9q: It looks like you are lower-casing DOI's instead of upper-casing them? All DOI's in Wikidata right now are upper-case, and you will not find matches with WDQS (or, I think, haswbstatement) if you have the wrong case. ArthurPSmith (talk) 17:30, 24 February 2022 (UTC)
University adding portraits[edit]
This project collects a lot of academic publications, and because of that makes structured data for authors. Until now I do not think we have an example of an organization which has tried to give us an image collection of their researchers, but here is one -
- de:Wikipedia:WikiProjekt ETH-Portraits
- en:ETH Library in Zurich
- ETH Library (Q684773)
- https://twitter.com/niggegraf/status/1500911202364444679
Bluerasberry (talk) 20:19, 7 March 2022 (UTC)
Wikidata software profiling hackathon, June 6&8[edit]
Those interested in software + Wikidata are invited to the Scholia Hackathon 6&8 June 2022.
WD:Scholia is a Wikidata front end which does scholarly profiling, and is best known as tool for browsing the WikiCite collection of WD:WikiProject Source Metadata.
An example Scholia profile for the software Stata (Q1204300) is
Anyone interested in examining any part of Wikidata connecting to software is welcome. Bluerasberry (talk) 20:40, 19 May 2022 (UTC)
Suggestions on adding affiliation string to author names[edit]
Hi there. I used my automated tool to create a scholarly article item that added affiliation strings to each author from ADS database to Wikidata. Link is here: https://www.wikidata.org/wiki/Q113322652 Would adding affiliation string to author or author name string be useful? I'd like to hear advise and your suggestion. Feliciss (talk) 12:29, 28 July 2022 (UTC)
- @Feliciss: Yes, adding these would be useful, but I would prefer they be exactly as in the article, not parsed/edited. For your example, the Stanford affiliation in the article is listed as "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A.", so I would think that should be the string used here? ArthurPSmith (talk) 20:36, 28 July 2022 (UTC)
- @ArthurPSmith Can you link to where the string "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A." comes from? In my case, it's exactly as same as what we see in the article. I get the whole affiliation strings for each author from https://ui.adsabs.harvard.edu/abs/1982CMAME..32..199B/abstract. You can view the affiliations of each author by clicking "Show affiliations". Feliciss (talk) 07:28, 29 July 2022 (UTC)
- I see. It's from DOI in the article. Since my bot only get affiliation strings from ADS, it's not possible (or does not make sense) to get the affiliation strings twice from the DOI in the article. Feliciss (talk) 07:37, 29 July 2022 (UTC)
- Ok, you are adding the reference to the ADS bibcode there so I guess that's fine. Obviously ADS is doing some parsing of affiliations but I think they're pretty reliable about that so this is ok. ArthurPSmith (talk) 17:11, 29 July 2022 (UTC)
- Another example: https://www.wikidata.org/wiki/Q113380669
- I think ADS is parsing some but not all affiliation strings. Feliciss (talk) 08:27, 2 August 2022 (UTC)
- Ok, you are adding the reference to the ADS bibcode there so I guess that's fine. Obviously ADS is doing some parsing of affiliations but I think they're pretty reliable about that so this is ok. ArthurPSmith (talk) 17:11, 29 July 2022 (UTC)
- I see. It's from DOI in the article. Since my bot only get affiliation strings from ADS, it's not possible (or does not make sense) to get the affiliation strings twice from the DOI in the article. Feliciss (talk) 07:37, 29 July 2022 (UTC)
- @ArthurPSmith Can you link to where the string "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A." comes from? In my case, it's exactly as same as what we see in the article. I get the whole affiliation strings for each author from https://ui.adsabs.harvard.edu/abs/1982CMAME..32..199B/abstract. You can view the affiliations of each author by clicking "Show affiliations". Feliciss (talk) 07:28, 29 July 2022 (UTC)
Who is the author "JC Shakespeare"?[edit]
A cautionary tale: https://shkspr.mobi/blog/2022/08/who-is-the-author-jc-shakespeare/ - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:28, 20 August 2022 (UTC)
Reports published by policy and research organisations, can they be considered generally reliable?[edit]
- also posted at w:WP:Village pump (policy) since it’s relevant there too
I’m looking for opinions on institutional policy and research reports in general as reliable sources as part of the WikiProject Policy Reports project. The example source types on WP:RS (scholarship, news, vendor etc) don’t quite cover our area of interest: reports, conference papers, discussion and briefing papers, strategies, policies and other docs (sometimes called grey literature). These are generally self-published by organisations (e.g. the WHO publishes WHO reports) but it’s obviously not the same as someone’s self-published blog or book.
I realise that for specific citations in WP it’s case-by-case. However, we’re looking for some guidance on what principles or criteria we could use to prioritise/sort organisations into 1) Generally reliable / 2) unclear / 3) generally unreliable since these sorts of items are likely often useful as potential WP sources in addition to books/journals/newspapers. As part of the project we’re looking to prioritise which organisations’ reports are most useful to upload metadata to Wikidata about. If general principles aren’t really possible, it’d be helpful to have some examples to calibrate on e.g. these five organisations:
- The Australia Institute is an independent public policy think tank based in Canberra, Australia that carries out research on a broad range of economic, social, and environmental issues (APO-listed reports)
- Australian Institute of Health and Welfare (AIHW) is Australia's national agency for information and statistics on Australia's health and welfare (APO-listed reports).
- Ministry of Business, Innovation and Employment is the New Zealand government department responsible for contributing to economic productivity and growth (APO-listed reports).
- Lowitja Institute is is a national research centre focusing on Aboriginal and Torres Strait Islander health and wellbeing (APO-listed reports).
- Australian Council of Social Service (ACOSS) is the peak body for the community services sector in Australia and advocates for action to reduce poverty and inequality (APO-listed reports).
Thanks in advance for the feedback on these! We’ve >70 publishing organisations that we’re focusing on so these will help us calibrate which sorts of organisations are worth focusing on uploading metadata to Wikidata. If anyone has an interest in the full list, please let me know and I can loop you in on the full project. Brigid vW (talk) 07:06, 28 September 2022 (UTC)
- Notability of the organization might be a useful guide, for example the number of sitelinks to language wikipedias for the organization? ArthurPSmith (talk) 21:19, 28 September 2022 (UTC)
Some scholarly article (Q13442814) statistics[edit]
As of 4th Apr 2023
- 38,911,011 – scholarly article items in all [3]
- 26,581,931 – scholarly article items without author (P50) and with author name string (P2093) = author name strings only [4]
- 1,111,985 – scholarly article items with author (P50) and without author name string (P2093) = all author name strings disambiguated to author Qids [5]
- 32,030,055 – scholarly article items with PubMed ID (P698) [6]
- 28,392,394 – scholarly article items with DOI (P356) [7]
- 22,720,690 – scholarly article items without main subject (P921) [8]
- 25,279,923 – scholarly article items without language of work or name (P407) [9]
- TODO – scholarly article items without title (P1476)
Kpjas (talk) 20:19, 4 April 2023 (UTC)
- Interesting observations. I think that "Articles without main subject" is especially important because:
- "Main subject" is the main reason to have articles in WD, since WD is not an authoritative article source, and doesn't have Abstracts.
- WD is flooded with articles about X but the item X itself is missing. Example:
- Sliding window protocol (Q592860) was made in 2012 from Wikipedias
- 300 articles mentioning "sliding window" were imported from various sources
- I made Sliding Window (Q80681012) only in 2020: but that's the archetypical item, after which Sliding window protocol (Q592860) is named; and all those articles could benefit from having "main subject" set to one of the two items.
- From your statistics, one might think that 1/3 of the articles have main subject: not so bad, right? However, an article should typically have at least 5-10 subjects, and there's no assessment whether those that have at least one, have adequate subjects
- Vladimir Alexiev (talk) 07:01, 30 April 2023 (UTC)
- @Vladimir Alexiev AFAIK main subject (P921) the situation with providing scientific articles items with this property is:
- by hand -- rather impractical
- executed by a specific tool that is QS -- carefully select scientific article items with phrases in their titles that we assume would make an adequate main subject e.g "BRCA1 mutation" or "Huntington's disease"
- provided by other tools or bots like SourceMD [10]
- Pubmed metadata contain keywords and MeSH -- why not pull these (copyright issue?)
- lo and behold ChatGPT is quite good at summarizing, perhaps also at providing main subjects for scientific articles, huh ?
- Kpjas (talk) 19:38, 30 April 2023 (UTC)
- I think there's a tagger by James Hare
- MESH is a great taxonomy system but to my unpleasant surprise PubMed articles don't have nearly enough MESH. Eg search https://pubmed.ncbi.nlm.nih.gov/?term=semantic+web:
- https://pubmed.ncbi.nlm.nih.gov/31213735/ has no MESH
- https://pubmed.ncbi.nlm.nih.gov/36320897/#mesh-terms has some MESH but no "systematic review" (D000078202, D000078182)
- I've done some GPT trials and it's great at classifying papers (just from title, not even abstract) with MESH words, but terrible at picking the precise MESH code. So a secondary lookup will be needed. Examples below
- I've tagged the WikiCite telegraph group, let's see if they react https://t.me/c/1497612692/4268
- I asked GPT-4:
- classify these PubMed papers with MESH descriptors and qualifiers.
- (I provided title and abstract, but showing only title below)
- 1. Semantic Web in Healthcare: A Systematic Literature Review of Application, Research Gap, and Future Research Avenues
- MESH Descriptors:
- Semantic Web
- Health Care
- Systematic Reviews
- Electronic Health Records
- Medical Informatics
- Interoperability
- Decision Support Systems, Clinical
- Information Management
- Medical Informatics Applications
- MESH Qualifiers:
- utilization
- trends
- organization & administration
- methods
- standards
- 2. Semantic Web Technologies for Sharing Clinical Information in Health Care Systems Abstract:
- MESH Descriptors:
- Semantic Web
- Health Care
- Medical Informatics
- Interoperability
- Information Management
- Ontologies (Information Science)
- Electronic Health Records
- Clinical Information Systems
- Information Storage and Retrieval
- MESH Qualifiers:
- methods
- trends
- organization & administration
- standards
- utilization
- -- Vladimir Alexiev (talk) 07:17, 1 May 2023 (UTC)
- @Vladimir Alexiev AFAIK main subject (P921) the situation with providing scientific articles items with this property is:
Source reliability + assessment[edit]
I recently drafted Wikidata:WikiProject Source Reliability to capture efforts to annotate source entities with information related to their reliability. I'd love feedback on how to make that a useful complement to SourceMD. Sj (talk) 17:07, 25 April 2023 (UTC)
- Please see property proposal: Wikidata:Property proposal/assessed source reliability. Harej (talk) 03:01, 26 April 2023 (UTC)
- Nice, that seems the minimal property that could capture a range of different evaluations.
Notified participants of WikiProject Source MetaData Sj (talk) 16:38, 28 April 2023 (UTC)
- Nice, that seems the minimal property that could capture a range of different evaluations.
Finalize a rename to WikiProject WikiCite[edit]
Move Wikidata:WikiProject Source MetaData -> Wikidata:WikiProject WikiCite
Previously discussed and approved in 2017 with mostly approvals, some abstention, and no opposition.
I called for comment, got it, then neglected to executed the move.
I am posting again to express intent to do the rename soon based on past approval. I do not expect objections, and in my view, over the years the scope of activity here has always overlapped with WikiCite activities.
Comments from anyone? Bluerasberry (talk) 17:59, 5 May 2023 (UTC)
Support ArthurPSmith (talk) 16:59, 8 May 2023 (UTC)
Comment Why not just Wikidata:WikiCite? Harej (talk) 04:03, 11 May 2023 (UTC)
- I like that, @JakobVoss: already redirected that here a while back. Sj (talk) 20:49, 15 May 2023 (UTC)
- Ok with me too. ArthurPSmith (talk) 16:44, 17 May 2023 (UTC)
- I like that, @JakobVoss: already redirected that here a while back. Sj (talk) 20:49, 15 May 2023 (UTC)
Support for Wikidata:WikiCite. --Daniel Mietchen (talk) 22:07, 16 May 2023 (UTC)