Wikidata talk:WikiProject Source MetaData

From Wikidata
Jump to navigation Jump to search

Importing from OpenAlex[edit]

Hi. I just posted this in the Wikidata chat in Telegram:

Related to the issues concerning BlazeGraph there is a new thread here
I participated in the WDQS scaling community call yesterday and I invite anyone interested to join the next call.
Meanwhile I'm continuing my work on my new bot with the goal of importing 20M+ articles into Wikidata from OpenAlex now that we have a disaster plan and don't have to make fear-based decisions. If BG breaks, WMF simply cuts out the scientific articles from WDQS according to the plan.
Anyone can set up a Wikibase and import a part of Wikidata and make it possible to make SPARQL queries on the scientific items and I predict someone will do it within a month from the disaster plan is executed.
I will post the request for botflag here once it is ready.

The code is here --So9q (talk) 08:09, 18 February 2022 (UTC)Reply[reply]

Here is the request (talk) 15:00, 23 February 2022 (UTC)Reply[reply]
@So9q: It looks like you are lower-casing DOI's instead of upper-casing them? All DOI's in Wikidata right now are upper-case, and you will not find matches with WDQS (or, I think, haswbstatement) if you have the wrong case. ArthurPSmith (talk) 17:30, 24 February 2022 (UTC)Reply[reply]
In the Wikicite group @Harej suggested we lowercase them all (in Wikidata). I use CirrusSearch which is based on Elasticsearch which has case-handling built in. Compare [1] and [2] :) So9q (talk) 08:48, 25 February 2022 (UTC)Reply[reply]
I want to clarify that although that is my personal opinion, as I understand there is currently consensus to capitalize DOIs in Wikidata, and drift away from this has been accidental (and largely a product of inconsistent enforcement). Harej (talk) 17:06, 25 February 2022 (UTC)Reply[reply]

University adding portraits[edit]

This project collects a lot of academic publications, and because of that makes structured data for authors. Until now I do not think we have an example of an organization which has tried to give us an image collection of their researchers, but here is one -

Bluerasberry (talk) 20:19, 7 March 2022 (UTC)Reply[reply]

Wikidata software profiling hackathon, June 6&8[edit]

Those interested in software + Wikidata are invited to the Scholia Hackathon 6&8 June 2022.

WD:Scholia is a Wikidata front end which does scholarly profiling, and is best known as tool for browsing the WikiCite collection of WD:WikiProject Source Metadata.

An example Scholia profile for the software Stata (Q1204300) is

Anyone interested in examining any part of Wikidata connecting to software is welcome. Bluerasberry (talk) 20:40, 19 May 2022 (UTC)Reply[reply]

Suggestions on adding affiliation string to author names[edit]

Hi there. I used my automated tool to create a scholarly article item that added affiliation strings to each author from ADS database to Wikidata. Link is here: Would adding affiliation string to author or author name string be useful? I'd like to hear advise and your suggestion. Feliciss (talk) 12:29, 28 July 2022 (UTC)Reply[reply]

@Feliciss: Yes, adding these would be useful, but I would prefer they be exactly as in the article, not parsed/edited. For your example, the Stanford affiliation in the article is listed as "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A.", so I would think that should be the string used here? ArthurPSmith (talk) 20:36, 28 July 2022 (UTC)Reply[reply]
@ArthurPSmith Can you link to where the string "Division of Applied Mechanics Stanford University, Stanford, CA 94305, U.S.A." comes from? In my case, it's exactly as same as what we see in the article. I get the whole affiliation strings for each author from You can view the affiliations of each author by clicking "Show affiliations". Feliciss (talk) 07:28, 29 July 2022 (UTC)Reply[reply]
I see. It's from DOI in the article. Since my bot only get affiliation strings from ADS, it's not possible (or does not make sense) to get the affiliation strings twice from the DOI in the article. Feliciss (talk) 07:37, 29 July 2022 (UTC)Reply[reply]
Ok, you are adding the reference to the ADS bibcode there so I guess that's fine. Obviously ADS is doing some parsing of affiliations but I think they're pretty reliable about that so this is ok. ArthurPSmith (talk) 17:11, 29 July 2022 (UTC)Reply[reply]
Another example:
I think ADS is parsing some but not all affiliation strings. Feliciss (talk) 08:27, 2 August 2022 (UTC)Reply[reply]

Who is the author "JC Shakespeare"?[edit]

A cautionary tale: - Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:28, 20 August 2022 (UTC)Reply[reply]

Reports published by policy and research organisations, can they be considered generally reliable?[edit]

also posted at w:WP:Village pump (policy) since it’s relevant there too

I’m looking for opinions on institutional policy and research reports in general as reliable sources as part of the WikiProject Policy Reports project. The example source types on WP:RS (scholarship, news, vendor etc) don’t quite cover our area of interest: reports, conference papers, discussion and briefing papers, strategies, policies and other docs (sometimes called grey literature). These are generally self-published by organisations (e.g. the WHO publishes WHO reports) but it’s obviously not the same as someone’s self-published blog or book.

I realise that for specific citations in WP it’s case-by-case. However, we’re looking for some guidance on what principles or criteria we could use to prioritise/sort organisations into 1) Generally reliable / 2) unclear / 3) generally unreliable since these sorts of items are likely often useful as potential WP sources in addition to books/journals/newspapers. As part of the project we’re looking to prioritise which organisations’ reports are most useful to upload metadata to Wikidata about. If general principles aren’t really possible, it’d be helpful to have some examples to calibrate on e.g. these five organisations:

  • The Australia Institute is an independent public policy think tank based in Canberra, Australia that carries out research on a broad range of economic, social, and environmental issues (APO-listed reports)

Thanks in advance for the feedback on these! We’ve >70 publishing organisations that we’re focusing on so these will help us calibrate which sorts of organisations are worth focusing on uploading metadata to Wikidata. If anyone has an interest in the full list, please let me know and I can loop you in on the full project. Brigid vW (talk) 07:06, 28 September 2022 (UTC)Reply[reply]

Notability of the organization might be a useful guide, for example the number of sitelinks to language wikipedias for the organization? ArthurPSmith (talk) 21:19, 28 September 2022 (UTC)Reply[reply]

As of 4th Apr 2023

Kpjas (talk) 20:19, 4 April 2023 (UTC)Reply[reply]

Interesting observations. I think that "Articles without main subject" is especially important because:
  • "Main subject" is the main reason to have articles in WD, since WD is not an authoritative article source, and doesn't have Abstracts.
  • WD is flooded with articles about X but the item X itself is missing. Example:
  • From your statistics, one might think that 1/3 of the articles have main subject: not so bad, right? However, an article should typically have at least 5-10 subjects, and there's no assessment whether those that have at least one, have adequate subjects
Vladimir Alexiev (talk) 07:01, 30 April 2023 (UTC)Reply[reply]
@Vladimir Alexiev AFAIK main subject (P921) the situation with providing scientific articles items with this property is:
  • by hand -- rather impractical
  • executed by a specific tool that is QS -- carefully select scientific article items with phrases in their titles that we assume would make an adequate main subject e.g "BRCA1 mutation" or "Huntington's disease"
  • provided by other tools or bots like SourceMD [10]
  • Pubmed metadata contain keywords and MeSH -- why not pull these (copyright issue?)
  • lo and behold ChatGPT is quite good at summarizing, perhaps also at providing main subjects for scientific articles, huh ?
Kpjas (talk) 19:38, 30 April 2023 (UTC)Reply[reply]
I asked GPT-4:
classify these PubMed papers with MESH descriptors and qualifiers.
(I provided title and abstract, but showing only title below)
1. Semantic Web in Healthcare: A Systematic Literature Review of Application, Research Gap, and Future Research Avenues
  • MESH Descriptors:
Semantic Web
Health Care
Systematic Reviews
Electronic Health Records
Medical Informatics
Decision Support Systems, Clinical
Information Management
Medical Informatics Applications
  • MESH Qualifiers:
organization & administration
2. Semantic Web Technologies for Sharing Clinical Information in Health Care Systems Abstract:
  • MESH Descriptors:
Semantic Web
Health Care
Medical Informatics
Information Management
Ontologies (Information Science)
Electronic Health Records
Clinical Information Systems
Information Storage and Retrieval
  • MESH Qualifiers:
organization & administration
-- Vladimir Alexiev (talk) 07:17, 1 May 2023 (UTC)Reply[reply]

Source reliability + assessment[edit]

I recently drafted Wikidata:WikiProject Source Reliability to capture efforts to annotate source entities with information related to their reliability. I'd love feedback on how to make that a useful complement to SourceMD. Sj (talk) 17:07, 25 April 2023 (UTC)Reply[reply]

Please see property proposal: Wikidata:Property proposal/assessed source reliability. Harej (talk) 03:01, 26 April 2023 (UTC)Reply[reply]
Nice, that seems the minimal property that could capture a range of different evaluations. Mattsenate (talk) 13:11, 8 August 2014 (UTC)Reply[reply]
KHammerstein (WMF) (talk) 13:15, 8 August 2014 (UTC)Reply[reply]
Mitar (talk) 13:17, 8 August 2014 (UTC)Reply[reply]
Mvolz (talk) 18:07, 8 August 2014 (UTC)Reply[reply]
Daniel Mietchen (talk) 18:09, 8 August 2014 (UTC)Reply[reply]
Merrilee (talk) 13:37, 9 August 2014 (UTC)Reply[reply]
Pharos (talk) 14:09, 9 August 2014 (UTC)Reply[reply]
DarTar (talk) 15:46, 9 August 2014 (UTC)Reply[reply]
HLHJ (talk) 09:11, 11 August 2014 (UTC)Reply[reply]
Blue Rasberry 18:02, 11 August 2014 (UTC)Reply[reply]
JakobVoss (talk) 12:23, 20 August 2014 (UTC)Reply[reply]
Finn Årup Nielsen (fnielsen) (talk) 02:06, 23 August 2014 (UTC)Reply[reply]
Jodi.a.schneider (talk) 09:24, 25 August 2014 (UTC)Reply[reply]
Abecker (talk) 23:35, 5 September 2014 (UTC)Reply[reply]
Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:21, 24 October 2014 (UTC)Reply[reply]
Mike Linksvayer (talk) 23:26, 18 October 2014 (UTC)Reply[reply]
Kopiersperre (talk) 20:33, 20 October 2014 (UTC)Reply[reply]
Jonathan Dugan (talk) 21:03, 20 October 2014 (UTC)Reply[reply]
Hfordsa (talk) 19:26, 5 November 2014 (UTC)Reply[reply]
Vladimir Alexiev (talk) 15:09, 23 January 2015 (UTC)Reply[reply]
Runner1928 (talk) 03:25, 6 May 2015 (UTC)Reply[reply]
Pete F (talk)
econterms (talk) 13:51, 19 August 2015 (UTC)Reply[reply]
Sj (talk)
addshore 17:43, 18 January 2016 (UTC)Reply[reply]
Bodhisattwa (talk) 16:08, 29 January 2016 (UTC)Reply[reply]
Ainali (talk) 16:51, 29 January 2016 (UTC)Reply[reply]
Shani Evenstein (talk) 21:29, 5 July 2018 (UTC)Reply[reply]
Skim (talk) 07:17, 6 November 2018 (UTC)Reply[reply]
PKM (talk) 23:19, 19 November 2018 (UTC)Reply[reply]
Ocaasi (talk) 22:19, 29 November 2018 (UTC)Reply[reply]
Trilotat Trilotat (talk) 15:43, 16 February 2019 (UTC)Reply[reply]
Alessandra Boccone
Pablo Busatto (talk) 05:40, 23 June 2020 (UTC)Reply[reply]
Blrtg1 (talk) 17:20, 23 July 2020 (UTC)Reply[reply]
Kosboot (talk) 21:32, 23 July 2020 (UTC)Reply[reply]
Matlin (talk) 09:38, 11 August 2020 (UTC)Reply[reply]
Carrierudd(talk) 11:44, 3 November 2020 (UTC)Reply[reply]
So9q (talk) 11:35, 16 January 2021 (UTC)Reply[reply]
pdesai (talk) 16:00, 8 February 2021 (UTC)Reply[reply]
 Donald Trung/徵國單  (討論 🀄) (方孔錢 💴) 18:43, 17 May 2021 (UTC)Reply[reply]
Notified participants of WikiProject Source MetaData Sj (talk) 16:38, 28 April 2023 (UTC)Reply[reply]

Finalize a rename to WikiProject WikiCite[edit]

Move Wikidata:WikiProject Source MetaData -> Wikidata:WikiProject WikiCite

Previously discussed and approved in 2017 with mostly approvals, some abstention, and no opposition.

I called for comment, got it, then neglected to executed the move.

I am posting again to express intent to do the rename soon based on past approval. I do not expect objections, and in my view, over the years the scope of activity here has always overlapped with WikiCite activities.

Comments from anyone? Bluerasberry (talk) 17:59, 5 May 2023 (UTC)Reply[reply]