Wikidata talk:WikiProject Source MetaData

From Wikidata
Jump to navigation Jump to search

New Tool for Creating Items from a Pubmed ID[edit]

Hi All, I made a tool to help create items for journal articles from a Pubmed ID. It uses WikidataIntegrator, which is a python package created by User:Sebotic for creating bots and interacting with Wikidata. Check out the page here, and let me know any comments or suggestions.  – The preceding unsigned comment was added by Gstupp (talk • contribs) at 20:50, 28 January 2017‎ (UTC).

Conflict-of-interest metadata[edit]

Some sources have serious conflicts of interest, which are not immediately obvious. For example:

Rippe, J. M; Angelopoulos, T. J (2016). "Sugars, obesity, and cardiovascular disease: Results from recent randomized control trials". European Journal of Nutrition. 55 (Suppl 2): 45–53. doi:10.1007/s00394-016-1257-2. PMC 5174142. Freely accessible. PMID 27418186.

This looks like it might be a solid medical review, and a good medical source. However, there is some information missing from this citation.

This isn't any old article in the w:European Journal of Nutrition. It says "Suppl"; it's actually from a "supplement sponsored by Rippe Health" (list of accessible COIs in European Journal of Nutrition supplements). Rippe Health is in turn sponsored by producers of sugary foods, among others, like the w:Corn Refiners Association (sic). The editor of the supplement is w:James M. Rippe, the founder and director of Rippe Health. Apparently the editor and the lead author are the same person.

James Rippe's COIs as an author are declared in the paper; there is no declaration of his COIs as an editor or as the director of the supplement funder that I can find. I can't find information about the funding sources or COIs of the European Journal of Nutrition, or its editorial staff (although the latter might be available via academic homepages, articles published by them, etc.).

To take another example, the American Journal of Clinical Nutrition (and the the Journal of Nutrition) are run by the W:American Society for Nutrition. The ASN has received some criticism for industry funding, see W:American Society for Nutrition#Corporate relationship concerns and list at w:Talk:American Society for Nutrition#Funding. The COIs of the editorial board of AJCN are declared here and summarized here.

These examples are expanded from w:Wikipedia talk:WikiProject Medicine#Sponsored supplement?.

The European Food Safety Authority runs a journal; see this project's influence on the use of this journal at Wikidata talk:WikiProject Source MetaData/Archive 3#Importing all articles from the EFSA journal. See the W:European Food Safety Authority#Criticism for third-party statements about the agency's COIs. I looked at the journal's website for COI information and found this page, which appears to state that the database is empty and the EFSA will tell you the COIs of its editors on request.

If most of the more standard COIs could be tracked automatically, and missing information flagged, it would make scholarly communications much more transparent. Wikimedia is a high-value target for shilling and misinformation, and finding truly independent sources can be difficult and time-consuming for editors. I think a pop-up COI details flag on references, for instance, would be great.

We have a start with Crossref funder ID. Does anyone have suggestions for other properties or approaches that would be useful? HLHJ (talk) 18:40, 26 March 2018 (UTC)

It seems as if supplement (Q2915731) View with Reasonator View with SQID with the properties sponsor (P859) View with SQID and editor (P98) View with SQID would be best. I'm not sure how one would indicate the relationship of a supplement to the journal that it is a supplement of, though. HLHJ (talk) 01:38, 7 April 2018 (UTC)
Tried putting it in Template:Bibliographic properties. Please let me know if I've messed it up. HLHJ (talk) 01:49, 7 April 2018 (UTC)
Example made at no label (Q56479527) View with Reasonator View with SQID. Note that the editor, the supplement publication sponsor, and the lead author are the same person. Note the sponsorship, too. This paper is not an independent source, but it is currently still cited as a MEDRS in Wikipedia. (some text copied from self on WP:MED talk) HLHJ (talk) 18:45, 5 September 2018 (UTC)
It should have been at the pre-existing Sugars, obesity, and cardiovascular disease: results from recent randomized control trials. (Q37521442), as Daniel Mietchen pointed out. I've merged them and resolved all the duplicate data except the supplement (which actually has a name, Supplement on Sugar Consumption Controversy (Q56479539) View with Reasonator View with SQID). The supplement has both separate funding and a separate editor from the European Journal of Nutrition, and according to DGG would have been mailed to individual subscribers, but not to libraries (he categorizes it as a non-peer-reviewed ad). If the supplement does not have its own item, I'm not sure how to tag an individual article with an editor, especially as it would presumably clash with the editor of the EJN. Tagging an individual article with a sponsor makes sense, and I've done a related example at Dietary fats, carbohydrates and atherosclerotic vascular disease. (Q40050232) View with Reasonator View with SQID, but here the sponsor sponsored the article, not the publication, which was apparently in ignorance of the sponsorship.
All the other papers from this supplement are already in Wikidata, too, although WhatamIdoing has recently removed them from en:Sugar:
What new properties do you think might be needed? I wrote a summary of some of the issues we might want to document at en:Conflicts of interest in academic publishing. Tags for journals' pledges to follow widely-recognised codes of conduct might be useful; the most recent version of the most common of those is Good publication practice in physiology 2017: Current Revisions of the Recommendations for the Conduct, Reporting, Editing and Publication of Scholarly Work in Medical Journals. (Q50061640) View with Reasonator View with SQID. It seems to me that we need a way to tag papers with honorary author (Q42889533) and ghost author (Q43155099) View with Reasonator View with SQID (when documented), and a way of listing the declared or reliably reported institutional COIs of the journal.
I've added the consulting listed by the original paper's lead author in the COI declaration to his record as "employers". The same could be done for journal staff, and peer reviewers in the case of open peer review. I've probably got some of this wrong, please let me know what. HLHJ (talk) 19:35, 9 September 2018 (UTC)

Books, editions, volumes, and exemplars[edit]

Wikidata:WikiProject Books allows different items to be created for a book (Q571) (ie the underlying work); a version, edition, or translation (Q3331189); a volume (Q1238720); or a individual book (Q53731850).

Presumably, the citation template should allow any of these to be cited, with the user free to specify a particular edition or volume or copy either by choosing a particular Q-number (which may or may not exist), or by specifying that parameter explicitly.

However, items for the more specific levels will not necessarily re-specify all the bibliographic information, if it is the same as that for the parent level -- eg an item for a copy would not usually repeat author/publisher/publication-date information specified for an edition.

Are (or will) the citation templates be able to supply the missing fields by tracing back up the hierarchy ? Jheald (talk) 22:43, 18 May 2018 (UTC)

@Jheald: The problem is to define in which level some data have to be store. Currently the data model is not providing a clear overview of the different levels and of the related properties. If this classification is done, then it will be easy to develop a program to retrieve information from the correct level. I was starting to develop one table with everything inside for book (see Wikidata:WikiProject Books/Book data model), but recent discussions in the project didn't convince me to continue. Snipre (talk) 21:10, 15 August 2018 (UTC)

Multiple versions of the same statement[edit]

An item can contain multiple versions of essentially the same statement, if there is different information contributed in qualifiers like stated as (P1932) -- see eg the authorship at The history and description of the county of Salop (Q29572671) for an example.

Can the citation templates condense multiple occurrences, if they have the same value? Jheald (talk) 20:06, 19 May 2018 (UTC)

Identifying duplicate items without duplicate IDs[edit]

A while ago, for a separate project, I put up all articles from the Royal Society Biographical memoirs and associated titles, with a tracking page at User:Andrew Gray/Royal Society biographies. I looked it over recently and realised that it's probing to be a useful way of seeing how much duplicate uploading we have going on. I've merged a few but left others up for now as a demonstration, eg no label (Q52399202) & Franz Bergel. 13 February 1900-1 January 1987 (Q47480577)

A couple of things are worth noting:

  • This is likely to be more common for older papers in a pre-universal-DOI era - in most cases here they're being imported from Pubmed, which doesn't always have DOIs for older papers, and a DOI-based import won't have Pubmed IDs, so there are no overlapping identifiers.
  • The slight metadata differences may make purely automated matching a bit challenging - note different title punctuation, different author string punctuation (sometimes full name in one and initials in another), different publication date (the Pubmed imports seem to be inferring 1/1/xx for year only data?), different approaches to counting pagination, and sometimes discrepancies with issue numbers (presumably because this journal didn't systematically use issue numbering). All of these are "obviously the same" to a human reader but might cause difficulties for a script.

None of this is a massive problem, of course (I estimate ~10 duplicate records for that title last month, and that was high), but it's something to be aware of and I thought I'd flag it up. I don't know what if anything at the moment is being done to catch and merge multiple uploads. Andrew Gray (talk) 08:42, 3 October 2018 (UTC)

Where do I see the record of Source M.D.?[edit]

I ran a batch, but see no evidence that it worked. I cannot find anything about the ISBNs that I listed. I cannot find the books that it might have created based on those ISBNs. I see nothing that indicated the effort produced something. Where should I look? If the "batch created" is relevant, it was 20181010142459. Thank you. -Trilotat (talk) 19:42, 10 October 2018 (UTC)