Wikidata:ScienceSource project/Focus list, Coronavirus
Using on focus list of Wikimedia project (P5008) with a qualifier, additions of article items related to COVID-19 (Q84263196) are now being made to the ScienceSource focus list, such as The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application (Q87647368).
As of April 2020, the idea is to tag article items for articles used as references in Wikipedia articles about Covid-19. The initial purpose is just to improve the article items by adding statements, which can be of a number of kinds. There is also scope for improving the items about the journals in which the articles are published; and the items about the publishers of the journals.
The pandemic means that Wikidata has to deal with a great deal of biomedical literature about coronavirus topics. Newly-created article items may well be sparse, and missing even basic identifiers. Material at PubMed will be updated over time, so an initial creation cannot be very full, and there is reason to go back later. The way citations link articles obviously evolves over time. The focus list is a rather simple device to enable some tracking, in a changing situation.
At the main WikiProject[edit]
Wikidata:WikiProject COVID-19 has its own focus list, and you can read it with this query:
#Focus list of the Wikidata WikiProject COVID-19
SELECT ?item ?itemLabel
WHERE {?item wdt:P5008 wd:Q87748614.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
For the context see Wikidata:WikiProject COVID-19/Focus list.
Missing articles[edit]
Checking Wikipedia articles related to coronavirus is a way to find cited articles that do not yet have Wikidata items. Currently these include from w:Coronavirus disease 2019:
- Clinical Characteristics of SARS-CoV-2 Infected Pneumonia with Diarrhea, DOI 10.2139/ssrn.3546120
- Smell and taste dysfunction in patients with COVID-19, DOI 10.1016/S1473-3099(20)30293-0, PMC 7159875
- New SARS-like virus in China triggers alarm DOI 10.1126/science.367.6475.234, PMID 31949058
Viewing the list[edit]
This simplified query shows the focus list in reverse chronological order:
SELECT ?item ?itemLabel ?date
WHERE {?item wdt:P31 wd:Q13442814;
wdt:P5008 wd:Q55439927;
wdt:P577 ?date.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?date)
LIMIT 25
This version brings up the main subject (P921) statements:
#Display of focus list papers from 2020, with subjects where they have been added.
SELECT ?item ?itemLabel ?date ?subjectLabel
WHERE {?item wdt:P5008 wd:Q55439927;
wdt:P577 ?date.
OPTIONAL {?item wdt:P921 ?subject.}
FILTER(year(?date) >= 2020)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
ORDER BY DESC(?date)
Relevant types of statement[edit]
- Any of the basic identifiers DOI (P356), PubMed ID (P698) and PMCID (P932) may be missing on Wikipedia
- main subject (P921) statements. From the point of view of this project, these should be referenced (to the article or PubMed), with MeSH descriptor ID (P486) main topics from PubMed, referenced, preferred. COVID-19 (Q84263196) will appear in PubMed as a supplementary topic, and of course can be added. MeSH topics may not be available, because the review for PubMed may not have happened yet (or, indeed, the article may not be indexed by PubMed). Author-generated keywords have disadvantages: (i) they are not from a controlled vocabulary like MeSH, so their meaning may be vaguer; (ii) they may not currently have a Wikidata item, while just about all MeSH terms do (and in harder cases the item can be found via https://meshb.nlm.nih.gov/ and https://tools.wmflabs.org/wikidata-todo/resolver.php with 486 in the first field); (iii) they are relatively random, may use domain-specific abbrevations and so on. From the point of view of this project, keywords may be used with discretion, with the qualifier statement object has role (P3831) subject heading (Q1128340) and a reference.
- full work available at URL (P953) statements. Texts may appear on PubMed Central, or publishers' sites, and the pandemic means that much more of the literature can currently be read than is usual, for "closed access" papers and journals. PubMed gives such links. Which may not last for ever, of course.
- copyright license (P275) In cases where the article has a Creative Commons license, it can be entered here. One advantage of full work available at URL (P953) statements that link to PubMed Central is that its pages give license information in the header.
- instance of (P31) statements. There are at least ten kinds to add. All the relevant items should be an instance of scholarly article (Q13442814). Further classification, such as may appear in PubMed, is very much relevant to value as a reference, such as review article (Q7318358), systematic review (Q1504425), meta-analysis (Q815382). Here particularly for "systematic review", being described that way by PubMed is an endorsement of method. A case report (Q2782326) ranks lower on the scale of clinical evidence. All these statements should be added, if they can be referenced. (There may be points to discuss about data modelling, but the value in Wikipedia terms is evident.) Also descriptive statements: as in editorial (Q871232), correspondence (Q1784733) ("Letter" in PubMed), comment (Q58897583), erratum (Q1348305), retracted scholarly article (Q77253277) and retraction notice (Q7316896), the last three with qualifiers: these show the scientific debate as it develops.
- cites work (P2860). Ideally one wants to go both backwards and forwards: to see where an article item is cited, one can use a simple SPARQL query (performance permitting), or use "What links here" with due caution.
It is also quite possible that clinical trial (Q30612) items could usefully be brought in: not quite so obvious how.