Wikidata:ScienceSource project/MEDRS report
This is a report on the MEDRS algorithm developed by ScienceSource. WikiMed in this report stands for the medical editing community on the English Wikipedia. MEDRS means the guideline used on English Wikipedia to determine which are the "reliable sources" acceptable for referencing health information: it is more stringent in its view than the basic w:WP:RS guideline. Automation of MEDRS throws up numerous questions, and the details below supply both data-led answers and introductions to debates. WikiMed proceeds by relying on informed, individual judgement calls,
The SPARQL code in the third section at the least provides a uniform approach, and raises issues about how Wikidata could fill the current gaps in its medical bibliography. It is applied in different ways, and an understanding of its basic structure would allow users to modify it. At present, the state of the "debates" means a definitive algorithm is some way off.
[edit]w:WP:MEDRS is a significant guideline page, regulating the citations that support health information on the English Wikipedia. This report assumes familiarity with its typical workings. MEDRS goes rather further than w:Wikipedia:Scientific citation guidelines. Where w:Wikipedia:Reliable_sources#Medical_claims mentions also "textbooks ... medical guidelines and position statements", what is addressed here is the use of biomedical journal articles. These outnumber other types of source.
In this area, differences between scientific research and medical knowledge become apparent. The onus is put on the secondary biomedical literature. The aim is to identify the right parts of that literature, solving a bibliographical question. Doing that cannot, by itself, prevent misuse. It can be helpful to think of the citation question for Wikimedia's health information as a "dirty data" issue. The lay approach can accept misinterpretation of the text, but also misunderstanding of the sources. What tends to happen is adversarial source criticism, first. What goes on in ScienceSource homes in on finding data useful for cleaning up the stream of "dirty data", stepwise. In a similar fashion, what makes for a proper apprehension of the import of biomedical literature can be seen as a concentric process, not just gatekeeping at a single gate.
The algorithm described below has been developed in response to quite a number of "nudges" in planning, and the solutions that were found are sometimes placed in this reports footnotes, to keep things readable. The code section shows what is essentially the same idea used in half-a-dozen ways. There is a basic structure, but the inputs, outputs and lists (information built into the code but curated in its own right) can be modified at will. It should be noted that dates, which play a key role, are only used in the form of calendar years. Any greater stretching for accuracy would miss the point of how MEDRS is applied in practice, which is flexibly.
Medical publishing in general
[edit]Much of the input into determining the value of citations to medical literature comes from detailed knowledge of the article, journal and publisher in question. The complexity of the issues can be seen with the publishers, with over 50 types of body being relevant.[1]
There is also complexity, and hazard, in "access status". There is no "access status" property on Wikidata, which would help here, and some experts recommend that there should be. This is an area in which missing information still makes Wikidata somewhat harder to use than it should be.[2] One recommendation in this direction is that the RoMEO publisher ID (P6617) property should be given a companion property for the identifiers on that site for journals (via ISSNs). It is a most helpful site for the fine details of open access (OA). The Directory of Open Access Journals (DOAJ), which is more concerned with editorial process, rather than what can be read when, is central to what goes on.
Working assumptions and overview
[edit]To be algorithmic is to work with a definite set of inputs. The major assumption is that the main inputs can be stored on Wikidata, (or if not there, in future on the ScienceSource wiki).
The algorithm(s) we are talk about here are expressed in SPARQL. Essentially the same code can then be used in a number of different "modes". From the development point of view, it has been is easiest to run the code on the project focus list (see WD:SSFL), a matter of adding one line concerned with on focus list of Wikimedia project (P5008). But there are now performance issues in using the whole list at once.
Modes of use
[edit]Of various possible ways in which the MEDRS code can be run, the examples shown below are:
- Run a "VALUES" query with a set of article items selected from the focus list.
- Run a focus list query, but restrict the topic.
- "Singleton" mode, with a single article item, used to test one paper.
- Batch derived from a list of identifiers, with a Wikipedia or other external source. The example given below uses PubMed publication ID (P698). Batches of DOIs are certainly also possible, if one is aware of the case-sensitivity issue (UCASE in SPARQL would be useful).
At present federation with is only playing a supporting role, but that might change in future,
What is actually practical depends on performance, with the 60 second limit at Running the whole focus list might take 5 minutes, without some further optimisation.
Fundamental assumptions
[edit]These are the fundamental positive and negative assumptions about the statements on an article item to qualify for MEDRS status:
Property | Type |
title | positive |
publication date | positive |
journal | positive |
publisher on the journal item |
positive |
NLM Unique ID (P1055) on the journal item |
positive |
retracted by (P5824) | negative |
instance of (P31) retracted paper |
negative |
So the journals involved should all have publisher items here. Publisher items for journals used by the focus were checked and provided by work done for the project: it is not automatic here. This effort built on work in 2018 by John Cummings and Navino Evans.
For the positive criteria, missing information is a reason to filter out the item: for the negative criteria, the presence of a retraction statement is enough to rule out the item for MEDRS purposes. Other criteria used (Directory of Open Access Journals ID (P5115), possibly GARD rare disease ID (P4317) with some further work) are "optional", leading to case analyses.
The algorithm also operates on the publication date, and the credibility that should be given to open-access journals.
Lists used within the algorithm
[edit]"Blacklists" and "whitelists" are used. These mean lists of data held in the algorithm itself. At a fundamental level, it is naturally better to replace such lists by explicit criteria, a "counsel of perfection". MEDRS itself subscribes to principles of Evidence-Based Medicine/Practice (EBM/EBP). Any attempt at putting it on a formal footing will begin with a mixture of data, rules and appeals to types of authority. Then bearing down on the appeals should have as high a priority as checking the data, and considering the usefulness of the rules.
The emphasis here is on getting hold of basic data that can be stored on Wikidata. The current lists used are:
- List 1. Deprecated reviews (article blacklist, currently placeholder)[3]
- List 2. Neglected diseases (topical whitelist}[4]
- List 3. OA exceptions (journal whitelist)[5]
- List 4. Beall's list (publisher blacklist)[6]
See commentary below. The first "working code" example shows the lists as they occur in the SPARQL
Consistent with the practice around MEDRS would be a whitelist of articles, absolute exceptions to the MEDDATE time window that sets the "best before" date for articles. No such list is involved right now, because there was no relevant case that became apparent with the focus list as compiled. It is possible that a whitelist of closed access journals will be needed, to make the algorithm work properly outside the open access sector to which it currently applies. The rare diseases issue mentioned below may be conveniently handled by an exception list for diseases.
Other assumptions
[edit]ScienceSource operates with papers published under certain Creative Commons licenses, and the most pressing and immediate problem early in the project was to import that information. Later, scaling up the focus list showed up missing information about the journals involved.
A working assumption, though, is that false negatives are less serious than false positives. There are two faces, at least, of this attitude: to be conservative in what the algorithm accepts as a reliable reference; and to admit that sources will be rejected for missing inputs. It is helpful and indeed a healthy state of affairs if users identify missing data in the system as a whole; and fixing an unwanted rejection by adding information is a one-for-all solution for that particular case. The addition might be on Wikidata, or there might be a change to a blacklist or whitelist. The Lyme disease example at the end, which reaches outside the focus list, shows up all these points.
Incremental route to a "universal" algorithm
[edit]It is theoretically possible to run a MEDRS algorithm on all Wikidata's article items, but currently that is impractical, at least through the regular SPARQL at, for performance reasons.
Running such an algorithm somehow, and posting the results in a fashion that allowed lookup, would "solve" the MEDRS issue, to the extent that Wikidata also had items on all of the relevant articles. Time-dependence (MEDDATE) means that the data is anyway a moving target.
What is now possible is, firstly, to repurpose the focus list, so that the tagging indicates that the article items on it corresponded to sources passing MEDRS, as tested by the current version of the algorithm. Secondly, to build up and maintain the list, particularly byincluding the "closed access" literature (e.g. Cochrane reviews). This would be an incremental process, driven by additions of metadata here, and removals from the focus list (a) because MEDDATE in its qualified version applied to publication dates indicated the "staleness" of a source, and (b) because the algorithm and its list inputs had been tweaked.
A definitive algorithm is unlikely to emerge, until the MEDREV issue mentioned below is closer to an open-data solution. What Wikidata can offer at present is a curated set approach, where inclusion is gradually moderated to match the citation needs of Wikipedia. This pursuit is of m:Knowledge Integrity on the grand scale. At present, the MEDRS case looks quite disjunctive (requiring exhaustive case analyses), and technically punishing (operations on sets of items in the tens of millions, that are moving targets), while requiring inputs from discriminating judges. But not actually out of reach.
[edit]One does one's best. There are numerous facets to the quality issue within MEDRS. MEDDATE and MEDREV refer to sections in w:WP:MEDRS. See also MEDASSESS. The current algorithm does not capture all the content of the guideline.
Journals and "closed access"
[edit]The publication charges for open access publishing raise well-known issues on quality. Since “open access” articles form a fragmented body of works, it is not straightforward to decide how to handle them. The Directory of Open Access Journals (DOAJ) sets editorial standards for open access journals, and constantly expands its approved list. DOAJ ID statements with P5115, having the starting date for DOAJ approval, work well here.
Some leading OA publications (e.g. BMJ and WHO bulletins) are not DOAJ-approved. So there is a whitelist. See Wikidata:ScienceSource project/Focus list open access journals for full details.
As a general rule, "closed access" publication comes in the general form of "hybrid open access", a publishing model meaning some authors pay for publication as OA, while other authors agree that subscriptions will finance their papers. The assumption being that the review process for acceptance is the same in both cases, there is less need to scrutinise the hybrid journals.
[edit]There is certainly a view, however, that some journals can be excluded a priori for low w:impact factor, in other words by a calculation from w:bibliometrics. There are troubling points about this:
- Not clear that "low" can be defined in an absolute way, rather than relative to some idea of what is normal for a given medical subfield.
- This kind of exclusion likely makes the "neglected disease" phenomenon worse, and risks being no better than guilt by association.
- The argument that certain fringe papers can be kept at bay as sources, in this way, shows up the weakness traditionally called out as "w:hard cases make bad law".
Journal and paper blacklists can function instead as precedents, which can be annotated with rationales. An alternative and simple approach would be to look at Danish Bibliometric Research Indicator level (P1240), which simply divides journals into levels 1 and 2, with level 2 superior to level 1. The JUFO ID (P1277) has a similar system with three levels, not stored on Wikidata though. This "Scandinavian" approach gives simple advice. Norwegian Register journal ID (P1270), not yet much used on journal items, does some ranking. If there is a consensus view on a way to apply bibliometric data consistently, from a definite source, then the point can be revisited, and journals filtered.
Wikidata would benefit in the longer term from an "indexed by" property, to record in which repositories a given journal is indexed. Accumulating that kind of information here is more in the Wikidata spirit, as would be computations from citation data stored here (which would be open) as opposed to taking numbers seriously that depend on closed data.
[edit]Information about retracted articles, retraction notices and “expressions of concern” is available on PubMed. With the ScienceSource metadata tool, some articles with Wikidata items have been tagged as retracted and reviews.
Evidently retracted articles should be excluded. They may be found in one of two ways:
- Statements instance of (P31) retracted paper (Q45182324).
- Statements retracted by (P5824) with object a retraction notice.
While #2 is a better way, it requires the creation here of an item for the retraction notice. The presence of the retraction notice cannot just be assumed, so it is safer to exclude on both grounds, as is done in the MINUS section of the working code.
Given the extensive citation data now on Wikidata, it would be possible also to exclude articles citing retracted reviews. That would currently amount to about 100 suspect articles, a number that will grow (PubMed knows of about 6K retracted articles). For this set of articles, a flag has been raised, and that means there is an issue “blacklist or whitelist”: should there be an assumption guilty until proven innocent, or the other way round? MEDRS seems not to supply an answer.
w:User:RetractionBot is working in this area, on Wikipedia.
MEDDATE: Handling of "neglected diseases"
[edit]Not all neglected diseases are "neglected tropical diseases" (NTDs), but the most flagrant cases of neglect mostly fall into that category.
WikiMed practice is to be lenient in requirements for referencing key material on areas relatively neglected in the biomedical literature. To implement this type of approach requires specific rules, which is easy for the five-year "best before" date applied to review articles, which can be relaxed to eight years, say.
To define "neglected" is another matter. The project is adopting a list from the editorial policy of a journal, PLoS Neglected Tropical Diseases (Q3359737). It is more inclusive than the corresponding WHO list.
MEDDATE case study: handling of rare diseases
[edit]While rare diseases tend to be neglected, they are not usually found on NTD lists, because neglected diseases that are common in certain environments have to take priority. Their research literature does tend to be sparse. A "best before" date ten years after publication seems justified. Definition by GARD rare disease ID (P4317) is feasible. The same mechanism with an OPTIONAL clause as is currently used for the neglected disease list can apply.
On the other hand, some quite common diseases (diabetes, Alzheimer's) are included if P4317 is employed. A list of exclusions would have to be compiled, to make proper use of the information in the GARD system.
MEDREV: What is a review?
[edit]The prototype algorithm is based on PubMed's tagging of articles as reviews. The MEDRS nutshell speaks of both literature reviews and systematic reviews, while the main discussion in the guideline is closer to an EBM approach. Currently the algorithm trusts PubMed, but has a blacklist for "deprecated reviews".
Good practice in the production of systematic reviews is illuminating for the MEDRS algorithm, in the broader ScienceSource context. There has to be a selection of literature relevant to the "clinical question" at issue; it is then whittled down by a principled, repeatable set of criteria representing expert judgement; and the actual content of articles examined critically by humans. Replacing the question by the MEDDATE issue, and the examination of content by the ScienceSource review stage, that would amount to a overall description of the pipeline. The quality achieved, in both cases, depends on what is left tacit in the top-down view.
ScienceSource makes a clear distinction between data inputs, relying on Wikidata content, and compiled lists. The structure of the algorithm really just provides a suitable framework for bringing this information to bear, made as complete and authoritative as it can be.
Best references: Reproducibility and systematic reviews
[edit]There is a debating point associated with MEDRS: good references, or "the best" references? In terms of systematic reviews, there is more of a chance of objectivity in judging the "best" ones. Reproducibility of the survey protocols is a recognised quality indicator in systematic reviews.[7] When a survey is repeated with the same protocols, applied to more recent literature, the resulting systematic review can properly be said to supersede the previous one.
This criterion is not primarily topical, because it depends on the protocols being described in a close to formal way. It may result in the scope of the new review being more inclusive, but that is not perhaps a given. The question addressed should be the same.
In principle the information "review A supersedes review B" can be stored in a machine-readable way. It is unlikely that Wikidata would want to host it, unless the statement was objective. Statements of this kind involving judgements could be stored on the ScienceSource wiki, for example, and used in a MEDRS algorithm.
Working code
[edit]The same essential piece of SPARQL can be used in different contexts. It may require some minor adaptations, and for better performance some testing can be assumed. (For example, conditions need not be imposed on publisher items in the query itself, when they can be tested beforehand.)
Working batch query
[edit]#Uses 40 most recent articles from the focus list as batch
#Demo showing neglected disease impact on MEDDATE
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
VALUES ?article
{wd:Q60919743 wd:Q61445426 wd:Q61796242 wd:Q61796245 wd:Q61796826
wd:Q61797172 wd:Q61797316 wd:Q61797563 wd:Q61797782 wd:Q61798333
wd:Q61798848 wd:Q61798909 wd:Q61798960 wd:Q61799857 wd:Q61801109
wd:Q60311053 wd:Q61795885 wd:Q61802156 wd:Q61810971 wd:Q62491494
wd:Q61804762 wd:Q61805331 wd:Q61442848 wd:Q61443180 wd:Q61443288
wd:Q61811251 wd:Q61811294 wd:Q61445837 wd:Q61446085 wd:Q61446089
wd:Q61446093 wd:Q61812282 wd:Q61812472 wd:Q61446120 wd:Q61812779
wd:Q61446176 wd:Q61446365 wd:Q61446539 wd:Q61813488 wd:Q61447393}
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher;
wdt:P1055 [ ].
?publisher wdt:P31/wdt:P279* wd:Q2085381.
?article wdt:P5008 wd:Q55439927;
wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P31 wd:Q7318358;
wdt:P1476 [ ].
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}#List 1. Deprecated reviews (article blacklist)
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
}#List 4. Beall's list (publisher blacklist)
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }#List 2. Neglected diseases (topical whitelist)
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
FILTER(?medrsyear >= ?ynow)
#In general, you can remove this "filter line"
#to see the workings of the medrsyear variable displayed
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
Initial testing of this batch
[edit]Of the articles initially rejected by the query in its state on 2019-05-30, from the initial 40, five[8] were then fixed up with missing journal and publisher data, and passed. Others, from two journals,[9] would require changes in the OA journal whitelist and (in one of the cases) the publisher blacklist to pass.
Single-value test version
[edit]#As above, batch size one
#Demo version showing neglected disease impact on MEDDATE
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
VALUES ?article
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher;
wdt:P1055 [ ].
?publisher wdt:P31/wdt:P279* wd:Q2085381.
?article wdt:P5008 wd:Q55439927;
wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P31 wd:Q7318358;
wdt:P1476 [ ].
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
FILTER(?medrsyear >= ?ynow)#Remove this filter line to see the medrsyear variable displayed
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
2013 test set
[edit]#As above, batch taken from the focus list, last 100 dates in 2013
#Demo showing neglected disease impact on MEDDATE
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
VALUES ?article
{wd:Q26991987 wd:Q31147214 wd:Q34795368 wd:Q35091165 wd:Q37535454
wd:Q37563132 wd:Q41837402 wd:Q54313739 wd:Q27001739 wd:Q26859056
wd:Q37598449 wd:Q37444803 wd:Q37486731 wd:Q54318026 wd:Q54331401
wd:Q34746108 wd:Q37424905 wd:Q37428770 wd:Q37488800 wd:Q38174963
wd:Q38181085 wd:Q41610163 wd:Q41821146 wd:Q21131296 wd:Q27001269
wd:Q26860902 wd:Q30399417 wd:Q35080136 wd:Q36234685 wd:Q37415488
wd:Q37424916 wd:Q37430213 wd:Q37444678 wd:Q38198431 wd:Q37564661
wd:Q38223686 wd:Q27027614 wd:Q34394530 wd:Q35081249 wd:Q35081255
wd:Q38329445 wd:Q27001098 wd:Q30358199 wd:Q37455520 wd:Q38174396
wd:Q38202106 wd:Q38215500 wd:Q24607667 wd:Q26853657 wd:Q27012569
wd:Q37419857 wd:Q37427512 wd:Q37449369 wd:Q37456955 wd:Q54334851
wd:Q27008874 wd:Q33440399 wd:Q35073534 wd:Q37451050 wd:Q41858635
wd:Q41851925 wd:Q37597560 wd:Q37703820 wd:Q26851339 wd:Q27011832
wd:Q27680496 wd:Q27680497 wd:Q28304672 wd:Q34393461 wd:Q35101581
wd:Q35073498 wd:Q35078355 wd:Q37411773 wd:Q38172970 wd:Q38173346
wd:Q38216022 wd:Q26827570 wd:Q26999444 wd:Q27011272 wd:Q33650196
wd:Q34398735 wd:Q35075737 wd:Q35075771 wd:Q35079805 wd:Q35081252
wd:Q37433321 wd:Q37425897 wd:Q38176010 wd:Q42385487 wd:Q26822652
wd:Q30441089 wd:Q35071418 wd:Q35075351 wd:Q35075628 wd:Q35557773
wd:Q37400355 wd:Q37424844 wd:Q37425963 wd:Q26853711 wd:Q27691382
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher;
wdt:P1055 [ ].
?publisher wdt:P31/wdt:P279* wd:Q2085381.
?article wdt:P5008 wd:Q55439927;
wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P31 wd:Q7318358;
wdt:P1476 [ ].
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
FILTER(?medrsyear >= ?ynow) #Remove this filter line to see the workings of the medrsyear variable displayed
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
Testing of this batch
[edit]The year 2013 is on the cusp for MEDDATE: the algorithm normally would compute the medrsyear for these papers as 2018, and so they wouldn't qualify. As first run, there were six hits: one, Biomarkers in Japanese encephalitis: a review (Q37425963) because of a neglected disease, and five for a technical reason that counts as an artefact: a print publication date being in 2014. Since MEDRS is not interpreted strictly by date, these results may stand.
One more paper from the batch, Paragonimus and paragonimiasis in Vietnam: an update. (Q35091165), would qualify if the OA journal Korean Journal of Parasitology (Q21385511) were whitelisted. The page WD:SSFLOAJ is for determining such issues.
A query showed that the candidates on the focus list (published 2013, main subject on the neglected disease whitelist, review articles), came to 77.[10]
Infectious Disease tweaked version
[edit]#Demo using Infectious Disease specialty.
#(1) Conditions usually imposed on the publisher are just assumed.
#(2) The line "?mainsubject wdt:P1995 wd:Q788926" would causes a bound variable clash,
#with the OPTIONAL section, so the variable ?mainsubject1 is used instead.
#(3) The filter line, comparing ?medrsyear ?ynow, is omitted.
#(4) The results are in chronological order.
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher.
?article wdt:P5008 wd:Q55439927;
wdt:P31 wd:Q7318358;
wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P921 ?mainsubject1.
?mainsubject1 wdt:P1995 wd:Q788926.
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
LIMIT 1000
Comments: This query has some modifications for performance, and for technical reasons (see points at the top of the query). The filter line comparing medrsyear and ynow has been removed, so that the neglected disease effect is displayed. The use of the ?mainsubject variable can be replaced by plugging a disease into the P921 line, giving a prototype literature search: there is an example in the next section.
With Wikipedia referencing data
[edit]The article w:Lyme disease has been worked over intensively in 2019, and has (at the time of this writing) 272 references. The PubMed publication ID (P698) values were extracted from the wikitext, and converted as far as possible to Wikidata items.[11] There were 181 distinct article items found here, with eight missing IDs.[12] The coverage of around 95% may be accounted for by bot work that aims to create items here for sources used on English Wikipedia. This set of article items therefore forms a test set for MEDRS. It does include quite a number of older papers, i.e. pre-2011. MEDRS as applied is concerned with health information, not (for example) historical background.
#Demo showing [[w:Lyme disease]] article PMID references with tweaks
#(1) No requirement to be on the ScienceSource focus list.
#(2) No requirement to be marked as a review article: the articles may well be reviews, but not tagged on Wikidata.
#(3) Remove filter line, to display MEDDATE workings.
#(4) Order by ascending date
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
VALUES ?article
{wd:Q58616664 wd:Q57609146 wd:Q57485891 wd:Q50127493 wd:Q48707337
wd:Q47381540 wd:Q46921677 wd:Q44428323 wd:Q40593725 wd:Q39780867
wd:Q38872719 wd:Q38654479 wd:Q38496896 wd:Q38237758 wd:Q37153001
wd:Q36989147 wd:Q36796349 wd:Q35633278 wd:Q35005600 wd:Q34995221
wd:Q34843047 wd:Q34814928 wd:Q34781042 wd:Q34729329 wd:Q34637392
wd:Q34629286 wd:Q34618127 wd:Q34591565 wd:Q34585208 wd:Q34561366
wd:Q34561133 wd:Q34559233 wd:Q34557481 wd:Q34544429 wd:Q34520345
wd:Q34477366 wd:Q34474075 wd:Q34461866 wd:Q34460730 wd:Q34437837
wd:Q34399316 wd:Q34394008 wd:Q34352282 wd:Q34347231 wd:Q34341217
wd:Q34319238 wd:Q34317312 wd:Q34303133 wd:Q34298628 wd:Q34297106
wd:Q34294127 wd:Q34290428 wd:Q34235975 wd:Q34224011 wd:Q34184736
wd:Q34172333 wd:Q34121634 wd:Q34064853 wd:Q34003294 wd:Q33973130
wd:Q33938828 wd:Q33857401 wd:Q30905505 wd:Q30656379 wd:Q29013670
wd:Q28831218 wd:Q28306954 wd:Q28306654 wd:Q28300495 wd:Q28297602
wd:Q28295952 wd:Q28277809 wd:Q28274247 wd:Q28272745 wd:Q28262147
wd:Q28259739 wd:Q28258391 wd:Q28256274 wd:Q28143958 wd:Q24670918
wd:Q24652798 wd:Q24647011 wd:Q24646591 wd:Q24645861 wd:Q24519702
wd:Q22253069 wd:Q22253014 wd:Q22253007 wd:Q22253006 wd:Q22253000
wd:Q22252993 wd:Q22252911 wd:Q22252818 wd:Q22252809 wd:Q22252781
wd:Q22252691 wd:Q22252679 wd:Q22252678 wd:Q22252671 wd:Q22252526
wd:Q22252431 wd:Q22252398 wd:Q22252360 wd:Q22252296 wd:Q22252240
wd:Q22252237 wd:Q22252203 wd:Q22252202 wd:Q22252200 wd:Q22252199
wd:Q22252198 wd:Q22252197 wd:Q22252196 wd:Q22252195 wd:Q22252187
wd:Q22251447 wd:Q22251436 wd:Q22251377 wd:Q22251363 wd:Q22251277
wd:Q22251276 wd:Q22251193 wd:Q22251192 wd:Q22250917 wd:Q22250914
wd:Q22250902 wd:Q22250897 wd:Q22250895 wd:Q22248114 wd:Q22248094
wd:Q22248045 wd:Q22248043 wd:Q22242973 wd:Q22242970 wd:Q22242959
wd:Q22242954 wd:Q22242935 wd:Q22242934 wd:Q22242931 wd:Q22242930
wd:Q22242917 wd:Q22242916 wd:Q22242865 wd:Q22242804 wd:Q22242763
wd:Q22242644 wd:Q22242631 wd:Q22242390 wd:Q22242388 wd:Q22242308
wd:Q22242252 wd:Q22242239 wd:Q22242238 wd:Q22242237 wd:Q22242236
wd:Q22242231 wd:Q22242230 wd:Q22242033 wd:Q22242014 wd:Q22241895
wd:Q22241880 wd:Q22241656 wd:Q22241572 wd:Q22241561 wd:Q22241465
wd:Q22241464 wd:Q22241461 wd:Q22241460 wd:Q22241391 wd:Q22241317
wd:Q22241251 wd:Q22241249 wd:Q22241246 wd:Q22241245 wd:Q22241162
wd:Q22241146 wd:Q22241125 wd:Q22241117 wd:Q22241086 wd:Q21285061
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher;
wdt:P1055 [ ].
?publisher wdt:P31/wdt:P279* wd:Q2085381.
?article wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P1476 [ ].
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
This batch query has a number of modifications.
Testing this batch
[edit]With the four changes noted in the comments, the initial test of this batch passed only 17, or fewer than 10%, of the articles from the Wikipedia Lyme disease article. A query directed at the journal metadata[13] brought out some issues:
- Journals marked as academic journal (Q737498), rather than "scientific journal" and its subclasses.
- The majority of journals lacked the double marking by "scientific journal" plus a type indicating the "access type", which the MEDRS algorithm expects.
- Journals marked as delayed open access journal (Q5253501), which can be allowed as a form of closed-access publishing. Adding a clause next to {?journal wdt:P31 wd:Q5953270} in the UNION, namely {?journal wdt:P31 wd:Q5253501}, would let the relevant articles through.
- Seven journals lacking a publisher.
Addressing these points brought some improvements.
There is also the condition of the article items themselves to consider. According to this query,[14] as a baseline only two of the articles were marked as "review article", while 28 lacked main subject (P921) information. Lyme disease is a form of borreliosis (Q16006998), which is a candidate to be added to List 2 (neglected diseases), as representative of diseases not tropical by geographical distribution. As things stood initially, Wikidata was lacking much information that could support this batch of references.
Comparison with ScienceSource
[edit]Here is an example in the style of the previous section, with main subject set to Lyme disease (Q201989).
#Demo using Lyme disease main subject.
#(1) Conditions usually imposed on the publisher are just assumed.
#(2) The filter line, comparing ?medrsyear with ?ynow, is omitted.
#(3) The results are in chronological order.
SELECT DISTINCT ?article ?articleLabel ?journal ?journalLabel ?bound ?medrsyear ?ynow
?journal wdt:P31 wd:Q5633421;
wdt:P123 ?publisher.
?article wdt:P5008 wd:Q55439927;
wdt:P31 wd:Q7318358;
wdt:P1433 ?journal;
wdt:P577 ?date;
wdt:P921 wd:Q201989.
{?journal wdt:P31 wd:Q5953270}
UNION {VALUES ?journal {wd:Q546003 wd:Q2928049 wd:Q5030320 wd:Q5690746 wd:Q6047666
wd:Q26841926 wd:Q1763668 wd:Q2025726 wd:Q15724513 wd:Q2456339
wd:Q27722384 wd:Q27667673}
UNION {?journal wdt:P5115 [ ].}
{?article wdt:P31 wd:Q45182324}
{?article wdt:P5824 [ ]}
{VALUES ?publisher {wd:Q52636754 wd:Q52635805 wd:Q4689899 wd:Q52620137 wd:Q4732612
wd:Q43080819 wd:Q30270870 wd:Q30297686 wd:Q52661346 wd:Q52636079
wd:Q52557383 wd:Q54958933 wd:Q2896740 wd:Q63254475 wd:Q18712923
wd:Q52609680 wd:Q52609536 wd:Q52636154 wd:Q52609215 wd:Q80796
wd:Q52636535 wd:Q52633727 wd:Q52636944 wd:Q63254434 wd:Q52637577
wd:Q52665969 wd:Q52660711 wd:Q52659576 wd:Q56979398 wd:Q52670242
wd:Q29891111 wd:Q52619294 wd:Q52662151 wd:Q7072722 wd:Q52609375
wd:Q7259709 wd:Q52636843 wd:Q45251004 wd:Q52637573 wd:Q52662489
wd:Q52635330 wd:Q47116994 wd:Q30267116 wd:Q24706265 wd:Q52620720
wd:Q52633876 wd:Q56416796 wd:Q52660351 wd:Q52635690 wd:Q7433770
wd:Q27991304 wd:Q55566796 wd:Q52619286 wd:Q30265175 wd:Q8035326}
{VALUES ?article {wd:Q26746153}
OPTIONAL {VALUES ?mainsubject {wd:Q949694 wd:Q2447562 wd:Q649558 wd:Q326071 wd:Q203133
wd:Q842428 wd:Q11679861 wd:Q2264130 wd:Q2360849 wd:Q1345113
wd:Q1597571 wd:Q2841329 wd:Q2665559 wd:Q30953 wd:Q738292
wd:Q154874 wd:Q2859732 wd:Q39222 wd:Q326638 wd:Q162272
wd:Q809561 wd:Q18975737 wd:Q1017169 wd:Q12090 wd:Q327298
wd:Q326648 wd:Q18975220 wd:Q36956 wd:Q155098 wd:Q690032
wd:Q193216 wd:Q682798 wd:Q922029 wd:Q41083 wd:Q76973
wd:Q16877704 wd:Q247096 wd:Q167178 wd:Q304601 wd:Q331283
wd:Q1475667 wd:Q388646 wd:Q777087 wd:Q1102300 wd:Q1048084
wd:Q719656 wd:Q2528129 wd:Q1760607 wd:Q192100 wd:Q1137321
wd:Q221159 }
?article wdt:P921 ?mainsubject.}
BIND(xsd:boolean(COALESCE(BOUND(?mainsubject),"false")) AS ?bound)
BIND(year(?date) AS ?ydate)
BIND(year(now()) AS ?ynow)
BIND((IF(?bound,?ydate+8,?ydate+5)) AS ?medrsyear)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en".}
Auxiliary queries
[edit]Generally, the MEDRS code depends on "levelling up" of data, to avoid false negatives.
- [1] Try it!
#Focus list journals, missing title SELECT DISTINCT ?journal ?journalLabel WHERE {?item wdt:P5008 wd:Q55439927; wdt:P1433 ?journal. MINUS {?journal wdt:P1476 [ ]} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
- [2] missing publication date
- [3] missing NLM ID
- [4] Try it!
#Finding DOAJ journals not instance of open-access journal SELECT DISTINCT ?journal ?journalLabel WHERE {?item wdt:P5008 wd:Q55439927; wdt:P1433 ?journal. ?journal wdt:P5115 [ ]. MINUS {?journal wdt:P31 wd:Q773668} SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } }
- [5] For possible implementation: P31 review article, referencing to PubMed
- [6] Variation on implicit "access model" assumption here, that the journals are either open access, or hybrid open access.
- [7] For possible implementation: filter OA journal articles by DOAJ start time, so assumed present
Auxiliary data
[edit]Missing data on Wikidata can cause false negatives in the MEDRS algorithm, which can be thought of as some bureaucratic form-filling that will not have a result unless required fields are filled. For testing, and to have a viable algorithm, much data has had to be added here.
