Wikidata:WikiCite/Bibliographic metadata for scholarly articles in Wikidata
Shortcut: WD:BIBO
About
[edit]This page is to develop a proposal for how bibliographic metadata for scholarly articles in Wikidata should be structured. It originated from the PLOS hackathon on October 18, 2014 (impression from the meeting) and is now being developed further as part of WikiProject Source MetaData.
Participants at the hackathon
[edit]- Jonathan Dugan (PLOS) - @jmdugan
- Jonas Dupuich (PLOS) - @jdup
- Mitar (UC Berkeley/PeerLibrary) - @mitar_m
- Dario Taraborelli (Wikimedia Foundation) - @readermeter
- Daniel Mietchen (WikiProject Open Access) - @evomri
- Karen Coyle - @karencoyle
Sample articles
[edit]Examples we're working on
- Journal articles
- Cited using Module:Cite, which gives:
- Trying a test cite module:
Data model (and mapping it to existing bibliographic data models)
[edit]- Source: https://github.com/mitar/bib2wikidata/issues/1#issuecomment-50998834
- Output should be a set of proposed properties, e.g.:
- Wikidata:WikiProject Periodicals
- Wikidata:List of properties/Works#Literature
- related Google Doc of unknown origin
- MARC Code List for Relators
Importers
[edit]- https://github.com/mitar/bib2wikidata
- see this ticket for discussion: https://github.com/mitar/bib2wikidata/issues/1
Ideas for input
[edit]- Schema.org
- MODS
- bibutils can convert several bibliographic formats to/from MODS
- CSL-Data
- pandoc-citeproc includes import from several bibliographic formats (e.g. BibTeX, RIS and MODS) for further processig as CSL-Data. See http://johnmacfarlane.net/pandoc/README.html#citations also to view an example of CSL-Data in YAML (mods2yaml is now included in pandoc-citeproc).
- http://blog.martinfenner.org/2013/07/30/citeproc-yaml-for-bibliographies/
- https://gist.github.com/nichtich/97f06bbfa249f5d33e2d - example and conversion of DOI 10.1371/journal.pone.0010676
- Relevant RDF ontologies for bibliographic data: bibo, schema.org, dcterms..
- BibTex, BibJSON?
- Mailing list Mitar made for discussing how the input should look like: https://common.tnode.com/sympa/info/bibformat-list
RDF and MODS, in contrast to BibTeX and common CSL-Data as used in Zotero, can contain author identifiers instead of raw strings only.
Contributorship links
[edit]- Harvard meeting http://projects.iq.harvard.edu/attribution_workshop/home
- CSE panel "Pinning Contributions to Individuals" on http://www.councilscienceeditors.org/events/previous-annual-meetings/cse-2014-annual-meeting/
- http://www.nature.com/news/publishing-credit-where-credit-is-due-1.15033 (table 1 has first pass of biomedical contributorship roles)
Wikidata item page
[edit]See also: Wikidata:WikiProject_Books.
- Item label: article title
- label shows up in instant search results; human-readable
- Item description: "scholarly article by [lastname,firstname] [et al], [year]"
- aliases
Statements
[edit]Yet to clarify which properties should only be used as qualifiers.
- properties and values
- are also items; have labels & descriptions & aliases
- For existing ones, see
- qualifiers
- sources
- instance of (P31): scholarly article (Q13442814)
- in the following, we will define "recommended properties" that can be ranked
- author (P50):
- Note: Wikidata item for author has to be created before the article's item can link to it
- the author's item should link to the author's ORCID iD (P496) if at all possible
- Note: Wikidata item for author has to be created before the article's item can link to it
- contributor role - do by qualifiers (to be defined)
- order of authors - int 1....n
- total number of authors?
- affiliation (P1416) affiliation
- ideally as given in the paper, though that may be too granular for Wikidata (example)
- email address (P968) email
- corresponding author (link to author page; display email from author list)
- published in (P1433) published in
- journal title
- volume (P478) volume
- string property
- should probably be used as a qualifier on the published in (P1433) statement
- issue (P433) issue/number
- string property
- should probably be used as a qualifier on the published in (P1433) statement
- page(s) (P304) page
- string property
- perhaps split into start page/ end page, allowing for several of both (e.g. in case of advertisements in between)
- should probably be used as a qualifier on the published in (P1433) statement
- identifiers
- PubMed publication ID (P698) PubMed
- PMC publication ID (P932) PMCID
- DOI (P356) DOI
- arXiv ID (P818) ArXiv
- reference URL (P854) reference URL
- URL in support of data in article - a link to where the article is accessible when no other field is available for the link
- title (P1476) title
- there is subtitle (P1680), but we don't expect to use it
- Commons category (P373)
- Wikimedia-specific place for any media types in or associated with [this article ID]
- needs to be a categorization of the media on Wikimedia Commons, compliant with the naming rules there
- needs bi-directional links between the Commons category and the Wikidata item
- includes DOI or Q number
- publication date (P577)
- date type: received date, accepted date, publication date
- this may not be minimal: publication date is the minimal
- language of work or name (P407) language
- can be multiples
- copyright license (P275) licence
- qualifiers for version, deed needed
- article ID (P2322) article ID (corresponds to "item_number" within "publisher_item" at CrossRef)
- corrections and retractions
TODO
[edit]- write this work up, suggest specific recommendations
- build "Scholarly Article template" on wikidata, similar to
- test corpus of papers from CSL and bibtex and compare to proposal
- Set up a few Wikidata:Showcase items
- share proposal with STM publishers
- Dario to identify leaders in wikidata for reach out to support proposal
- Automatically generate Wikipedia bibliographic references from Wikidata items (project proposed on meta in 2021)
See also
[edit]Examples of metadata in various input formats
[edit]For discussion about what to import into Wikidata.
citeproc-json:
{u'DOI': u'10.1371/journal.pone.0107541', u'ISSN': [u'1932-6203'], u'URL': u'http://dx.doi.org/10.1371/journal.pone.0107541', u'author': [{u'family': u'Pinto', u'given': u'Jayant M.'}, {u'family': u'Wroblewski', u'given': u'Kristen E.'}, {u'family': u'Kern', u'given': u'David W.'}, {u'family': u'Schumm', u'given': u'L. Philip'}, {u'family': u'McClintock', u'given': u'Martha K.'}], u'container-title': u'PLoS ONE', u'deposited': {u'date-parts': 2014, 10, 1, u'timestamp': 1412121600000}, u'editor': [{u'family': u'Hummel', u'given': u'Thomas'}], u'indexed': {u'date-parts': 2014, 10, 5, u'timestamp': 1412470290002}, u'issue': u'10', u'issued': {u'date-parts': 2014, 10, 1}, u'member': u'http://id.crossref.org/member/340', u'page': u'e107541', u'prefix': u'http://id.crossref.org/prefix/10.1371', u'publisher': u'Public Library of Science (PLoS)', u'reference-count': 0, u'score': 1.0, u'source': u'CrossRef', u'subject': [u'Agricultural and Biological Sciences(all)', u'Medicine(all)', u'Biochemistry, Genetics and Molecular Biology(all)'], u'subtitle': [], u'title': u'Olfactory Dysfunction Predicts 5-Year Mortality in Older Adults', u'type': u'journal-article', u'update-policy': u'http://dx.doi.org/10.1371/journal.pone.corrections_policy', u'volume': u'9'}
See this GIST for more information
TODO: Get the same input in BibTex, schema.org+JSON-LD, BibJSON.
TODO: How to represent corrections (this paper has two corrections)?