Wikidata:Dataset Imports/Directory of Open Access Journals

From Wikidata
Jump to navigation Jump to search

You may find these related resources helpful:


Guidelines for using this page[edit]

Documenting the import[edit]

  • Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
  • Please include notes on all steps of the process.
  • Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
  • It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.

Creating a Wikidata item for the dataset[edit]

  • Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
  • If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
  • Link the dataset Wikidata item to this page using {{P|??}}

Getting help[edit]

  • If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
  • You can ask for help on Wikidata:Project chat.

Overview[edit]

Dataset name[edit]

Directory of Open Access Journals

Source[edit]

Directory of Open Access Journals

Link[edit]

https://doaj.org/faq#metadata

Dataset description[edit]

List of open access journals that appear in the Directory of Open Access Journals

Started by[edit]

Additional information[edit]

Progress of import[edit]

The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.

Wikidata item for the datasetImport data into spreadsheetFormat the spreadsheet to import the dataStructure of data within WikidataMatch the dataset to WikidataImporting data into WikidataVisualisationsMaintainance queries and expected results
{{Q|??}}Not done yetNot done yetNot done yet Not done yetNot done yetNot done yetNot done yet

Edit history[edit]

Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.

Date Notes Who
2019-11-04 Imported some data for ~1300 journals with DOAJ seal, of which ~50 new, using the 2019-10-31 JSON dump and OpenRefine 3.3. Nemo 23:12, 4 November 2019 (UTC)[reply]
2019-11-05 Imported additional data for some 10k existing journals and publishers. About 3000 journals remain missing or unmatched. Nemo 11:05, 6 November 2019 (UTC)[reply]

Discussion of import[edit]

These headings are generally useful, please change this section to suit your needs.

Wikidata item for dataset[edit]

Import data into spreadsheet[edit]

Format the spreadsheet to import the data[edit]

Structure of data within Wikidata[edit]

Field name Wikidata property Notes
DOAJ ID Directory of Open Access Journals ID (P5115) + eISSN

catalog (P972) Directory of Open Access Journals (Q1227538)

Journal title label + title (P1476) needs language attribute
Alternative title alias and/or short name (P1813); needs language attribute
Discontinued date discontinued date (P2669)? P2669 in use by a handful journals as of October 2019, for instance Journal of Systems Chemistry (Q27725594)
Journal URL official website (P856)
Journal ISSN (print version) ISSN (P236)
Journal EISSN (online version) ISSN (P236) -- qualifier TODO
Publisher
Subjects main subject (P921)
Keywords ?P921
Society or institution If official journal of a society:

Potentially:

The submission form describes the field as "The name of the Society or Institution that the journal belongs to."
Platform, host or aggregator software engine (P408) Open Journal Systems (Q1710177) or other
Country of publisher country (P17)
Journal article processing charges (APCs)

If no APC: instance of (P31) APC-free journal (Q73365499)

If APC: instance of (P31) APC-funded journal (Q73365221)

If APC amount known and non-zero: has characteristic (P1552) article processing charge (Q15291071)

Article Processing Charge information URL -- described at URL (P973)
APC amount -- fee (P2555) qualify with point in time (P585)
Currency -- currency (P38)
Journal article submission fee has characteristic (P1552) article submission fee (Q50289174)
Submission fee URL -- described at URL (P973) better as a reference?
Submission fee amount -- fee (P2555) qualify with point in time (P585)
Submission fee currency -- currency (P38)
Number of articles published in the last calendar year Not to be imported, but useful as a filter. Very small or inactive journals may be dead or not worth creating items for.
Number of articles information URL
Journal waiver policy (for developing country authors etc) Related to APC.
Waiver policy information URL
Digital archiving policy or program(s)
Archiving: national library
Archiving: other
Archiving information URL
Journal full-text crawl permission
Permanent article identifiers
Journal provides download statistics
Download statistics information URL
First calendar year journal provided online Open Access content
Full text formats This field indicates whether the journal supports/allows different full text formats beyond PDF: for instance human-readable HTML and machine-readable JATS XML. Could perhaps use product or material produced or service provided (P1056)
Full text language
URL for the Editorial Board page
Review process
Review process information URL
URL for journal's aims & scope
URL for journal's instructions for authors
Journal plagiarism screening policy
Plagiarism information URL
Average number of weeks between submission and publication
URL for journal's Open Access statement
Machine-readable CC licensing information embedded or displayed in articles
URL to an example page with embedded licensing information
Journal license copyright license (P275)
License attributes appears to only be used where publisher uses own license in 'Journal license' field
URL for license terms use as a reference
Does this journal allow unrestricted reuse in compliance with BOAI?
Deposit policy directory
Author holds copyright without restrictions If true (and following true as well?), use product or material produced or service provided (P1056) works copyrighted by authors (Q73362150)

Otherwise, may warrant product or material produced or service provided (P1056) works copyrighted by learned societies (Q73362358) or product or material produced or service provided (P1056) works copyrighted by publishers (Q73362505), but that's not something captured by DOAJ so don't add it automatically.

Per https://blog.doaj.org/2015/05/19/copyright-and-licensing-incompatibility-part-1/ , answering "yes" to question 52 probably means that there is no wholesale copyright transfer or copyright assignment to the publisher. However, there may be additional contractual terms beyond the public license (if any).
Copyright information URL
Author holds publishing rights without restrictions Per above, answering "yes" probably means that there are no such additional contractual terms restricting the author's right to repost the same work elsewhere. The actual implications are unclear, especially if the journal is under an unfree license.
Publishing rights information URL
DOAJ Seal award received (P166) DOAJ seal (Q73548471)
Tick: Accepted after March 2014
Added on Date

Match the dataset to Wikidata[edit]

Example OpenRefine schema[edit]

Basic schema used after importing the JSON dump with the option "convert numbers, dates etc." and after reconciling the columns for title, publisher, country, license title:

{"itemDocuments":[{"subject":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - title"},"nameDescs":[{"name_type":"LABEL_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - title"}}},{"name_type":"DESCRIPTION_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringconstant","value":"open access academic journal"}}}],"statementGroups":[{"property":{"type":"wbpropconstant","pid":"P31","label":"instance of","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q737498","label":"academic journal"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]},{"value":{"type":"wbitemconstant","qid":"Q773668","label":"open-access journal"},"qualifiers":[{"prop":{"type":"wbpropconstant","pid":"P580","label":"start time","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - bibjson - oa_start - year"}}],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}},{"prop":{"type":"wbpropconstant","pid":"P813","label":"retrieved","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - last_updated"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P166","label":"award received","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q73548471","label":"DOAJ seal"},"qualifiers":[{"prop":{"type":"wbpropconstant","pid":"P585","label":"point in time","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - last_updated"}}],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}},{"prop":{"type":"wbpropconstant","pid":"P813","label":"retrieved","datatype":"time"},"value":{"type":"wbdateconstant","value":"2019-10-31"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P17","label":"country","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - country"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P123","label":"publisher","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - publisher"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P275","label":"license","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - license - _ - title"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P236","label":"ISSN","datatype":"external-id"},"statements":[{"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - identifier - _ - id"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]}]},{"subject":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - publisher"},"nameDescs":[{"name_type":"LABEL_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - publisher"}}},{"name_type":"DESCRIPTION_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringconstant","value":"academic publisher"}}}],"statementGroups":[{"property":{"type":"wbpropconstant","pid":"P31","label":"instance of","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q2085381","label":"publisher"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P17","label":"country","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - country"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]}]}]}

Importing data into Wikidata[edit]

This is quite a complex dataset, I'm not sure all of it is suitable for Wikidata, I think that perhaps new properties will need to be added and possibly imported as multiple mix n' match catalogues to match the journals, publishers etc.

John Cummings (talk) 09:36, 1 March 2018 (UTC)[reply]

Please do use the P template to spell out prop names (done a couple below).
Use qualifiers to your advantage. Eg I've modeled the 3 APC props below using qualifiers)
See "List of qualifiers" link at a prop discussion (eg Property_talk:P236 to discover appropriate qualifiers --Vladimir Alexiev (talk) 14:56, 1 March 2018 (UTC)[reply]
@Vladimir Alexiev:, wonderful, thank you, this is a really nice way of expressing qualifiers. --John Cummings (talk) 17:10, 1 March 2018 (UTC)[reply]
  • @John Cummings: I agree this is good data to import, but I don't think Mix n Match is the right tool for this. Normally with Mix n Match there's an identifier you are looking to link up, but DOAJ doesn't appear to use any identifiers of its own. Also it would be good to add the fact that these journals are in DOAJ (this isn't part of the DOAJ dataset - unless it's DOAJ Seal?, but just the fact the journals are on this list) - maybe member of (P463) would be suitable? I think the best approach here would be to use OpenRefine or anything similar to that to link up the journals, organizations, publishers etc with existing wikidata entries (where they exist), and then export to Quickstatements to add things that are not already in wikidata. ArthurPSmith (talk) 16:43, 2 March 2018 (UTC)[reply]
    • @ArthurPSmith:, thanks for the suggestions, we have gotten around the issue of no ID numbers previously by using a temporary ID system that is only used for Mix n' Match to match the columns afterwards. The other contributing factor is I have no idea how to use OpenRefine :( Yes we will definately add the fact these journals are part of (not sure of the right terminology) DOAJ. --John Cummings (talk) 17:23, 2 March 2018 (UTC)[reply]
    • Thanks @Pintoch:, I will get round to it one day, I have put quite a lot of time learning a system that just about works and spending another chunk of time learning another process that gets the same results doesn't feel like a priority. I've been working with a few others on collating guidance on data imports at Wikidata:Data_Import_Guide, please take a look and see if you can add anything OpenRefine related, I'm sure there is things to add to in the 'commonly used processes' section. --John Cummings (talk) 23:17, 2 March 2018 (UTC)[reply]
  • @John Cummings: I totally understand! I do think that you would save a lot of time with that tool though (20 minutes of watching Owen's tutorial would save you hours of Mix'n'Matching, for instance). But I acknowledge that we could have more tutorials to make the learning curve even smoother. Watch this space, I am working on it. − Pintoch (talk) 00:02, 3 March 2018 (UTC)[reply]
@Pintoch: great, thanks, the other thing I like about Mix n' Match is there is a public record of what has been matched that can be checked later to fix mistakes. I'm trying with the Data Import Hub to make this easier for other import methods. I'm very aware this page is getting very long, so there should be a version 2 along some time soon. Thanks again. --John Cummings (talk) 10:08, 3 March 2018 (UTC)[reply]
@John Cummings: the tutorials I promised are out - I hope you will find them useful − Pintoch (talk) 08:25, 30 May 2018 (UTC)[reply]
  • @John Cummings:, great idea and great dataset! I'd love to help with this import. What part are you working on now and how can I help make this import happen faster? Mahdimoqri (talk) 14:55, 20 March 2018 (UTC)[reply]
    • @Mahdimoqri:, Thanks very much, if you could take part in the Mix n' Match catalogue that would be great. Having done over 2000 matches I have only found one match to an existing item so its very safe to just create new items for everything. I will do a check for duplicate ISSN numbers before importing the data so we will catch any duplicates that snuck through :) --John Cummings (talk) 17:49, 20 March 2018 (UTC)[reply]
      • @John Cummings: I checked all the ISSNs in your catalogue (uploaded here) against all the existing items with an ISSN property (posted here). There is only one entry in your catalogue (this one) that matched an ISSN of an existing item on Wikidata (here) which I believe is a mistake. Seems you are matching the unmatched items in batches. Anything else I can help with?
  • Also, there is a related discussion on WikiCite here that you might find helpful.

What is the best way of saying the following:

  • This journal appears in DOAJ
  • This publisher publishes OA journals
  • This publisher publishes journals in DOAJ
  • This organisation publishes OA journals
  • This platform hosts OA journals

Additional statements to add that aren't covered by the filed matching[edit]

  • This journal appears in DOAJ
  • This publisher publishes OA journals
  • This publisher publishes journals in DOAJ
  • This organisation publishes OA journals
  • This platform hosts OA journals

Property proposals[edit]

Comments from matching[edit]

Issues:

  • Spelling mistakes
  • Multiple languages, hard to find the right one, Indonesian, Polish, Russian, Spanish, Portuguese are the most commons
  • Multiple languages, English one is often given in DOAJ, item name of Wikidata may be in another language
  • Parts of existing items, e.g library of a university where the university already has an item
  • Joint publishers created as one item
  • Publisher and journal has the same name, shouldn't be the same item, mix n' match may have thought it was the same thing (error warnings)
  • Name is only acronym
  • Name is only university department, doesn't include the name of the university
  • Some universities have multiple names

--John Cummings (talk) 12:45, 5 May 2018 (UTC)[reply]

Import completion notes[edit]

Visualisations[edit]

Maintenance[edit]

Queries and expected results[edit]

Query linkDescriptionExpected results
Link1Property1Notes1
Link2Property2Notes2
Link3Property3Notes3

Schedule of new data released[edit]