Wikidata:Dataset Imports/Directory of Open Access Journals
You may find these related resources helpful:
Guidelines for using this page[edit]
Documenting the import[edit]
- Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
- Please include notes on all steps of the process.
- Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
- It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.
Creating a Wikidata item for the dataset[edit]
- Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
- If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
- Link the dataset Wikidata item to this page using
{{P|??}}
Getting help[edit]
- If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
- You can ask for help on Wikidata:Project chat.
Overview[edit]
Dataset name[edit]
Directory of Open Access Journals
Source[edit]
Directory of Open Access Journals
Link[edit]
Dataset description[edit]
List of open access journals that appear in the Directory of Open Access Journals
Started by[edit]
Additional information[edit]
Progress of import[edit]
The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.
Wikidata item for the dataset | Import data into spreadsheet | Format the spreadsheet to import the data | Structure of data within Wikidata | Match the dataset to Wikidata | Importing data into Wikidata | Visualisations | Maintainance queries and expected results |
---|---|---|---|---|---|---|---|
{{Q|??}} | Not done yet | Not done yet | Not done yet | Not done yet | Not done yet | Not done yet | Not done yet |
Edit history[edit]
Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.
Date | Notes | Who |
---|---|---|
2019-11-04 | Imported some data for ~1300 journals with DOAJ seal, of which ~50 new, using the 2019-10-31 JSON dump and OpenRefine 3.3. | Nemo 23:12, 4 November 2019 (UTC) |
2019-11-05 | Imported additional data for some 10k existing journals and publishers. About 3000 journals remain missing or unmatched. | Nemo 11:05, 6 November 2019 (UTC) |
Discussion of import[edit]
These headings are generally useful, please change this section to suit your needs.
Wikidata item for dataset[edit]
Import data into spreadsheet[edit]
Format the spreadsheet to import the data[edit]
Structure of data within Wikidata[edit]
- Dataset here
- Wikidata property search
- "--" indicates a qualifier applied on the previous property
Field name | Wikidata property | Notes |
---|---|---|
DOAJ ID | Directory of Open Access Journals ID (P5115) + eISSN | |
Journal title | label + title (P1476) | needs language attribute |
Alternative title | alias and/or short name (P1813); needs language attribute | |
Discontinued date | discontinued date (P2669)? | P2669 in use by a handful journals as of October 2019, for instance Journal of Systems Chemistry (Q27725594) |
Journal URL | official website (P856) | |
Journal ISSN (print version) | ISSN (P236) | |
Journal EISSN (online version) | ISSN (P236) -- qualifier TODO | |
Publisher |
|
|
Subjects | main subject (P921) | |
Keywords | ?P921 | |
Society or institution | If official journal of a society:
Potentially:
|
The submission form describes the field as "The name of the Society or Institution that the journal belongs to." |
Platform, host or aggregator | software engine (P408) Open Journal Systems (Q1710177) or other | |
Country of publisher | country (P17) | |
Journal article processing charges (APCs) |
If no APC: instance of (P31) APC-free journal (Q73365499) If APC: instance of (P31) APC-funded journal (Q73365221) If APC amount known and non-zero: has quality (P1552) article processing charge (Q15291071) |
|
Article Processing Charge information URL | -- described at URL (P973) | |
APC amount | -- fee (P2555) | qualify with point in time (P585) |
Currency | -- currency (P38) | |
Journal article submission fee | has quality (P1552) article submission fee (Q50289174) | |
Submission fee URL | -- described at URL (P973) | better as a reference? |
Submission fee amount | -- fee (P2555) | qualify with point in time (P585) |
Submission fee currency | -- currency (P38) | |
Number of articles published in the last calendar year | Not to be imported, but useful as a filter. Very small or inactive journals may be dead or not worth creating items for. | |
Number of articles information URL | ||
Journal waiver policy (for developing country authors etc) | Related to APC. | |
Waiver policy information URL | ||
Digital archiving policy or program(s) | ||
Archiving: national library | ||
Archiving: other | ||
Archiving information URL | ||
Journal full-text crawl permission | ||
Permanent article identifiers | ||
Journal provides download statistics | ||
Download statistics information URL | ||
First calendar year journal provided online Open Access content | ||
Full text formats | This field indicates whether the journal supports/allows different full text formats beyond PDF: for instance human-readable HTML and machine-readable JATS XML. Could perhaps use product or material produced (P1056) | |
Full text language | ||
URL for the Editorial Board page | ||
Review process | ||
Review process information URL | ||
URL for journal's aims & scope | ||
URL for journal's instructions for authors | ||
Journal plagiarism screening policy | ||
Plagiarism information URL | ||
Average number of weeks between submission and publication | ||
URL for journal's Open Access statement | ||
Machine-readable CC licensing information embedded or displayed in articles | ||
URL to an example page with embedded licensing information | ||
Journal license | copyright license (P275) | |
License attributes | appears to only be used where publisher uses own license in 'Journal license' field | |
URL for license terms | use as a reference | |
Does this journal allow unrestricted reuse in compliance with BOAI? | ||
Deposit policy directory | ||
Author holds copyright without restrictions | If true (and following true as well?), use product or material produced (P1056) works copyrighted by authors (Q73362150)
Otherwise, may warrant product or material produced (P1056) works copyrighted by learned societies (Q73362358) or product or material produced (P1056) works copyrighted by publishers (Q73362505), but that's not something captured by DOAJ so don't add it automatically. |
Per https://blog.doaj.org/2015/05/19/copyright-and-licensing-incompatibility-part-1/ , answering "yes" to question 52 probably means that there is no wholesale copyright transfer or copyright assignment to the publisher. However, there may be additional contractual terms beyond the public license (if any). |
Copyright information URL | ||
Author holds publishing rights without restrictions | Per above, answering "yes" probably means that there are no such additional contractual terms restricting the author's right to repost the same work elsewhere. The actual implications are unclear, especially if the journal is under an unfree license. | |
Publishing rights information URL | ||
DOAJ Seal | award received (P166) DOAJ seal (Q73548471) | |
Tick: Accepted after March 2014 | ||
Added on Date |
Match the dataset to Wikidata[edit]
Example OpenRefine schema[edit]
Basic schema used after importing the JSON dump with the option "convert numbers, dates etc." and after reconciling the columns for title, publisher, country, license title:
{"itemDocuments":[{"subject":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - title"},"nameDescs":[{"name_type":"LABEL_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - title"}}},{"name_type":"DESCRIPTION_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringconstant","value":"open access academic journal"}}}],"statementGroups":[{"property":{"type":"wbpropconstant","pid":"P31","label":"instance of","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q737498","label":"academic journal"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]},{"value":{"type":"wbitemconstant","qid":"Q773668","label":"open-access journal"},"qualifiers":[{"prop":{"type":"wbpropconstant","pid":"P580","label":"start time","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - bibjson - oa_start - year"}}],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}},{"prop":{"type":"wbpropconstant","pid":"P813","label":"retrieved","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - last_updated"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P166","label":"award received","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q73548471","label":"DOAJ seal"},"qualifiers":[{"prop":{"type":"wbpropconstant","pid":"P585","label":"point in time","datatype":"time"},"value":{"type":"wbdatevariable","columnName":"_ - _ - last_updated"}}],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}},{"prop":{"type":"wbpropconstant","pid":"P813","label":"retrieved","datatype":"time"},"value":{"type":"wbdateconstant","value":"2019-10-31"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P17","label":"country","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - country"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P123","label":"publisher","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - publisher"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P275","label":"license","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - license - _ - title"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P236","label":"ISSN","datatype":"external-id"},"statements":[{"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - identifier - _ - id"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]}]},{"subject":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - publisher"},"nameDescs":[{"name_type":"LABEL_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringvariable","columnName":"_ - _ - bibjson - publisher"}}},{"name_type":"DESCRIPTION_IF_NEW","value":{"type":"wbmonolingualexpr","language":{"type":"wblanguageconstant","id":"en","label":"en"},"value":{"type":"wbstringconstant","value":"academic publisher"}}}],"statementGroups":[{"property":{"type":"wbpropconstant","pid":"P31","label":"instance of","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemconstant","qid":"Q2085381","label":"publisher"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]},{"property":{"type":"wbpropconstant","pid":"P17","label":"country","datatype":"wikibase-item"},"statements":[{"value":{"type":"wbitemvariable","columnName":"_ - _ - bibjson - country"},"qualifiers":[],"references":[{"snaks":[{"prop":{"type":"wbpropconstant","pid":"P248","label":"stated in","datatype":"wikibase-item"},"value":{"type":"wbitemconstant","qid":"Q1227538","label":"Directory of Open Access Journals"}}]}]}]}]}]}
Importing data into Wikidata[edit]
This is quite a complex dataset, I'm not sure all of it is suitable for Wikidata, I think that perhaps new properties will need to be added and possibly imported as multiple mix n' match catalogues to match the journals, publishers etc.
John Cummings (talk) 09:36, 1 March 2018 (UTC)
- I think a list of journals is valuable even with just basic data. You could easily just import the list of titles with a few properties and then complete it later. You'd just make sure that existing entries aren't duplicated. A good way to do that would probably to attempt to complete ISSN for existing entries.
--- Jura 09:47, 1 March 2018 (UTC)- @Jura1: I think you're right, I'd like at a minimum to get a Mix n' Match catalogue started for both journals and publishers and fill in the hard stuff later once the matching has started. ISSN seems a very sensible way of checking for duplicates. --John Cummings (talk) 09:58, 1 March 2018 (UTC)
- Looks useful. It seems that the directory has what we term a third-party formatter URL (P3303), in the format https://doaj.org/toc/$1 (example: [1]). I've added that to ISSN (P236). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:34, 1 March 2018 (UTC)
- Thanks very much @Pigsonthewing:, this is super helpful, I missed this. I'm going to have a go at preparing the Mix n' Match catalogues today, would be nice if a grown up could check them before I press the magic button :) --John Cummings (talk) 12:21, 1 March 2018 (UTC)
- NP. It's also worth noting that they have redirects, so if a journal has, say, both a print ISSN and an eISSN, both will work. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:50, 1 March 2018 (UTC)
- Thanks very much, I'm going to have to ask for a property for eISSN. --John Cummings (talk) 13:35, 1 March 2018 (UTC)
- @John Cummings: eISSNs have been discussed in the past: Wikidata:Project_chat/Archive/2015/03#ISSNs_for_web_and_print and in French at Topic:U6idbfbzhvlnb3ak. For now we have been adding them with the same property as ISSN and using a qualifier to distinguish the two. − Pintoch (talk) 14:21, 1 March 2018 (UTC)
- @Pintoch:, thanks very much, that's really useful information, nice to have less to do :) --John Cummings (talk) 17:10, 1 March 2018 (UTC)
- @John Cummings: eISSNs have been discussed in the past: Wikidata:Project_chat/Archive/2015/03#ISSNs_for_web_and_print and in French at Topic:U6idbfbzhvlnb3ak. For now we have been adding them with the same property as ISSN and using a qualifier to distinguish the two. − Pintoch (talk) 14:21, 1 March 2018 (UTC)
- Thanks very much, I'm going to have to ask for a property for eISSN. --John Cummings (talk) 13:35, 1 March 2018 (UTC)
- NP. It's also worth noting that they have redirects, so if a journal has, say, both a print ISSN and an eISSN, both will work. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:50, 1 March 2018 (UTC)
- Thanks very much @Pigsonthewing:, this is super helpful, I missed this. I'm going to have a go at preparing the Mix n' Match catalogues today, would be nice if a grown up could check them before I press the magic button :) --John Cummings (talk) 12:21, 1 March 2018 (UTC)
- @John Cummings: A great initiative!
- Please do use the P template to spell out prop names (done a couple below).
- Use qualifiers to your advantage. Eg I've modeled the 3 APC props below using qualifiers)
- See "List of qualifiers" link at a prop discussion (eg Property_talk:P236 to discover appropriate qualifiers --Vladimir Alexiev (talk) 14:56, 1 March 2018 (UTC)
- @Vladimir Alexiev:, wonderful, thank you, this is a really nice way of expressing qualifiers. --John Cummings (talk) 17:10, 1 March 2018 (UTC)
- @John Cummings: I agree this is good data to import, but I don't think Mix n Match is the right tool for this. Normally with Mix n Match there's an identifier you are looking to link up, but DOAJ doesn't appear to use any identifiers of its own. Also it would be good to add the fact that these journals are in DOAJ (this isn't part of the DOAJ dataset - unless it's DOAJ Seal?, but just the fact the journals are on this list) - maybe member of (P463) would be suitable? I think the best approach here would be to use OpenRefine or anything similar to that to link up the journals, organizations, publishers etc with existing wikidata entries (where they exist), and then export to Quickstatements to add things that are not already in wikidata. ArthurPSmith (talk) 16:43, 2 March 2018 (UTC)
- @ArthurPSmith:, thanks for the suggestions, we have gotten around the issue of no ID numbers previously by using a temporary ID system that is only used for Mix n' Match to match the columns afterwards. The other contributing factor is I have no idea how to use OpenRefine :( Yes we will definately add the fact these journals are part of (not sure of the right terminology) DOAJ. --John Cummings (talk) 17:23, 2 March 2018 (UTC)
- @John Cummings: - have you tried the videos? For a general intro to OpenRefine, the ones at http://openrefine.org are quite nice. For Wikidata reconciliation, there are a number of resources listed at https://tools.wmflabs.org/openrefine-wikidata/ (including videos). I'd be happy to help if there is anything you struggle with. − Pintoch (talk) 18:07, 2 March 2018 (UTC)
- Thanks @Pintoch:, I will get round to it one day, I have put quite a lot of time learning a system that just about works and spending another chunk of time learning another process that gets the same results doesn't feel like a priority. I've been working with a few others on collating guidance on data imports at Wikidata:Data_Import_Guide, please take a look and see if you can add anything OpenRefine related, I'm sure there is things to add to in the 'commonly used processes' section. --John Cummings (talk) 23:17, 2 March 2018 (UTC)
- @John Cummings: I totally understand! I do think that you would save a lot of time with that tool though (20 minutes of watching Owen's tutorial would save you hours of Mix'n'Matching, for instance). But I acknowledge that we could have more tutorials to make the learning curve even smoother. Watch this space, I am working on it. − Pintoch (talk) 00:02, 3 March 2018 (UTC)
- @Pintoch: great, thanks, the other thing I like about Mix n' Match is there is a public record of what has been matched that can be checked later to fix mistakes. I'm trying with the Data Import Hub to make this easier for other import methods. I'm very aware this page is getting very long, so there should be a version 2 along some time soon. Thanks again. --John Cummings (talk) 10:08, 3 March 2018 (UTC)
- @John Cummings: the tutorials I promised are out - I hope you will find them useful − Pintoch (talk) 08:25, 30 May 2018 (UTC)
- @John Cummings:, great idea and great dataset! I'd love to help with this import. What part are you working on now and how can I help make this import happen faster? Mahdimoqri (talk) 14:55, 20 March 2018 (UTC)
- @Mahdimoqri:, Thanks very much, if you could take part in the Mix n' Match catalogue that would be great. Having done over 2000 matches I have only found one match to an existing item so its very safe to just create new items for everything. I will do a check for duplicate ISSN numbers before importing the data so we will catch any duplicates that snuck through :) --John Cummings (talk) 17:49, 20 March 2018 (UTC)
- @John Cummings: I checked all the ISSNs in your catalogue (uploaded here) against all the existing items with an ISSN property (posted here). There is only one entry in your catalogue (this one) that matched an ISSN of an existing item on Wikidata (here) which I believe is a mistake. Seems you are matching the unmatched items in batches. Anything else I can help with?
- @Mahdimoqri:, Thanks very much, if you could take part in the Mix n' Match catalogue that would be great. Having done over 2000 matches I have only found one match to an existing item so its very safe to just create new items for everything. I will do a check for duplicate ISSN numbers before importing the data so we will catch any duplicates that snuck through :) --John Cummings (talk) 17:49, 20 March 2018 (UTC)
- Also, there is a related discussion on WikiCite here that you might find helpful.
- @John Cummings: It seems your catalogue does not currently deal with any journal title disambiguation. For example, it has mistakingly linked Q15749678 with this item but they are actually not referring to the same journal. I'm compiling a complete list here. Mahdimoqri (talk) 22:36, 25 March 2018 (UTC)
- See also Wikidata_talk:WikiProject_Open_Access#Other_sources_to_import for some other journals imported. IMHO it's best to import data which links various datasets together. It's less important to have all the details of the APC amounts, for instance. I would import relatively stable information like the CC license and DOAJ seal status. Nemo 22:14, 4 November 2019 (UTC)
What is the best way of saying the following:
- This journal appears in DOAJ
- This publisher publishes OA journals
- This publisher publishes journals in DOAJ
- This organisation publishes OA journals
- This platform hosts OA journals
Additional statements to add that aren't covered by the filed matching[edit]
- This journal appears in DOAJ
- This publisher publishes OA journals
- This publisher publishes journals in DOAJ
- This organisation publishes OA journals
- This platform hosts OA journals
Property proposals[edit]
Comments from matching[edit]
Issues:
- Spelling mistakes
- Multiple languages, hard to find the right one, Indonesian, Polish, Russian, Spanish, Portuguese are the most commons
- Multiple languages, English one is often given in DOAJ, item name of Wikidata may be in another language
- Parts of existing items, e.g library of a university where the university already has an item
- Joint publishers created as one item
- Publisher and journal has the same name, shouldn't be the same item, mix n' match may have thought it was the same thing (error warnings)
- Name is only acronym
- Name is only university department, doesn't include the name of the university
- Some universities have multiple names
--John Cummings (talk) 12:45, 5 May 2018 (UTC)
Import completion notes[edit]
Visualisations[edit]
Maintenance[edit]
Queries and expected results[edit]
Query link | Description | Expected results |
---|---|---|
Link1 | Property1 | Notes1 |
Link2 | Property2 | Notes2 |
Link3 | Property3 | Notes3 |