Wikidata:Dataset Imports/Directory of Open Access Journals

From Wikidata
Jump to navigation Jump to search

You may find these related resources helpful:

High-contrast-document-save.svg Dataset Imports    High-contrast-view-refresh.svg Why import data into Wikidata.    Light-Bulb by Till Teenck.svg Learn how to import data    Noun project 1248.svg Bot requests    Question Noun project 2185.svg Ask a data import question


Guidelines for using this page[edit]

Documenting the import[edit]

  • Guidelines on how to import a dataset into Wikidata are available at Wikidata:Data Import Guide.
  • Please include notes on all steps of the process.
  • Once a dataset has been imported into Wikidata please edit the page to change the progress status from in progress to complete.
  • It is strongly recommended to use Visual Editor when making changes to this page, particularly for editing any of the tables.

Creating a Wikidata item for the dataset[edit]

  • Please create a Wikidata item for the dataset, this will allow us to improve the coverage of datasets on Wikidata and understand what datasets are available on that topic and which of them have been added to Wikidata.
  • If you are working with very large dataset you can break it into smaller Mix n' Match catalogues, but only create one Wikidata item.
  • Link the dataset Wikidata item to this page using {{P|??}}

Getting help[edit]

  • If your dataset import runs into issues please edit the page to change the progress status from in progress to help needed.
  • You can ask for help on Wikidata:Project chat.

Overview[edit]

Dataset name[edit]

Directory of Open Access Journals

Source[edit]

Directory of Open Access Journals

Link[edit]

https://doaj.org/faq#metadata

Dataset description[edit]

List of open access journals that appear in the Directory of Open Access Journals

Started by[edit]

Additional information[edit]

Progress of import[edit]

The table below is used to track the progress of importing this dataset. The suggested column headings are most applicable to data being imported from a spreadsheet - you can change some column headings or add new columns as required to best describe the progress of this import.

Wikidata item for the datasetImport data into spreadsheetFormat the spreadsheet to import the dataStructure of data within WikidataMatch the dataset to WikidataImporting data into WikidataVisualisationsMaintainance queries and expected results
{{Q|??}}Not done yetNot done yetNot done yet Not done yetNot done yetNot done yetNot done yet

Edit history[edit]

Use the table below to list batches of edits that have been completed for this dataset. Ideally each entry should have all applicable columns filled out, but at a minimum please make to add a date and description to give an idea of what was added to Wikidata and when.

DateDescriptionMethodPropertiesQualifiersReferencesStatements addedStatements removedLink to import sheet
Date 1Description 1Method 1Properties 1Qualifiers 1References 1Added Count 1Removed Count 1Link 1
Date 2Description 2Method 2Properties 2Qualifiers 2References 2Added Count 2Removed Count 2Link 2
Date 3Description 3Method 3Properties 3Qualifiers 3References 3Added Count 3Removed Count 3Link 3

Discussion of import[edit]

These headings are generally useful, please change this section to suit your needs.

Wikidata item for dataset[edit]

Import data into spreadsheet[edit]

Format the spreadsheet to import the data[edit]

Structure of data within Wikidata[edit]

Field name Wikidata property Notes
DOAJ ID Directory of Open Access Journals ID (P5115)
Journal title label + title (P1476) needs language attribute
Alternative title alias and/or short name (P1813); needs language attribute
Journal URL official website (P856)
Journal ISSN (print version) ISSN (P236)
Journal EISSN (online version) ISSN (P236) -- qualifier TODO
Publisher
Subjects main subject (P921)
Keywords ?P921
Society or institution
Platform, host or aggregator
Country of publisher country (P17)
Journal article processing charges (APCs) has quality (P1552) article processing charge (Q15291071)
Article Processing Charge information URL -- described at URL (P973)
APC amount -- fee (P2555) qualify with point in time (P585)
Currency -- currency (P38)
Journal article submission fee has quality (P1552) article submission fee (Q50289174)
Submission fee URL -- described at URL (P973) better as a reference?
Submission fee amount -- fee (P2555) qualify with point in time (P585)
Submission fee currency -- currency (P38)
Number of articles publish in the last calendar year
Number of articles information URL
Journal waiver policy (for developing country authors etc)
Waiver policy information URL
Digital archiving policy or program(s)
Archiving: national library
Archiving: other
Archiving information URL
Journal full-text crawl permission
Permanent article identifiers
Journal provides download statistics
Download statistics information URL
First calendar year journal provided online Open Access content
Full text formats This seems an import fields to include for text reuse
Full text language
URL for the Editorial Board page
Review process
Review process information URL
URL for journal's aims & scope
URL for journal's instructions for authors
Journal plagiarism screening policy
Plagiarism information URL
Average number of weeks between submission and publication
URL for journal's Open Access statement
Machine-readable CC licensing information embedded or displayed in articles
URL to an example page with embedded licensing information
Journal license license (P275)
License attributes appears to only be used where publisher uses own license in 'Journal license' field
URL for license terms use as a reference
Does this journal allow unrestricted reuse in compliance with BOAI?
Deposit policy directory
Author holds copyright without restrictions
Copyright information URL
Author holds publishing rights without restrictions
Publishing rights information URL
DOAJ Seal
Tick: Accepted after March 2014
Added on Date

Match the dataset to Wikidata[edit]

Importing data into Wikidata[edit]

This is quite a complex dataset, I'm not sure all of it is suitable for Wikidata, I think that perhaps new properties will need to be added and possibly imported as multiple mix n' match catalogues to match the journals, publishers etc.

John Cummings (talk) 09:36, 1 March 2018 (UTC)

  • I think a list of journals is valuable even with just basic data. You could easily just import the list of titles with a few properties and then complete it later. You'd just make sure that existing entries aren't duplicated. A good way to do that would probably to attempt to complete ISSN for existing entries.
    --- Jura 09:47, 1 March 2018 (UTC)
    @Jura1: I think you're right, I'd like at a minimum to get a Mix n' Match catalogue started for both journals and publishers and fill in the hard stuff later once the matching has started. ISSN seems a very sensible way of checking for duplicates. --John Cummings (talk) 09:58, 1 March 2018 (UTC)
  • Looks useful. It seems that the directory has what we term a third-party formatter URL (P3303), in the format https://doaj.org/toc/$1 (example: [1]). I've added that to ISSN (P236). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:34, 1 March 2018 (UTC)
Please do use the P template to spell out prop names (done a couple below).
Use qualifiers to your advantage. Eg I've modeled the 3 APC props below using qualifiers)
See "List of qualifiers" link at a prop discussion (eg Property_talk:P236 to discover appropriate qualifiers --Vladimir Alexiev (talk) 14:56, 1 March 2018 (UTC)
@Vladimir Alexiev:, wonderful, thank you, this is a really nice way of expressing qualifiers. --John Cummings (talk) 17:10, 1 March 2018 (UTC)
  • @John Cummings: I agree this is good data to import, but I don't think Mix n Match is the right tool for this. Normally with Mix n Match there's an identifier you are looking to link up, but DOAJ doesn't appear to use any identifiers of its own. Also it would be good to add the fact that these journals are in DOAJ (this isn't part of the DOAJ dataset - unless it's DOAJ Seal?, but just the fact the journals are on this list) - maybe member of (P463) would be suitable? I think the best approach here would be to use OpenRefine or anything similar to that to link up the journals, organizations, publishers etc with existing wikidata entries (where they exist), and then export to Quickstatements to add things that are not already in wikidata. ArthurPSmith (talk) 16:43, 2 March 2018 (UTC)
    • @ArthurPSmith:, thanks for the suggestions, we have gotten around the issue of no ID numbers previously by using a temporary ID system that is only used for Mix n' Match to match the columns afterwards. The other contributing factor is I have no idea how to use OpenRefine :( Yes we will definately add the fact these journals are part of (not sure of the right terminology) DOAJ. --John Cummings (talk) 17:23, 2 March 2018 (UTC)
    • Thanks @Pintoch:, I will get round to it one day, I have put quite a lot of time learning a system that just about works and spending another chunk of time learning another process that gets the same results doesn't feel like a priority. I've been working with a few others on collating guidance on data imports at Wikidata:Data_Import_Guide, please take a look and see if you can add anything OpenRefine related, I'm sure there is things to add to in the 'commonly used processes' section. --John Cummings (talk) 23:17, 2 March 2018 (UTC)
  • @John Cummings: I totally understand! I do think that you would save a lot of time with that tool though (20 minutes of watching Owen's tutorial would save you hours of Mix'n'Matching, for instance). But I acknowledge that we could have more tutorials to make the learning curve even smoother. Watch this space, I am working on it. − Pintoch (talk) 00:02, 3 March 2018 (UTC)
@Pintoch: great, thanks, the other thing I like about Mix n' Match is there is a public record of what has been matched that can be checked later to fix mistakes. I'm trying with the Data Import Hub to make this easier for other import methods. I'm very aware this page is getting very long, so there should be a version 2 along some time soon. Thanks again. --John Cummings (talk) 10:08, 3 March 2018 (UTC)
@John Cummings: the tutorials I promised are out - I hope you will find them useful − Pintoch (talk) 08:25, 30 May 2018 (UTC)
  • @John Cummings:, great idea and great dataset! I'd love to help with this import. What part are you working on now and how can I help make this import happen faster? Mahdimoqri (talk) 14:55, 20 March 2018 (UTC)
    • @Mahdimoqri:, Thanks very much, if you could take part in the Mix n' Match catalogue that would be great. Having done over 2000 matches I have only found one match to an existing item so its very safe to just create new items for everything. I will do a check for duplicate ISSN numbers before importing the data so we will catch any duplicates that snuck through :) --John Cummings (talk) 17:49, 20 March 2018 (UTC)
      • @John Cummings: I checked all the ISSNs in your catalogue (uploaded here) against all the existing items with an ISSN property (posted here). There is only one entry in your catalogue (this one) that matched an ISSN of an existing item on Wikidata (here) which I believe is a mistake. Seems you are matching the unmatched items in batches. Anything else I can help with?
  • Also, there is a related discussion on WikiCite here that you might find helpful.
  • @John Cummings: It seems your catalogue does not currently deal with any journal title disambiguation. For example, it has mistakingly linked Q15749678 with this item but they are actually not referring to the same journal. I'm compiling a complete list here. Mahdimoqri (talk) 22:36, 25 March 2018 (UTC)

What is the best way of saying the following:

  • This journal appears in DOAJ
  • This publisher publishes OA journals
  • This publisher publishes journals in DOAJ
  • This organisation publishes OA journals
  • This platform hosts OA journals

Additional statements to add that aren't covered by the filed matching[edit]

  • This journal appears in DOAJ
  • This publisher publishes OA journals
  • This publisher publishes journals in DOAJ
  • This organisation publishes OA journals
  • This platform hosts OA journals

Property proposals[edit]

Comments from matching[edit]

Issues:

  • Spelling mistakes
  • Multiple languages, hard to find the right one, Indonesian, Polish, Russian, Spanish, Portuguese are the most commons
  • Multiple languages, English one is often given in DOAJ, item name of Wikidata may be in another language
  • Parts of existing items, e.g library of a university where the university already has an item
  • Joint publishers created as one item
  • Publisher and journal has the same name, shouldn't be the same item, mix n' match may have thought it was the same thing (error warnings)
  • Name is only acronym
  • Name is only university department, doesn't include the name of the university
  • Some universities have multiple names

--John Cummings (talk) 12:45, 5 May 2018 (UTC)

Import completion notes[edit]

Visualisations[edit]

Maintenance[edit]

Queries and expected results[edit]

Query linkDescriptionExpected results
Link1Property1Notes1
Link2Property2Notes2
Link3Property3Notes3

Schedule of new data released[edit]