Wikidata:Requests for permissions/Bot/JVbot 2
The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- Approved--Ymblanter (talk) 12:24, 8 February 2014 (UTC)[reply]
JVbot 2[edit]
JVbot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: John Vandenberg (talk • contribs • logs)
Task/s: Creating items for journals and magazines, initially from Wikipedias, then from the ERA registry, and then from the Scopus Sources list if the property is approved.
Code: User:JVbot/periodicalbot.py -q:-1
# now using item.editEntity
Example: Journal of Adolescent & Adult Literacy (Q15710714)
Function details: During my first ISSN focused task, I have been manually creating items for journals with an infobox on any Wikipedia. That is nearly complete. As I start to work on stubs, there is a very high proportion of Wikipedia pages without an item here. I would like to automatically create them.
Once all Wikipedia pages about periodicals have a Wikidata item, with an ISSN where possible, I would like to continue uploading items for academic journals on authoritative lists, specifically the Australian national journal list, the Danish national journal list, and the Scopus Sources list. The bot will use a list of all ISSNs already on Wikidata items to minimise creation of duplicate journal entries where the source list uses an ISSN already present on Wikidata. The bot will also use title pattern matching to prevent creating items for titles that have a similar name to any existing Wikidata item that does not have an ISSN yet. --John Vandenberg (talk) 07:08, 4 February 2014 (UTC)[reply]
- Comment We have so many different types of "periodicals", "journals" and "magazines" that we first need to clean up the mess from the basic WP import before adding properties with different (and still changing) meanings. For example: Names and categories do not match, and the translations of the labels are often improvised. The US for example has a different understanding of "journals" than other countries, just take the "peer peer review" thing. --Kolja21 (talk) 08:11, 4 February 2014 (UTC)[reply]
- @Kolja21:, this bot is to create the missing items automatically. If this bot isnt approved, I will continue creating them manually one by one. I would like to ensure that we have 95% of useful journals and magazines, so that Help:Sources can strongly recommend that people search for existing items of periodicals. We don't want people creating new items for journals haphazardly, as they wont bother adding an ISSN, etc, resulting in duplicates and a bigger mess. The objective is to ensure we have items with authority control identifiers like ISSNs so that they are uniquely identified. Those authority control properties, like ISSN, wont change meaning. I suspect you may be concerned about P357 (P357) possibly changing meaning; I am a bit worried about that one. When we have a complete list of journal items which exist on the 'pedias, discussions about Wikipedia categories and Wikidata instance/subclass structure will be more thorough. My bot is quite complicated because the 'Infobox' on different projects has different meanings across the 'pedias (e.g. German and Dutch use the same infobox for magazines and journals). I'm working on importing all of those infobox parameters, so that Wikidata has a standard set of properties for all periodical types, and the Wikipedias will hopefully adopt the same consistency. John Vandenberg (talk) 14:44, 4 February 2014 (UTC)[reply]
- @John Vandenberg: there are a lot of ambiguities between trade, professional, education, science, scientific, academic and peer-reviewed periodicals / magazines / journals. Looking at the translations it's a big mess. (Like your example: German and Dutch use the same infobox for magazines and journals. For most non-English readers there is no difference between a magazine and a journal.) Imho Wikidata:Periodicals task force needs first to develop a classification and than check the items before a bot can start using these properties. --Kolja21 (talk) 23:46, 5 February 2014 (UTC)[reply]
- @Kolja21:, I agree that the instanceof/subclassof hierarchy does need to be discussed and developed. However I believe that needs be a slow process if we are to engage many languages and disciplines to ensure we arrive at an effective and acceptable consensus. Once we have a taxonomy, we can determine how to update all of the items so that they have the correct classification. At that time, I will offer my services to help - using a bot to automate the updates that can be automated, and manually updating the items that cant be automatically determined. That can be done after we create the Wikidata items. I believe we should create these items now, using a bot, and give them an instance of, so we can easily find them later. Help:Sources currently tells Wikimedians to create these items, with instance of=>scientific journal. That results in duplicates, because Wikimedians are often lazy about items that are not in their area of specialty or interest - they let someone else fix their records. We need to take ownership of this, and give them the worlds best journal database, so adding sourced claims is easy.
- Would you be supportive of the 'JVbot 2' task if items were created as instance of => periodical (Q1092563) by default? How can this import be better? John Vandenberg (talk) 08:08, 6 February 2014 (UTC)[reply]
- @John Vandenberg: I'm not against this bot and using periodical (Q1092563) sounds like a good idea, since it should cover all types of journals, magazines, series etc. --Kolja21 (talk) 15:33, 6 February 2014 (UTC)[reply]
- @John Vandenberg: there are a lot of ambiguities between trade, professional, education, science, scientific, academic and peer-reviewed periodicals / magazines / journals. Looking at the translations it's a big mess. (Like your example: German and Dutch use the same infobox for magazines and journals. For most non-English readers there is no difference between a magazine and a journal.) Imho Wikidata:Periodicals task force needs first to develop a classification and than check the items before a bot can start using these properties. --Kolja21 (talk) 23:46, 5 February 2014 (UTC)[reply]
- Mind to fix this List before running the next task? --Succu (talk) 22:37, 5 February 2014 (UTC)[reply]
- @Succu: I have fixed the majority of the invalid ISSNs[1] and added ISSN validation to the bot so it wont add more invalid ISSNs. The existing dups will take longer to complete, as we also need to merge the articles on the 'pedias if that is the cause, as it often is.(e.g. w:Men and Masculinities (journal) and w:Men and Masculinities, and w:Administration Science Quarterly and w:Administrative Science Quarterly. The proposed bot wont create an item if the ISSN is on Wikidata (no new duplicate will be created), so may I do the de-duping while running the bot..? ;-) John Vandenberg (talk) 07:41, 6 February 2014 (UTC)[reply]
- @Succu: I've finished the unique value violations (including another ~five dup articles on en.wp - sigh), except for Mitteilungen StRuG (Q15148184)/Broadcasting and History (Q15199499) & Journal of Experimental Psychology (Q6295181)/Journal of Experimental Psychology: General (Q6295187) , which are related to each other and used the same issn at different times - a common occurrence. John Vandenberg (talk) 05:05, 7 February 2014 (UTC)[reply]
- @Succu: The code to detect when an ISSN is already on item has been added (see code above), so dups can not be created. There are ~1000 items to create from the Wikipedias for academic journals alone - more for magazines and other periodicals with an ISSN. John Vandenberg (talk) 08:19, 8 February 2014 (UTC)[reply]
- Thanks for fixing the constraints and extending your code. I'd like to have all these journal items, but I am not sure we really should create all these unlinked/unused items. --Succu (talk) 09:21, 8 February 2014 (UTC)[reply]