User talk:John Vandenberg
Welcome to Wikidata, John Vandenberg!
Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!
Need some help getting started? Here are some pages you can familiarize yourself with:
- Introduction – An introduction to the project.
- Wikidata tours – Interactive tutorials to show you how Wikidata works.
- Community portal – The portal for community members.
- User options – including the 'Babel' extension, to set your language preferences.
- Contents – The main help page for editing and using the site.
- Project chat – Discussions about the project.
- Tools – A collection of user-developed tools to allow for easier completion of some tasks.
Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.
If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.
- 1 Merge
- 2 Property:P1055
- 3 chromosome (P1057)
- 4 Files
- 5 Subpages
- 6 MedalBot
- 7 Wikimedia list article (Q13406463)
- 8 Literary critics
- 9 What is the suffix /R /TS /TN etc mean
- 10 Historisk Tidsskrift
- 11 statements at Wikimedia disambiguation pages
- 12 Wikidata:Requests for permissions/Bot/MedalBot
- 13 New tool to find duplicate items
- 14 Re: Odd description
- 15 Re: Cassida vibex vs Cassida viridis bot problem
- 16 Dharma Drum Buddhist College
- 17 About GZWDer (flood) bot
- 18 Spotted a mistake
- 19 Scholarly journal coverage in Wikidata
- 20 Unused properties
Hallo John Vandenberg,
When you merge items, please use the Merge.js gadget. It helps you merging, nominating, gives the option to always keep the lower number (which is older, so preferable) and makes it a lot easier for the admins to process the requests.
If you don't have account, you may have to Create Account. With regards,--by Revi at 09:04, 30 November 2013 (UTC)
- Thank you for informing me. I was going to ask about that when I woke up, so you have saved me the trouble. John Vandenberg (talk) 00:12, 31 December 2013 (UTC)
Please discuss about whether we can include some subtemplates at Wikidata:Requests for comment/Interwiki links for subpages.--GZWDer (talk) 05:30, 31 December 2013 (UTC)
- WMF is a legal Foundation. I don't think the legal Foundation have any special relationship with these list articles. Also, I have yet to find any evidence that this entity is suitable on items other than WikiPedia list pages. Have you seen list items for Wikibooks or Wikivoyage? If so, are the Wikibooks/Wikivoyage items really of the same class as the Wikipedia list pages? John Vandenberg (talk) 10:22, 4 March 2014 (UTC)
- Hi Kolja21, yes I saw this and a few other problems and stopped the bot until I can undo my bots edits, and prevent this happening again. The cause is en:Category:Newspapers published in the United Kingdom contains en:Category:British journalists, which contains en:Category:British critics, which contains en:Category:British literary critics. That category tree is insane, but I shouldn't have my bot on that category without manually approving each edit. I will automatically revert the bot on the problematic data items. John Vandenberg (talk) 21:40, 6 March 2014 (UTC)
What is the suffix /R /TS /TN etc mean
I think there might be an issue with Historisk Tidsskrift (Q15793533). Its a disambiguation page according to its linked English Wikipedia page en:Historisk Tidsskrift, yet have ERA. — Finn Årup Nielsen (fnielsen) (talk) 15:20, 14 April 2014 (UTC)
- Great, nice find. There will be a few of these, as I aggressively linked to a Wikipedia page when creating new items if at all possible (if it matched on ISSN, or on title if the title contained a variation of the word 'journal' in either English or local language), rather than allowing the bot to create duplicates. John Vandenberg (talk) 15:38, 14 April 2014 (UTC)
statements at Wikimedia disambiguation pages
- Hi, yes the bot has made a few mistakes like this. Thank you for finding and alerting me. The bot was in once-only aggressive heuristically mode, trying to identify all items which are periodicals or creative works in periodicals, and letting constraint exception reporting highlight the problems. See that item, where nl:Combat (tijdschrift) (a journal) was inappropriately linked with a disambiguation page by another bot. ;-( I have done detailed verification of all items with a ERA Journal ID (P1058), most items with ISSN (P236) but not yet carefully reviewed all items with no label (P357). As we now have a good journal database, all ongoing periodical work presumes that any two conflicting pieces of data must be reviewed by bot operator rather than proceed with a constraint violation. John Vandenberg (talk) 08:21, 27 April 2014 (UTC)
- Thanks. I saw the email. I will begin work again when I have returned to Indonesia. John Vandenberg (talk) 02:40, 5 May 2014 (UTC)
New tool to find duplicate items
- @Magnus Manske: "Item Christmas Island (Q686310) has potential duplicates: no label (Q16351925)" It should have suggested Christmas Island (Q31063) is the item it is likely a duplicate of.
- These are dups created because of your tools. Based on User talk:GZWDer#Creating duplicates unnecessarily, User talk:Daniel Mietchen#So many Widar enabled errors? (which could be as high as 70% error rate), and others, I wouldnt be surprised if we are talking hundreds of thousands of duplicates which are all going to grow into bot-populated items, until someone puts in a man month or more to clean up the mess. John Vandenberg (talk) 13:50, 13 May 2014 (UTC)
- "These are dups created because of your tools." That is, factually, a lie. Undoubtedly, some of them were created using my tools. How many, I do not know, and neither do you. FWIW, I looked at some of the dupes I merged, and didn't find any created though my OAuth tools; however, I spotted several created by User:GZWDer (flood). Maybe he'll be more receptive to your blame attempts. --Magnus Manske (talk) 15:23, 13 May 2014 (UTC)
Re: Odd description
Re: Cassida vibex vs Cassida viridis bot problem
Dharma Drum Buddhist College
Excuse me for butting in on your work, but can you explain why you added P1188 Dharma Drum Buddhist College place ID to Q668 India? That seems like a strange property for a country to have? SpinningSpark 10:19, 10 June 2014 (UTC)
- I think I have correctly mapped it to the record their database. http://authority.ddbc.edu.tw/place/search.php?code=PL000000048207 . My apologies if it is wrong. John Vandenberg (talk) 03:49, 13 June 2014 (UTC)
About GZWDer (flood) bot
Spotted a mistake
Hi, I hope that this message will help you improve your bot. A newspaper Moskovskij Komsomolets (Q1062623) was mistakenly labelled as an instance of scientific journal (Q5633421) in February last year. There also seem to be some other things labelled as scientific journals, but which don't actually publish scientific research... They should instead be instances of academic journal (Q737498). --BurritoBazooka (talk) 20:34, 21 September 2015 (UTC)
- Not sure about that last bit, actually. I just happened to find one really easily and assumed there are more. It might have just been by chance. --BurritoBazooka (talk) 20:40, 21 September 2015 (UTC)
- okay, I looked through the descriptions and labels of around 390 items labelled "scientific journal" (list was returned by AutoList2) and out of those found 7 (including that newspaper) which do not publish research, or research exclusively, or are not related to science, or science exclusively. I only looked at the labels and tried to guess before looking in more detail, so there might be more of them. All of them were labelled as scientific journals by your bot.
- They were:
- Harvard Law Review (Q1365125) - law review
- Daedalus (Q1270415) - science and arts journal
- Nordlyd (Q12717259) - mainly related to the science of linguistics, but not science exclusively
- Israel Law Review (Q12403262) - law review, described as law journal on Wikipedia so I went with that
- Die Welt des Islams (Q1217470) - academic (anthropology?) journal, focuses on literature and history
- Moskovskij Komsomolets (Q1062623) - newspaper
- International Journal of Ethics (Q15755944) - philosophy journal
Scholarly journal coverage in Wikidata
John, I was very impressed to see the level of coverage of scholarly journals/periodicals we have in Wikidata and I was curious to hear more from you – as someone who's been driving this effort – about:
- the data sources you use;
- any known data quality issues;
- data modeling needs (like missing or problematic identifiers/properties);
and more generally how we can help support this effort as part of m:WikiCite and WD:WikiProject Source. I was at a major scholarly publisher conference in London this week and the figures on journal coverage got quite some attention. --DarTar (talk) 15:58, 5 November 2016 (UTC)
- See also Wikidata:WikiProject Source MetaData/ToDo --DarTar (talk) 00:12, 6 November 2016 (UTC)
- @DarTar:, it was mostly loaded via my bot User:JVbot by merging the existing Wikipedia articles with all of the journals in the ERA 2010 journal list (Q15735759) and ERA 2012 journal list (Q15794938); see bot requests there for more information, and also more info at Wikidata_talk:WikiProject_Periodicals, and there is a Wikipedia page w:Excellence in Research for Australia about the dataset. There are around 10 minor errors in each dataset, which I recorded on Property_talk:P1058 and of course fixed, and some other problems where ERA 'choices' were reasonable but hard to marry up with Wikipedia's choices for the same topics. I deferred to Wikipedia choices in those cases to minimise disruption.
- I could talk and code forever about this topic and dataset, and have been doing it for 10 years, but I have other duties at the moment so I dont have time to participate in your alternative WikiProject. There are lots of data quality issues, but the largest is lack of data governance on WikiData coupled with quick and nasty bots and humans whose objective is having millions of edits rather than high quality edits. Until that is resolved, I generally do not find it useful to participate in Wikidata broadly, because the size of the total problem grows quicker than it can be fixed. I find it is only useful to load data as part of my own projects, do correlation and data cleansing, on Wikidata and in data extracts, and then leave the data to deteriorate over time rather than get into fights with people who are "building Wikidata" at all costs. These are the same problems that plagued Wikipedia for many years, and will probably sort themselves out in due course, by my attempts have failed and I have all but given up hope for the project (and I recommend my clients use their own Wikibase with the data extract loaded). Please let me know if you have specific questions I can help with. John Vandenberg (talk) 07:02, 6 November 2016 (UTC)
- Thanks for your input. I am wondering whether you can point us to specific deteriorations and problems with data governance, so we can learn from them? — Finn Årup Nielsen (fnielsen) (talk) 16:43, 6 November 2016 (UTC)
This is a kind reminder that the following properties were created more than six months ago: Norway Database for Statistics on Higher education publisher ID (P1271), Norway Import Service and Registration Authority publisher code (P1275), ISMN (P1208), Jufo ID (P1277), ecoregion (WWF) (P1425). As of today, these properties are used on less than five items. As the proposer of these properties you probably want to change the unfortunate situation by adding a few statements to items. --Pasleim (talk) 19:17, 17 January 2017 (UTC)