Welcome to Wikidata, Pintoch!
Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!
Need some help getting started? Here are some pages you can familiarize yourself with:
- Introduction – An introduction to the project.
- Wikidata tours – Interactive tutorials to show you how Wikidata works.
- Community portal – The portal for community members.
- User options – including the 'Babel' extension, to set your language preferences.
- Contents – The main help page for editing and using the site.
- Project chat – Discussions about the project.
- Tools – A collection of user-developed tools to allow for easier completion of some tasks.
Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.
If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.
- @Pigsonthewing: Absolutely! But I don't think my extraction from the ORCID dump was the right method. It only returns very few ids, and the method is not very reliable. I have been working on a different extraction procedure that uses the ORCID autocompletion method to retrieve the clean metadata associated with the Ringgold IDs. Here is how:
- http://isni.ringgold.org/ provides a database of 400,000 institutions, with ISNI identifiers, but without Ringgold IDs. However, they come from Ringgold's own database, so these institutions also have Ringgold IDs (not sure why Ringgold does not provide them).
- we can match these records to the metadata returned by the autocompletion method in ORCID, because they have exactly the same tuple (name,city,region,country) (as they come from the same database).
- by doing so, we obtain a much richer database, with both ids and a few other interesting columns:
|Ringgold ID, ISNI|
- I am currently processing this dump (slowly, to keep the load on ORCID minimal).
- Concerning Mix'n'Match, I have mixed feelings. I don't think it is worth rushing to use it right now, because a lot of the dataset could be matched automatically, using more reliable methods than name matching. For instance, by matching existing ISNI identifiers (and first making sure that we have pulled all the ISNIs we could from other sources such as GRID). But also by fuzzy-matching on the other fields of the dataset (including the URL), which Mix'n'Match does not currently support. Let me know what you think! − Pintoch (talk) 21:22, 27 January 2017 (UTC)
- Matching in IDs is good. My concern with fuzzy matching, especially on "home page" URLs, is that we might wrongly match a faculty to the main university, or a department to a faculty. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:52, 27 January 2017 (UTC)
- Yeah, of course we don't want to match only on URLs, but I think that a combined match on (name,country,URL,type) could be more robust than name matching. There could be a lot of cases where this matching is good enough to be automated, and the harder cases would be done manually. This would be a perfect use case for the reconciliation feature of OpenRefine, so I have just contributed to the bounty there: https://www.bountysource.com/issues/985941. − Pintoch (talk) 23:22, 27 January 2017 (UTC)
Country property quickstatements
I note that you added historic countries to a modern school (founded 1989). https://www.wikidata.org/w/index.php?title=Q7408197&curid=7318187&action=history This doesn't seem right. Runner1928 (talk) 21:06, 22 March 2017 (UTC)
- @Runner1928: Whoops, thanks for spotting that! I'll fix it asap. − Pintoch (talk) 21:08, 22 March 2017 (UTC)
Not sure how useful it is to add these, but if you do like  could you please also include retrieved (P813)? You might want to include the start time (P580) too. Do you have an idea about how this data can be used? Multichill (talk) 08:32, 13 May 2017 (UTC)
- @Multichill: I agree, I should have added retrieved (P813), sorry about that! And yes, I have been thinking about start time (P580), but it is not clear to me if this date is available from RIPE: they have a creation time, which is very often set to epoch (Q2703), and a time of last update. I suppose that if the creation date is not Epoch then it is reasonably safe to assume that it is the right value for start time (P580) but I'm not entirely sure. About using this data, yes I have plenty of use cases in mind, and I plan to release a tool that uses this data in the coming weeks. − Pintoch (talk) 19:17, 13 May 2017 (UTC)
P17 or not P17…
I just saw this claim :
I'm not sure that country (P17) is appropriate here. Could you take a look?
- @VIGNERON: First sorry for this batch of country (P17) imports, I had many issues with it. But for that particular example it does not seem too bad to me! The English wikipedia does put the page in en:Category:International organisations based in Denmark… Would you prefer
? Whyh is that wrong to add country (P17) in these circumstances? Thanks! − Pintoch (talk) 13:28, 24 May 2017 (UTC)
- No problem, I understand, I had my fair share of problematic import myself.
- Yes, headquarters location (P159) would be better but after a deeper look I think that this item should be split in two: the online international database and the organisation behind it. That way, the article on the French Wikipedia could be link to the first and the English Wikipedia article to the second. What do you think?
- Cdlt, VIGNERON (talk) 13:49, 24 May 2017 (UTC)
- You can use ORCID iD (Q51044) and ORCID, Inc. (Q19861084) as a model ;-) Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:48, 24 May 2017 (UTC)