User talk:Jheald

From Wikidata
Jump to navigation Jump to search
Logo of Wikidata

Welcome to Wikidata, Jheald!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

  • Introduction – An introduction to the project.
  • Wikidata tours – Interactive tutorials to show you how Wikidata works.
  • Community portal – The portal for community members.
  • User options – including the 'Babel' extension, to set your language preferences.
  • Contents – The main help page for editing and using the site.
  • Project chat – Discussions about the project.
  • Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, please ask me on my talk page. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! Liuxinyu970226 (talk) 23:58, 10 August 2014 (UTC)

Contents

WDQ[edit]

Yes, you can filter those results by label matching some characters, but for that you need to use Autolists2. The link is on the same page of autolists, on the top part, it says: "FOR EDITING WIKIDATA, please use this tool's successor, AutoList 2!"--Micru (talk) 06:42, 15 August 2014 (UTC)

Wikidata:WikiProject sum of all paintings[edit]

I see you're interested in GLAM and sturctured data. You might want to join this project. The project has it's own goals and it will already give us a lot of experience on how to model data about works of art. Multichill (talk) 09:48, 17 August 2014 (UTC)

Wikidata:WikiProject Structured Data for Commons/Phase 1 progress/Links updates[edit]

I think will be good idea to setup periodical links updates (for example, once a day). Or provide update link similar to c:User:OgreBot/Uploads_by_new_users/2014_September_06_06:00. --EugeneZelenko (talk) 14:56, 8 September 2014 (UTC)

@EugeneZelenko: Hi Eugene, thanks for the interest. I have to admit, I'm still quite a newbie when it comes to automation -- at the moment I'm trying to get my very first bot to run (on Commons), and struggling to understand why the perl module I was going to use is refusing to install -- so it looks like the moment has finally come that I'm going to have to get to know Python...
The pages were the best I can do at the moment with my current level of experience and understanding, because I haven't yet really worked out how to get to get myself set up on Labs to run my own queries; nor the bot framework to put it all together; nor how to trap a button being pressed and get it to run a bot. These are all things that I probably ought to try to find out quite soon, but for the moment it's beyond where I am at.
On the other hand, what should be reasonably easy with the links provided should be to get a current count to see if that's different to what's on the page; and to download a .tsv file that can then be cut-and-pasted in the edit window. Yes it's a pain, and it would be nice to have a shiny blue button that updated everything automatically, or a cron job automatically updating the pages every n days; if somebody wants to make that, I'd be very very happy to see it.
But for the moment, it did the basics of showing what's there. And if some nice person did go ahead and spend an afternoon clearing out eg all the direct file sitelinks that shouldn't be there, then I hope I have made it easy enough for them to regenerate the page. Jheald (talk) 15:33, 8 September 2014 (UTC)
Updated cygwin, so now MediaWiki::Bot now installs: so I'm getting there... towards my first bot edit, anyway. Still a long way to writing automated pages though. :-) Jheald (talk) 15:45, 8 September 2014 (UTC)
I fixed some of problems as time allowed, but I didn't have time for pages updates. --EugeneZelenko (talk) 13:57, 9 September 2014 (UTC)
@EugeneZelenko: Wow! You have been busy! Now only 4 links to Creator pages, none at all to Institution pages, and a dozen fewer than there were to File pages. Impressive!
You're right, the updating is a pain (having now done it). It would good to have auto-updated summary statistics on the summary page. This surely can be built, but I'm not sure I can do it very soon. (Still failing to get my first ever bot edit to actually happen over on Commons -- for some reason Media::Bot can't re-write the page, so it looks like the day has finally arrived for me to learn Python!)
However one important thing that may help, is to tell people that they need to be logged in to the Quarry tool in order for the "Submit Query" button to appear. Then, particularly if you're just working on one namespace, updating oughtn't be too painful. Jheald (talk) 16:47, 9 September 2014 (UTC)

Creator template[edit]

Hi Jheald, I'm traveling these days and I don't have time to look into it, but I recommend taking a look to the Authority template in Wikipedia, maybe you get some ideas from there. Good luck!--Micru (talk) 12:30, 22 September 2014 (UTC)

Louis Carrogis Carmontelle (Q982053)[edit]

Hi James, I don't get these edits. Care to explain? Multichill (talk) 18:02, 21 November 2014 (UTC)

@Multichill: I was using Magnus's QuickStatements tool to add Commons Creator page (P1472) properties. But it seems I have got some of the Q-numbers wrong. It's conceivable that I wasn't careful enough to check whether a regex capture had been successful, and used an old value. I'll look at the scripts and try to identify what went wrong, and if there were any other Q-numbers that got spurious additional links. Thanks for spotting this & pulling me up about it. Jheald (talk) 18:33, 21 November 2014 (UTC)
@Multichill: Update: It looks like the script was working properly, but the Q-values on the creator templates were wrong. (Presumably due to a never-corrected cut and paste when they were created). There are also a few genuine duplicates, which I will turn into redirects. In all about 100 such Creator templates to go through, so I'll get on with that. All best, Jheald (talk) 19:01, 21 November 2014 (UTC)
Ok, thank you! You did see the constraint report? It's very useful for finding mistakes and duplicates. Can you revert Louis Carrogis Carmontelle (Q982053) when you're done? Multichill (talk) 19:14, 21 November 2014 (UTC)
@Multichill: Thanks, I'd forgotten that was there. So I'll be going through the list, and sorting out the dupes and incorrect Q-numbers. Jheald (talk) 19:19, 21 November 2014 (UTC)

BBC Your Paintings artist identifier[edit]

You added Art UK artist ID (P1367) to John Jones (Q454248), Francina Margaretha van Huysum (Q15511647) and James Charles (Q6131230). What was the data source? These statements were all wrong but it does not seem like an error by you but by the data source as also other users added exactly the same wrong statements. --Pasleim (talk) 10:02, 2 December 2014 (UTC)

@Pasleim, Jane023: Good catch. These look like automated edits made by synchronising from Magnus's Mix-n-match tool. Somebody has wrongly identified "Your paintings" links with these items in the tool, and it is then trying to re-add the information every time somebody uses the synchronise option. (Choose catalog, then the 'Y' link at the end of the "Your paintings" line).
This will presumably continue until the incorrect identifications are removed from the tool. The best way to do that is probably to set up correct items for these links; then update the tool by importing from Wikidata; then check for double use of IDs. I'll get on to this. I think I can see the "John Jones" one; the others may take a little more investigation. Jheald (talk) 11:46, 2 December 2014 (UTC)
@Pasleim, Jane023: Update. I think I've removed all but the John Jones identification from the mix'n'match tool. The key was to use the 'Search' link, which then had a "Remove match" function. Unfortunately, there seem to be more "John Joneses" that the search can display, so it doesn't give me the option. So for the moment, anyone who uses the update function must remember to remove the "John Jones" link manually from the Quick Statements run. Jheald (talk) 12:38, 2 December 2014 (UTC)
Yes! Thanks Pasleim and Jheald! I have been "unmatching" these whenever they appear, because otherwise they just get added again. So far it seems the error ratio is quite low, but it does worry me. One thing I noticed is that when I unmatch these mistakes and then make another sync run the same day, the mistakes get made again, so you need to check the data if you make a sync run the same day. Otherwise it's best to wait a day between sync runs. Jane023 (talk) 16:18, 2 December 2014 (UTC)

RKD[edit]

Hi James, this doesn't work. It's just a report so it's easy to find items to work on. I added these manually now. Multichill (talk) 20:12, 7 December 2014 (UTC)

@Multichill: Okay, that makes more sense now. I thought it was a bit brutal to edit by hand! My immediate current focus is more on the "Your Paintings" list, as organised by time over at en:WP; on trying to wrap up the BL map tagging project; and try to get some experimenting done with en:Content based image retrieval, ideally to have something with the BL collection to have to show for a seminar on the 17th -- so I'm a bit committed at the moment. But I'll try to fit in some of the RKD artist lookups if I can find a moment. All best, Jheald (talk) 21:59, 7 December 2014 (UTC)

GEMET Thesaurus?[edit]

See https://www.eionet.europa.eu/gemet/theme_concepts?th=13&langcode=en . What do you think? Should we add it? Multichill (talk) 11:35, 12 April 2015 (UTC)


twins[edit]

FYI: https://de.wikipedia.org/wiki/Diskussion:Johann_Zacharias_Richter --- Jura 12:01, 23 September 2015 (UTC)

@Jura1: Very interesting. Thank you! Jheald (talk) 13:34, 24 September 2015 (UTC)

tinyurl and WDQS[edit]

You don't have to rely on tinyurl, copy pasting the url on WDQS includes the query. This allows to build clickable links, at the cost of sligtly less readable diff. I must admit I prefer clickable links. author  TomT0m / talk page 09:05, 8 October 2015 (UTC)

Reason for deprecation[edit]

I've created reason for deprecation (P2241) based on your request. Mbch331 (talk) 11:46, 16 October 2015 (UTC)

Improved grouping[edit]

Hi Jheald,

This might help you for improved/simplified grouping. --- Jura 13:20, 18 October 2015 (UTC)

Thanks for commenting. Unfortunately, I don't think your alternate can work out. There are too many variations involved and what works with the English label "John and variants" doesn't necessarily lead to the same with the label for the same item in another language (e.g. ru:"Джон and variants"). --- Jura 14:44, 18 October 2015 (UTC)
@Jura1:: So create groups that combine everything that is considered as a variant in any language -- as per the searches at Wikidata:WikiProject Names/given-name variants.
Then, if it makes sense to define particular sub-groups within that overall group, that is straightforward too. Jheald (talk) 15:35, 18 October 2015 (UTC)
overlapping subgroups? --- Jura 15:40, 18 October 2015 (UTC)
@Jura1: Not a problem. An item can be a member of more than one subgroup. It is then possible to query for either subgroup and extract a list of corresponding "instances of". Jheald (talk) 15:45, 18 October 2015 (UTC)
Personally, my primary focus is not querying them. I thought the property might help you with your queries, but it seems it doesn't. I did find a way to solve the identical birth/death day question though.
I'm sure in theory your suggestion might work. It might even work in practice with a single users creating the groups. We frequently get such suggestions or comments in property proposal discussions, but one needs to bear in mind that this is Wikidata: many contributors from different backgrounds, editing in different languages. For things to work, you need to have clearly defined properties that can be referenced and checked. With names this particularly tricky .. --- Jura 15:55, 18 October 2015 (UTC)

Just to know...[edit]

... how do you find this category so that you can make such an edit? I ask because if you find it in dewiki then it should be mentioned as a reference. --Aschroet (talk) 06:00, 7 November 2015 (UTC)

@Aschroet: I ran a search for every article-like item that had a sitelink to a Commons category, but didn't have a Commons category (P373). I did think about putting in a reference, but it seemed odd to give Wikidata itself as a reference, and I couldn't find a Q-number for 'sitelink'; and in any case the new 373 claim is only as strong as the existing cross-namespace sitelink, which is unreferenced. So it seemed reasonably appropriate to put it down as a similarly unreferenced bare claim. Jheald (talk) 08:06, 7 November 2015 (UTC)
Would it be possible to do the opposite as well, please - wherever P373 exists but there isn't a sitelink to a Commons category, add the sitelink? That would be incredibly useful for interwiki links on Commons. Thanks. Mike Peel (talk) 13:14, 7 November 2015 (UTC)
Hi @Mike Peel:.
On a purely technical level, it would be entirely possible and, in fact, dead straightforward. The only difference would be one of scale. There were about 80,000 items that had cross-namespace sitelinks but no Commons category (P373), whereas there are about 800,000 items that have a P373 but no sitelink. I'm Magnus's Quick Statements tool is throttled to about 4500 edits an hour, ie about 100,000 edits in 24 hours going full tilt. (I'm currently making the edits in batches of 4,000 or 20,000 at a time). So whereas this job is going to take about a day to complete, the opposite would take about 10 times as long.
But that's not the real issue. The real issue here is political, not technical. Adding P373 statements is (or should generally be) completely uncontroversial -- it is exactly what the property was made for. On the other hand there is a definite controversy about sitelinks that go "cross-namespace", ie from an article-like item here to a category on Commons.
It is a controversy that may be edging towards resolution purely through the development of facts on the ground. A year ago there were 100,000 such cross-namespace links. I ran the same search a couple of months ago and found there were now 200,000. So it does look like we're moving towards a de facto acceptance on the ground. I posted these numbers at the time, both to the mailing list and to Project Chat, to ask whether people were okay with this, because if one wanted to take a definitive view on it, the time to do so would be now. But it seemed the response was just a resounding "Meh".
There definitely was a constituency here for a distinct Category <--> Category, Article <--> Gallery sitelink division. For one thing it means you know what kind of Commons page you're going to end up on, so there's predictability, whether for people or for bots or for tools; and for another thing, it means that if you allow no links from article items to categories, you can never get trapped wanting to add a link to a category but being caught out because there is already a link to a gallery blocking its path.
As I have said, I am not sure to what extent there is or is not still a constituency prepared to take action to enforce such demarcations. But at the same time given the current greyness of the issue, I am not sure that I would want to be one to steam into such muddy ground tooling up to make 800,000 edits. Jheald (talk) 15:17, 7 November 2015 (UTC)

Questionable use of withdrawn identifier value (Q21441764)[edit]

Hi, I don't think your practice WRT BBC Your Paintings "Identifiers" is of much help:

  1. Once the redirects are retraced in Mix'n'Match (which I just did) these false cuplicates clobber both the duplicate list on Mix'n'Match and on the Value Constraint Report for P1367. Removing non-actionble identifiers from Wikidata and making certain that they won't reappear (by setting them to N/A in Mix'n'Match) seems to me a much more appropriate way of handling this
  2. Though impressive and somehow under curation (if not we wouldn't have the problem of "vanished" URLs at all) it's just a website which allows incoming links: Obviously they are performing clean-ups but don't even care to implement redirects. I don't think Wikidata's task should be to document changes on that website
  3. Those unusable identifiers stem from Mix'n'Match. Unfortunately the underlying dataset is not documented, Magnus may have harvested the Website at some (single?) point in time and/or may have had access to data files provided to him: So using P2241 actually means documenting the difference between this unclear dataset from reality? Not worth pursuing I think.
  4. Magnus may re-import the Website and equip Mix'n'Match with the set of then current identifiers, i.e. those not valid any more will cease to exist in the Mix'n'Match database but will survive perpetually on Wikidata? So (see point 2 above) Wikidata would provide some persistence for BBCYP "identifiers" the original provider obviously doesn't care about (at the moment). I'm not sure about the fundamental implications of persistence as added value by third parties but in the YP case it would be a crude approximation anyway: We have some peripheral (i.e. current M'n'M database) evidence that a certain identifier has existed (been actionable somewhere in the past) and some (soon to be removed) statement in M'n'M that this identifier was related to a certain Q-item. Transferring that to Wikidata as a P2241-qualified statement leaves us with somethin completely unverifyable...

Actually, there are cases where this new property together with withdrawn identifier value (Q21441764) (or some variants of it) make sense: I remember the concept of a "cancelled ISSN": There is a regulation that periodicals which use ISSNs against policy (e.g. not getting floated after a no label (Q1514286) or assinging an ISSN for something other than a serial) won't get recycled but remain in the database, tagged as "cancelled". An related case are "wrong ISSNs": If the ISSN printed on the journal does not exist (has a checksum error, i.e. isn't any ISSN at all), is not assigned to any periodical (is formally valid but not existing at that time), or even officially assigned for something other (so the ISSN exists) then it's worth recording because queries may be performed based on face value.

Thus a (non exhaustive) list of withdrawal reasons might be:

  1. identifier formally invalid (but was used anyway)
  2. identifier did never exist (but was used anyway as such)
  3. identifier is not actionable any more (announces error)
  4. identifier is actionable but announces deprecation (acknowledges that it has existed)
  5. identifier exists but the corresponding object is deprecated (think of ODNB biographical articles where later research came to the conclusion that the person described is identical to another person or actually two different persons)
  6. ...

A similar thing on a much higher scale can be currently noticed for RKDartists ID (P650):

  1. The initial import accidentially contained thousands of "See" references (they have an identifier of their own, but no link to the object they are referring to)
  2. The initial dataset contained tens of thousands of entries "in bewerking" (under construction). Thousands of them have enough accompanying data to be spotted as quite obvious duplicates of other entries (and thousands of them do not have enough data at the moment to make matches possible - probably they should have been left out from Mix'n'Match at first hand, but identifier-wise these items definitely exist).
  3. There seems a massive weeding effort ongoing: Especially for artists from Belgium, writers from Germany and those from the lower parts of the alphabet the links aren't operational any more
  4. I noticed that because an IP from the Dutch National Library started removing bunches of RKD identifiers from items in the Constraint report: So actually there is some feedback loop, Wikidata reports are used by (institutions related to) the original providers to perform or at least prioritise data sanitation.
  5. Again: RKD itself does not care about the fate of identifiers for items which weren't propert items at all or were designating items they have removed for whatever reason (their initial data collection appears to have been extremely broad in scope and even clearly identifyable persons might be well beyond the topical restrictions of RKD).

So many RKD identifiers we currently know about may just be "leaked": They will be withdrawn as provisional or as not relevant (and in many cases as duplicates) and the question would be if we really shold document the RKD identifier for persons RKD does not want to deal with at all? -- Gymel (talk) 15:28, 14 November 2015 (UTC)

@Gymel: We may not have been the only people to have harvested the BBC Your Paintings identifiers (or any other set of identifiers). It seems to me that it is useful to record retired identifiers (a discussion that's been had both on the mailing list, and at the Sum of All Paintings project recently), not least because people may match their copy of the old identifiers to our copy of the old identifiers.
As for these messing up the Constraint reports, or MnM single values, then that is simply a bug in the Constraint reports and in MnM that needs to be fixed -- deprecated values should not be considered for the single instance.
Another usefulness is that we now have a SPARQL-searchable list of the retired identifiers -- so for example, we can now generate a report of all retired identifiers for which there are not new identifiers, and ask the PCF "what happened to these?" -- in a couple of cases (of names that look genuine, and don't seem to have a new id) may be a system refresh glitch at their end.
I think I have now marked all the retired identifiers that we have items for (and merged any where we also have items for current identifiers). I think that they are worth keeping.
As for values for reason for deprecation (P2241), you are very welcome to create further value items to document such cases in more detail as you wish. Jheald (talk) 15:44, 14 November 2015 (UTC)
@Gymel:. To add to the above, I think the "single value" constraint report does ignore deprecated values. The 93 multiple values currently reported is similar to the number from several weeks ago (it was actually 97 then) -- as far as I can see, it represents genuine unmerged duplicates on the PCF site, and doesn't seem to have gone up. Jheald (talk) 15:48, 14 November 2015 (UTC)
Here's the start of the thread on wikidata-l : [1]
The discussion also continued into the next month : [2] Jheald (talk) 15:57, 14 November 2015 (UTC)
Interesting, I will pursue that: For VIAF ID (P214) we are sometimes marking identifiers as deprecated if the VIAF cluster exists but is not usable since it conflates different persons and an alternative cluster for that person also exists. VIAF may act on that findings and it would be good to know if the constraint report is not complete once one wants to reaccess these cases. -- Gymel (talk) 15:58, 14 November 2015 (UTC)
OK, I'm not impressed by the discussion on the mailing list. As I said before I can see use cases for keeping deprecated identifiers, but one has to differentiate:
  • VIAF was given as example several times: Every month they automatically cluster and recluster their consitutent entires, currently they record >7M redirects (targetting about 28M entries) and provide resolution services. They also provide a change history for any single cluster. Given that Wikidata also has a version history for items actively recording obsolete identifiers here seems overkill.
  • Use cases of outdated information are construed and Wikidata should somehow step in so that the providers of the original data can be asked "what happened" (but not be bothered at the same time): Well, those utilizing the outdated identifiers could aks directly (increasing the pressure on the providers to operate more carefully). Wikidata could only serve as a place for acknowledging that an identifier indeed did exist and does not exist any more. However for these Wikidata cannot be as exhaustive as for valid ones.
  • Admittedly many data providers should invest more into persistence of identifiers, e.g. by at least "supporting" them by redirects. But those who do that usually have an interest of re-users eventually migrating to up-to date values. Wikidata IMHO should not thwart that by establishing an one-stop solution for the abselutely lazy.
  • My RKD example above shows that "support" will have limits: Some things will simply go because they shouldn't ever have been assigned an identifier (from the provider's point of view). That's the downside of presenting provisional entries to the public which IMHO generally is a good thing
  • Some sites like BBC YP are way too sloppy with their handling of what we perceive as identifiers. But are we really in a position to remedy that? You stated that you have recorded the 50 or so obsolete identifiers you deemed important here. But the actual number might me much higher and - as said above - what we can record is only the arbitrary difference between the unknown point in time some data was harvested and today.
  • Last, not least: Unactionable identifiers of that kind cannot be verified (but perhaps in the Your Paintings case by a link to the internet archive). Common opinion here is that identifiers don't have to be sourced, because they can be immediately verified again at any given time. Thats obviously not the case here! -- Gymel (talk) 16:42, 14 November 2015 (UTC)
@Gymel: I have recorded all the obsolete identifiers I knew about, that I have so far been able to identify items for, based on the pages in this series, the identifier columns in which are based on Magnus's (or Jane's) original scrape in 2012.
You are correct, that these may no longer be verifiable and can no longer be confirmed. Mistakes may have crept in. But so what? They are dead links and marked as such. If a copying error has crept in, the worst case scenario is that then somebody may not be able to match their old reference link to our old reference link. That doesn't take away from the positive side, that in as many cases as possible, it will be possible for somebody to match their old dead reference to our old dead reference, and mostly we should also be able to give them a live new reference. Jheald (talk) 16:56, 14 November 2015 (UTC)

in support of User:Snipre and issue of (uncontrolled) bot imports from wikipedias[edit]

Would you be happy if some, not involed, changed you topic? --Succu (talk) 20:10, 19 November 2015 (UTC)

@Succu: It's a project page. It involves everybody; and should have an appropriate neutral header. Jheald (talk) 20:17, 19 November 2015 (UTC)
Really? Any hint where I can find this rule? --Succu (talk) 20:20, 19 November 2015 (UTC)
@Succu: It's common sense, and happens all the time. Nobody 'owns' the header of a section of a public page. It should be whatever best, most neutrally and most succinctly tells the reader what follows, and encourages participation from all points of view. I would revert again, if I hadn't hit the 3 edit limit, because the present header is simply not appropriate, and would also be far clearer if shortened. Jheald (talk) 20:25, 19 November 2015 (UTC)
@Succu: But if you want a reference, here's the en-wiki guidance from en:Wikipedia:Talk_page_guidelines#Editing_comments,
Section headings: Because threads are shared by multiple editors (regardless how many have posted so far), no one, including the original poster, "owns" a talk page discussion or its heading. It is generally acceptable to change headings when a better header is appropriate, e.g., one more descriptive of the content of the discussion or the issue discussed, less one-sided, more appropriate for accessibility reasons, etc.
Wikidata may not have yet the same depth of conduct guidance, but the broad principle still makes sense. Jheald (talk) 20:30, 19 November 2015 (UTC)
The unreflected export of „rules“ of your home community is not very helpful. At dewiki we normaly do not change the heading of a discussion (Kmhkmh). Especially if we are not involved in the discussion, Multichill. --Succu (talk) 20:51, 19 November 2015 (UTC)

Preferred rank[edit]

Hi,

I'm afraid I have no idea on how bots work, SPARQL and so on. I made these changes because the template Spanish Wikipedia uses for national sub-entities has changed and shows all instances as a subtitle. You can check out this problem on the Frankfurt article. Users who made those changes in the template are Agabi10 and Metrónomo. They suggested selecting «preferred rank» so that it only shows those values, and it seems to be solving solving these problems we have now in nearly every city article. I understand this has caused some problems with bots on Wikidata but, as I said, unfortunately I have no idea on bot operation or how to edit templates. Can you speak Spanish? If so, it would be useful if you read this talk page and further discuss the issue with them. Anyway, I'm going to tell them and hope you can work out this problem together.

Meanwhile, I stop my edits until a solution is found. Greenny (talk) 15:25, 20 November 2015 (UTC)

Done, you can check the talk here. As I deduce from your userpage that you can't speak Spanish, I've encouraged them to write in English from now on. Greenny (talk) 15:34, 20 November 2015 (UTC)

Hir and Bron[edit]

Actually I read "Hir" in an English Start Trek novell some time ago. Captain Riker and the starship Titan visited a planet of alien Invertebrata (Q43806) with only one sex. Instead of "Him" or "Her" they said "Hir".

And Saga says Hen in Swedish, something her Danish college dislike. The Swedish word is considered as a gender-neutral version of "Hon" (She) and "Han" (He). The word is widely used in media and is today included in wordbooks. Personally, I think that word still isn't neutral enough, since it is promoted by political groups. I guess the word is imported from Finnish, which do not have genders in the same way as our German-derived languages have.

I followed Bron/The Bridge last season, but stoped to watch this season, since I thought it was to much of violence present. My post-traumatic stress disorder (Q202387) become worse... -- Innocent bystander (talk) 16:10, 25 November 2015 (UTC)

@Innocent bystander: Sorry to hear that. In the pre-series publicity, I thought I had read the lead writer saying they thought they should dial down the body count this series, since they thought it had become a bit excessive the last couple of times. I'll just have to see how it goes -- they do like to pull surprises! Jheald (talk) 16:30, 25 November 2015 (UTC)
I think there is one episode left here (this sunday) and my wife still follows it. I do not know if the number of bodies have increased or decreased and we are maybe not shown so much violence within the TV-frame. But the description of such things as missing body parts and how they have been removed is a more efficient way to give me new nightmares than many other ways to describe violence. That is the good thing with Star Trek novels. The close combats are few. -- Innocent bystander (talk) 17:26, 25 November 2015 (UTC)

Second Severn Crossing[edit]

Hi, I'm confused by this edit to Second Severn Crossing (Q1287969). Surely the bridge is in all of England, Wales, Monmouthshire and South Gloucestershire. However if only the lowest level should be included then why retain Wales? Thryduulf (talk: local | en.wp | en.wikt) 14:55, 28 November 2015 (UTC)

@Thryduulf: I was running an automated process to remove all located in the administrative territorial entity (P131) = England (Q21) when there was also an English county given. The same could be done for Wales, but one step at a time... (though I have now removed Wales in this case).
The Severn Bridge may be a special case, as it joins two different nations. So perhaps, in this case, England & Wales might be justified. But for most places, if we already have that the country = the UK, and the county, then England as well seemed just a distraction. Usually located in the administrative territorial entity (P131) = England is a sign that further refinement is needed. Jheald (talk) 15:04, 28 November 2015 (UTC)
Thanks for the explanation, it makes sense now. Thryduulf (talk: local | en.wp | en.wikt) 15:09, 28 November 2015 (UTC)

Wikidata:Database reports/Wikipedia versions[edit]

Dear Jheald; I have seen you contributing to a lot at pages linked to https://www.wikidata.org/?curid=24028442# (as for today titled Wikipedia versions but intended in general for WMF projects). I would be happy if you can review the properties of these pages, create the missing Wikibook and Wikiversity project pages, comment on user:I18n/sandbox (where you may find many usefull queries) and comment there with new / additional ideas. Best regards Gangleri also aka I18n (talk) 19:54, 9 January 2016 (UTC)

Hi! I want to let you know that the number of Wikidata:Database reports/WMF projects has increased to more then 385. You may be interested in adding labels and descriptions in other anguages, follow the discussion at property talk:P1800 and comment there. Best regards Gangleri also aka I18n (talk) 02:59, 12 January 2016 (UTC)

BBC Your Paintings[edit]

...is called Art UK as of today. The properties need to be adjusted. --Jane023 (talk) 09:33, 24 February 2016 (UTC)

I informed Magnus and he is converting them now - 36k links!! --Jane023 (talk) 10:26, 24 February 2016 (UTC)
@Jane023: So: extraordinarily ugly new site, extraordinarily ugly new name, and they changed rafts of identifiers. Are these guys a complete bunch of muppets?
(And I see they don't even own their own twitter handle, so have to use this instead!)
My watchlist is lighting up with lots of old identifiers that Magnus is removing. Do you know if he will be replacing them with new ones?
And is there an old-to-new conversion list, so I can update the pages at en:Wikipedia:GLAM/Your paintings/header ?
Thanks for the heads-up, Jheald (talk) 14:46, 24 February 2016 (UTC)
Ask Magnus for a copy of his list? He already finished the conversion and will start updating 16k new links. I was very annoyed as well (I was informed by news letter yesterday). --Jane023 (talk) 15:28, 24 February 2016 (UTC)

links to random items[edit]

Hi Jheald,

From tome to time I come upon items on categories where you have put in links to random, unrelated items (for example here), apparently because these have a name with the same spelling. Items on categories are not disambiguation pages, but are there to gather and connect sitelinks to categories on the same topic. - Brya (talk) 05:33, 18 May 2016 (UTC)

IIIF-tool for the property relative position within image (P2677)[edit]

Hello James! Thanks a lot for the property relative position within image (P2677). I'm still not using but it could be a great improvment on visual artworks. One issue is that we need a tool to help us to provide data. So I made a little fork of the Liz Fischer's IIIF-tool created for image annotation on IIIF standard : Cropper. It's a just a draft (I'm not a developper) of what we could have. Maybe that could be interesting for you. Best regards --Shonagon (talk) 01:35, 22 May 2016 (UTC)

Example of use: Virgin among the Virgins (Q21013224) --Shonagon (talk) 02:14, 22 May 2016 (UTC)
Hello Jheald. An additional development to display the image fragments of an artwork has been done. It's multingual; so it's possible to display labels and links to Wikipedia in differents languages. Surely more robust tool could be done but we have now a first interface to edit and display image artwork annotation, which is essential for using relative position within image (P2677). Best regards --Shonagon (talk) 07:26, 28 June 2016 (UTC)

Dorset description[edit]

Hey Jheald,

Just wanted to let you know I partially reverted this change. The text about "Q21694711" was showing up in search results on Google, Wikipedia.org, top of the article in the Wikipedia app on Android and iOS, and other places that utilise Wikidata descriptions. Thanks! --Krinkle (talk) 03:06, 20 July 2016 (UTC)

Best way to get sitelinks for lots of items at once[edit]

Hi! If you're interested in Special:Permalink/243943252#Best way to get sitelinks for lots of items at once ? in probably much more better way, then there is one. Use SPARQL. Query, you can get data in json format, by adding that query in this link in {} place: https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={}&format=json. You can of course include other needed columns there. --Edgars2007 (talk) 10:41, 4 September 2016 (UTC)

Unused property[edit]

This is a kind reminder that the following property was created more than six months ago: metasubclass of (P2445). As of today, this property is used on less than five items. As the proposer of this property you probably want to change the unfortunate situation by adding a few statements to items. --Pasleim (talk) 19:15, 17 January 2017 (UTC)

Art UK links[edit]

Hi James, you mentioned ART UK on Commons. One thing I realized with Art UK artist ID (P1367) and Art UK artwork ID (P1679) is that their links are rather unstable. When the name of the artwork changes, so does the url breaking our links. That's a shame because for artworks they do seem to have an unique id. See for example http://artuk.org/discover/artworks/bacchus-and-ariadne-114356, the id is 114356 (you can find it in the HTML source too). Wouldn't it be nice to be able to just records that integer here instead of "bacchus-and-ariadne-114356"? Do you happen to have any contacts at Art UK you can use? I can easily import several thousand Art UK artwork ID (P1679) links, but I'm a bit reluctant to do that now with the unstable links. Multichill (talk) 11:08, 26 January 2017 (UTC)

Hi @Multichill:
I've just this morning had an email back from User:Charles Matthews. He and wmuk:User:Richard Nevell (WMUK) from WMUK met with some Art UK people last month.
...
I think the painter identifiers we have now are broadly correct -- I will do a verification run to confirm later today, or in the next couple of days.
As for painting identifiers, I was thinking about making a trial run on some of the collections we currently have the best accession number coverage for -- eg National Gallery, National Portrait Gallery, Tate -- but I am very happy to coordinate with you.
As to identifier stability, the important thing of course is to be able to serve people URLs that work. With luck, the big identifier change was when they moved to their new site. Beyond that, until they publish any regular list of recent identifier changes, then all I think we can do is regular verification runs, and use the "Accessed" qualifier to make a not of what date the idenfier was valid. It would be nice if they had a more stable scheme; maybe that will come, and we do need to keep keep knocking on their door, I think. But it seems we first need to prove ourselves more. Jheald (talk) 13:57, 26 January 2017 (UTC)

@Multichill: I can explain more about the meeting, but in a mail. Charles Matthews (talk) 14:51, 26 January 2017 (UTC)

Trail run sounds like a plan. I'll write some import code, I already have most of it so should be done soon. I'll share the link to github here, will be in Python
@Charles Matthews: please do :-) Multichill (talk) 15:40, 26 January 2017 (UTC)
@Multichill: I was just going to add Art UK painting identifiers for paintings where we already had accession numbers, and then just add them using QuickStatements. But it would be easy enough to pass you what doesn't match. Jheald (talk) 15:47, 26 January 2017 (UTC)
Ok, bot and example edit. It's running now. Multichill (talk) 16:41, 26 January 2017 (UTC)
Thanks. Jheald (talk) 16:47, 26 January 2017 (UTC)
I'm importing quite a few new links. I updated the constraints on Property talk:P1679 to catch more useful stuff. Might need a bit more tweaking. Multichill (talk) 19:36, 26 January 2017 (UTC)

@Multichill: Grrr... Just done the validation scrape. Over 250 no-longer-working identifiers to investigate. (BTW I saw you're asking Magnus for a full rescrape for Mix'n'Match -- I suppose that adapt to identifiers that have been updated here in the meantime.)Jheald (talk) 22:09, 31 January 2017 (UTC)

Quantity on ART UK links[edit]

Hi James, this seems wrong. Quantity on an identifier of 4? You're trying to say art uk has 4 works, but this is not the way to do it. Also doing such a large controversial import without discussion is not the best way to go or did I miss the discussion somewhere? Multichill (talk) 16:33, 6 February 2017 (UTC)

You are running a bot job, someone objects. You should pause and discus it. Multichill (talk) 17:25, 6 February 2017 (UTC)
@Multichill: Stopped. (Sorry I didn't see your message sooner).
So, where and how to identify the number of works Art UK has in its catalogue under this identifier?
Because the same artist may sometimes have more that one identifier at Art UK, and this information relates specifically to the identifier rather than the artist, it seems to me the appropriate place is as a qualifier on the identifier.
So then, which property to use? quantity (P1114) seems the most generic, for a "quantity, total number, number of instances, number, amount, total" as its list of (English-language) equivalent names gives for it.
In particular, this is the "number of instances" for the identifier in the Art UK database -- so if P1114 is intended for use including "number of instances", this seems entirely appropriate.
But if there is an alternative that you would suggest, that you think would be more appropriate, then I am very open to discussion.
I would like to get on with things though, because Art UK have been complaining they haven't been getting enough hits from us; so I'd like to be revising and rolling out a template on en-wiki including this information as soon as I can get it done. Jheald (talk) 17:46, 6 February 2017 (UTC)
Got distracted by other things. I did this change to make it a bit clearer, but still doesn't feel right. At first I thought you meant the person had 4 Art UK artist ID (P1367) links. I had to check the link to realize you meant that on the linked page it had 4 paintings.
I'm not sure you should even document it this way. In some point in the future we'll have all art uk works and you can just do a query to get this information. Multichill (talk) 20:17, 6 February 2017 (UTC)
@Multichill: Even if we did, they wouldn't necessarily have it, so the information would still be germane in documenting their database. Besides, I want to use this information in a WP template this week, not at some far distant point in the misty future.
I'm not sure your edit helps, because the "of" is placed as a qualifier on the identifier, not on the number of works. At some point in the future, when the data is next updated and re-written, the ordering could get changed; or other qualifiers might get added and upset the order, eg one for "preferred form of name" (in this database, associated with this identifier). It doesn't feel safe to me to rely that WD is always going to show the same qualifiers always in the same particular order. Jheald (talk) 20:28, 6 February 2017 (UTC)
Your putting time pressure on this. My experience in (wiki) projects is that this hurts the quality. I would appreciate if you could discus this in a broader venue before adding more. Multichill (talk) 20:40, 6 February 2017 (UTC)
@Multichill: Okay. Where do you suggest? Jheald (talk) 20:43, 6 February 2017 (UTC)
What about Property talk:P1367 and a link at Wikidata:Project chat to get some people to comment on it? Multichill (talk) 20:46, 6 February 2017 (UTC)
Mind to stop your silly additions? --Succu (talk) 22:15, 9 February 2017 (UTC)
@Succu: Task is now 95% complete, so I am going to finish it. It makes no sense to leave the last 5% not done. Jheald (talk) 22:21, 9 February 2017 (UTC)
Cool, than we have to remove 100% of query results at a certain point of time. --Succu (talk) 22:27, 9 February 2017 (UTC)
@Succu: I'm sorry, what are you talking about? Jheald (talk) 22:32, 9 February 2017 (UTC)
But I am curious as to why you think the addition is "silly" ? Jheald (talk) 22:23, 9 February 2017 (UTC)
Are you prepared to update this fixed number when the count at Art UK (Q7257339) is updatend? We have queries for this. --Succu (talk) 22:35, 9 February 2017 (UTC)
@Succu: And how exactly do you propose querying something which is not stored on Wikidata? Jheald (talk) 22:37, 9 February 2017 (UTC)
OK, vice versa. What do you want to express with this addtion? --Succu (talk) 22:48, 9 February 2017 (UTC)
@Succu: It expresses that Art UK (a catalogue of UK public collections) has 16 paintings by Esther Tyson (Q21458718), compared to eg only 1 by Hendrick van Zuylen (Q28431499) Jheald (talk) 22:58, 9 February 2017 (UTC)
... which means I can now write queries eg like this, for the total number of works at Art UK by painters that we have items for: tinyurl.com/zgjvucp. Jheald (talk) 00:53, 10 February 2017 (UTC)
+1. Jheald, how do you plan to update those numbers every time when any item is added to the catalogue? Or is there plan to have those numbers obsolete forever? --Infovarius (talk) 16:12, 2 March 2017 (UTC)
@Infovarius: The Art UK external IDs are only mildly stable -- they change if Art UK revise the name for an artist, or modify an artist's dates, or e.g. add a date of death. I asked them whether they could publish a regular record of ID changes, but apparently they can't -- apparently they don't hold the data centrally. It's only quite a small proportion that get changed; but it does mean that at regular intervals we will need to re-check the ID links to make sure they still work; we can check the quantity data at the same time.
The quantities probably won't change much -- Art UK was set up to be a survey of oil-on-canvas works in publicly-owned collections, and that survey is now complete. But they may change a little: Art UK may in future add some sculpture, and a limited number of works on paper.
So it's possible that the numbers may go out of date. But there is a retrieved (P813) date in the referencing for each statement, so it shoulf always be possible to tell how recently the data was checked. Jheald (talk) 16:29, 2 March 2017 (UTC)

GSS[edit]

I see that you've been removing GSS codes from a number of items, e.g. [3]. What is the reason for this? This property is currently used by w:zh-yue:Template:Infobox English county. Deryck Chan (talk) 14:15, 16 March 2017 (UTC)

Hi @Deryck Chan: There were a number of GSS codes that were on the wrong items, eg Essex (Q23240) -- they were on items for the ceremonial counties, when (as is clear eg from how the map if you follow the GSS links excludes eg Southend-on-Sea and Thurrock), they ought to be on the items for the County Council areas, eg Essex (Q21272241).
This also applies to most of the other identifiers on the ceremonial counties, eg FIPS 10-4 (countries and regions) (P901), OSM relation ID (P402), NUTS code (P605) etc, which should also be moved across in the near future.
Compare en:Essex, where the facts that apply to the non-metropolitan county are shown in a different part of the infobox to those that apply to the ceremonial county.
en-wiki combines the two; but to make co-referencing and properties like located in the administrative territorial entity (P131) work properly, we have two different items for the two concepts.
Hope this makes some sense now. All best, Jheald (talk) 15:18, 16 March 2017 (UTC)

Golden Hind[edit]

Can we continue geeking out about the Golden Hind here? I think I'm getting pretty far down into the weeds for the Project Chat page. :-)

I'm going to keep digging for a end date for the original. And for the actual citation in Stow - having an oddly difficult time finding it. - PKM (talk) 00:42, 24 March 2017 (UTC)

"The original Golden Hinde remained in Deptford for about 100 years, until it started to disintegrate and had to be broken up." it says here. - PKM (talk) 00:47, 24 March 2017 (UTC)
And here's a citation for inception date, built place, commissioned by, the wharf where it was displayed, and even "ship museum" if you want to use it! http://goldenhind.co.uk/pages/education/the-original-golden-hind/88 - PKM (talk) 00:53, 24 March 2017 (UTC)
And bingo! "AD 1668. John Davies, of Camberwell, the storekeeper of Deptford dockyard, caused a chair to be made out of the remains of the ship, 'The Golden Hind' ... here. - PKM (talk) 01:01, 24 March 2017 (UTC)
@PKM: Superb! Hope you're adding this to en-wiki as well. 1:30 am here, so I'm turning in; but really pleased you're on the case! Jheald (talk) 01:32, 24 March 2017 (UTC)
Will do, soon. - PKM (talk) 05:34, 24 March 2017 (UTC)
EN Wiki updated and I found a source for the date of renaming the Pelican to Golden Hind <does happy dance>. Lots of updates made at Golden Hind (Q546198) since I had all the references open anyway. - PKM (talk) 19:47, 24 March 2017 (UTC)
@PKM: That's looking really good now. Thank you so much. Jheald (talk) 20:46, 24 March 2017 (UTC)

CPs[edit]

First thanks for all the work your doing for adding statements for parishes but I just wandered what to do with some where there are 2 items but the main one has statements for both, for example Q2055282 (settlement and parish) and Q24674398 (parish only). While I do think we should probably have separate items for districts even if they have similar boundaries (like Exeter) I'd suggest that it is unnecessary for parishes (except for cases like Q1002828 and Q21347409 where the parish doesn't include the settlement). Although I think cases like Q637298 and Q24662858 seem OK as it is a town and the ONS population is much smaller than the parishes. The reason why some parishes have 2 items is because of Lsjbot, who sometimes created pages for the settlement as well as the parish, maybe they should be marked with Property:P460 or Q17362920, although I think items are only true duplicates if they are unquestionably on the same topic not just where a distinction has been made. Why don't you also do the same thing with JhealdBatch for wards as well, as cases like Bristol Q21693433 don't have any parishes, I did create items for wards but most don't have any (although Bristol does). Lucywood (talk) 20:06, 31 March 2017 (UTC)

Hi @Lucywood: Sorry not to get back to you sooner. My wife and I were having a long weekend away from the Internet. (Overdue and very much needed!)
With regard to the CPs, I do hope we're getting there. Some key queries I have been watching:
  • tinyurl.com/n43ysz2 - Latest count of number of distinct, non-deprecated GSS codes for civil parishes. Latest value: 10123 ; should be: 10449 => still to find: 326
  • tinyurl.com/mnoklwy - Items marked as current CPs, that do not have GSS codes. (Currently: 287). I do find this quite a brutally slow list to work though.
Some were CPs, that need to have a end time (P582) qualifier added to their P31. Some are in fact civil parish group (Q29043077)s, though editors on en-wiki may not be aware of the fact. Some are completely other things altogether (eg public baths, etc). Some do match entries in the GSS list, but the formal name that GSS has (and usually Commons too, following the GSS) may be slightly different, sometimes opening up questions of what to link to what, and also whether or not the Commons category tree is accurately reflecting this.
But you are quite right that there is also a very real issue with some CPs being claimed by multiple entries here. (Some of which I may have created or added to, by tagging settlements as CPs). The following queries try to reveal this:
  • tinyurl.com/mw3e4pb GSS values claimed by more than one item. (Currently: 84).
  • tinyurl.com/mfdlwjl Commons categories for CPs claimed by more than one item. (Currently: 81).
  • tinyurl.com/kkuz36e CPs that are in areas that are also claimed as CPs. (Currently: 72).
  • tinyurl.com/mts62qu A query that tries to combine the above. (Currently: 142).
This is partly what I opened the discussion at User talk:Kelly to try to think through.
The last query appears to reveal broadly two groups -- one is (mostly) parishes in South Kesteven, where a settlement item and a parish item share a Commons category; the other, almost completely distinct case, is where there are two items both marked as CPs, with one usually a P131 of the other.
My own view is that the link from Commons to CP items here is very valuable, eg for us to be able to use the very categorisation there to infer statements, to add to items here (ie: which parish is a geographical item in). We would lose out if items here did not have a Commons link.
Equally the link to/from Commons categories via Wikidata items from/to Wikipedia items is clearly very valuable.
User:Nilfanion makes the interesting point that ultimately Commons may be quite happy to have distinct categories for parishes and for settlements of the same name. However, the fact remains that for at the moment Commons does not for the most part make such a distinction, and that making and populating such a split will/would be no small amount of work.
So my own view is that for the moment it probably does make sense to combine items for settlements and parishes, until such time as they get split on Commons. The other thing that weighs with me is that so many of the current properties seem to be quite relevant to both parishes and settlements -- eg KEPN ID (P3639), OpenDomesday settlement ID (P3118), Vision of Britain place ID (P3616), British History Online VCH ID (P3628) -- I'd probably place most of these on the settlement, if forced to choose, but it's not a clear-cut thing.
On the other hand I am reluctant to undo somebody else's work, and merge back the items that User:Kelly has split out for the South Kesteven parishes. (And similarly User:Robevans123 for communities in Anglesey).
The bulk of the others, as you note above, appear for the most part to reflect sv-wiki and ceb-wiki stub articles created by Lsjbot, whose operator I understand has since retired from Wikipedia editing.
So what to do with these? said to be the same as (P460) and Wikimedia duplicated page (Q17362920) are both interesting options. But if people are content that we don't try to force there to be separate items, just because of stubs created by a bot, then perhaps the best way forward may just be to kill the stubs, by redirecting the stubs on sv-wiki and ceb-wiki (merging any content that seems particularly useful to keep), allowing the corresponding items to be merged here. Would anybody have any objection to this. (And is there anywhere else we should ask first?)
With regard to wards, I have tended to keep those separate from parishes (and they have different GSS codes, starting "E05"). Around 18 months ago, when I asked the UK project on en-wiki what was most valuable to have in the P131 hierarchy for UK places here, the view was that parishes are useful, because they have typically been comparatively stable over comparatively long periods of time; whereas often wards seem to be much less stable: much more likely to be re-drawn as population numbers change. So I haven't seen it as such a priority to create and populate items for wards. I don't think they should be combined with items for CPs; but a new property "coterminous with" might be useful to connect them with parishes (& v.v.), in the occasions where they do have equivalent boundaries. Jheald (talk) 19:57, 3 April 2017 (UTC)
Whether it makes sense to systematically create wards for unparished areas, ie (typically) areas that were former metropolitan boroughs, I am not sure. For the City of London, I think: certainly -- and I think these items all exist & have Commons cats (though still need GSS codes). For other areas, eg Bristol, I don't know. Clearly if en-wiki has articles, we will have items, and they should be described as well as we can. Beyond that, my inclination would be to see how far Commons goes at the moment. If Commons has categories, particularly if they are well populated, it probably makes sense to have items here. If Commons doesn't have categories, then maybe there are other higher priorities for work here. Jheald (talk) 20:14, 3 April 2017 (UTC)
I think there is a distaste in a significant proportion of both en.wp and Commons communities for using wards for localisation (apart from the City of London). There are several drawbacks:
  1. Wards are very variable units. As an example in Bristol, compare the 2009 wards of Hartcliffe and Bishopworth to the 2016 wards of Hartcliffe & Withywood, and Bishopsworth.The two wards in 2009 split their combined area into a West and East, while the two wards in 2016 cover the exact same area, but are a North/South split
  2. Wards have low recognition. If you asked someone where they live, the ward is unlikely to be quoted and an area of the city is more likely to be quoted.
  3. When both exist, there is a complex relationship between CPs and wards. Sometimes one is a subset of the other, sometimes not. That makes a logical hierarchy awkward, as CPs are desirable.
In the absence of anything better, Commons sometimes goes to street-level to provide the granular localisation.
At the same time, there is a strong desire to get localisation within the unparished areas. One possibility is shown by my work in commons:Category:Districts of Plymouth, which basically splits the city into the regions known by residents (and all could potentially have WP articles). A better solution might to use the city council's neighbourhoods which are defined in terms of community identity and natural boundaries, and unlike people's perceptions are objectively defined. Following the Localism Act 2011, Neighbourhood Areas have been established in many large cities, when they exist these might be ideal.--Nilfanion (talk) 23:51, 3 April 2017 (UTC)

wards[edit]

@Lucywood: Despite User:Nilfanion's cautions above, I have started adding GSS code (2011) (P836) links to items marked as ward (Q1195098) or ward or electoral division of the United Kingdom (Q589282), on the basis that if that is how items here have been marked, then we might as well link to their boundaries etc.

I also have a extracted a list of wards from sub-categories of en:Category:Wards_of_England, which should probably be marked up as such here, since in many cases the items have no existing P31. (Though in some cases they are identified as some sort of human settlement (Q486972)).

A further complication that I now realise (on top of all Nilfanion has written), that I had not appreciated is that there can be a distinct difference between electoral divisions used to elect County councillors (see eg item note at the OS), and the wards used to elect district councillors -- I should have read en:Wards and electoral divisions of the United Kingdom more closely. I had been happily assuming that everything was the latter, but then I hit Pulborough (Q7259268), this link on OS OpenData; which is significantly different from the district ward "Pulborough and Coldwatham" this link -- in each case, click on the value for 'Extent' to compare the boundaries.

I am hoping that the only county electoral divisions that have got into Wikidata are those that are subcategories of en:Category:Electoral divisions of England -- but it would be useful if you confirm. Jheald (talk) 19:08, 16 April 2017 (UTC)

@Lucywood: There were also a couple of wards that you added that I've had a bit of trouble identifying. Is there any help you can give me with either of the following?
  • Courtfield (Q28938159). Said to be in LB Brent. The only ward I could find was in Kensington & Chelsea: [4]
  • Devon (Q27889472). There's one in Newark-on-Trent, in Nottinghamshire [5]. But I couldn't find one in South Kesteven, Lincolnshire.
Jheald (talk) 20:29, 16 April 2017 (UTC)
The first one was probably a mistake, sorry, corrected, the second one used to exist, see [6]. As you know many change quickly but I was using mainly the Ordnance Survey data. Lucywood (talk) 07:49, 17 April 2017 (UTC)
@Lucywood: Thanks. I've found its dates now, thanks to data from the Elections Centre at Plymouth University [7]: appears 1979, disappears 1999 -- I had been thrown because the en-wiki page en:List_of_electoral_wards_in_Lincolnshire#South_Kesteven only had names back to 1999.
It seems quite a random sort of an item to have created. Out of interest, has there been a system or a pattern to the wards you created items for? Jheald (talk) 10:10, 17 April 2017 (UTC)
No there wasn't really apart from Suffolk, Essex and Cumbria and some unusual names like Devon. However as I was suggesting why not use your bot to create items for all of them? Lucywood (talk) 12:59, 17 April 2017 (UTC)
Just to add I see no harm in creating them on Wikidata, but I can't see them getting much use either. However, be aware that there are several classes of wards, and these should be given distinct groupings - ward or electoral division of the United Kingdom (Q589282) may not be a sensible concept.
The various types include:
  1. County electoral divisions (eg Tonbridge for Kent County Council)
  2. Unitary Authority electoral divisions (eg Bugle for Cornwall Council)
  3. "Normal" wards (eg Axminster Rural for East Devon District Council)
  4. English Parish Council wards (eg North for Tavistock Town Council)
  5. Welsh Community Council wards (eg Plymouth for Penarth Town Council)
AFAIK, ONS codes are only applied to the wards that elect to councils with district-level (or unitary) powers.--Nilfanion (talk) 18:12, 22 April 2017 (UTC)

Looking to do DNB queries and data population ...[edit]

Here to seek some help.

For enWS, we have mechanism to check that each article of DNB00/01/12 is in WD (done), and we can run a check to note that each WD item has a main subject (excluding the instances of DNB redirect). What I would like to now ensure that we have reciprocal of DNB item/main subject:person item <-> person item/described by:DNB00/01/12 (qualified) stated in:DNB item. Noting that we number of instances where some have a directed described by "DNB item" often as duplicate that we need to remove after we are sure that we have the correct "described by" statements in place. I suspect that we are going to need to do SPARQL queries to work it out.

Hope that you can help. Thanks.  — billinghurst sDrewth 11:18, 6 May 2017 (UTC)

Follow-up, once we have the relationships in place, please hold the queries as then we can look to populate family names from the articles "Surname, Given name ... (DNBXX)" through to the respective people items.
@billinghurst: Let's see if I can translate the above into queries, to see whether I have understood correctly what you've told me (and what you are looking for).
So currently we have 30,684 items tinyurl.com/lmg9aop that are published in (P1433) Dictionary of National Biography (1885-1900) (Q15987216) or Dictionary of National Biography, first supplement (Q16014700) or Dictionary of National Biography, second supplement (Q16014697); and these all have a link to en-wikisource tinyurl.com/kc82p8k (number doesn't change if we add that latter requirement). From what you have written above I infer that you are able independently to confirm that this is the number that there should be.
That number falls to 30,289 if we require that each article-item has a main subject (P921) tinyurl.com/m4686wh.
The remaining 404 tinyurl.com/lwrzpy3 are redirects at wikisource, eg Audelay (DNB00) (Q19052970)[8], and you believe that this is the number that there should be.
These 404 are all tagged as instance of (P31) DNB redirect page (Q19648608) (tinyurl.com/mgeaw9o)
However, only 22,938 (tinyurl.com/ks9lmxr) of those 30,289 subject items have described by source (P1343) the expected release of the DNB; leaving 7351 which do not tinyurl.com/mabf22y -- however this information could now be added from the results of this query using Quick Statements (although there may be a few more checks we want to make first).
Updated, to exclude redirects: tinyurl.com/mzn276h (7219)
Of the 22,938 there are 22,616 that have an appropriate stated in (P248) qualifier linking back to the article item (tinyurl.com/kwl5wnq).
This leaves 451 that don't have a stated in (P248) qualifier linking back to the article item. (tinyurl.com/kbs9xqy).
However, looking at this list reveals some oddities. For example:
So there maybe some more checking needed on those main subject (P921) statements before adding the inverses in bulk.
Is that the sort of investigation you were looking for ? Jheald (talk) 14:19, 6 May 2017 (UTC)
Correction on that last query. It was getting confused if there were two different DNB articles both describing (or being purported to describe) the same person.
Here's a revised query, with 323 hits, for when there is a link back to the right release of the DNB, but not the original article-item:
tinyurl.com/mxsjz35. Jheald (talk) 14:33, 6 May 2017 (UTC)
Pretty much. Let me look at fixing the errors firstly, then we can review where we are.

Note that with the DNB articles they can refer to multiple people, so we may not have a one to one relationship in that direction (though guess is that we may be missing numbers of those, and I have an inkling how to track)  — billinghurst sDrewth 15:00, 6 May 2017 (UTC)

@billinghurst: Turns out that most of those 323 were redirects, eg Falconberg (d.1471) (DNB00) (Q19019837) (but which had their own "main subject" property, which is why they were being included).
Excluding the redirects brings the number down to 37 (tinyurl.com/lmqfau3) that have a "main subject", but where the subject does not have a "stated in" in turn. Jheald (talk) 15:09, 6 May 2017 (UTC)
For those 37, it looks as if some do have a "stated in", but it's been added as a reference not a qualifier (I was specifically looking for it as a qualifier). Let me know if you'd like an example query to look for this.
Also, there may be some pseudonyms (eg Sawtrey, James (DNB00) (Q19024116)), where there is a link-back from the subject to the subject's main DNB article, but not the DNB article for their pseudonym. Jheald (talk) 15:17, 6 May 2017 (UTC)
That is lots of twists and turns to get my head around at this late hour. I think that I need to consolidate and clean the oddballs first. I would also would like to standardise the DNB redirect set, 1) I don't think that they should have main subjects (they are redirects), 2) it seems worthwhile them having the "of" statement.  — billinghurst sDrewth 15:21, 6 May 2017 (UTC)
I am happy for you to give me lists of inconsistent data approaches/errors/weirds and I will fix those.  — billinghurst sDrewth 15:31, 6 May 2017 (UTC)
@billinghurst:: 23 where the "stated in" is in a reference, not a qualifier: tinyurl.com/nxf5e42 ✓ Done
16 "weird" tinyurl.com/ny5ujhx (might include some double-counting). Jheald (talk) 15:32, 6 May 2017 (UTC) ✓ Done
@billinghurst: 816 redirects with no "of": tinyurl.com/l8qdmzj
some of which have a "main subject"; and some of the subject items have a "stated in". Jheald (talk) 15:48, 6 May 2017 (UTC)
Pictogram voting comment.svg Comment I am proposing to enWS that we delete the DNB redirect 'articles' and accordingly delete the wikidata items They are a redundancy that are placeholders for a book, we don't need them as we can manage by web means. So will park that cpt for the moment.  — billinghurst sDrewth 05:53, 7 May 2017 (UTC)
@billinghurst: I have now added described by source (P1343) + stated in (P248) + imported from Wikimedia project (P143) no label (Q20651139) for most of the 7000 subjects that didn't have it (see Special:Contributions/JhealdBatch), with the rest going in as we speak.
Will be away from my computer now until much later, but it should be easier to pick up & deal with anomalies once this lot are all in. Jheald (talk) 07:50, 7 May 2017 (UTC)

State of play[edit]

@billinghurst:
The only wrinkle is Henry Elsynge (Q15072637) which does have a described by source (P1343) back to the DNB, but one that has been ranked deprecated, as "misinformation" -- see link on talk page for more.
I think that we can live with the qualification, it is what it is.  — billinghurst sDrewth 14:12, 9 May 2017 (UTC)
@billinghurst: Also the following query, which looks for where the same subject item links back to more than one DNB item, might be worth checking through, just to make sure they're all kosher: tinyurl.com/m7bl37o; currently returns 155 rows. Jheald (talk) 12:40, 9 May 2017 (UTC)
✓ Done fixed misapplied, and applied DNB redirects.  — billinghurst sDrewth 12:56, 10 May 2017 (UTC)
Excellent query, shows up redirects, wrongly attributed and over-enthusiastic. Will take a little while to work through.  — billinghurst sDrewth 14:12, 9 May 2017 (UTC)
@billinghurst: Though on reflection, it may be entirely appropriate that there are additional people and things that can be said to be "described by" a biographical article, in addition to its main subject. Cf this query for items that are not humans "described by" DNB articles tinyurl.com/kkdbsle -- most of which may be entirely appropriate.
So perhaps one should also (or first) look at this way round: tinyurl.com/mqqhuw6 -- articles which have more than one "main subject"; and/or tinyurl.com/mgd9t24 articles where the main subject is not human. Jheald (talk) 14:34, 9 May 2017 (UTC)
This leaves 487 such items with no "of" (query: tinyurl.com/l8qdmzj). Typically these have a "main subject" that is a title, eg "Earl of X", rather than a person. There are also a fair number with no main subject (P921).
I'll leave it to the project to consider whether keeping these redirects is useful. But it may be handy to keep them around, to make sure that e.g. such alt forms are reflected in aliases on the "main subject" item, etc.
  • I won't go adding back any more (DNB00)s etc (diff, diff). But if ppl do seriously want to get rid of these, there are about 30,000 to go... Jheald (talk) 12:22, 9 May 2017 (UTC)
  • Pictogram voting comment.svg Comment @Charles Matthews: for the local DNB project page we should grab the queries that are usable for ongoing checks and other maintenance.  — billinghurst sDrewth 12:56, 10 May 2017 (UTC)

DNB articles[edit]

To note that ultimately the DNB articles will all be moved to be subpages of the work, not root level items where they currently sit. It is a quirk of the time when the project started that they sit with their suffix. To the suffix in items here, or with those that are subpages the guidance here is that that the title is kept simpler with clarification taking place in the description. So the DNB00/01/12 suffixes have been disappearing, though we have kept the years of life. For other works, we have been removing the book title from the subpage, and showing the article, and the name of the work in the descriptor. There are lots to come as adding them has been problematic to this point of time.  — billinghurst sDrewth 11:50, 9 May 2017 (UTC)

I'll bow to whatever the project thinks best... Personally I do quite like the (DNBOO)s and similar on items, as scarecrows to stop people linking to items of the wrong sort (or merging them). But whatever people want. Jheald (talk) 12:26, 9 May 2017 (UTC)
I've argued against removing the suffix style in the past, on grounds of ease of search on Wikisource (which I use all the time). There is a possible technical fix in the search, there. Here, I'm a fan of the suffix style for the same reason as James. Charles Matthews (talk) 13:07, 10 May 2017 (UTC)

The Thames at Westminster (Q19660486)[edit]

I noticed that this one was created by Poulpybot and had no external link to a NT website, but of course these are also on Art Uk. I wonder if you know how to go through and update these (I didn't check how many have been created) with the Art UK links, but that would be a worthwhile thing to do, in my opinion. Jane023 (talk) 13:04, 21 September 2017 (UTC)

@Jane023:. Hmm. Looks like one can do a search for "National Trust" at Art UK to get pages like this, then follow the link to each pic to get the NT accession number (and NT URL where available), then match on the NT accession number against NT pics here. Shouldn't be too much of a challenge to script that, I'll put it on my to-do list, but I can't promise immediate action -- or would it be helpful to have these links urgently? Jheald (talk) 15:27, 21 September 2017 (UTC)
Seems we only have 25 National Trust paintings in the system at the moment though, tinyurl.com/ybng96om. A motley bunch, doesn't seem to be much rhyme or reason to them. Jheald (talk) 15:31, 21 September 2017 (UTC)
Thanks! No I don't need them urgently - I only noticed because I was working on Canaletto and picked up a few. I think the NT website has permanent urls, and the Art UK site does too, so I thought it might be a good idea to get the back end of both of these hooked up somehow with the Wikidata paintings (at least for the collections we have, so e.g. Tate, NPG, NG etc) Jane023 (talk) 15:38, 21 September 2017 (UTC)
Ha I see now that I have done most of these probably, working with Art UK images, such as The Sense of Taste (Q29569637). Jane023 (talk) 15:40, 21 September 2017 (UTC)

Shand Mason[edit]

I seem to have done something wrong and brought about this:[9]

I've tried to understand but I can't. How do I learn what my mistake was? Thanks, Eddaido (talk) 21:55, 23 September 2017 (UTC)

@Eddaido: What's the problem? On 11 August you created a sitelink from en:Shand Mason on English Wikipedia to c:Category:Shand Mason fire engines, a sitelink which seems entirely reasonable, in the process creating the Wikidata item Shand Mason (Q35956189), which is great -- every English wikipedia article ought to have a corresponding Wikidata item.
Today a batch process of mine has added a Commons category (P373) property to the Wikidata item, because sometimes these are easier to deal with for some purposes than sitelinks (and sitelinks can't always be created, eg if the link is already taken by a category). So everything seems fine, just as it should be. Jheald (talk) 22:18, 23 September 2017 (UTC)
I have added a few more statements to the item here. Nothing too incorrect, I hope. Jheald (talk) 22:43, 23 September 2017 (UTC)

note about mistake[edit]

https://www.wikidata.org/w/index.php?title=Q11832547&diff=564177610&oldid=527175994 was a mistake and created an invalid link - you may want to check your other related edits Mateusz Konieczny (talk) 15:39, 24 September 2017 (UTC)

@Mateusz Konieczny: Any idea why the URL formatter isn't linking it as a valid Commons category (P373) ? There seems to be no problem with the sitelink to the same Commons page below. Jheald (talk) 15:44, 24 September 2017 (UTC)

Oxford biography[edit]

If you can look up the entry for Sir John Maclean, 1st Baronet (Q7527912) I would appreciate it. I would like to update his Wikipedia entry. --RAN (talk) 19:21, 4 December 2017 (UTC)

Thanks! More info than I expected his entry to have, I mostly had Swedish language sources previously, this is nice. --RAN (talk) 19:50, 4 December 2017 (UTC)

Q41336172[edit]

In english wikipedia the geonamnes object 7297992 and 2644559 are described in one article. In swedish wikipedia they are described in two diffrent articles. Shouild wikidata follow the english version of wikipedia? If not the 7297992 is what is described in sv:Lewes (parish). Maundwiki (talk) 18:54, 4 January 2018 (UTC)

@Maundwiki: I tend to follow Commons, which has a very thorough break-down of territorial areas, and the local language wiki. If they both only have one item (even if the Commons one may be primarily for the area, and the Wiki one primarily for the settlement), then I am very very reluctant to have two different principal items here on Wikidata, because that will break the Wiki <-> Commons link.
That's why if I see multiple articles on sv-wiki and ceb-wiki (only), I am now tending to mark the Wikidata item as Wikimedia duplicated page (Q17362920) and concentrate the information on the other item, that is shared between all other languages and Commons. Jheald (talk) 16:29, 6 January 2018 (UTC)
I am for data in one place however not that it is top down. The outcome is that we must follow who controls wikidata and I have problems with that type of centralized control. Not that I was for creation of seperate articles for PPL and ADM (this case) if there is only one PPL within the ADM. So if we have one wikidata record there should be two articles for that wikidata record. I know it will not work, but in that case the geoname will have to be in more than one wikidata record. Maundwiki (talk) 21:06, 22 June 2018 (UTC)

National Trust places paintings categories[edit]

I have been thinking about this ever since I realized that Art UK has a pretty decent coverage of what is on the National Trust website for paintings. Ideally it would be nice to have images of ALL National Trust objects, but the Art UK site is a good start. I think we should try to set up a way to cover this on Commons that links to Wikidata so we can use Mike Peel's "Wikidata infobox" on those Commons categories - what do you think? That way we could include the Art UK venue link as well as the NT venue link for the paintings. See e.g. this category I just created (I rounded up the paintings using search as many of them were not categorized in any venue at all): c:Category:Paintings in Ascott House. Do you have a list of these venues anywhere? Jane023 (talk) 10:40, 21 February 2018 (UTC)

@Jane023: A list like en:List of National Trust properties in England ?
We're currently showing 233 things owned by (P127) the National Trust tinyurl.com/y8fa45ol, most of which are buildings -- don't know how complete that is, and may well include some exterior landscapes, also some paintings. Also 5 for operator (P137) tinyurl.com/y7vr4ax8.
I would think it should be fairly straightforward to get an infobox to show a Art UK venue ID (P1602) link as well as official website (P856).
Be aware that Art UK are very sensitive about the idea of people taking their image files. Metadata they might (or might not) be easier about, but to date I haven't scraped any of their paintings pages for artist or location or collection or accession number information. Jheald (talk) 11:52, 21 February 2018 (UTC)
Updated version of query with county information, to compare with list from en-wiki; but many of them are missing it: tinyurl.com/yb35worh Jheald (talk) 11:57, 21 February 2018 (UTC)
Nice list - yes that is exactly what I meant. I have uploaded many ArtUk images and probably because of the weird and conflicting copyright notices, I see that lots of people have uploaded their images as Template:Own work which is ridiculous for PD artworks in my opinion. Since their website has all these venues, I think it would be useful to link the venue to the proper category, but of course we need the categories. Thanks to Geograph we have lots of categories for the venues, but for the Art UK links we need "Paintings in XXX" categories, and these are of course there for the larger and better known semi-museums but much less so for the out-of-the-way places. There is also not a 1-1 relationship between ArtUK and National Trust, but there is a huge overlap. Following up on the museum discussions over at the WikiProject for museums, I am not too sure how to proceed. The NT is the overall owner and that should be the main item and all others should be part of it for their collections, I think. Jane023 (talk) 12:22, 21 February 2018 (UTC)
@Jane023: Updated query with column for Commons category (P373): tinyurl.com/y7j9mpfm. If any of these don't already have Commons categories, then they damn well ought to. But it may just be that we don't have P373 statements for them yet.
It should be totally uncontroversial to make "Paintings in XXX" sub-categories for these -- though if a place has a value for Art UK venue ID (P1602) then I don't see why not to include in the infobox for the place as a whole: seems an entirely appropriate link to give people.
For indicating where a painting is, this surely is why we (and Art UK) have both location (P276) as well as collection (P195) -- is that not enough to record and then later fish out the ones that ought to be in a particular category? Jheald (talk) 12:41, 21 February 2018 (UTC)
Yes nice worklist! Of course this is uncontroversial - it just needs to be done and dusted, and I came here for advice on approach for modelling, since you've been so active with the artists side of things. BTW on the copyright side, looks likt Art UK is hiring an expert. So as far as my modelling issue goes, take the example I posted above for Ascott House. Here is an item View of Dordrecht (from the Maas) (Q47510248) for a painting that I have now given the Art Uk as reference for it being in collection Ascott House. But maybe it should be collection NT since the number is not from Ascott House but an NT number? Would be interested to know your thoughts. Jane023 (talk) 12:53, 21 February 2018 (UTC)
@Jane023: I would use location (P276) = Ascott House, collection (P195) = National Trust. I think that's factually accurate. (And also appears to match Art UK, where "National Trust, Ascott" is linked as a venue, not a collection).
Interesting job ad :-) . But looks like Art UK is hiring a manager rather than a lawyer -- presumably principally to handle permissions and clearances for 2D works still in copyright. Jheald (talk) 13:04, 21 February 2018 (UTC)

Good point about that subtle "National Trust, Ascott" wording. I think you are right, but I would also like to be able to query these by collection at the location level. We now have large museums split into bequests, but all of those can be "collection=museum" because I like to have the location show up as being the museum. For these it is different because the locations are really very far apart. So for this case maybe we need a "Collections of Ascott House" item that is "part of" NT? Then I can use that item for the collection and make it part of NT. The building itself could also be part of this item, or maybe its own listed building item? There are other job openings at Art UK so I guess their grant came through. Jane023 (talk) 13:21, 21 February 2018 (UTC)

@Jane023: I really do think that location (P276) = Ascott House, collection (P195) = National Trust is the right way to do this.
This is also I think the way we handle the Tate, which considers itself to have a single collection, displayed over several locations (Tate Modern, Tate Britain, Tate St Ives etc), between which items can sometimes move.
If you want to display the museum as location, when no specific location (P276) is set, surely this is easy enough to do, whether in SPARQL or in a template or in an additional column of your own spreadsheet?
As for querying by location, surely this also is easy enough to do, by adding an option to specify <location> in the relevant template, then if present looking for location (P276) in the relevant SPARQL query? Jheald (talk) 14:14, 21 February 2018 (UTC)
Yes (I see Maarten has also done this for the Bavarian State collections and Alte Pinakothek). I still would like to link the Art UK venue to the Commons category. So should the venue link appear on the location item then? Or do we still need a collection item for the specific location? Jane023 (talk) 14:56, 21 February 2018 (UTC)
@Jane023: So at the moment this is what we have tinyurl.com/yaasrc9n Those look pretty reasonable to me.
So I would have Art UK venue ID (P1602) as a statement on the item for the building, wherever possible, not the collection even if the collection has a distinct item.
I would have a location (P276) on the collection item, pointing to the building, if the whole collection is housed in one building.
On a painting item, I would have location (P276) pointing to the building item, whenever the collection is spread over more than one building. I would have collection (P195) pointing to the collection, in this case just "National Trust" for everything. I would be very reluctant to create individual items for sub-collections, just because the collection as a whole is displayed over multiple sites. Jheald (talk) 15:17, 21 February 2018 (UTC)

OK fine. This sounds reasonable. When done it will be interesting to see how many of these venues have files on Commons. My gut feeling is a good chunk of them. Jane023 (talk) 15:25, 21 February 2018 (UTC)

@Jane023: That would be good. Great if we could gather them up! There certainly are a lot of venues that do have Commonscats -- though also a lot without a P373 currently tinyurl.com/y7t4s9xa (though I bet a lot of these actually do have categories on Commons). Inevitably, many of the pictures may just be of exteriors; but with luck there should be some that are of the collections too. Jheald (talk) 15:42, 21 February 2018 (UTC)
I think you would be surprised. There are lots and lots of portrait paintings that have many categories, none of which are in the proper artist or location categories. It is those that I mean to round up and put on Wikidata. Jane023 (talk) 16:04, 21 February 2018 (UTC)

Broader concept discussion[edit]

@Jheald: apologies for loooong delay in responding to your message but after a hectic couple of weeks I now have a little more time to look into things like this. I confess this isn't something we have come across at the University of Edinburgh... as yet. And I feel the as yet is important to stress. But it is fairly early days in our own forays into formal work with Wikidata so the proposal could very well be pertinent as we move further along (I'd need to re-read through the rather lengthy archive of discussions to date - I see you have moved things on to now having the concept as a qualifier instead). I'll be attempting to move our own work in terms of the Survey of Scottish Witchcraft, the Thesis Collection and (hopefully) more of the Library & University Collections in general in the next few months so I'll be interested to discuss with L&UC colleagues how they view the matter and see how things develop in terms of the creation of the new qualifier. Very best, Stinglehammer (talk) 13:31, 28 February 2018 (UTC)

Commons category edits[edit]

You probably want to check out Wikidata:WikiProject sum of all paintings/Painters with Commons category no sitelink and it's history when you're done. Multichill (talk) 22:41, 24 March 2018 (UTC)

@Multichill: Yes, I should pretty much empty that list, unless there are any that have a topic's main category (P910). If there are painters that are left with unlinked Commons galleries, it should be possible to pick them in SQL -- or almost all of them. But over 85% of galleries do have links (all but 17,000 total), compared with 600,000 sitelinks that could be added for categories. Jheald (talk) 23:05, 24 March 2018 (UTC)
Check out the query, items that already have a link to Commons or have a topic's main category (P910) link are already filtered out. Multichill (talk) 23:08, 24 March 2018 (UTC)
Ps. You should really use that "show preview" button more often.
@Multichill: I should. :-) Deflate my edit count by a factor of about 5.
So with luck I should completely empty the list; depending on how much falls through the cracks with the LIMIT and OFFSETs I'm using to try to cover the set in multiple bites (fingers crossed for not too much deviation from determinism). Jheald (talk) 23:17, 24 March 2018 (UTC)
Seemed to have worked. Not sure what your focus area is. I focus on paintings and painters, but the same query for humans seems to complete and gives plenty of suggestions.
Wikidata:WikiProject sum of all paintings/Duplicate Commons category got some extra entries too, usually a bit of a puzzle. Multichill (talk) 11:24, 26 March 2018 (UTC)
@Multichill: I've also just run a sweep adding P373s to about 60,000 category items that had Commons category sitelinks but no P373s. So some of these new additions to your duplicates list, eg Ball at the Wedding of the Duke of Joyeuse (Q19820066) may be where the item has a topic's main category (P910), and the category item has now acquired a Commons category (P373).
My focus for the painters was really as a proof on concept, to look at systematically adding sitelinks to start bringing down the 600,000 items that could have Commonscat sitelinks but currently don't, in part as a step to being able to run SQL queries on Commons more easily for further categories that could be matched here (or could have Wikidata items created), but so far haven't. User:Mike Peel might take this forward as a bot process; but I'll certainly think about adding some more for human (Q5)s -- it would be clearly advantageous if we could get all Q5s covered.
In the past I've done a bit on Art UK artist ID (P1367) painters, but really all the work that you've inspired makes painter IDs a particularly strong area of Wikidata, so a natural choice to try to improve first.
Other than that, a particular current interest is online thesauruses, and trying to compare their hierarchical structures with ours, to see whether there are items and/or hierarchical links we may be missing. I've been starting with genre/form thesauruses like Library of Congress Genre/Form Terms ID (P4953) and AAT ID (P1014), first doing some matching with OpenRefine, and it's been striking how many matches I'm finding seem to be to items that have never had any information added at all, other than a sitelink. I think getting our hierarchical subclass of (P279) structures properly in place is something we've perhaps neglected a bit, while building up lots of instance of (P31)s, but of crucial importance to represent what those items are. Commons has particularly strong hierarchical structures, so should be very useful to mine, but it's difficult because it's so difficult to write tools that can access both the item properties here and the category system there, and also because there seems to be no easy way to record in a mass-retrievable machine-interpretable way what Commons categories actually mean. Which is so needed to make Structured Data a success, amongst other things. The more that we can link to here the better, but it's a real limitation not to be able to start describing the rest in their own wikibase-for-Commons items, that would be accessible from WDQS. But it's still worth trying to see what we can do, and more sitelinks will help.
Apart from that, I am also getting close to creating description pages for 30,000 old maps on Commons. (project outline / tranche 1), which is really my main big project at the moment, that the rest is all a bit a diversion from. Current issues of interest -- automatic estimation of projection, scale, and heading from the georeferencing data, quite likely based on [10] though the tool seems to have some glitches; trying to get VIAFs and OCLCs out of the British Library for the books and authors, that could be matched here; and (the big remaining issue) for the maps in the set that already have pages on Commons, how to recognise content (categories, user-added descriptions) that may be worth keeping, when the pages get re-written to use the Map template. Once I can get that sorted, then it should at last be possible to really get going! Jheald (talk) 12:18, 26 March 2018 (UTC)
Jheald, wikidata items related to wikipedia articles have already the Commons category as property P373; aren't commonswiki link only for wikidata items related to categories (not articles)? -- Blackcat (talk) 22:27, 27 March 2018 (UTC)
@Blackcat: In a word: No. See here for statistics and historical trends. Jheald (talk) 22:41, 27 March 2018 (UTC)
Whatever the case is, then, communnication must be clearer. Until a couple of years ago the last say was that article items "commonlinked" only with gallery on commons if existent, and category items commonlinked to the respective Commons category. There must be a guideline that chases any ambiguity away on this topic. -- Blackcat (talk) 07:49, 28 March 2018 (UTC)
@Blackcat: You might like to look in on Wikidata_talk:Notability#RfC:_Notability_and_Commons which (in part) is considering updating guidance on this point; and also general standards of notability for subjects that have Commons categories.
In practical terms P373 is useful for WDQS queries, and also for linking from Wikipedias. But in the other direction, from Commons, a sitelink is more useful for interwiki, for writing templates, and for SQL queries; and because of their guaranteed 1-to-1 nature. That is what has fuelled the great organic growth in people adding sitelinks. It's true that in 2013 there was a ruling (kind of) that Commons categories should only link to category-items here; but, given the value of sitelinks, Commons people have added them anyway; and there's also come to be an acceptance that it's not desirable to create a category item here just to support a Commonscat sitelink if there's already an article-item that could be linked instead. So de facto the position is now that it's welcome for a Commons category should be linked to an article-item, unless Wikidata has a corresponding category-item. But, as you say, guidance on this could probably benefit from being clearer and more authoritative. Jheald (talk) 08:10, 28 March 2018 (UTC)
[conflict edition] Indeed, I don't care about notability or less, I was talking about the existence of articles that have NOT their respective category in any Wikimedia chapter but on Commons. In this case, what I have always known is that those articles must have only the property P373 (Category on Commons) filled, with no commonslink. On the opposite side we have items with category: for example Liverpool Football Club's Wikidata item (Q1130849) has no commonslink to the respective category on Commons; you'll find it on the wikidata item for Category:Liverpool F.C. (Q7162712). Now, the question is: shall the commonswiki field be filled with the Commons category even in those case in which the Wikidata item is about an article with no category in whatsoever chapter but Commons? -- Blackcat (talk) 08:13, 28 March 2018 (UTC)
@Blackcat: If there is a category in another main project other than Commons, so that there is a category-item here, then that category-item should be sitelinked to the Commons category.
If there is no category in another main project other than Commons, so that there is no category-item here, (and there is no gallery on Commons), then a new category-item should not be created here just to sitelink to the Commons category, instead the sitelink should go from the article-item here to the Commons category.
This is what I understand the current community consensus to be; reflected now by over 750,000 sitelinks from article-type items to Commons categories. Jheald (talk) 08:24, 28 March 2018 (UTC)

I don't know about consensus, but such simple queries return garbage: some commonslinks are categories, and some are galleries. Wikidata is supposed to be structured, so that every property and sitelink has a specific value.. not something that "depends". How would one query only the items that have galleries at Commons? Gikü (talk) 10:34, 28 March 2018 (UTC)

@Gikü: You can filter out the Commons categories to find only non-categories like this: tinyurl.com/ybgtdo8z
Or (faster) you can look for Commons gallery (P935). Jheald (talk) 10:44, 28 March 2018 (UTC)

VIAF[edit]

Thanks for your help with the queries. I think it would be good to have VIAF addition automated, for case like this. It is a VIAF item with many links and even a link back to Wikidata. From Wikidata I found it via ISNI -> isni.org -> viaf.org. If a VIAF item has a link to Wikidata and the ISNI in both items match, then the VIAF could be added via an automated process, not? 92.229.165.74 16:32, 19 April 2018 (UTC)

There are bots that do this, particularly from ULAN; but also ISNI, GND etc. I think User:Magnus Manske did a big bot run in the last couple of months; User:Multichill is also active in this area. Jheald (talk) 16:47, 19 April 2018 (UTC)
I've added National Thesaurus for Author Names ID (P1006) to a lot of items based on VIAF (and some sanity checks) and in the past I also did a bit of work on ULAN ID (P245). I thought some bot was doing more structural work cross referencing viaf and other sources, but not sure which one. Multichill (talk) 18:26, 19 April 2018 (UTC)

Excluding properties from query result?[edit]

Hi Jheald. I'm having an issue with running Wikidata:Requests for permissions/Bot/Pi bot 2 in that the version of the query in [11] is returning property IDs such as BLDAM object ID (P2081). That's causing the code to crash at the 'for page in generator' line ("'P2081' is not a valid item page title"), and it's not easy to add some code to avoid that happening. Is there an easy way to exclude properties being returned in the query code? Thanks. Mike Peel (talk) 12:35, 27 April 2018 (UTC)

@Mike Peel: Try adding the line
MINUS {?item wikibase:directClaim [] } .
to the query, immediately below the line
INCLUDE %cats .
I never considered that people might put a P373 on a property page; but I think this should exclude it. Jheald (talk) 12:58, 27 April 2018 (UTC)
Thanks, I've added that line and restarted it. It's done ~60,000 so far, so it's maybe 10% of the way through the complete run. Thanks. Mike Peel (talk) 13:04, 27 April 2018 (UTC)
The tweak seems to be working nicely, thanks! Thinking ahead a step or two with the deployment of Wikidata infoboxes on Commons, the bot's currently walking through the category tree one category at a time, which helps to make sure that it works through the whole of a category at once rather than seeming to be random. However, that also means that it's wasting a lot of time checking each category to see if it has a Wikidata link or if it already has the infobox or an alternative template before adding the infobox. While that's OK to start with, it quickly becomes inefficient for repeat runs. So it then becomes a lot more efficient if I download lists of where the infobox (and the alternative templates) are used and I compare each category against that, and I'll probably implement that soon. But if there's a good way to download a list of all commons sitelinks, that might speed things up even more - I don't suppose you have a query to hand that might be able to provide that list (either in one go or in chunks)? Thanks. Mike Peel (talk) 00:07, 1 May 2018 (UTC)
@Mike Peel: A list of all the Commons sitelinks is a lot of data -- WDQS can only just about count them within the time. It probably can be done, either from the complete data dump or through the fragments service, but it would be messy to keep up to date.
I would have thought a better approach would be to go through the SQL tables -- a single SQL query ought to be able to return all of the subcategories of a particular category, that have a Wikidata sitelink, but don't have any of a list of templates.
I'm a lot less familiar with using and querying the SQL tables, but let me see if I can knock an example together in Quarry. Jheald (talk) 09:04, 1 May 2018 (UTC)
So I think you may want something like this: quarry:query/26771, which finds all the sub-categories of c:Category:Hamlets in England by county that have Wikidata sitelinks, but excludes c:Category:Hamlets in County Durham because it has a Wikidata infobox.
More JOINs could be added to exclude other templates.
Disclaimer: my experience with SQL is quite limited, so each time I use it I am very much feeling my way forward -- it's possible there may be some efficiencies that I have missed. Jheald (talk) 10:55, 1 May 2018 (UTC)
This bot run seems to have finished now, after just over 400,000 edits, amazingly with only a couple of reverts that were due to bad P373 values. I'll keep running it every so often (not sure if daily/weekly/monthly atm) to catch new results. Do you think it would be worth trying any variants of the query to catch other cases? Also, thanks for your advice above, I'll look into the SQL option, I know SQL a lot better than I do SPARQL! Thanks. Mike Peel (talk) 12:26, 11 May 2018 (UTC)
Hmm, somehow it found another 1,000+ more to edit in a repeat run today, not sure why it didn't get those in the last run-through... Thanks. Mike Peel (talk) 14:22, 11 May 2018 (UTC)
@Mike Peel: Excellent! Hugely impressed with the rollout of the Wikidata infoboxes over on Commons too -- they're a real positive on a category page.
As to the 1000+ in the repeat run, what surprises me actually is that the number was so low. The SELECT ... LIMIT ... OFFSET statements that the query was using to divide up the P373s are not guaranteed to be deterministic, and even less so when run over a period of time with new P373s being added, so if the first run-through really did hit 400,000 and only miss 1000 or so, that's actually well ahead of what I would have expected.
In time we probably need to look more closely at the P373s that for one reason or another got excluded. But getting lots more Commons infoboxes in place is the next really exciting step -- it will be interesting to see if (or how soon) there's a tipping point, so people get to the point of *expecting* their categories to have a wikidata infobox, actively linking or creating Wikidata items if their category doesn't. Jheald (talk) 16:44, 11 May 2018 (UTC)
For various reasons, that first run-through was actually about 10 restarts of the code, so that might be why the latest set was smaller. ;-) Now this task is dote, I've started Pi bot running through commons:Category:CommonsRoot, so expect to see a lot more infoboxes on Commons in the next couple of weeks (I'm not sure how long it's going to take the bot to run through every commons category -- or if the raspberry pi has enough memory to store the list of them all!). While that's running I'll look into switching to the SQL selection approach for the regular runs later on. I know a few editors are already actively adding the infobox to new categories they create, although I'm not sure how much that translates to new item/content being added here yet. Thanks. Mike Peel (talk) 17:21, 11 May 2018 (UTC)

relative position within image (P2677)[edit]

Hello Jheald!

I would like to notice you about a new developpment on Crotos that might interest you. It corresponds to the need you express rightly some time ago: the possibility to find artworks with a depicts (P180) but without relative position within image (P2677).

On this page http://zone47.com/crotos/lab/cropper/p180iiif.php?q=79746 on the top right there is a link to a SparQL to display artworks that has the depicted item but no relative position within image (P2677). And on the SparQL query's results, there are links to IIIF Image Cropper on Crotos to locate the element on the image and then fill in the information on the corresponding wikidata item, which is linked. I haven't communicated on it yet because the IIIF service isn't working well at the moment. I hope the service will soon be fixes in a sustainable way, so that we could play more.

Best regards --Shonagon (talk) 19:25, 17 May 2018 (UTC)

BL System Numbers ID?[edit]

Thanks for the thorough update concerning book editions and copies.

Applying the principles of FRBR is a good way to approach the ingest of bibliographic records and I think it will work well in Wikidata. The FRBR conceptual model is a bit bewildering in theory because the work seems to be an unnecessary level of abstraction (what is a work without an expression?) but the practical implementation in library catalogues tends to have positive results for browsing and resource discovery. Linked open data is an ideal format for FRBR and it will be interesting to how easy it is to find relevant books using WQS as the number of items increases. The Library of Congress has some really good training resources for FRBR and RDA (Resource Description and Access - cataloguing guidelines based on the principles of FRBR), which contain a lot of useful guidance.

I like the idea of an external identifier property to link the British Library catalogue and I will definitely support such a proposal. You may well already know that there are a few existing properties that provide coverage of BL collections: ESTC citation number (P3939) (books printed before 1800 in the English language or in Britain; hosted by the BL), OCLC control number (P243) (BL contributes to Worldcat) and there will be overlap with LCOC LCCN (bibliographic) (P1144). Still, there is a great deal of unique material that would be covered by a BL identifier. Another option is a COPAC identifier for coverage of all major UK library collections. It isn't obvious but they do use a unique identifier for each item and it is found in the direct link on the record page e.g. https://copac.jisc.ac.uk/id/36275396?style=html&title=Catalogue%20of%20important%20Western%20and%20Oriental%20manuscripts. When shortened to https://copac.jisc.ac.uk/id/36275396, it links directly to the metadata in XML format. Simon Cobb (Sic19 ; talk page) 22:15, 21 May 2018 (UTC)

User:Jheald/commons[edit]

Would it be easy to do an update of User:Jheald/commons? It would be interesting to see how the statistics look now, and whether there's still a big gap between numbers of P373 and the sitelinks after the bot work. Thanks. Mike Peel (talk) 11:42, 8 June 2018 (UTC)

@Mike Peel: Sure. Takes a couple of hours or so to run and collate all the queries. I'd like to clear the decks first with some things arising from the Biodiversity Heritage Library book items I've been working on, and the Wikidata:WikiProject BHL pages I've just set up, but then let me see what I can do. Jheald (talk) 11:48, 8 June 2018 (UTC)
Thanks. No rush. :-) Mike Peel (talk) 11:51, 8 June 2018 (UTC)
@Mike Peel: Some updated numbers now at User:Jheald/commons, and also at Wikidata:WikiProject_Commons/Links_and_sitelinks/historical to compare historical trends.
I haven't posted them to Project Chat yet, because I need to stop and think harder if there's another way to get the numbers in the top row (the queries in the method I previously used are timing out). Also I need to think a bit about interpretation of what it all means!
Also, in some cases the numbers aren't quite to the questions one would most want -- eg: how many article items are relying on the topic's main category (P910)/category's main topic (P301) bridge to be connected with CommonsCats (the query I've given doesn't quite give that).
But I'm out of time just right now, so this is what I can do for the present. Jheald (talk) 10:12, 10 June 2018 (UTC)
Thanks! I'm still digesting this too, but it looks like while this was a good step forward, we still have a long way to go - I hadn't realised that there were 6+ million commons categories! Thanks. Mike Peel (talk) 01:56, 12 June 2018 (UTC)

Beiträge zur Biologie der Pflanzen (Q14914936)[edit]

I'm sorry, but most of your additions here are not very useful... --Succu (talk) 20:29, 17 June 2018 (UTC)

@Succu: I can only add the data I can see.
The central BHL title summary file gives dates of 1870-2006 for this publication, as reflected in the section "Publication info: Berlin [etc.]Duncker & Humblot [etc.],1870-2006." on this page for the publication. So that's what I have added for inception (P571) and dissolved, abolished or demolished (P576).
The information summary for the constituent items gives a date of 1870 for each one, hence the derivation that the earliest date available was 1870, and the latest date available was 1870.
If that is not correct, feel free to fix it. Jheald (talk) 21:01, 17 June 2018 (UTC)
I think how do you parse and map the information from here is not valid. Descriptions like | Berlin [etc.]Duncker & Humblot [etc.],1870-2006. | New York Botanical Garden, LuEsther T. Mertz Library are not welcomed. --Succu (talk) 21:11, 17 June 2018 (UTC)
@Succu: I didn't create those descriptions. They were made by Magnus when he created the items. I've merely been adding sourced and referenced information to try and fill the items out. Jheald (talk) 21:15, 17 June 2018 (UTC)
You made use of them, so you are responsable. Why do we want this. How is this updated if another volume is scanned? --Succu (talk) 21:32, 17 June 2018 (UTC)
@Succu: That's the point of giving a 'retrieved' date as a reference, to indicate when the information was extracted.
There are various ways to update the information from time to time. To start with, BHL releases a file of information about volumes they have scanned, including a column for the "title" ID of the corresponding series or serial, which they regularly update. So one just needs to look at that file, see what's been added since the last check, and update those items accordingly. As far as I can see, however, this record hasn't been updated since May 2009. In practice I believe BHL very often starts a new title ID when they have a new batch of scans; so it may well be that the information on this record will never have to be updated.
As to why we want this, it is very useful to have an idea what the BHL has or has not got scanned for a particular title. In this case we know that if the date from a reference to this title is 1912, it may well be worth looking in the BHL for a scanned copy; but if it is 1922, then that is not part of what the BHL has scanned.
It's also useful, if BHL has multiple identifiers for the same title, to have an idea of which identifiers cover which date ranges. Jheald (talk) 21:59, 17 June 2018 (UTC)

About placeholder for <somevalue> (Q53569537)[edit]

It isn't clear for me, what do you mean with "special value <somevalue>"? --ValterVB (talk) 16:39, 22 June 2018 (UTC)

@ValterVB: As in eg this diff Jheald (talk) 16:43, 22 June 2018 (UTC)
Then is more correct "unknown value" instead "somevalue", at least in the english User Interface is called in this manner. --ValterVB (talk) 16:47, 22 June 2018 (UTC)
@ValterVB: Yes, the English user interface has "unknown value", but the developers have always called it "somevalue", and that is its intention -- it avoids questions such as "unknown by who", and makes it clear that the use is intended to encompass cases such as here, where the publisher name is known, but hasn't yet been resolved to a Q-number. Jheald (talk) 16:55, 22 June 2018 (UTC)
For italian I can use the translation of "unknown value" because developer don't talk in english :) --ValterVB (talk) 16:58, 22 June 2018 (UTC)

Brazilian pastor a mammalogist?[edit]

https://www.wikidata.org/w/index.php?title=Q10309107&type=revision&diff=694807089&oldid=632945610

Found

at https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P4081&oldid=702014670#"Unique_value"_violations 2.243.118.239 11:08, 25 June 2018 (UTC)

@2.243.118.239: Evidently not, as they lived & worked about 60 years apart (1920s vs 1980s/90s).
Good catch!
cc also User:Ambrosia10 -- Jheald (talk) 11:19, 25 June 2018 (UTC)

Category[edit]

There doesn't seem to another category than Commons. Why is this created: Q55243534
--- Jura 14:58, 29 June 2018 (UTC)

@Jura1: Stucturally necessary to make templates work on Commons, because a gallery is taking the sitelink from the main item. Jheald (talk) 15:52, 29 June 2018 (UTC)
We corrected that some time ago. These items don't meet our notability guidelines. Please stop creating them. You might be better served by adding the sitelinks directly on the items.
--- Jura 15:55, 29 June 2018 (UTC)
Jura1 Read what I just wrote. The sitelinks can't be added to the main items because they already have sitelinks linking to Commons galleries. In that circumstance, per this diagram of Multichill's the link has to go to a category item.
Commons template c:Wikidata infobox follows these sitelinks in order to draw data eg for c:Category:Deborah Kerr; and building up these sitelinks to structured descriptions of categories is also what is critically needed, to get those resources in place now for the structured data project on Commons.
The pages are staying because they are needed: we need to connect Commons categories to structured data in a way that can be queried at scale. Jheald (talk) 16:10, 29 June 2018 (UTC)
I'm not sure if you recently read the notability guidelines. This is explicitly excluded. Please re-read what I just wrote. We already have enough problems with Commons sitelinks. Please avoid adding more (I think most of your other additions are most helpful, btw).
--- Jura 16:16, 29 June 2018 (UTC)
@Jura1: We need this for the infoboxes, taxo templates, etc, on Commons; and for structured data. That's non-negotiable. If the current wording of the notability guideline is getting in the way of that, then it needs to be fixed. And -- advanced warning to you -- Commons is going to need to roll out category items for intersection categories, so the guidance is going to need to be updated to cope with that too. Jheald (talk) 16:25, 29 June 2018 (UTC)
@Jura1: If you don't like these, please can you propose a different solution that will let us have the commons sitelinks here? Also see the discussion at Wikidata talk:Notability, which fizzled out without a solution. Thanks. Mike Peel (talk) 16:16, 29 June 2018 (UTC)
  • Can we stop this now and check where and why it's suppose to be needed?
    --- Jura 12:33, 30 June 2018 (UTC)
@Jura1: You've been told why it's needed. But for completeness, I have gone through it for you again, at WD:AN. Jheald (talk) 17:15, 30 June 2018 (UTC)

Commons category linking—have I missed a policy/practice change?[edit]

Has there been a change in the practice that the category link for commons now belongs on the WD topic item rather than the WD category item? I am seeing contributions moving them. Thanks.  — billinghurst sDrewth 02:12, 9 July 2018 (UTC)

@billinghurst: No change that I am aware of. Commons category <-> category links still have advantages if particular wikis (eg Wikisource?) only have the category, not the item; and potentially allow a configurable choice as to which one to follow from Commons if there is a choice.
Perhaps remind User:JotaCartas of the existence of Commons category (P373), which most Wikis will follow for their Commons sitelink, and category's main topic (P301)/topic's main category (P910), which WikiCommons infoboxes and interwiki templates can navigate.
It would of course be good to have this written down as community-approved guidance, that one could simply link to. But that never seems to happen. Perhaps once the latest discussion on category notability gets resolved... Jheald (talk) 08:23, 9 July 2018 (UTC)
Thanks. This fellow is purposefully moving them from the category items. I have mentioned on their talk, and I was double-checking before I actively resolved.  — billinghurst sDrewth 12:54, 9 July 2018 (UTC)

ScienceSource and P5008[edit]

Hi there – I need ScienceSource (Q55439927) to be added to the "one of" list for on focus list of Wikimedia project (P5008). This is to support the focus list launched yesterday at WD:SSFL, which is now active and raising exclamation marks, if not eyebrows. I'd be grateful for help in fixing the constraint issue. Charles Matthews (talk) 09:28, 11 July 2018 (UTC)

@Charles Matthews: Done. If you ever need this again, just add another item of property constraint (P2305) qualifier with the Q-number for the list or project, as a qualifier to the property constraint (P2302) = one-of constraint (Q21510859) statement. No preliminary discussion or community authorisation needed -- it's really just a way to keep track of what the property is being used for, so that there's a list that people can easily find. Project looks interesting! Jheald (talk) 11:20, 11 July 2018 (UTC)

Thanks! Charles Matthews (talk) 11:45, 11 July 2018 (UTC)

Lsjbot and Wikimedia duplicated page[edit]

If there are 2 articles like Gnosall, shouldn't we just use them and make the distinction, while if there is only 1 Lsjbot article then like Q5169821/Q20989256 should't we just merge them as it doesn't really make sense to split the ceb/sv from the other articles @Kelly:. Lucywood (talk) 20:19, 15 July 2018 (UTC)

@Lucywood: Happy to see them merged if we can (Q5169821/Q20989256). But as for the other, if ceb-wiki is the only wiki to make the distinctions, then personally I'd be rather more disposed to try to preserve the wikilink between en-wiki and Commons (and in the process permit a rather more meaningful infobox on the Commons category), than accommodating a rogue page on a wiki that's going to have a readership of about one per year. So cut the the ceb-wiki loose, and make it an instance of Wikimedia duplicated page (Q17362920), that's what I'd say. Jheald (talk) 20:33, 15 July 2018 (UTC)
I'll leave Q5169821/Q20989256 and similar for a few days to see if Kelly objects but I'd point out that settlements and administrative units are still different things and I'm not sure if "Wikimedia duplicated page" is a good tag (that has been pointed out by Lsj), maybe something like "excessive distinction page" would work better? I'd note that any WP article means we have to have an item here, while that's not the case for Commons categories. Lucywood (talk) 11:21, 16 July 2018 (UTC)
@Lucywood: Regarding Commons categories, I suspect we are moving towards having an item for every Commons cat (and need to) -- see eg current discussion at Wikidata talk:Notability. On the other hand, on Wikidata:Project Chat there's talk of suspending the presumption of notability for wikis with a lot of Lsj articles.
I do think it may be quite a useful test, to look to see whether Commons has a separate 'village' category for the placename, when considering whether to maintain or split an item here.
But I'm happy enough to be over-ruled, if people think it's the right thing to do. The important thing is to come up with a line everyone feels they can live with, so there's no danger of items being flip-flopped backwards and forwards. If necessary, a category could be created on Commons for the village, to preserve Commons <--> wiki linkage. (Though populating it and keeping it maintained may be more difficult).
I haven't done much in this area in about the last year, but it looks like there are currently about 1950 pairs of items (tinyurl.com/y9a2fmqv) modelled similarly to Abbotsbury (Q306685) / Abbotsbury (Q24665923), with the two connected by said to be the same as (P460) and the second designated an instance of (P31) Wikimedia duplicated page (Q17362920). Jheald (talk) 12:28, 16 July 2018 (UTC)
I though the recent proposals are mainly for situations where there is a gallery taking the main item. Yes I suppose removing presumed notability for Lsj could work but might well cause problems as well.
Very few are split, probably making it too small to be much of a consideration.
I'm not sure what the best option is but I just thought, let Ljsbot decide if we have 2, statements can easily be added for both. I don't really have any strong views either way but I do think that WD can be more specific with this kind of thing. Lucywood (talk) 19:52, 16 July 2018 (UTC)
@Lucywood: ^^ Lsjbot has separate parish pages for almost 2000 items that are currently merged. (Plus more that may have been created recently). I'm not sure that that is "very few". Yes WD can be more specific, and maybe it should be. But do we want to break all the enwiki <--> Commons links? Besides, those en-wiki articles almost all do combine both; not entirely clear whether it's fair to say they're 'primarily' about the village. But if anyone feels like making and populating those 2000 new categories on Commons for the villages, then I won't stand in their way. Jheald (talk) 20:01, 16 July 2018 (UTC)
Having 2 items here wouldn't break the links to en much, though they do include a lot of content for the unit as well. I have started Wikidata:Property proposal/Unusually granulated item. Lucywood (talk) 20:29, 16 July 2018 (UTC)
@Lucywood: ?? Having two items that don't (can't) have the same sitelinks breaks the sitelink to/from Commons 100%. Jheald (talk) 20:31, 16 July 2018 (UTC)
That would only affect the "extra" item, not the "main" item. Lucywood (talk) 20:34, 16 July 2018 (UTC)
@Lucywood: Okay, I may not be understanding you correctly. What I thought you were suggesting was taking the current item (instance of (P31) village (Q532) & civil parish (Q1115575) and sitelinked to both en-wiki and Commons, and making it instance of (P31) village (Q532) only and sitelinked to en-wiki, while the Lsjbot item would take over the role of instance of (P31) civil parish (Q1115575) and be sitelinked to Commons; so that the Commons category and the en-wiki article would then no longer be linked together. But in fact you're suggesting something different? Jheald (talk) 20:40, 16 July 2018 (UTC)
@Lucywood: PS. I have edited your property proposal to indicate the item that (if I understood correctly) you would intend the new property to sit on. Not sure if I did understand it correctly, so you should probably check. Wasn't 100% sure whether what you really wanted to propose was a new property, or whether it was a new class, that the Lsjbot items would be instance of (P31). So hope I got this right. Jheald (talk) 20:48, 16 July 2018 (UTC)
Yes I think you misunderstood, to clarify the "main" item would be for example Abbotsbury (Q306685) and have it marked as a village and have statements for the settlement, this item would contain all sitelinks that don't make a distinction (including Commons) while Abbotsbury (Q24665923) would only contain the ceb/sv sitelinks, but if a split occurs at Commons or another project, that new page could also be linked. Note even if there is only 1 Commons category, the "extra" item could still contain the Commons category (P373). Lucywood (talk) 20:58, 16 July 2018 (UTC)
@Lucywood: But would Abbotsbury (Q306685) continue to be instance of (P31) civil parish (Q1115575) ? Or would it be located in the administrative territorial entity (P131) Abbotsbury (Q24665923), and that be the civil parish? Which item would eg the GSS code (2011) (P836) be on? Or the "civil parish" GeoNames ID (P1566)? Etc, etc. Jheald (talk) 21:24, 16 July 2018 (UTC)
Abbotsbury (Q306685) would only be a village (Q532) and be located in the administrative territorial entity (P131) Abbotsbury (Q24665923). The GSS code (2011) (P836) would be on the parish item, the GeoNames item for the parish would be moved to Abbotsbury (Q24665923), along with Vision of Britain unit ID (P3615) but Vision of Britain place ID (P3616) would be on the village item. However if we were to use the "unusually granulated item" then all the statments would be on the "village" item (similar to how the "duplicate" items are now). Lucywood (talk) 09:55, 17 July 2018 (UTC)
  • I have merged the South Kesteven parishes with no article conflicts. Lucywood (talk) 11:40, 2 August 2018 (UTC)

P373 values that aren't being caught by pi bot's query[edit]

I've started noticing a few cases like John Rankin House (Q14706682), where there was a P373 value but the query that pi bot runs wasn't returning the ID. Wikimania Hackathon 2018 (Q55606654) was another recent case. Any idea what might be happening there? (I've added the sitelink using another script now.) Thanks. Mike Peel (talk) 21:12, 25 July 2018 (UTC)

@Mike Peel: This is the script that steps through all items with a P373, and adds a Commons sitelink if possible, if various conditions are met (ie no other contenders for the sitelink) ?
If I remember correctly, you're doing a new sweep through all the P373s about once a day.
If that's right, then the most likely issue is that the way WDQS returns the list of P373s isn't 100% deterministic and consistent. So an item with a P373 might be missed by one slice, but not necessarily appear in the next. It's a pain, but unless somebody can write a tighter query, it was the only way I could see to be able to get something more-or-less workable within the timeout constraint.
With enough sweeps, one would think an item shouldn't consistently dodge all of them (unless something very unlucky is conspiring to happen), so one would think even the stragglers ought to be picked up after the second or third or fourth run-through.
(Added) But the P373 on John Rankin House (Q14706682) was added back in May 2016, so that would have been a huge number of sweeps by now that it would have consistently missed... Very strange.
The only other thing I can think of is that the P373 statement might have registered on one WDQS server, but another might have missed the update (if that server was under a lot of pressure at the time).
Or it might be something I've completely not thought of.
How are you spotting these items that appear to be being missed? Jheald (talk) 21:26, 25 July 2018 (UTC)
Yup, it's the one in the script at [12], which runs daily. These seem to be stragglers that have missed many runs (~100 in some cases). I have a new script that runs through a specified commons category tree looking for items without Wikidata items linked to them, and then searches Wikidata for the category name to find potential matches - it's maybe 70% accurate at the moment, so I'm running it manually (currently through commons:Category:Long Island, see my latest edits from this account). Most don't have existing P373 links, though. Thanks. Mike Peel (talk) 21:31, 25 July 2018 (UTC)
Hmm, I added a check to see if the image (P18) is in the commons category, which seems to make it ~100% accurate, so maybe this new script should be botified... Thanks. Mike Peel (talk) 22:37, 25 July 2018 (UTC)
@Mike Peel: Nice! Go for it. Jheald (talk) 22:39, 25 July 2018 (UTC)
It's now at Wikidata:Requests for permissions/Bot/Pi bot 8! Thanks. Mike Peel (talk) 22:53, 25 July 2018 (UTC)
This is odd ... pi bot's run out of places to add the infobox! It's finally finished looking through all of the cases where we have commons sitelinks (at least, as of the quarry result from yesterday - I've added a few more since) to add them where it can. I can now (manually) fetch a new list from quarry every so often to catch the latest additions and places where things have changed in the category that might allow the infobox to be added, but that's going to be a lot less frequent than running the bot 24/7! Thanks. Mike Peel (talk) 09:44, 10 August 2018 (UTC)
@Mike Peel: Yea! That's a fantastic milestone. Looks like Pi bot 8 is still finding some new sitelinks to make, so things are still moving. It might be interesting to produce a breakdown of numbers -- ie number of items with Commonscat sitelinks; number with infoboxes; and then the numbers for each reason a Commonscat with a sitelink doesn't have an infobox, to give an idea of how the numbers fall, and whether everything is accounted for. The focus now I guess moves to creating Wikidata items for Commons categories without sitelinks -- ie looking at the Commons categories for artists or engravers or cartographers, or listed buildings, or whatever, can we identify ones don't have Wikidata items, that should. Also the time may soon have come to start creating systematically items for categories of the form "X by Y", and its intersection subcategories -- eg "cartographers by country", "cartographers from Russia" etc., with appropriate category combines topics (P971) statements. For example, for the "Old maps of... " categories, it would be *exceedingly* useful to be query for which places had an old maps category and which didn't, if one had used eg OpenRefine to match a list of map subjects to a list of places. But perhaps that is something that will need to be eased into gently, use-case by use-case. Jheald (talk) 10:51, 10 August 2018 (UTC)
There are definitely more sitelinks that can be added ... as well as pi bot 8 running automatically, I've also been running the script manually without the image requirement, and that's also finding quite a few cases (but at the ~70% accuracy level) that I've been adding with my user account. But after that, starting to add new wikidata items will definitely be the way to go - as you say, people and monuments are good ones to start with, as intersection categories are going to be a lot more controversial... Thanks. Mike Peel (talk) 14:05, 10 August 2018 (UTC)
I found some more sitelinks to be added by removing "MINUS {?item wdt:P910 [] }" from the P373 query and adding some extra Python code to add the sitelink to the category item rather than the topic item - Pi bot's added ~700 of these so far and spot-checks seem to be OK, so I'll leave it running overnight. Relaxing the query a bit more may give us some extra sitelinks that can be added (and maybe in the long term it would be best to deprecate P373 in favour of the sitelinks...) Thanks. Mike Peel (talk) 00:26, 15 August 2018 (UTC)

Eep![edit]

I just realized that my comment on the Fashion project Talk page might have come off as a slap at you. It was not!!! You have been more helpful than anyone with my sticky little problems. My frustrations are focused elsewhere. - PKM (talk) 20:30, 8 August 2018 (UTC)

I didn't pick up on any frustration directed at anyone, just an offering up of a particularly knotty question to the community for thought and comments, so no worries at all from this end. Jheald (talk) 21:16, 9 August 2018 (UTC)

Q21385082[edit]

Hi! Something is wrong with this item. The journal is replaced by itself. Regards --Succu (talk) 10:19, 26 August 2018 (UTC)

@Succu: Thanks, good catch.
The question here is going to be whether we want one item or two for Stuttgarter Beiträge zur Naturkunde (< 1957 - c.1970) and Stuttgarter Beiträge zur Naturkunde. Serie A: Biologie (c. 1973 - 1999> ).
User:Pigsonthewing originally attached the BHL scans for both of the above to Q21385082 (diff); but probably we want a separate item for Serie A, in the years after the journal split into parts A, B, and C.
One question I never know the answer to is what to put for the end date for the original undivided journal in cases such as these. Do we consider that it ceased in c.1970, to be replaced by the three sub-journals? Or do we consider that it continued, with Serie A, B and C as parts of it?
It would be nice if the style guide at the Periodicals project could give a bit of guidance on questions such as this. Jheald (talk) 17:55, 26 August 2018 (UTC)
From German National Library (Q27302): Stuttgarter Beiträge zur Naturkunde (1961-1972) replaced by Stuttgarter Beiträge zur Naturkunde. A, Biologie (1973-) and Stuttgarter Beiträge zur Naturkunde. Serie B, Geologie und Paläontologie (1972-2007). Stuttgarter Beiträge zur Naturkunde. Serie C, Wissen für alle (1974-) is a new one, not a split. Hope that helps. --Succu (talk) 18:31, 26 August 2018 (UTC)
And there is Stuttgarter Beiträge zur Naturkunde aus dem Staatlichen Museum für Naturkunde in Stuttgart (1957-1972) replaced by A/B/C... --Succu (talk) 18:43, 26 August 2018 (UTC)

Cambridge Wikidata Workshop 20 October[edit]

I mailed you an invitation just now, but the address bounced. Charles Matthews (talk) 14:48, 25 September 2018 (UTC)

Game[edit]

Feel like playing a game? [13] now has 'Commons category matches' based on suggestions from pi bot using the code we talked about in #P373 values that aren't being caught by pi bot's query (but without image matching). I'll announce it more widely soon, but thought you might like a preview / to do some testing. Thanks. Mike Peel (talk) 18:08, 5 November 2018 (UTC)

@Mike Peel: Thanks for the invite! My time is quite limited at the moment -- two rather big time-consuming things IRL, both having to be dealt with this week. But I'll try and take a look. I'm always a bit apprehensive of these games -- my fear (after clearing up a lot of bad results from Magnus's 'proposed merge' game) of some is that players can be a bit casual about matches, or be simply unaware / not wary of how many very similarly named but actually different things the game may throw at them. For example, the first match it's offering me is St. Bernard's Chapel (Q7587280) (Building in Patterson, United States of America) vs c:Category:St. Bernard's Chapel (Heiligenkreuzerhof). I fear that far too many people might just reflexively tick 'yes' and go straight to the next, based just on the similarity of names; even though presumbly unless the names were very very similar, the match wouldn't have been offered. So IMO before starting a game like this, people need to be very strongly schooled to approach potential matches from a position of extreme scepticism. The matches will look plausible, or they wouldn't be offered. The aim of the game is not to tick yes. Rather, it is to identify which potential matches need to be rejected. This I think may need to be quite strongly belaboured, because IMO it may be quite a distance from the default mind-set with which people may approach games. People like to say "yes", and probably feel that it is every "yes" that is helping Wikidata. But in reality, the costs of a false-positive are far worse than a false-negative. False-positives that get into Wikidata from a game like this can be very insidious, and can be a real pain to fix (even if perhaps in this case not quite as bad to fix as having to manually unmerge and separate out statements on wrongly-merged items, which can be an absolutely monumental pain).
So my instant reaction looking at this is to ask: have you done absolutely everything you can to screen out bad matches before they can get offered? Eg for geographical items, are there co-ordinates you can perhaps check against a bounding-box of a super-category? Anything that can be done to stop items being matched from different counties or even different countries is worth doing.
And to re-iterate my other key request: please try to instill a default standpoint of extreme scepticism when judging potential matches, in any player of the game. The request should be: "These matches look plausible. But are they really? Please help us to reject the bad ones" -- not "please help us to find the good ones". Jheald (talk) 18:37, 5 November 2018 (UTC)
Thanks for the feedback. I've modified the description to "Match Commons categories with Wikidata items, and add the commons sitelink to Wikidata.
These matches look plausible. But are they really? Please help us to reject the bad ones by clicking "No" - and if you are sure that it is right, add the link to Wikidata using "Match". If you are not sure, press "Skip".
Bug reports and feedback should be sent to <a href="https://commons.wikimedia.org/wiki/User_talk:Mike_Peel">Mike Peel</a>." - it looks like it might take a short time to show up, though.
So far about 70% of the matches have been rejected, which isn't quite as good as I was hoping, but shows that people (or testers at least) aren't just clicking 'yes' blindly. By their very nature, these are matches that I'm not 100% sure of - otherwise I'd get pi bot to add them automatically. They are ones that need human review, and this seems like a good way of making that easier for people to do, and to get more people doing it. The infobox should make it easier-than-usual for people to spot false matches later on (as people have been spotting the bad bot-added ones). So let's see how it goes. Thanks. Mike Peel (talk) 22:46, 5 November 2018 (UTC)
@Mike Peel: Just wanted to say that I had another play with this, having seen it in the weekly news, and really enjoyed it. Well done! It's very slick, and I really like all the further links for investigation of 50/50 cases. I hope the accuracy is good, and people get used to pressing the 'No' button. One thing that might be interesting would be to re-offer a proportion to other players, and see how often they agree (and whether that is the same across different sorts of objects, or whether some are more likely to get disagreeing matches). There may still be more that could be done towards auto-matching, eg: species with exactly matching name probably good; geo-locations from different parts of America probably bad; ships compared to rivers probably not a match. But overall, looking really really good (and quite addictive!). Great stuff! Jheald (talk) 22:26, 12 November 2018 (UTC)