User talk:Jheald

Welcome to Wikidata, Jheald!

Wikidata is a free knowledge base that you can edit! It can be read and edited by humans and machines alike and you can go to any item page now and add to this ever-growing database!

Need some help getting started? Here are some pages you can familiarize yourself with:

Introduction – An introduction to the project.
Wikidata tours – Interactive tutorials to show you how Wikidata works.
Community portal – The portal for community members.
User options – including the 'Babel' extension, to set your language preferences.
Contents – The main help page for editing and using the site.
Project chat – Discussions about the project.
Tools – A collection of user-developed tools to allow for easier completion of some tasks.

Please remember to sign your messages on talk pages by typing four tildes (~~~~); this will automatically insert your username and the date.

If you have any questions, don't hesitate to ask on Project chat. If you want to try out editing, you can use the sandbox to try. Once again, welcome, and I hope you quickly feel comfortable here, and become an active editor for Wikidata.

Best regards! Liuxinyu970226 (talk) 23:58, 10 August 2014 (UTC)[reply]

WDQ[edit]

Yes, you can filter those results by label matching some characters, but for that you need to use Autolists2. The link is on the same page of autolists, on the top part, it says: "FOR EDITING WIKIDATA, please use this tool's successor, AutoList 2!"--Micru (talk) 06:42, 15 August 2014 (UTC)[reply]

Wikidata:WikiProject sum of all paintings[edit]

I see you're interested in GLAM and sturctured data. You might want to join this project. The project has it's own goals and it will already give us a lot of experience on how to model data about works of art. Multichill (talk) 09:48, 17 August 2014 (UTC)[reply]

Wikidata:WikiProject Structured Data for Commons/Phase 1 progress/Links updates[edit]

I think will be good idea to setup periodical links updates (for example, once a day). Or provide update link similar to c:User:OgreBot/Uploads_by_new_users/2014_September_06_06:00. --EugeneZelenko (talk) 14:56, 8 September 2014 (UTC)[reply]

@EugeneZelenko: Hi Eugene, thanks for the interest. I have to admit, I'm still quite a newbie when it comes to automation -- at the moment I'm trying to get my very first bot to run (on Commons), and struggling to understand why the perl module I was going to use is refusing to install -- so it looks like the moment has finally come that I'm going to have to get to know Python...

The pages were the best I can do at the moment with my current level of experience and understanding, because I haven't yet really worked out how to get to get myself set up on Labs to run my own queries; nor the bot framework to put it all together; nor how to trap a button being pressed and get it to run a bot. These are all things that I probably ought to try to find out quite soon, but for the moment it's beyond where I am at.

On the other hand, what should be reasonably easy with the links provided should be to get a current count to see if that's different to what's on the page; and to download a .tsv file that can then be cut-and-pasted in the edit window. Yes it's a pain, and it would be nice to have a shiny blue button that updated everything automatically, or a cron job automatically updating the pages every n days; if somebody wants to make that, I'd be very very happy to see it.

But for the moment, it did the basics of showing what's there. And if some nice person did go ahead and spend an afternoon clearing out eg all the direct file sitelinks that shouldn't be there, then I hope I have made it easy enough for them to regenerate the page. Jheald (talk) 15:33, 8 September 2014 (UTC)[reply]

Updated cygwin, so now MediaWiki::Bot now installs: so I'm getting there... towards my first bot edit, anyway. Still a long way to writing automated pages though. :-) Jheald (talk) 15:45, 8 September 2014 (UTC)[reply]

I fixed some of problems as time allowed, but I didn't have time for pages updates. --EugeneZelenko (talk) 13:57, 9 September 2014 (UTC)[reply]

@EugeneZelenko: Wow! You have been busy! Now only 4 links to Creator pages, none at all to Institution pages, and a dozen fewer than there were to File pages. Impressive!

You're right, the updating is a pain (having now done it). It would good to have auto-updated summary statistics on the summary page. This surely can be built, but I'm not sure I can do it very soon. (Still failing to get my first ever bot edit to actually happen over on Commons -- for some reason Media::Bot can't re-write the page, so it looks like the day has finally arrived for me to learn Python!)

However one important thing that may help, is to tell people that they need to be logged in to the Quarry tool in order for the "Submit Query" button to appear. Then, particularly if you're just working on one namespace, updating oughtn't be too painful. Jheald (talk) 16:47, 9 September 2014 (UTC)[reply]

Creator template[edit]

Hi Jheald, I'm traveling these days and I don't have time to look into it, but I recommend taking a look to the Authority template in Wikipedia, maybe you get some ideas from there. Good luck!--Micru (talk) 12:30, 22 September 2014 (UTC)[reply]

Louis Carrogis Carmontelle (Q982053)[edit]

Hi James, I don't get these edits. Care to explain? Multichill (talk) 18:02, 21 November 2014 (UTC)[reply]

@Multichill: I was using Magnus's QuickStatements tool to add Commons Creator page (P1472) properties. But it seems I have got some of the Q-numbers wrong. It's conceivable that I wasn't careful enough to check whether a regex capture had been successful, and used an old value. I'll look at the scripts and try to identify what went wrong, and if there were any other Q-numbers that got spurious additional links. Thanks for spotting this & pulling me up about it. Jheald (talk) 18:33, 21 November 2014 (UTC)[reply]

@Multichill: Update: It looks like the script was working properly, but the Q-values on the creator templates were wrong. (Presumably due to a never-corrected cut and paste when they were created). There are also a few genuine duplicates, which I will turn into redirects. In all about 100 such Creator templates to go through, so I'll get on with that. All best, Jheald (talk) 19:01, 21 November 2014 (UTC)[reply]

Ok, thank you! You did see the constraint report? It's very useful for finding mistakes and duplicates. Can you revert Louis Carrogis Carmontelle (Q982053) when you're done? Multichill (talk) 19:14, 21 November 2014 (UTC)[reply]

@Multichill: Thanks, I'd forgotten that was there. So I'll be going through the list, and sorting out the dupes and incorrect Q-numbers. Jheald (talk) 19:19, 21 November 2014 (UTC)[reply]

BBC Your Paintings artist identifier[edit]

You added Art UK artist ID (P1367) to John Jones (Q454248), Francina Margaretha van Huysum (Q15511647) and James Charles (Q6131230). What was the data source? These statements were all wrong but it does not seem like an error by you but by the data source as also other users added exactly the same wrong statements. --Pasleim (talk) 10:02, 2 December 2014 (UTC)[reply]

@Pasleim, Jane023: Good catch. These look like automated edits made by synchronising from Magnus's Mix-n-match tool. Somebody has wrongly identified "Your paintings" links with these items in the tool, and it is then trying to re-add the information every time somebody uses the synchronise option. (Choose catalog, then the 'Y' link at the end of the "Your paintings" line).

This will presumably continue until the incorrect identifications are removed from the tool. The best way to do that is probably to set up correct items for these links; then update the tool by importing from Wikidata; then check for double use of IDs. I'll get on to this. I think I can see the "John Jones" one; the others may take a little more investigation. Jheald (talk) 11:46, 2 December 2014 (UTC)[reply]

@Pasleim, Jane023: Update. I think I've removed all but the John Jones identification from the mix'n'match tool. The key was to use the 'Search' link, which then had a "Remove match" function. Unfortunately, there seem to be more "John Joneses" that the search can display, so it doesn't give me the option. So for the moment, anyone who uses the update function must remember to remove the "John Jones" link manually from the Quick Statements run. Jheald (talk) 12:38, 2 December 2014 (UTC)[reply]

Yes! Thanks Pasleim and Jheald! I have been "unmatching" these whenever they appear, because otherwise they just get added again. So far it seems the error ratio is quite low, but it does worry me. One thing I noticed is that when I unmatch these mistakes and then make another sync run the same day, the mistakes get made again, so you need to check the data if you make a sync run the same day. Otherwise it's best to wait a day between sync runs. Jane023 (talk) 16:18, 2 December 2014 (UTC)[reply]

RKD[edit]

Hi James, this doesn't work. It's just a report so it's easy to find items to work on. I added these manually now. Multichill (talk) 20:12, 7 December 2014 (UTC)[reply]

@Multichill: Okay, that makes more sense now. I thought it was a bit brutal to edit by hand! My immediate current focus is more on the "Your Paintings" list, as organised by time over at en:WP; on trying to wrap up the BL map tagging project; and try to get some experimenting done with en:Content based image retrieval, ideally to have something with the BL collection to have to show for a seminar on the 17th -- so I'm a bit committed at the moment. But I'll try to fit in some of the RKD artist lookups if I can find a moment. All best, Jheald (talk) 21:59, 7 December 2014 (UTC)[reply]

GEMET Thesaurus?[edit]

See https://www.eionet.europa.eu/gemet/theme_concepts?th=13&langcode=en . What do you think? Should we add it? Multichill (talk) 11:35, 12 April 2015 (UTC)[reply]

twins[edit]

FYI: https://de.wikipedia.org/wiki/Diskussion:Johann_Zacharias_Richter --- Jura 12:01, 23 September 2015 (UTC)[reply]

@Jura1: Very interesting. Thank you! Jheald (talk) 13:34, 24 September 2015 (UTC)[reply]

tinyurl and WDQS[edit]

You don't have to rely on tinyurl, copy pasting the url on WDQS includes the query. This allows to build clickable links, at the cost of sligtly less readable diff. I must admit I prefer clickable links. author TomT0m / talk page 09:05, 8 October 2015 (UTC)[reply]

Reason for deprecation[edit]

I've created reason for deprecated rank (P2241) based on your request. Mbch331 (talk) 11:46, 16 October 2015 (UTC)[reply]

Improved grouping[edit]

Hi Jheald,

This might help you for improved/simplified grouping. --- Jura 13:20, 18 October 2015 (UTC)[reply]

Thanks for commenting. Unfortunately, I don't think your alternate can work out. There are too many variations involved and what works with the English label "John and variants" doesn't necessarily lead to the same with the label for the same item in another language (e.g. ru:"Джон and variants"). --- Jura 14:44, 18 October 2015 (UTC)[reply]

@Jura1:: So create groups that combine everything that is considered as a variant in any language -- as per the searches at Wikidata:WikiProject Names/given-name variants.

Then, if it makes sense to define particular sub-groups within that overall group, that is straightforward too. Jheald (talk) 15:35, 18 October 2015 (UTC)[reply]

overlapping subgroups? --- Jura 15:40, 18 October 2015 (UTC)[reply]

@Jura1: Not a problem. An item can be a member of more than one subgroup. It is then possible to query for either subgroup and extract a list of corresponding "instances of". Jheald (talk) 15:45, 18 October 2015 (UTC)[reply]

Personally, my primary focus is not querying them. I thought the property might help you with your queries, but it seems it doesn't. I did find a way to solve the identical birth/death day question though.

I'm sure in theory your suggestion might work. It might even work in practice with a single users creating the groups. We frequently get such suggestions or comments in property proposal discussions, but one needs to bear in mind that this is Wikidata: many contributors from different backgrounds, editing in different languages. For things to work, you need to have clearly defined properties that can be referenced and checked. With names this particularly tricky .. --- Jura 15:55, 18 October 2015 (UTC)[reply]

Just to know...[edit]

... how do you find this category so that you can make such an edit? I ask because if you find it in dewiki then it should be mentioned as a reference. --Aschroet (talk) 06:00, 7 November 2015 (UTC)[reply]

@Aschroet: I ran a search for every article-like item that had a sitelink to a Commons category, but didn't have a Commons category (P373). I did think about putting in a reference, but it seemed odd to give Wikidata itself as a reference, and I couldn't find a Q-number for 'sitelink'; and in any case the new 373 claim is only as strong as the existing cross-namespace sitelink, which is unreferenced. So it seemed reasonably appropriate to put it down as a similarly unreferenced bare claim. Jheald (talk) 08:06, 7 November 2015 (UTC)[reply]

Would it be possible to do the opposite as well, please - wherever P373 exists but there isn't a sitelink to a Commons category, add the sitelink? That would be incredibly useful for interwiki links on Commons. Thanks. Mike Peel (talk) 13:14, 7 November 2015 (UTC)[reply]

Hi @Mike Peel:.

On a purely technical level, it would be entirely possible and, in fact, dead straightforward. The only difference would be one of scale. There were about 80,000 items that had cross-namespace sitelinks but no Commons category (P373), whereas there are about 800,000 items that have a P373 but no sitelink. I'm Magnus's Quick Statements tool is throttled to about 4500 edits an hour, ie about 100,000 edits in 24 hours going full tilt. (I'm currently making the edits in batches of 4,000 or 20,000 at a time). So whereas this job is going to take about a day to complete, the opposite would take about 10 times as long.

But that's not the real issue. The real issue here is political, not technical. Adding P373 statements is (or should generally be) completely uncontroversial -- it is exactly what the property was made for. On the other hand there is a definite controversy about sitelinks that go "cross-namespace", ie from an article-like item here to a category on Commons.

It is a controversy that may be edging towards resolution purely through the development of facts on the ground. A year ago there were 100,000 such cross-namespace links. I ran the same search a couple of months ago and found there were now 200,000. So it does look like we're moving towards a de facto acceptance on the ground. I posted these numbers at the time, both to the mailing list and to Project Chat, to ask whether people were okay with this, because if one wanted to take a definitive view on it, the time to do so would be now. But it seemed the response was just a resounding "Meh".

There definitely was a constituency here for a distinct Category <--> Category, Article <--> Gallery sitelink division. For one thing it means you know what kind of Commons page you're going to end up on, so there's predictability, whether for people or for bots or for tools; and for another thing, it means that if you allow no links from article items to categories, you can never get trapped wanting to add a link to a category but being caught out because there is already a link to a gallery blocking its path.

As I have said, I am not sure to what extent there is or is not still a constituency prepared to take action to enforce such demarcations. But at the same time given the current greyness of the issue, I am not sure that I would want to be one to steam into such muddy ground tooling up to make 800,000 edits. Jheald (talk) 15:17, 7 November 2015 (UTC)[reply]

Questionable use of withdrawn identifier value (Q21441764)[edit]

Hi, I don't think your practice WRT BBC Your Paintings "Identifiers" is of much help:

Once the redirects are retraced in Mix'n'Match (which I just did) these false cuplicates clobber both the duplicate list on Mix'n'Match and on the Value Constraint Report for P1367. Removing non-actionble identifiers from Wikidata and making certain that they won't reappear (by setting them to N/A in Mix'n'Match) seems to me a much more appropriate way of handling this
Though impressive and somehow under curation (if not we wouldn't have the problem of "vanished" URLs at all) it's just a website which allows incoming links: Obviously they are performing clean-ups but don't even care to implement redirects. I don't think Wikidata's task should be to document changes on that website
Those unusable identifiers stem from Mix'n'Match. Unfortunately the underlying dataset is not documented, Magnus may have harvested the Website at some (single?) point in time and/or may have had access to data files provided to him: So using P2241 actually means documenting the difference between this unclear dataset from reality? Not worth pursuing I think.
Magnus may re-import the Website and equip Mix'n'Match with the set of then current identifiers, i.e. those not valid any more will cease to exist in the Mix'n'Match database but will survive perpetually on Wikidata? So (see point 2 above) Wikidata would provide some persistence for BBCYP "identifiers" the original provider obviously doesn't care about (at the moment). I'm not sure about the fundamental implications of persistence as added value by third parties but in the YP case it would be a crude approximation anyway: We have some peripheral (i.e. current M'n'M database) evidence that a certain identifier has existed (been actionable somewhere in the past) and some (soon to be removed) statement in M'n'M that this identifier was related to a certain Q-item. Transferring that to Wikidata as a P2241-qualified statement leaves us with somethin completely unverifyable...

Actually, there are cases where this new property together with withdrawn identifier value (Q21441764) (or some variants of it) make sense: I remember the concept of a "cancelled ISSN": There is a regulation that periodicals which use ISSNs against policy (e.g. not getting floated after a zero issue (Q1514286) or assinging an ISSN for something other than a serial) won't get recycled but remain in the database, tagged as "cancelled". An related case are "wrong ISSNs": If the ISSN printed on the journal does not exist (has a checksum error, i.e. isn't any ISSN at all), is not assigned to any periodical (is formally valid but not existing at that time), or even officially assigned for something other (so the ISSN exists) then it's worth recording because queries may be performed based on face value.

Thus a (non exhaustive) list of withdrawal reasons might be:

identifier formally invalid (but was used anyway)
identifier did never exist (but was used anyway as such)
identifier is not actionable any more (announces error)
identifier is actionable but announces deprecation (acknowledges that it has existed)
identifier exists but the corresponding object is deprecated (think of ODNB biographical articles where later research came to the conclusion that the person described is identical to another person or actually two different persons)
...

A similar thing on a much higher scale can be currently noticed for RKDartists ID (P650):

The initial import accidentially contained thousands of "See" references (they have an identifier of their own, but no link to the object they are referring to)
The initial dataset contained tens of thousands of entries "in bewerking" (under construction). Thousands of them have enough accompanying data to be spotted as quite obvious duplicates of other entries (and thousands of them do not have enough data at the moment to make matches possible - probably they should have been left out from Mix'n'Match at first hand, but identifier-wise these items definitely exist).
There seems a massive weeding effort ongoing: Especially for artists from Belgium, writers from Germany and those from the lower parts of the alphabet the links aren't operational any more
I noticed that because an IP from the Dutch National Library started removing bunches of RKD identifiers from items in the Constraint report: So actually there is some feedback loop, Wikidata reports are used by (institutions related to) the original providers to perform or at least prioritise data sanitation.
Again: RKD itself does not care about the fate of identifiers for items which weren't propert items at all or were designating items they have removed for whatever reason (their initial data collection appears to have been extremely broad in scope and even clearly identifyable persons might be well beyond the topical restrictions of RKD).

So many RKD identifiers we currently know about may just be "leaked": They will be withdrawn as provisional or as not relevant (and in many cases as duplicates) and the question would be if we really shold document the RKD identifier for persons RKD does not want to deal with at all? -- Gymel (talk) 15:28, 14 November 2015 (UTC)[reply]

@Gymel: We may not have been the only people to have harvested the BBC Your Paintings identifiers (or any other set of identifiers). It seems to me that it is useful to record retired identifiers (a discussion that's been had both on the mailing list, and at the Sum of All Paintings project recently), not least because people may match their copy of the old identifiers to our copy of the old identifiers.

As for these messing up the Constraint reports, or MnM single values, then that is simply a bug in the Constraint reports and in MnM that needs to be fixed -- deprecated values should not be considered for the single instance.

Another usefulness is that we now have a SPARQL-searchable list of the retired identifiers -- so for example, we can now generate a report of all retired identifiers for which there are not new identifiers, and ask the PCF "what happened to these?" -- in a couple of cases (of names that look genuine, and don't seem to have a new id) may be a system refresh glitch at their end.

I think I have now marked all the retired identifiers that we have items for (and merged any where we also have items for current identifiers). I think that they are worth keeping.

As for values for reason for deprecated rank (P2241), you are very welcome to create further value items to document such cases in more detail as you wish. Jheald (talk) 15:44, 14 November 2015 (UTC)[reply]

@Gymel:. To add to the above, I think the "single value" constraint report does ignore deprecated values. The 93 multiple values currently reported is similar to the number from several weeks ago (it was actually 97 then) -- as far as I can see, it represents genuine unmerged duplicates on the PCF site, and doesn't seem to have gone up. Jheald (talk) 15:48, 14 November 2015 (UTC)[reply]

Here's the start of the thread on wikidata-l : [1]

The discussion also continued into the next month : [2] Jheald (talk) 15:57, 14 November 2015 (UTC)[reply]

Interesting, I will pursue that: For VIAF ID (P214) we are sometimes marking identifiers as deprecated if the VIAF cluster exists but is not usable since it conflates different persons and an alternative cluster for that person also exists. VIAF may act on that findings and it would be good to know if the constraint report is not complete once one wants to reaccess these cases. -- Gymel (talk) 15:58, 14 November 2015 (UTC)[reply]

OK, I'm not impressed by the discussion on the mailing list. As I said before I can see use cases for keeping deprecated identifiers, but one has to differentiate:

VIAF was given as example several times: Every month they automatically cluster and recluster their consitutent entires, currently they record >7M redirects (targetting about 28M entries) and provide resolution services. They also provide a change history for any single cluster. Given that Wikidata also has a version history for items actively recording obsolete identifiers here seems overkill.
Use cases of outdated information are construed and Wikidata should somehow step in so that the providers of the original data can be asked "what happened" (but not be bothered at the same time): Well, those utilizing the outdated identifiers could aks directly (increasing the pressure on the providers to operate more carefully). Wikidata could only serve as a place for acknowledging that an identifier indeed did exist and does not exist any more. However for these Wikidata cannot be as exhaustive as for valid ones.
Admittedly many data providers should invest more into persistence of identifiers, e.g. by at least "supporting" them by redirects. But those who do that usually have an interest of re-users eventually migrating to up-to date values. Wikidata IMHO should not thwart that by establishing an one-stop solution for the abselutely lazy.
My RKD example above shows that "support" will have limits: Some things will simply go because they shouldn't ever have been assigned an identifier (from the provider's point of view). That's the downside of presenting provisional entries to the public which IMHO generally is a good thing
Some sites like BBC YP are way too sloppy with their handling of what we perceive as identifiers. But are we really in a position to remedy that? You stated that you have recorded the 50 or so obsolete identifiers you deemed important here. But the actual number might me much higher and - as said above - what we can record is only the arbitrary difference between the unknown point in time some data was harvested and today.
Last, not least: Unactionable identifiers of that kind cannot be verified (but perhaps in the Your Paintings case by a link to the internet archive). Common opinion here is that identifiers don't have to be sourced, because they can be immediately verified again at any given time. Thats obviously not the case here! -- Gymel (talk) 16:42, 14 November 2015 (UTC)[reply]

@Gymel: I have recorded all the obsolete identifiers I knew about, that I have so far been able to identify items for, based on the pages in this series, the identifier columns in which are based on Magnus's (or Jane's) original scrape in 2012.

You are correct, that these may no longer be verifiable and can no longer be confirmed. Mistakes may have crept in. But so what? They are dead links and marked as such. If a copying error has crept in, the worst case scenario is that then somebody may not be able to match their old reference link to our old reference link. That doesn't take away from the positive side, that in as many cases as possible, it will be possible for somebody to match their old dead reference to our old dead reference, and mostly we should also be able to give them a live new reference. Jheald (talk) 16:56, 14 November 2015 (UTC)[reply]

in support of User:Snipre and issue of (uncontrolled) bot imports from wikipedias[edit]

Would you be happy if some, not involed, changed you topic? --Succu (talk) 20:10, 19 November 2015 (UTC)[reply]

@Succu: It's a project page. It involves everybody; and should have an appropriate neutral header. Jheald (talk) 20:17, 19 November 2015 (UTC)[reply]

Really? Any hint where I can find this rule? --Succu (talk) 20:20, 19 November 2015 (UTC)[reply]

@Succu: It's common sense, and happens all the time. Nobody 'owns' the header of a section of a public page. It should be whatever best, most neutrally and most succinctly tells the reader what follows, and encourages participation from all points of view. I would revert again, if I hadn't hit the 3 edit limit, because the present header is simply not appropriate, and would also be far clearer if shortened. Jheald (talk) 20:25, 19 November 2015 (UTC)[reply]

@Succu: But if you want a reference, here's the en-wiki guidance from en:Wikipedia:Talk_page_guidelines#Editing_comments,

Section headings: Because threads are shared by multiple editors (regardless how many have posted so far), no one, including the original poster, "owns" a talk page discussion or its heading. It is generally acceptable to change headings when a better header is appropriate, e.g., one more descriptive of the content of the discussion or the issue discussed, less one-sided, more appropriate for accessibility reasons, etc.

Wikidata may not have yet the same depth of conduct guidance, but the broad principle still makes sense. Jheald (talk) 20:30, 19 November 2015 (UTC)[reply]

The unreflected export of „rules“ of your home community is not very helpful. At dewiki we normaly do not change the heading of a discussion (Kmhkmh). Especially if we are not involved in the discussion, Multichill. --Succu (talk) 20:51, 19 November 2015 (UTC)[reply]

Preferred rank[edit]

Hi,

I'm afraid I have no idea on how bots work, SPARQL and so on. I made these changes because the template Spanish Wikipedia uses for national sub-entities has changed and shows all instances as a subtitle. You can check out this problem on the Frankfurt article. Users who made those changes in the template are Agabi10 and Metrónomo. They suggested selecting «preferred rank» so that it only shows those values, and it seems to be solving solving these problems we have now in nearly every city article. I understand this has caused some problems with bots on Wikidata but, as I said, unfortunately I have no idea on bot operation or how to edit templates. Can you speak Spanish? If so, it would be useful if you read this talk page and further discuss the issue with them. Anyway, I'm going to tell them and hope you can work out this problem together.

Meanwhile, I stop my edits until a solution is found. Greenny (talk) 15:25, 20 November 2015 (UTC)[reply]

Done, you can check the talk here. As I deduce from your userpage that you can't speak Spanish, I've encouraged them to write in English from now on. Greenny (talk) 15:34, 20 November 2015 (UTC)[reply]

Hir and Bron[edit]

Actually I read "Hir" in an English Start Trek novell some time ago. Captain Riker and the starship Titan visited a planet of alien invertebrate (Q43806) with only one sex. Instead of "Him" or "Her" they said "Hir".

And Saga says Hen in Swedish, something her Danish college dislike. The Swedish word is considered as a gender-neutral version of "Hon" (She) and "Han" (He). The word is widely used in media and is today included in wordbooks. Personally, I think that word still isn't neutral enough, since it is promoted by political groups. I guess the word is imported from Finnish, which do not have genders in the same way as our German-derived languages have.

I followed Bron/The Bridge last season, but stoped to watch this season, since I thought it was to much of violence present. My post-traumatic stress disorder (Q202387) become worse... -- Innocent bystander (talk) 16:10, 25 November 2015 (UTC)[reply]

@Innocent bystander: Sorry to hear that. In the pre-series publicity, I thought I had read the lead writer saying they thought they should dial down the body count this series, since they thought it had become a bit excessive the last couple of times. I'll just have to see how it goes -- they do like to pull surprises! Jheald (talk) 16:30, 25 November 2015 (UTC)[reply]

I think there is one episode left here (this sunday) and my wife still follows it. I do not know if the number of bodies have increased or decreased and we are maybe not shown so much violence within the TV-frame. But the description of such things as missing body parts and how they have been removed is a more efficient way to give me new nightmares than many other ways to describe violence. That is the good thing with Star Trek novels. The close combats are few. -- Innocent bystander (talk) 17:26, 25 November 2015 (UTC)[reply]

Second Severn Crossing[edit]

Hi, I'm confused by this edit to Second Severn Crossing (Q1287969). Surely the bridge is in all of England, Wales, Monmouthshire and South Gloucestershire. However if only the lowest level should be included then why retain Wales? Thryduulf (talk: local | en.wp | en.wikt) 14:55, 28 November 2015 (UTC)[reply]

@Thryduulf: I was running an automated process to remove all located in the administrative territorial entity (P131) = England (Q21) when there was also an English county given. The same could be done for Wales, but one step at a time... (though I have now removed Wales in this case).

The Severn Bridge may be a special case, as it joins two different nations. So perhaps, in this case, England & Wales might be justified. But for most places, if we already have that the country = the UK, and the county, then England as well seemed just a distraction. Usually located in the administrative territorial entity (P131) = England is a sign that further refinement is needed. Jheald (talk) 15:04, 28 November 2015 (UTC)[reply]

Thanks for the explanation, it makes sense now. Thryduulf (talk: local | en.wp | en.wikt) 15:09, 28 November 2015 (UTC)[reply]

Wikidata:Database reports/Wikipedia versions[edit]

Dear Jheald; I have seen you contributing to a lot at pages linked to https://www.wikidata.org/?curid=24028442# (as for today titled Wikipedia versions but intended in general for WMF projects). I would be happy if you can review the properties of these pages, create the missing Wikibook and Wikiversity project pages, comment on user:I18n/sandbox (where you may find many usefull queries) and comment there with new / additional ideas. Best regards Gangleri also aka I18n (talk) 19:54, 9 January 2016 (UTC)[reply]

Hi! I want to let you know that the number of Wikidata:Database reports/WMF projects has increased to more then 385. You may be interested in adding labels and descriptions in other anguages, follow the discussion at property talk:P1800 and comment there. Best regards Gangleri also aka I18n (talk) 02:59, 12 January 2016 (UTC)[reply]

BBC Your Paintings[edit]

...is called Art UK as of today. The properties need to be adjusted. --Jane023 (talk) 09:33, 24 February 2016 (UTC)[reply]

I informed Magnus and he is converting them now - 36k links!! --Jane023 (talk) 10:26, 24 February 2016 (UTC)[reply]

@Jane023: So: extraordinarily ugly new site, extraordinarily ugly new name, and they changed rafts of identifiers. Are these guys a complete bunch of muppets?

(And I see they don't even own their own twitter handle, so have to use this instead!)

My watchlist is lighting up with lots of old identifiers that Magnus is removing. Do you know if he will be replacing them with new ones?

And is there an old-to-new conversion list, so I can update the pages at en:Wikipedia:GLAM/Your paintings/header ?

Thanks for the heads-up, Jheald (talk) 14:46, 24 February 2016 (UTC)[reply]

Ask Magnus for a copy of his list? He already finished the conversion and will start updating 16k new links. I was very annoyed as well (I was informed by news letter yesterday). --Jane023 (talk) 15:28, 24 February 2016 (UTC)[reply]

links to random items[edit]

Hi Jheald,

From tome to time I come upon items on categories where you have put in links to random, unrelated items (for example here), apparently because these have a name with the same spelling. Items on categories are not disambiguation pages, but are there to gather and connect sitelinks to categories on the same topic. - Brya (talk) 05:33, 18 May 2016 (UTC)[reply]

IIIF-tool for the property relative position within image (P2677)[edit]

Hello James! Thanks a lot for the property relative position within image (P2677). I'm still not using but it could be a great improvment on visual artworks. One issue is that we need a tool to help us to provide data. So I made a little fork of the Liz Fischer's IIIF-tool created for image annotation on IIIF standard : Cropper. It's a just a draft (I'm not a developper) of what we could have. Maybe that could be interesting for you. Best regards --Shonagon (talk) 01:35, 22 May 2016 (UTC)[reply]

Example of use: Virgin among the Virgins (Q21013224) --Shonagon (talk) 02:14, 22 May 2016 (UTC)[reply]

Hello Jheald. An additional development to display the image fragments of an artwork has been done. It's multingual; so it's possible to display labels and links to Wikipedia in differents languages. Surely more robust tool could be done but we have now a first interface to edit and display image artwork annotation, which is essential for using relative position within image (P2677). Best regards --Shonagon (talk) 07:26, 28 June 2016 (UTC)[reply]

Dorset description[edit]

Hey Jheald,

Just wanted to let you know I partially reverted this change. The text about "Q21694711" was showing up in search results on Google, Wikipedia.org, top of the article in the Wikipedia app on Android and iOS, and other places that utilise Wikidata descriptions. Thanks! --Krinkle (talk) 03:06, 20 July 2016 (UTC)[reply]

Best way to get sitelinks for lots of items at once[edit]

Hi! If you're interested in Special:Permalink/243943252#Best way to get sitelinks for lots of items at once ? in probably much more better way, then there is one. Use SPARQL. Query, you can get data in json format, by adding that query in this link in {} place: https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={}&format=json. You can of course include other needed columns there. --Edgars2007 (talk) 10:41, 4 September 2016 (UTC)[reply]

Unused property[edit]

This is a kind reminder that the following property was created more than six months ago: metasubclass of (P2445). As of today, this property is used on less than five items. As the proposer of this property you probably want to change the unfortunate situation by adding a few statements to items. --Pasleim (talk) 19:15, 17 January 2017 (UTC)[reply]

Art UK links[edit]

Hi James, you mentioned ART UK on Commons. One thing I realized with Art UK artist ID (P1367) and Art UK artwork ID (P1679) is that their links are rather unstable. When the name of the artwork changes, so does the url breaking our links. That's a shame because for artworks they do seem to have an unique id. See for example http://artuk.org/discover/artworks/bacchus-and-ariadne-114356, the id is 114356 (you can find it in the HTML source too). Wouldn't it be nice to be able to just records that integer here instead of "bacchus-and-ariadne-114356"? Do you happen to have any contacts at Art UK you can use? I can easily import several thousand Art UK artwork ID (P1679) links, but I'm a bit reluctant to do that now with the unstable links. Multichill (talk) 11:08, 26 January 2017 (UTC)[reply]

Hi @Multichill:

I've just this morning had an email back from User:Charles Matthews. He and wmuk:User:Richard Nevell (WMUK) from WMUK met with some Art UK people last month.

...

I think the painter identifiers we have now are broadly correct -- I will do a verification run to confirm later today, or in the next couple of days.

As for painting identifiers, I was thinking about making a trial run on some of the collections we currently have the best accession number coverage for -- eg National Gallery, National Portrait Gallery, Tate -- but I am very happy to coordinate with you.

As to identifier stability, the important thing of course is to be able to serve people URLs that work. With luck, the big identifier change was when they moved to their new site. Beyond that, until they publish any regular list of recent identifier changes, then all I think we can do is regular verification runs, and use the "Accessed" qualifier to make a not of what date the idenfier was valid. It would be nice if they had a more stable scheme; maybe that will come, and we do need to keep keep knocking on their door, I think. But it seems we first need to prove ourselves more. Jheald (talk) 13:57, 26 January 2017 (UTC)[reply]

@Multichill: I can explain more about the meeting, but in a mail. Charles Matthews (talk) 14:51, 26 January 2017 (UTC)[reply]

Trail run sounds like a plan. I'll write some import code, I already have most of it so should be done soon. I'll share the link to github here, will be in Python

@Charles Matthews: please do :-) Multichill (talk) 15:40, 26 January 2017 (UTC)[reply]

@Multichill: I was just going to add Art UK painting identifiers for paintings where we already had accession numbers, and then just add them using QuickStatements. But it would be easy enough to pass you what doesn't match. Jheald (talk) 15:47, 26 January 2017 (UTC)[reply]

Ok, bot and example edit. It's running now. Multichill (talk) 16:41, 26 January 2017 (UTC)[reply]

Thanks. Jheald (talk) 16:47, 26 January 2017 (UTC)[reply]

I'm importing quite a few new links. I updated the constraints on Property talk:P1679 to catch more useful stuff. Might need a bit more tweaking. Multichill (talk) 19:36, 26 January 2017 (UTC)[reply]

@Multichill: Grrr... Just done the validation scrape. Over 250 no-longer-working identifiers to investigate. (BTW I saw you're asking Magnus for a full rescrape for Mix'n'Match -- I suppose that adapt to identifiers that have been updated here in the meantime.)Jheald (talk) 22:09, 31 January 2017 (UTC)[reply]

Quantity on ART UK links[edit]

Hi James, this seems wrong. Quantity on an identifier of 4? You're trying to say art uk has 4 works, but this is not the way to do it. Also doing such a large controversial import without discussion is not the best way to go or did I miss the discussion somewhere? Multichill (talk) 16:33, 6 February 2017 (UTC)[reply]

You are running a bot job, someone objects. You should pause and discus it. Multichill (talk) 17:25, 6 February 2017 (UTC)[reply]

@Multichill: Stopped. (Sorry I didn't see your message sooner).

So, where and how to identify the number of works Art UK has in its catalogue under this identifier?

Because the same artist may sometimes have more that one identifier at Art UK, and this information relates specifically to the identifier rather than the artist, it seems to me the appropriate place is as a qualifier on the identifier.

So then, which property to use? quantity (P1114) seems the most generic, for a "quantity, total number, number of instances, number, amount, total" as its list of (English-language) equivalent names gives for it.

In particular, this is the "number of instances" for the identifier in the Art UK database -- so if P1114 is intended for use including "number of instances", this seems entirely appropriate.

But if there is an alternative that you would suggest, that you think would be more appropriate, then I am very open to discussion.

I would like to get on with things though, because Art UK have been complaining they haven't been getting enough hits from us; so I'd like to be revising and rolling out a template on en-wiki including this information as soon as I can get it done. Jheald (talk) 17:46, 6 February 2017 (UTC)[reply]

Got distracted by other things. I did this change to make it a bit clearer, but still doesn't feel right. At first I thought you meant the person had 4 Art UK artist ID (P1367) links. I had to check the link to realize you meant that on the linked page it had 4 paintings.

I'm not sure you should even document it this way. In some point in the future we'll have all art uk works and you can just do a query to get this information. Multichill (talk) 20:17, 6 February 2017 (UTC)[reply]

@Multichill: Even if we did, they wouldn't necessarily have it, so the information would still be germane in documenting their database. Besides, I want to use this information in a WP template this week, not at some far distant point in the misty future.

I'm not sure your edit helps, because the "of" is placed as a qualifier on the identifier, not on the number of works. At some point in the future, when the data is next updated and re-written, the ordering could get changed; or other qualifiers might get added and upset the order, eg one for "preferred form of name" (in this database, associated with this identifier). It doesn't feel safe to me to rely that WD is always going to show the same qualifiers always in the same particular order. Jheald (talk) 20:28, 6 February 2017 (UTC)[reply]

Your putting time pressure on this. My experience in (wiki) projects is that this hurts the quality. I would appreciate if you could discus this in a broader venue before adding more. Multichill (talk) 20:40, 6 February 2017 (UTC)[reply]

@Multichill: Okay. Where do you suggest? Jheald (talk) 20:43, 6 February 2017 (UTC)[reply]

What about Property talk:P1367 and a link at Wikidata:Project chat to get some people to comment on it? Multichill (talk) 20:46, 6 February 2017 (UTC)[reply]

Mind to stop your silly additions? --Succu (talk) 22:15, 9 February 2017 (UTC)[reply]

@Succu: Task is now 95% complete, so I am going to finish it. It makes no sense to leave the last 5% not done. Jheald (talk) 22:21, 9 February 2017 (UTC)[reply]

Cool, than we have to remove 100% of query results at a certain point of time. --Succu (talk) 22:27, 9 February 2017 (UTC)[reply]

@Succu: I'm sorry, what are you talking about? Jheald (talk) 22:32, 9 February 2017 (UTC)[reply]

But I am curious as to why you think the addition is "silly" ? Jheald (talk) 22:23, 9 February 2017 (UTC)[reply]

Are you prepared to update this fixed number when the count at Art UK (Q7257339) is updatend? We have queries for this. --Succu (talk) 22:35, 9 February 2017 (UTC)[reply]

@Succu: And how exactly do you propose querying something which is not stored on Wikidata? Jheald (talk) 22:37, 9 February 2017 (UTC)[reply]

OK, vice versa. What do you want to express with this addtion? --Succu (talk) 22:48, 9 February 2017 (UTC)[reply]

@Succu: It expresses that Art UK (a catalogue of UK public collections) has 16 paintings by Esther Tyson (Q21458718), compared to eg only 1 by Hendrick van Zuylen (Q28431499) Jheald (talk) 22:58, 9 February 2017 (UTC)[reply]

... which means I can now write queries eg like this, for the total number of works at Art UK by painters that we have items for: tinyurl.com/zgjvucp. Jheald (talk) 00:53, 10 February 2017 (UTC)[reply]

+1. Jheald, how do you plan to update those numbers every time when any item is added to the catalogue? Or is there plan to have those numbers obsolete forever? --Infovarius (talk) 16:12, 2 March 2017 (UTC)[reply]

@Infovarius: The Art UK external IDs are only mildly stable -- they change if Art UK revise the name for an artist, or modify an artist's dates, or e.g. add a date of death. I asked them whether they could publish a regular record of ID changes, but apparently they can't -- apparently they don't hold the data centrally. It's only quite a small proportion that get changed; but it does mean that at regular intervals we will need to re-check the ID links to make sure they still work; we can check the quantity data at the same time.

The quantities probably won't change much -- Art UK was set up to be a survey of oil-on-canvas works in publicly-owned collections, and that survey is now complete. But they may change a little: Art UK may in future add some sculpture, and a limited number of works on paper.

So it's possible that the numbers may go out of date. But there is a retrieved (P813) date in the referencing for each statement, so it shoulf always be possible to tell how recently the data was checked. Jheald (talk) 16:29, 2 March 2017 (UTC)[reply]

GSS[edit]

I see that you've been removing GSS codes from a number of items, e.g. [3]. What is the reason for this? This property is currently used by w:zh-yue:Template:Infobox English county. Deryck Chan (talk) 14:15, 16 March 2017 (UTC)[reply]

Hi @Deryck Chan: There were a number of GSS codes that were on the wrong items, eg Essex (Q23240) -- they were on items for the ceremonial counties, when (as is clear eg from how the map if you follow the GSS links excludes eg Southend-on-Sea and Thurrock), they ought to be on the items for the County Council areas, eg Essex (Q21272241).

This also applies to most of the other identifiers on the ceremonial counties, eg FIPS 10-4 (countries and regions) (P901), OpenStreetMap relation ID (P402), NUTS code (P605) etc, which should also be moved across in the near future.

Compare en:Essex, where the facts that apply to the non-metropolitan county are shown in a different part of the infobox to those that apply to the ceremonial county.

en-wiki combines the two; but to make co-referencing and properties like located in the administrative territorial entity (P131) work properly, we have two different items for the two concepts.

Hope this makes some sense now. All best, Jheald (talk) 15:18, 16 March 2017 (UTC)[reply]

Golden Hind[edit]

Can we continue geeking out about the Golden Hind here? I think I'm getting pretty far down into the weeds for the Project Chat page. :-)

I'm going to keep digging for a end date for the original. And for the actual citation in Stow - having an oddly difficult time finding it. - PKM (talk) 00:42, 24 March 2017 (UTC)[reply]

"The original Golden Hinde remained in Deptford for about 100 years, until it started to disintegrate and had to be broken up." it says here. - PKM (talk) 00:47, 24 March 2017 (UTC)[reply]

And here's a citation for inception date, built place, commissioned by, the wharf where it was displayed, and even "ship museum" if you want to use it! http://goldenhind.co.uk/pages/education/the-original-golden-hind/88 - PKM (talk) 00:53, 24 March 2017 (UTC)[reply]

And bingo! "AD 1668. John Davies, of Camberwell, the storekeeper of Deptford dockyard, caused a chair to be made out of the remains of the ship, 'The Golden Hind' ... here. - PKM (talk) 01:01, 24 March 2017 (UTC)[reply]

@PKM: Superb! Hope you're adding this to en-wiki as well. 1:30 am here, so I'm turning in; but really pleased you're on the case! Jheald (talk) 01:32, 24 March 2017 (UTC)[reply]

Will do, soon. - PKM (talk) 05:34, 24 March 2017 (UTC)[reply]

EN Wiki updated and I found a source for the date of renaming the Pelican to Golden Hind <does happy dance>. Lots of updates made at Golden Hind (Q546198) since I had all the references open anyway. - PKM (talk) 19:47, 24 March 2017 (UTC)[reply]

@PKM: That's looking really good now. Thank you so much. Jheald (talk) 20:46, 24 March 2017 (UTC)[reply]

CPs[edit]

First thanks for all the work your doing for adding statements for parishes but I just wandered what to do with some where there are 2 items but the main one has statements for both, for example Q2055282 (settlement and parish) and Q24674398 (parish only). While I do think we should probably have separate items for districts even if they have similar boundaries (like Exeter) I'd suggest that it is unnecessary for parishes (except for cases like Q1002828 and Q21347409 where the parish doesn't include the settlement). Although I think cases like Q637298 and Q24662858 seem OK as it is a town and the ONS population is much smaller than the parishes. The reason why some parishes have 2 items is because of Lsjbot, who sometimes created pages for the settlement as well as the parish, maybe they should be marked with Property:P460 or Q17362920, although I think items are only true duplicates if they are unquestionably on the same topic not just where a distinction has been made. Why don't you also do the same thing with JhealdBatch for wards as well, as cases like Bristol Q21693433 don't have any parishes, I did create items for wards but most don't have any (although Bristol does). Lucywood (talk) 20:06, 31 March 2017 (UTC)[reply]

Hi @Lucywood: Sorry not to get back to you sooner. My wife and I were having a long weekend away from the Internet. (Overdue and very much needed!)

With regard to the CPs, I do hope we're getting there. Some key queries I have been watching:

tinyurl.com/n43ysz2 - Latest count of number of distinct, non-deprecated GSS codes for civil parishes. Latest value: 10123 ; should be: 10449 => still to find: 326
tinyurl.com/mnoklwy - Items marked as current CPs, that do not have GSS codes. (Currently: 287). I do find this quite a brutally slow list to work though.

Some were CPs, that need to have a end time (P582) qualifier added to their P31. Some are in fact civil parish group (Q29043077)s, though editors on en-wiki may not be aware of the fact. Some are completely other things altogether (eg public baths, etc). Some do match entries in the GSS list, but the formal name that GSS has (and usually Commons too, following the GSS) may be slightly different, sometimes opening up questions of what to link to what, and also whether or not the Commons category tree is accurately reflecting this.

But you are quite right that there is also a very real issue with some CPs being claimed by multiple entries here. (Some of which I may have created or added to, by tagging settlements as CPs). The following queries try to reveal this:

tinyurl.com/mw3e4pb GSS values claimed by more than one item. (Currently: 84).
tinyurl.com/mfdlwjl Commons categories for CPs claimed by more than one item. (Currently: 81).
tinyurl.com/kkuz36e CPs that are in areas that are also claimed as CPs. (Currently: 72).

tinyurl.com/mts62qu A query that tries to combine the above. (Currently: 142).

This is partly what I opened the discussion at User talk:Kelly to try to think through.

The last query appears to reveal broadly two groups -- one is (mostly) parishes in South Kesteven, where a settlement item and a parish item share a Commons category; the other, almost completely distinct case, is where there are two items both marked as CPs, with one usually a P131 of the other.

My own view is that the link from Commons to CP items here is very valuable, eg for us to be able to use the very categorisation there to infer statements, to add to items here (ie: which parish is a geographical item in). We would lose out if items here did not have a Commons link.

Equally the link to/from Commons categories via Wikidata items from/to Wikipedia items is clearly very valuable.

User:Nilfanion makes the interesting point that ultimately Commons may be quite happy to have distinct categories for parishes and for settlements of the same name. However, the fact remains that for at the moment Commons does not for the most part make such a distinction, and that making and populating such a split will/would be no small amount of work.

So my own view is that for the moment it probably does make sense to combine items for settlements and parishes, until such time as they get split on Commons. The other thing that weighs with me is that so many of the current properties seem to be quite relevant to both parishes and settlements -- eg KEPN ID (P3639), OpenDomesday settlement ID (P3118), Vision of Britain place ID (P3616), British History Online VCH ID (P3628) -- I'd probably place most of these on the settlement, if forced to choose, but it's not a clear-cut thing.

On the other hand I am reluctant to undo somebody else's work, and merge back the items that User:Kelly has split out for the South Kesteven parishes. (And similarly User:Robevans123 for communities in Anglesey).

The bulk of the others, as you note above, appear for the most part to reflect sv-wiki and ceb-wiki stub articles created by Lsjbot, whose operator I understand has since retired from Wikipedia editing.

So what to do with these? said to be the same as (P460) and Wikimedia duplicated page (Q17362920) are both interesting options. But if people are content that we don't try to force there to be separate items, just because of stubs created by a bot, then perhaps the best way forward may just be to kill the stubs, by redirecting the stubs on sv-wiki and ceb-wiki (merging any content that seems particularly useful to keep), allowing the corresponding items to be merged here. Would anybody have any objection to this. (And is there anywhere else we should ask first?)

With regard to wards, I have tended to keep those separate from parishes (and they have different GSS codes, starting "E05"). Around 18 months ago, when I asked the UK project on en-wiki what was most valuable to have in the P131 hierarchy for UK places here, the view was that parishes are useful, because they have typically been comparatively stable over comparatively long periods of time; whereas often wards seem to be much less stable: much more likely to be re-drawn as population numbers change. So I haven't seen it as such a priority to create and populate items for wards. I don't think they should be combined with items for CPs; but a new property "coterminous with" might be useful to connect them with parishes (& v.v.), in the occasions where they do have equivalent boundaries. Jheald (talk) 19:57, 3 April 2017 (UTC)[reply]

Whether it makes sense to systematically create wards for unparished areas, ie (typically) areas that were former metropolitan boroughs, I am not sure. For the City of London, I think: certainly -- and I think these items all exist & have Commons cats (though still need GSS codes). For other areas, eg Bristol, I don't know. Clearly if en-wiki has articles, we will have items, and they should be described as well as we can. Beyond that, my inclination would be to see how far Commons goes at the moment. If Commons has categories, particularly if they are well populated, it probably makes sense to have items here. If Commons doesn't have categories, then maybe there are other higher priorities for work here. Jheald (talk) 20:14, 3 April 2017 (UTC)[reply]

I think there is a distaste in a significant proportion of both en.wp and Commons communities for using wards for localisation (apart from the City of London). There are several drawbacks:

Wards are very variable units. As an example in Bristol, compare the 2009 wards of Hartcliffe and Bishopworth to the 2016 wards of Hartcliffe & Withywood, and Bishopsworth.The two wards in 2009 split their combined area into a West and East, while the two wards in 2016 cover the exact same area, but are a North/South split
Wards have low recognition. If you asked someone where they live, the ward is unlikely to be quoted and an area of the city is more likely to be quoted.
When both exist, there is a complex relationship between CPs and wards. Sometimes one is a subset of the other, sometimes not. That makes a logical hierarchy awkward, as CPs are desirable.

In the absence of anything better, Commons sometimes goes to street-level to provide the granular localisation.

At the same time, there is a strong desire to get localisation within the unparished areas. One possibility is shown by my work in commons:Category:Districts of Plymouth, which basically splits the city into the regions known by residents (and all could potentially have WP articles). A better solution might to use the city council's neighbourhoods which are defined in terms of community identity and natural boundaries, and unlike people's perceptions are objectively defined. Following the Localism Act 2011, Neighbourhood Areas have been established in many large cities, when they exist these might be ideal.--Nilfanion (talk) 23:51, 3 April 2017 (UTC)[reply]

wards[edit]

@Lucywood: Despite User:Nilfanion's cautions above, I have started adding GSS code (2011) (P836) links to items marked as ward (Q1195098) or ward or electoral division of the United Kingdom (Q589282), on the basis that if that is how items here have been marked, then we might as well link to their boundaries etc.

I also have a extracted a list of wards from sub-categories of en:Category:Wards_of_England, which should probably be marked up as such here, since in many cases the items have no existing P31. (Though in some cases they are identified as some sort of human settlement (Q486972)).

A further complication that I now realise (on top of all Nilfanion has written), that I had not appreciated is that there can be a distinct difference between electoral divisions used to elect County councillors (see eg item note at the OS), and the wards used to elect district councillors -- I should have read en:Wards and electoral divisions of the United Kingdom more closely. I had been happily assuming that everything was the latter, but then I hit Pulborough (Q7259268), this link on OS OpenData; which is significantly different from the district ward "Pulborough and Coldwatham" this link -- in each case, click on the value for 'Extent' to compare the boundaries.

I am hoping that the only county electoral divisions that have got into Wikidata are those that are subcategories of en:Category:Electoral divisions of England -- but it would be useful if you confirm. Jheald (talk) 19:08, 16 April 2017 (UTC)[reply]

@Lucywood: There were also a couple of wards that you added that I've had a bit of trouble identifying. Is there any help you can give me with either of the following?

Courtfield (Q28938159). Said to be in LB Brent. The only ward I could find was in Kensington & Chelsea: [4]
Devon (Q27889472). There's one in Newark-on-Trent, in Nottinghamshire [5]. But I couldn't find one in South Kesteven, Lincolnshire.

Jheald (talk) 20:29, 16 April 2017 (UTC)[reply]

The first one was probably a mistake, sorry, corrected, the second one used to exist, see [6]. As you know many change quickly but I was using mainly the Ordnance Survey data. Lucywood (talk) 07:49, 17 April 2017 (UTC)[reply]

@Lucywood: Thanks. I've found its dates now, thanks to data from the Elections Centre at Plymouth University [7]: appears 1979, disappears 1999 -- I had been thrown because the en-wiki page en:List_of_electoral_wards_in_Lincolnshire#South_Kesteven only had names back to 1999.

It seems quite a random sort of an item to have created. Out of interest, has there been a system or a pattern to the wards you created items for? Jheald (talk) 10:10, 17 April 2017 (UTC)[reply]

No there wasn't really apart from Suffolk, Essex and Cumbria and some unusual names like Devon. However as I was suggesting why not use your bot to create items for all of them? Lucywood (talk) 12:59, 17 April 2017 (UTC)[reply]

Just to add I see no harm in creating them on Wikidata, but I can't see them getting much use either. However, be aware that there are several classes of wards, and these should be given distinct groupings - ward or electoral division of the United Kingdom (Q589282) may not be a sensible concept.

The various types include:

County electoral divisions (eg Tonbridge for Kent County Council)
Unitary Authority electoral divisions (eg Bugle for Cornwall Council)
"Normal" wards (eg Axminster Rural for East Devon District Council)
English Parish Council wards (eg North for Tavistock Town Council)
Welsh Community Council wards (eg Plymouth for Penarth Town Council)

AFAIK, ONS codes are only applied to the wards that elect to councils with district-level (or unitary) powers.--Nilfanion (talk) 18:12, 22 April 2017 (UTC)[reply]

Looking to do DNB queries and data population ...[edit]

Here to seek some help.

For enWS, we have mechanism to check that each article of DNB00/01/12 is in WD (done), and we can run a check to note that each WD item has a main subject (excluding the instances of DNB redirect). What I would like to now ensure that we have reciprocal of DNB item/main subject:person item <-> person item/described by:DNB00/01/12 (qualified) stated in:DNB item. Noting that we number of instances where some have a directed described by "DNB item" often as duplicate that we need to remove after we are sure that we have the correct "described by" statements in place. I suspect that we are going to need to do SPARQL queries to work it out.

Hope that you can help. Thanks. — billinghurst sDrewth 11:18, 6 May 2017 (UTC)[reply]

Follow-up, once we have the relationships in place, please hold the queries as then we can look to populate family names from the articles "Surname, Given name ... (DNBXX)" through to the respective people items.

@billinghurst: Let's see if I can translate the above into queries, to see whether I have understood correctly what you've told me (and what you are looking for).

So currently we have 30,684 items tinyurl.com/lmg9aop that are published in (P1433) Dictionary of National Biography, 1885–1900 (Q15987216) or Dictionary of National Biography, first supplement (Q16014700) or Dictionary of National Biography, second supplement (Q16014697); and these all have a link to en-wikisource tinyurl.com/kc82p8k (number doesn't change if we add that latter requirement). From what you have written above I infer that you are able independently to confirm that this is the number that there should be.

That number falls to 30,289 if we require that each article-item has a main subject (P921) tinyurl.com/m4686wh.

The remaining 404 tinyurl.com/lwrzpy3 are redirects at wikisource, eg Audelay (DNB00) (Q19052970)[8], and you believe that this is the number that there should be.

These 404 are all tagged as instance of (P31) DNB redirect page (Q19648608) (tinyurl.com/mgeaw9o)

However, only 22,938 (tinyurl.com/ks9lmxr) of those 30,289 subject items have described by source (P1343) the expected release of the DNB; leaving 7351 which do not tinyurl.com/mabf22y -- however this information could now be added from the results of this query using Quick Statements (although there may be a few more checks we want to make first).

Updated, to exclude redirects: tinyurl.com/mzn276h (7219)

Of the 22,938 there are 22,616 that have an appropriate stated in (P248) qualifier linking back to the article item (tinyurl.com/kwl5wnq).

This leaves 451 that don't have a stated in (P248) qualifier linking back to the article item. (tinyurl.com/kbs9xqy).

However, looking at this list reveals some oddities. For example:

should subject-item William Frederick Wells (Q8009414) -> article-item Wellsted, James Raymond (Q19064860) ?
should subject-item Joseph Collyer (Q6282206) -> both article-item Collyer, Joseph (1748-1827) (DNB00) (Q19048153) and Collyer, Joseph (d.1776) (DNB00) (Q19048161) ?

So there maybe some more checking needed on those main subject (P921) statements before adding the inverses in bulk.

Is that the sort of investigation you were looking for ? Jheald (talk) 14:19, 6 May 2017 (UTC)[reply]

Correction on that last query. It was getting confused if there were two different DNB articles both describing (or being purported to describe) the same person.

Here's a revised query, with 323 hits, for when there is a link back to the right release of the DNB, but not the original article-item:

tinyurl.com/mxsjz35. Jheald (talk) 14:33, 6 May 2017 (UTC)[reply]

Pretty much. Let me look at fixing the errors firstly, then we can review where we are.

Note that with the DNB articles they can refer to multiple people, so we may not have a one to one relationship in that direction (though guess is that we may be missing numbers of those, and I have an inkling how to track) — billinghurst sDrewth 15:00, 6 May 2017 (UTC)[reply]

@billinghurst: Turns out that most of those 323 were redirects, eg Falconberg (d.1471) (DNB00) (Q19019837) (but which had their own "main subject" property, which is why they were being included).

Excluding the redirects brings the number down to 37 (tinyurl.com/lmqfau3) that have a "main subject", but where the subject does not have a "stated in" in turn. Jheald (talk) 15:09, 6 May 2017 (UTC)[reply]

For those 37, it looks as if some do have a "stated in", but it's been added as a reference not a qualifier (I was specifically looking for it as a qualifier). Let me know if you'd like an example query to look for this.

Also, there may be some pseudonyms (eg Sawtrey, James (DNB00) (Q19024116)), where there is a link-back from the subject to the subject's main DNB article, but not the DNB article for their pseudonym. Jheald (talk) 15:17, 6 May 2017 (UTC)[reply]

That is lots of twists and turns to get my head around at this late hour. I think that I need to consolidate and clean the oddballs first. I would also would like to standardise the DNB redirect set, 1) I don't think that they should have main subjects (they are redirects), 2) it seems worthwhile them having the "of" statement. — billinghurst sDrewth 15:21, 6 May 2017 (UTC)[reply]

I am happy for you to give me lists of inconsistent data approaches/errors/weirds and I will fix those. — billinghurst sDrewth 15:31, 6 May 2017 (UTC)[reply]

@billinghurst:: 23 where the "stated in" is in a reference, not a qualifier: tinyurl.com/nxf5e42

Done

16 "weird" tinyurl.com/ny5ujhx (might include some double-counting). Jheald (talk) 15:32, 6 May 2017 (UTC)

Done[reply]

@billinghurst: 816 redirects with no "of": tinyurl.com/l8qdmzj

some of which have a "main subject"; and some of the subject items have a "stated in". Jheald (talk) 15:48, 6 May 2017 (UTC)[reply]

Comment I am proposing to enWS that we delete the DNB redirect 'articles' and accordingly delete the wikidata items They are a redundancy that are placeholders for a book, we don't need them as we can manage by web means. So will park that cpt for the moment. — billinghurst sDrewth 05:53, 7 May 2017 (UTC)[reply]

@billinghurst: I have now added described by source (P1343) + stated in (P248) + imported from Wikimedia project (P143) Q20651139 for most of the 7000 subjects that didn't have it (see Special:Contributions/JhealdBatch), with the rest going in as we speak.

Will be away from my computer now until much later, but it should be easier to pick up & deal with anomalies once this lot are all in. Jheald (talk) 07:50, 7 May 2017 (UTC)[reply]

State of play[edit]

@billinghurst:

All article-items with published in (P1433) = a release of the DNB now have a corresponding item that links back to them with a described by source (P1343) + stated in (P248) + imported from Wikimedia project (P143) Q20651139 statement (query for exceptions: tinyurl.com/mabf22y):

The only wrinkle is Henry Elsynge (Q15072637) which does have a described by source (P1343) back to the DNB, but one that has been ranked deprecated, as "misinformation" -- see link on talk page for more.

I think that we can live with the qualification, it is what it is. — billinghurst sDrewth 14:12, 9 May 2017 (UTC)[reply]

@billinghurst: Also the following query, which looks for where the same subject item links back to more than one DNB item, might be worth checking through, just to make sure they're all kosher: tinyurl.com/m7bl37o; currently returns 155 rows. Jheald (talk) 12:40, 9 May 2017 (UTC)[reply]

Done fixed misapplied, and applied DNB redirects. — billinghurst sDrewth 12:56, 10 May 2017 (UTC)[reply]

Excellent query, shows up redirects, wrongly attributed and over-enthusiastic. Will take a little while to work through. — billinghurst sDrewth 14:12, 9 May 2017 (UTC)[reply]

@billinghurst: Though on reflection, it may be entirely appropriate that there are additional people and things that can be said to be "described by" a biographical article, in addition to its main subject. Cf this query for items that are not humans "described by" DNB articles tinyurl.com/kkdbsle -- most of which may be entirely appropriate.

So perhaps one should also (or first) look at this way round: tinyurl.com/mqqhuw6 -- articles which have more than one "main subject"; and/or tinyurl.com/mgd9t24 articles where the main subject is not human. Jheald (talk) 14:34, 9 May 2017 (UTC)[reply]

I have added a of (P642) qualifier for 321 DNB redirect page (Q19648608)-class items, that I could find a chain of main subject (P921) -> described by source (P1343) / stated in (P248) -> redirect-target for. (Query: tinyurl.com/l3ehjwl, none remaining) -- of course, these are only as good as the data in the chain, which I didn't hand-check; but the ones I did see seemed reasonable.

This leaves 487 such items with no "of" (query: tinyurl.com/l8qdmzj). Typically these have a "main subject" that is a title, eg "Earl of X", rather than a person. There are also a fair number with no main subject (P921).

I'll leave it to the project to consider whether keeping these redirects is useful. But it may be handy to keep them around, to make sure that e.g. such alt forms are reflected in aliases on the "main subject" item, etc.

I won't go adding back any more (DNB00)s etc (diff, diff). But if ppl do seriously want to get rid of these, there are about 30,000 to go... Jheald (talk) 12:22, 9 May 2017 (UTC)[reply]

Comment @Charles Matthews: for the local DNB project page we should grab the queries that are usable for ongoing checks and other maintenance. — billinghurst sDrewth 12:56, 10 May 2017 (UTC)[reply]

DNB articles[edit]

To note that ultimately the DNB articles will all be moved to be subpages of the work, not root level items where they currently sit. It is a quirk of the time when the project started that they sit with their suffix. To the suffix in items here, or with those that are subpages the guidance here is that that the title is kept simpler with clarification taking place in the description. So the DNB00/01/12 suffixes have been disappearing, though we have kept the years of life. For other works, we have been removing the book title from the subpage, and showing the article, and the name of the work in the descriptor. There are lots to come as adding them has been problematic to this point of time. — billinghurst sDrewth 11:50, 9 May 2017 (UTC)[reply]

I'll bow to whatever the project thinks best... Personally I do quite like the (DNBOO)s and similar on items, as scarecrows to stop people linking to items of the wrong sort (or merging them). But whatever people want. Jheald (talk) 12:26, 9 May 2017 (UTC)[reply]

I've argued against removing the suffix style in the past, on grounds of ease of search on Wikisource (which I use all the time). There is a possible technical fix in the search, there. Here, I'm a fan of the suffix style for the same reason as James. Charles Matthews (talk) 13:07, 10 May 2017 (UTC)[reply]

The Thames at Westminster (Q19660486)[edit]

I noticed that this one was created by Poulpybot and had no external link to a NT website, but of course these are also on Art Uk. I wonder if you know how to go through and update these (I didn't check how many have been created) with the Art UK links, but that would be a worthwhile thing to do, in my opinion. Jane023 (talk) 13:04, 21 September 2017 (UTC)[reply]

@Jane023:. Hmm. Looks like one can do a search for "National Trust" at Art UK to get pages like this, then follow the link to each pic to get the NT accession number (and NT URL where available), then match on the NT accession number against NT pics here. Shouldn't be too much of a challenge to script that, I'll put it on my to-do list, but I can't promise immediate action -- or would it be helpful to have these links urgently? Jheald (talk) 15:27, 21 September 2017 (UTC)[reply]

Seems we only have 25 National Trust paintings in the system at the moment though, tinyurl.com/ybng96om. A motley bunch, doesn't seem to be much rhyme or reason to them. Jheald (talk) 15:31, 21 September 2017 (UTC)[reply]

Thanks! No I don't need them urgently - I only noticed because I was working on Canaletto and picked up a few. I think the NT website has permanent urls, and the Art UK site does too, so I thought it might be a good idea to get the back end of both of these hooked up somehow with the Wikidata paintings (at least for the collections we have, so e.g. Tate, NPG, NG etc) Jane023 (talk) 15:38, 21 September 2017 (UTC)[reply]

Ha I see now that I have done most of these probably, working with Art UK images, such as The Sense of Taste (Q29569637). Jane023 (talk) 15:40, 21 September 2017 (UTC)[reply]

Shand Mason[edit]

I seem to have done something wrong and brought about this:[9]

I've tried to understand but I can't. How do I learn what my mistake was? Thanks, Eddaido (talk) 21:55, 23 September 2017 (UTC)[reply]

@Eddaido: What's the problem? On 11 August you created a sitelink from en:Shand Mason on English Wikipedia to c:Category:Shand Mason fire engines, a sitelink which seems entirely reasonable, in the process creating the Wikidata item Shand Mason (Q35956189), which is great -- every English wikipedia article ought to have a corresponding Wikidata item.

Today a batch process of mine has added a Commons category (P373) property to the Wikidata item, because sometimes these are easier to deal with for some purposes than sitelinks (and sitelinks can't always be created, eg if the link is already taken by a category). So everything seems fine, just as it should be. Jheald (talk) 22:18, 23 September 2017 (UTC)[reply]

I have added a few more statements to the item here. Nothing too incorrect, I hope. Jheald (talk) 22:43, 23 September 2017 (UTC)[reply]

note about mistake[edit]

https://www.wikidata.org/w/index.php?title=Q11832547&diff=564177610&oldid=527175994 was a mistake and created an invalid link - you may want to check your other related edits Mateusz Konieczny (talk) 15:39, 24 September 2017 (UTC)[reply]

@Mateusz Konieczny: Any idea why the URL formatter isn't linking it as a valid Commons category (P373) ? There seems to be no problem with the sitelink to the same Commons page below. Jheald (talk) 15:44, 24 September 2017 (UTC)[reply]

Oxford biography[edit]

If you can look up the entry for Sir John Maclean, 1st Baronet (Q7527912) I would appreciate it. I would like to update his Wikipedia entry. --RAN (talk) 19:21, 4 December 2017 (UTC)[reply]

Thanks! More info than I expected his entry to have, I mostly had Swedish language sources previously, this is nice. --RAN (talk) 19:50, 4 December 2017 (UTC)[reply]

Q41336172[edit]

In english wikipedia the geonamnes object 7297992 and 2644559 are described in one article. In swedish wikipedia they are described in two diffrent articles. Shouild wikidata follow the english version of wikipedia? If not the 7297992 is what is described in sv:Lewes (parish). Maundwiki (talk) 18:54, 4 January 2018 (UTC)[reply]

@Maundwiki: I tend to follow Commons, which has a very thorough break-down of territorial areas, and the local language wiki. If they both only have one item (even if the Commons one may be primarily for the area, and the Wiki one primarily for the settlement), then I am very very reluctant to have two different principal items here on Wikidata, because that will break the Wiki <-> Commons link.

That's why if I see multiple articles on sv-wiki and ceb-wiki (only), I am now tending to mark the Wikidata item as Wikimedia duplicated page (Q17362920) and concentrate the information on the other item, that is shared between all other languages and Commons. Jheald (talk) 16:29, 6 January 2018 (UTC)[reply]

I am for data in one place however not that it is top down. The outcome is that we must follow who controls wikidata and I have problems with that type of centralized control. Not that I was for creation of seperate articles for PPL and ADM (this case) if there is only one PPL within the ADM. So if we have one wikidata record there should be two articles for that wikidata record. I know it will not work, but in that case the geoname will have to be in more than one wikidata record. Maundwiki (talk) 21:06, 22 June 2018 (UTC)[reply]

National Trust places paintings categories[edit]

I have been thinking about this ever since I realized that Art UK has a pretty decent coverage of what is on the National Trust website for paintings. Ideally it would be nice to have images of ALL National Trust objects, but the Art UK site is a good start. I think we should try to set up a way to cover this on Commons that links to Wikidata so we can use Mike Peel's "Wikidata infobox" on those Commons categories - what do you think? That way we could include the Art UK venue link as well as the NT venue link for the paintings. See e.g. this category I just created (I rounded up the paintings using search as many of them were not categorized in any venue at all): c:Category:Paintings in Ascott House. Do you have a list of these venues anywhere? Jane023 (talk) 10:40, 21 February 2018 (UTC)[reply]

@Jane023: A list like en:List of National Trust properties in England ?

We're currently showing 233 things owned by (P127) the National Trust tinyurl.com/y8fa45ol, most of which are buildings -- don't know how complete that is, and may well include some exterior landscapes, also some paintings. Also 5 for operator (P137) tinyurl.com/y7vr4ax8.

I would think it should be fairly straightforward to get an infobox to show a Art UK venue ID (P1602) link as well as official website (P856).

Be aware that Art UK are very sensitive about the idea of people taking their image files. Metadata they might (or might not) be easier about, but to date I haven't scraped any of their paintings pages for artist or location or collection or accession number information. Jheald (talk) 11:52, 21 February 2018 (UTC)[reply]

Updated version of query with county information, to compare with list from en-wiki; but many of them are missing it: tinyurl.com/yb35worh Jheald (talk) 11:57, 21 February 2018 (UTC)[reply]

Nice list - yes that is exactly what I meant. I have uploaded many ArtUk images and probably because of the weird and conflicting copyright notices, I see that lots of people have uploaded their images as Template:Own work which is ridiculous for PD artworks in my opinion. Since their website has all these venues, I think it would be useful to link the venue to the proper category, but of course we need the categories. Thanks to Geograph we have lots of categories for the venues, but for the Art UK links we need "Paintings in XXX" categories, and these are of course there for the larger and better known semi-museums but much less so for the out-of-the-way places. There is also not a 1-1 relationship between ArtUK and National Trust, but there is a huge overlap. Following up on the museum discussions over at the WikiProject for museums, I am not too sure how to proceed. The NT is the overall owner and that should be the main item and all others should be part of it for their collections, I think. Jane023 (talk) 12:22, 21 February 2018 (UTC)[reply]

@Jane023: Updated query with column for Commons category (P373): tinyurl.com/y7j9mpfm. If any of these don't already have Commons categories, then they damn well ought to. But it may just be that we don't have P373 statements for them yet.

It should be totally uncontroversial to make "Paintings in XXX" sub-categories for these -- though if a place has a value for Art UK venue ID (P1602) then I don't see why not to include in the infobox for the place as a whole: seems an entirely appropriate link to give people.

For indicating where a painting is, this surely is why we (and Art UK) have both location (P276) as well as collection (P195) -- is that not enough to record and then later fish out the ones that ought to be in a particular category? Jheald (talk) 12:41, 21 February 2018 (UTC)[reply]

Yes nice worklist! Of course this is uncontroversial - it just needs to be done and dusted, and I came here for advice on approach for modelling, since you've been so active with the artists side of things. BTW on the copyright side, looks likt Art UK is hiring an expert. So as far as my modelling issue goes, take the example I posted above for Ascott House. Here is an item View of Dordrecht (from the Maas) (Q47510248) for a painting that I have now given the Art Uk as reference for it being in collection Ascott House. But maybe it should be collection NT since the number is not from Ascott House but an NT number? Would be interested to know your thoughts. Jane023 (talk) 12:53, 21 February 2018 (UTC)[reply]

@Jane023: I would use location (P276) = Ascott House, collection (P195) = National Trust. I think that's factually accurate. (And also appears to match Art UK, where "National Trust, Ascott" is linked as a venue, not a collection).

Interesting job ad :-) . But looks like Art UK is hiring a manager rather than a lawyer -- presumably principally to handle permissions and clearances for 2D works still in copyright. Jheald (talk) 13:04, 21 February 2018 (UTC)[reply]

Good point about that subtle "National Trust, Ascott" wording. I think you are right, but I would also like to be able to query these by collection at the location level. We now have large museums split into bequests, but all of those can be "collection=museum" because I like to have the location show up as being the museum. For these it is different because the locations are really very far apart. So for this case maybe we need a "Collections of Ascott House" item that is "part of" NT? Then I can use that item for the collection and make it part of NT. The building itself could also be part of this item, or maybe its own listed building item? There are other job openings at Art UK so I guess their grant came through. Jane023 (talk) 13:21, 21 February 2018 (UTC)[reply]

@Jane023: I really do think that location (P276) = Ascott House, collection (P195) = National Trust is the right way to do this.

This is also I think the way we handle the Tate, which considers itself to have a single collection, displayed over several locations (Tate Modern, Tate Britain, Tate St Ives etc), between which items can sometimes move.

If you want to display the museum as location, when no specific location (P276) is set, surely this is easy enough to do, whether in SPARQL or in a template or in an additional column of your own spreadsheet?

As for querying by location, surely this also is easy enough to do, by adding an option to specify <location> in the relevant template, then if present looking for location (P276) in the relevant SPARQL query? Jheald (talk) 14:14, 21 February 2018 (UTC)[reply]

Yes (I see Maarten has also done this for the Bavarian State collections and Alte Pinakothek). I still would like to link the Art UK venue to the Commons category. So should the venue link appear on the location item then? Or do we still need a collection item for the specific location? Jane023 (talk) 14:56, 21 February 2018 (UTC)[reply]

@Jane023: So at the moment this is what we have tinyurl.com/yaasrc9n Those look pretty reasonable to me.

So I would have Art UK venue ID (P1602) as a statement on the item for the building, wherever possible, not the collection even if the collection has a distinct item.

I would have a location (P276) on the collection item, pointing to the building, if the whole collection is housed in one building.

On a painting item, I would have location (P276) pointing to the building item, whenever the collection is spread over more than one building. I would have collection (P195) pointing to the collection, in this case just "National Trust" for everything. I would be very reluctant to create individual items for sub-collections, just because the collection as a whole is displayed over multiple sites. Jheald (talk) 15:17, 21 February 2018 (UTC)[reply]

OK fine. This sounds reasonable. When done it will be interesting to see how many of these venues have files on Commons. My gut feeling is a good chunk of them. Jane023 (talk) 15:25, 21 February 2018 (UTC)[reply]

@Jane023: That would be good. Great if we could gather them up! There certainly are a lot of venues that do have Commonscats -- though also a lot without a P373 currently tinyurl.com/y7t4s9xa (though I bet a lot of these actually do have categories on Commons). Inevitably, many of the pictures may just be of exteriors; but with luck there should be some that are of the collections too. Jheald (talk) 15:42, 21 February 2018 (UTC)[reply]

I think you would be surprised. There are lots and lots of portrait paintings that have many categories, none of which are in the proper artist or location categories. It is those that I mean to round up and put on Wikidata. Jane023 (talk) 16:04, 21 February 2018 (UTC)[reply]

Broader concept discussion[edit]

@Jheald: apologies for loooong delay in responding to your message but after a hectic couple of weeks I now have a little more time to look into things like this. I confess this isn't something we have come across at the University of Edinburgh... as yet. And I feel the as yet is important to stress. But it is fairly early days in our own forays into formal work with Wikidata so the proposal could very well be pertinent as we move further along (I'd need to re-read through the rather lengthy archive of discussions to date - I see you have moved things on to now having the concept as a qualifier instead). I'll be attempting to move our own work in terms of the Survey of Scottish Witchcraft, the Thesis Collection and (hopefully) more of the Library & University Collections in general in the next few months so I'll be interested to discuss with L&UC colleagues how they view the matter and see how things develop in terms of the creation of the new qualifier. Very best, Stinglehammer (talk) 13:31, 28 February 2018 (UTC)[reply]

Commons category edits[edit]

You probably want to check out Wikidata:WikiProject sum of all paintings/Painters with Commons category no sitelink and it's history when you're done. Multichill (talk) 22:41, 24 March 2018 (UTC)[reply]

@Multichill: Yes, I should pretty much empty that list, unless there are any that have a topic's main category (P910). If there are painters that are left with unlinked Commons galleries, it should be possible to pick them in SQL -- or almost all of them. But over 85% of galleries do have links (all but 17,000 total), compared with 600,000 sitelinks that could be added for categories. Jheald (talk) 23:05, 24 March 2018 (UTC)[reply]

Check out the query, items that already have a link to Commons or have a topic's main category (P910) link are already filtered out. Multichill (talk) 23:08, 24 March 2018 (UTC)[reply]

Ps. You should really use that "show preview" button more often.

@Multichill: I should. :-) Deflate my edit count by a factor of about 5.

So with luck I should completely empty the list; depending on how much falls through the cracks with the LIMIT and OFFSETs I'm using to try to cover the set in multiple bites (fingers crossed for not too much deviation from determinism). Jheald (talk) 23:17, 24 March 2018 (UTC)[reply]

Seemed to have worked. Not sure what your focus area is. I focus on paintings and painters, but the same query for humans seems to complete and gives plenty of suggestions.

Wikidata:WikiProject sum of all paintings/Duplicate Commons category got some extra entries too, usually a bit of a puzzle. Multichill (talk) 11:24, 26 March 2018 (UTC)[reply]

@Multichill: I've also just run a sweep adding P373s to about 60,000 category items that had Commons category sitelinks but no P373s. So some of these new additions to your duplicates list, eg Ball at the Wedding of the Duke of Joyeuse (Q19820066) may be where the item has a topic's main category (P910), and the category item has now acquired a Commons category (P373).

My focus for the painters was really as a proof on concept, to look at systematically adding sitelinks to start bringing down the 600,000 items that could have Commonscat sitelinks but currently don't, in part as a step to being able to run SQL queries on Commons more easily for further categories that could be matched here (or could have Wikidata items created), but so far haven't. User:Mike Peel might take this forward as a bot process; but I'll certainly think about adding some more for human (Q5)s -- it would be clearly advantageous if we could get all Q5s covered.

In the past I've done a bit on Art UK artist ID (P1367) painters, but really all the work that you've inspired makes painter IDs a particularly strong area of Wikidata, so a natural choice to try to improve first.

Other than that, a particular current interest is online thesauruses, and trying to compare their hierarchical structures with ours, to see whether there are items and/or hierarchical links we may be missing. I've been starting with genre/form thesauruses like Library of Congress Genre/Form Terms ID (P4953) and Art & Architecture Thesaurus ID (P1014), first doing some matching with OpenRefine, and it's been striking how many matches I'm finding seem to be to items that have never had any information added at all, other than a sitelink. I think getting our hierarchical subclass of (P279) structures properly in place is something we've perhaps neglected a bit, while building up lots of instance of (P31)s, but of crucial importance to represent what those items are. Commons has particularly strong hierarchical structures, so should be very useful to mine, but it's difficult because it's so difficult to write tools that can access both the item properties here and the category system there, and also because there seems to be no easy way to record in a mass-retrievable machine-interpretable way what Commons categories actually mean. Which is so needed to make Structured Data a success, amongst other things. The more that we can link to here the better, but it's a real limitation not to be able to start describing the rest in their own wikibase-for-Commons items, that would be accessible from WDQS. But it's still worth trying to see what we can do, and more sitelinks will help.

Apart from that, I am also getting close to creating description pages for 30,000 old maps on Commons. (project outline / tranche 1), which is really my main big project at the moment, that the rest is all a bit a diversion from. Current issues of interest -- automatic estimation of projection, scale, and heading from the georeferencing data, quite likely based on [10] though the tool seems to have some glitches; trying to get VIAFs and OCLCs out of the British Library for the books and authors, that could be matched here; and (the big remaining issue) for the maps in the set that already have pages on Commons, how to recognise content (categories, user-added descriptions) that may be worth keeping, when the pages get re-written to use the Map template. Once I can get that sorted, then it should at last be possible to really get going! Jheald (talk) 12:18, 26 March 2018 (UTC)[reply]

Jheald, wikidata items related to wikipedia articles have already the Commons category as property P373; aren't commonswiki link only for wikidata items related to categories (not articles)? -- Blackcat (talk) 22:27, 27 March 2018 (UTC)[reply]

@Blackcat: In a word: No. See here for statistics and historical trends. Jheald (talk) 22:41, 27 March 2018 (UTC)[reply]

Whatever the case is, then, communnication must be clearer. Until a couple of years ago the last say was that article items "commonlinked" only with gallery on commons if existent, and category items commonlinked to the respective Commons category. There must be a guideline that chases any ambiguity away on this topic. -- Blackcat (talk) 07:49, 28 March 2018 (UTC)[reply]

@Blackcat: You might like to look in on Wikidata_talk:Notability#RfC:_Notability_and_Commons which (in part) is considering updating guidance on this point; and also general standards of notability for subjects that have Commons categories.

In practical terms P373 is useful for WDQS queries, and also for linking from Wikipedias. But in the other direction, from Commons, a sitelink is more useful for interwiki, for writing templates, and for SQL queries; and because of their guaranteed 1-to-1 nature. That is what has fuelled the great organic growth in people adding sitelinks. It's true that in 2013 there was a ruling (kind of) that Commons categories should only link to category-items here; but, given the value of sitelinks, Commons people have added them anyway; and there's also come to be an acceptance that it's not desirable to create a category item here just to support a Commonscat sitelink if there's already an article-item that could be linked instead. So de facto the position is now that it's welcome for a Commons category should be linked to an article-item, unless Wikidata has a corresponding category-item. But, as you say, guidance on this could probably benefit from being clearer and more authoritative. Jheald (talk) 08:10, 28 March 2018 (UTC)[reply]

[conflict edition] Indeed, I don't care about notability or less, I was talking about the existence of articles that have NOT their respective category in any Wikimedia chapter but on Commons. In this case, what I have always known is that those articles must have only the property P373 (Category on Commons) filled, with no commonslink. On the opposite side we have items with category: for example Liverpool Football Club's Wikidata item (Q1130849) has no commonslink to the respective category on Commons; you'll find it on the wikidata item for Category:Liverpool F.C. (Q7162712). Now, the question is: shall the commonswiki field be filled with the Commons category even in those case in which the Wikidata item is about an article with no category in whatsoever chapter but Commons? -- Blackcat (talk) 08:13, 28 March 2018 (UTC)[reply]

@Blackcat: If there is a category in another main project other than Commons, so that there is a category-item here, then that category-item should be sitelinked to the Commons category.

If there is no category in another main project other than Commons, so that there is no category-item here, (and there is no gallery on Commons), then a new category-item should not be created here just to sitelink to the Commons category, instead the sitelink should go from the article-item here to the Commons category.

This is what I understand the current community consensus to be; reflected now by over 750,000 sitelinks from article-type items to Commons categories. Jheald (talk) 08:24, 28 March 2018 (UTC)[reply]

I don't know about consensus, but such simple queries return garbage: some commonslinks are categories, and some are galleries. Wikidata is supposed to be structured, so that every property and sitelink has a specific value.. not something that "depends". How would one query only the items that have galleries at Commons? Gikü (talk) 10:34, 28 March 2018 (UTC)[reply]

@Gikü: You can filter out the Commons categories to find only non-categories like this: tinyurl.com/ybgtdo8z

Or (faster) you can look for Commons gallery (P935). Jheald (talk) 10:44, 28 March 2018 (UTC)[reply]

VIAF[edit]

Thanks for your help with the queries. I think it would be good to have VIAF addition automated, for case like this. It is a VIAF item with many links and even a link back to Wikidata. From Wikidata I found it via ISNI -> isni.org -> viaf.org. If a VIAF item has a link to Wikidata and the ISNI in both items match, then the VIAF could be added via an automated process, not? 92.229.165.74 16:32, 19 April 2018 (UTC)[reply]

There are bots that do this, particularly from ULAN; but also ISNI, GND etc. I think User:Magnus Manske did a big bot run in the last couple of months; User:Multichill is also active in this area. Jheald (talk) 16:47, 19 April 2018 (UTC)[reply]

I've added Nationale Thesaurus voor Auteursnamen ID (P1006) to a lot of items based on VIAF (and some sanity checks) and in the past I also did a bit of work on Union List of Artist Names ID (P245). I thought some bot was doing more structural work cross referencing viaf and other sources, but not sure which one. Multichill (talk) 18:26, 19 April 2018 (UTC)[reply]

Excluding properties from query result?[edit]

Hi Jheald. I'm having an issue with running Wikidata:Requests for permissions/Bot/Pi bot 2 in that the version of the query in [11] is returning property IDs such as BLDAM object ID (P2081). That's causing the code to crash at the 'for page in generator' line ("'P2081' is not a valid item page title"), and it's not easy to add some code to avoid that happening. Is there an easy way to exclude properties being returned in the query code? Thanks. Mike Peel (talk) 12:35, 27 April 2018 (UTC)[reply]

@Mike Peel: Try adding the line

MINUS {?item wikibase:directClaim [] } .

to the query, immediately below the line

INCLUDE %cats .

I never considered that people might put a P373 on a property page; but I think this should exclude it. Jheald (talk) 12:58, 27 April 2018 (UTC)[reply]

Thanks, I've added that line and restarted it. It's done ~60,000 so far, so it's maybe 10% of the way through the complete run. Thanks. Mike Peel (talk) 13:04, 27 April 2018 (UTC)[reply]

The tweak seems to be working nicely, thanks! Thinking ahead a step or two with the deployment of Wikidata infoboxes on Commons, the bot's currently walking through the category tree one category at a time, which helps to make sure that it works through the whole of a category at once rather than seeming to be random. However, that also means that it's wasting a lot of time checking each category to see if it has a Wikidata link or if it already has the infobox or an alternative template before adding the infobox. While that's OK to start with, it quickly becomes inefficient for repeat runs. So it then becomes a lot more efficient if I download lists of where the infobox (and the alternative templates) are used and I compare each category against that, and I'll probably implement that soon. But if there's a good way to download a list of all commons sitelinks, that might speed things up even more - I don't suppose you have a query to hand that might be able to provide that list (either in one go or in chunks)? Thanks. Mike Peel (talk) 00:07, 1 May 2018 (UTC)[reply]

@Mike Peel: A list of all the Commons sitelinks is a lot of data -- WDQS can only just about count them within the time. It probably can be done, either from the complete data dump or through the fragments service, but it would be messy to keep up to date.

I would have thought a better approach would be to go through the SQL tables -- a single SQL query ought to be able to return all of the subcategories of a particular category, that have a Wikidata sitelink, but don't have any of a list of templates.

I'm a lot less familiar with using and querying the SQL tables, but let me see if I can knock an example together in Quarry. Jheald (talk) 09:04, 1 May 2018 (UTC)[reply]

So I think you may want something like this: quarry:query/26771, which finds all the sub-categories of c:Category:Hamlets in England by county that have Wikidata sitelinks, but excludes c:Category:Hamlets in County Durham because it has a Wikidata infobox.

More JOINs could be added to exclude other templates.

Disclaimer: my experience with SQL is quite limited, so each time I use it I am very much feeling my way forward -- it's possible there may be some efficiencies that I have missed. Jheald (talk) 10:55, 1 May 2018 (UTC)[reply]

This bot run seems to have finished now, after just over 400,000 edits, amazingly with only a couple of reverts that were due to bad P373 values. I'll keep running it every so often (not sure if daily/weekly/monthly atm) to catch new results. Do you think it would be worth trying any variants of the query to catch other cases? Also, thanks for your advice above, I'll look into the SQL option, I know SQL a lot better than I do SPARQL! Thanks. Mike Peel (talk) 12:26, 11 May 2018 (UTC)[reply]

Hmm, somehow it found another 1,000+ more to edit in a repeat run today, not sure why it didn't get those in the last run-through... Thanks. Mike Peel (talk) 14:22, 11 May 2018 (UTC)[reply]

@Mike Peel: Excellent! Hugely impressed with the rollout of the Wikidata infoboxes over on Commons too -- they're a real positive on a category page.

As to the 1000+ in the repeat run, what surprises me actually is that the number was so low. The SELECT ... LIMIT ... OFFSET statements that the query was using to divide up the P373s are not guaranteed to be deterministic, and even less so when run over a period of time with new P373s being added, so if the first run-through really did hit 400,000 and only miss 1000 or so, that's actually well ahead of what I would have expected.

In time we probably need to look more closely at the P373s that for one reason or another got excluded. But getting lots more Commons infoboxes in place is the next really exciting step -- it will be interesting to see if (or how soon) there's a tipping point, so people get to the point of *expecting* their categories to have a wikidata infobox, actively linking or creating Wikidata items if their category doesn't. Jheald (talk) 16:44, 11 May 2018 (UTC)[reply]

For various reasons, that first run-through was actually about 10 restarts of the code, so that might be why the latest set was smaller. ;-) Now this task is dote, I've started Pi bot running through commons:Category:CommonsRoot, so expect to see a lot more infoboxes on Commons in the next couple of weeks (I'm not sure how long it's going to take the bot to run through every commons category -- or if the raspberry pi has enough memory to store the list of them all!). While that's running I'll look into switching to the SQL selection approach for the regular runs later on. I know a few editors are already actively adding the infobox to new categories they create, although I'm not sure how much that translates to new item/content being added here yet. Thanks. Mike Peel (talk) 17:21, 11 May 2018 (UTC)[reply]

relative position within image (P2677)[edit]

Hello Jheald!

I would like to notice you about a new developpment on Crotos that might interest you. It corresponds to the need you express rightly some time ago: the possibility to find artworks with a depicts (P180) but without relative position within image (P2677).

On this page http://zone47.com/crotos/lab/cropper/p180iiif.php?q=79746 on the top right there is a link to a SparQL to display artworks that has the depicted item but no relative position within image (P2677). And on the SparQL query's results, there are links to IIIF Image Cropper on Crotos to locate the element on the image and then fill in the information on the corresponding wikidata item, which is linked. I haven't communicated on it yet because the IIIF service isn't working well at the moment. I hope the service will soon be fixes in a sustainable way, so that we could play more.

Best regards --Shonagon (talk) 19:25, 17 May 2018 (UTC)[reply]

BL System Numbers ID?[edit]

Thanks for the thorough update concerning book editions and copies.

Applying the principles of FRBR is a good way to approach the ingest of bibliographic records and I think it will work well in Wikidata. The FRBR conceptual model is a bit bewildering in theory because the work seems to be an unnecessary level of abstraction (what is a work without an expression?) but the practical implementation in library catalogues tends to have positive results for browsing and resource discovery. Linked open data is an ideal format for FRBR and it will be interesting to how easy it is to find relevant books using WQS as the number of items increases. The Library of Congress has some really good training resources for FRBR and RDA (Resource Description and Access - cataloguing guidelines based on the principles of FRBR), which contain a lot of useful guidance.

I like the idea of an external identifier property to link the British Library catalogue and I will definitely support such a proposal. You may well already know that there are a few existing properties that provide coverage of BL collections: English Short Title Catalogue ID (P3939) (books printed before 1800 in the English language or in Britain; hosted by the BL), OCLC control number (P243) (BL contributes to Worldcat) and there will be overlap with Library of Congress Control Number (LCCN) (bibliographic) (P1144). Still, there is a great deal of unique material that would be covered by a BL identifier. Another option is a COPAC identifier for coverage of all major UK library collections. It isn't obvious but they do use a unique identifier for each item and it is found in the direct link on the record page e.g. https://copac.jisc.ac.uk/id/36275396?style=html&title=Catalogue%20of%20important%20Western%20and%20Oriental%20manuscripts. When shortened to https://copac.jisc.ac.uk/id/36275396, it links directly to the metadata in XML format. Simon Cobb (Sic19 ; talk page) 22:15, 21 May 2018 (UTC)[reply]

User:Jheald/commons[edit]

Would it be easy to do an update of User:Jheald/commons? It would be interesting to see how the statistics look now, and whether there's still a big gap between numbers of P373 and the sitelinks after the bot work. Thanks. Mike Peel (talk) 11:42, 8 June 2018 (UTC)[reply]

@Mike Peel: Sure. Takes a couple of hours or so to run and collate all the queries. I'd like to clear the decks first with some things arising from the Biodiversity Heritage Library book items I've been working on, and the Wikidata:WikiProject BHL pages I've just set up, but then let me see what I can do. Jheald (talk) 11:48, 8 June 2018 (UTC)[reply]

Thanks. No rush. :-) Mike Peel (talk) 11:51, 8 June 2018 (UTC)[reply]

@Mike Peel: Some updated numbers now at User:Jheald/commons, and also at Wikidata:WikiProject_Commons/Links_and_sitelinks/historical to compare historical trends.

I haven't posted them to Project Chat yet, because I need to stop and think harder if there's another way to get the numbers in the top row (the queries in the method I previously used are timing out). Also I need to think a bit about interpretation of what it all means!

Also, in some cases the numbers aren't quite to the questions one would most want -- eg: how many article items are relying on the topic's main category (P910)/category's main topic (P301) bridge to be connected with CommonsCats (the query I've given doesn't quite give that).

But I'm out of time just right now, so this is what I can do for the present. Jheald (talk) 10:12, 10 June 2018 (UTC)[reply]

Thanks! I'm still digesting this too, but it looks like while this was a good step forward, we still have a long way to go - I hadn't realised that there were 6+ million commons categories! Thanks. Mike Peel (talk) 01:56, 12 June 2018 (UTC)[reply]

Beiträge zur Biologie der Pflanzen (Q14914936)[edit]

I'm sorry, but most of your additions here are not very useful... --Succu (talk) 20:29, 17 June 2018 (UTC)[reply]

@Succu: I can only add the data I can see.

The central BHL title summary file gives dates of 1870-2006 for this publication, as reflected in the section "Publication info: Berlin [etc.]Duncker & Humblot [etc.],1870-2006." on this page for the publication. So that's what I have added for inception (P571) and dissolved, abolished or demolished date (P576).

The information summary for the constituent items gives a date of 1870 for each one, hence the derivation that the earliest date available was 1870, and the latest date available was 1870.

If that is not correct, feel free to fix it. Jheald (talk) 21:01, 17 June 2018 (UTC)[reply]

I think how do you parse and map the information from here is not valid. Descriptions like | Berlin [etc.]Duncker & Humblot [etc.],1870-2006. | New York Botanical Garden, LuEsther T. Mertz Library are not welcomed. --Succu (talk) 21:11, 17 June 2018 (UTC)[reply]

@Succu: I didn't create those descriptions. They were made by Magnus when he created the items. I've merely been adding sourced and referenced information to try and fill the items out. Jheald (talk) 21:15, 17 June 2018 (UTC)[reply]

You made use of them, so you are responsable. Why do we want this. How is this updated if another volume is scanned? --Succu (talk) 21:32, 17 June 2018 (UTC)[reply]

@Succu: That's the point of giving a 'retrieved' date as a reference, to indicate when the information was extracted.

There are various ways to update the information from time to time. To start with, BHL releases a file of information about volumes they have scanned, including a column for the "title" ID of the corresponding series or serial, which they regularly update. So one just needs to look at that file, see what's been added since the last check, and update those items accordingly. As far as I can see, however, this record hasn't been updated since May 2009. In practice I believe BHL very often starts a new title ID when they have a new batch of scans; so it may well be that the information on this record will never have to be updated.

As to why we want this, it is very useful to have an idea what the BHL has or has not got scanned for a particular title. In this case we know that if the date from a reference to this title is 1912, it may well be worth looking in the BHL for a scanned copy; but if it is 1922, then that is not part of what the BHL has scanned.

It's also useful, if BHL has multiple identifiers for the same title, to have an idea of which identifiers cover which date ranges. Jheald (talk) 21:59, 17 June 2018 (UTC)[reply]

About placeholder for "somevalue" (Q53569537)[edit]

It isn't clear for me, what do you mean with "special value <somevalue>"? --ValterVB (talk) 16:39, 22 June 2018 (UTC)[reply]

@ValterVB: As in eg this diff Jheald (talk) 16:43, 22 June 2018 (UTC)[reply]

Then is more correct "unknown value" instead "somevalue", at least in the english User Interface is called in this manner. --ValterVB (talk) 16:47, 22 June 2018 (UTC)[reply]

@ValterVB: Yes, the English user interface has "unknown value", but the developers have always called it "somevalue", and that is its intention -- it avoids questions such as "unknown by who", and makes it clear that the use is intended to encompass cases such as here, where the publisher name is known, but hasn't yet been resolved to a Q-number. Jheald (talk) 16:55, 22 June 2018 (UTC)[reply]

For italian I can use the translation of "unknown value" because developer don't talk in english :) --ValterVB (talk) 16:58, 22 June 2018 (UTC)[reply]

Brazilian pastor a mammalogist?[edit]

https://www.wikidata.org/w/index.php?title=Q10309107&type=revision&diff=694807089&oldid=632945610

Found

José Carlos Nogueira (Q10309107), José Carlos Nogueira (Q50824493)

at https://www.wikidata.org/w/index.php?title=Wikidata:Database_reports/Constraint_violations/P4081&oldid=702014670#"Unique_value"_violations 2.243.118.239 11:08, 25 June 2018 (UTC)[reply]

@2.243.118.239: Evidently not, as they lived & worked about 60 years apart (1920s vs 1980s/90s).

Good catch!

cc also User:Ambrosia10 -- Jheald (talk) 11:19, 25 June 2018 (UTC)[reply]

Category[edit]

There doesn't seem to another category than Commons. Why is this created: Q55243534
--- Jura 14:58, 29 June 2018 (UTC)[reply]

@Jura1: Stucturally necessary to make templates work on Commons, because a gallery is taking the sitelink from the main item. Jheald (talk) 15:52, 29 June 2018 (UTC)[reply]

We corrected that some time ago. These items don't meet our notability guidelines. Please stop creating them. You might be better served by adding the sitelinks directly on the items.
--- Jura 15:55, 29 June 2018 (UTC)[reply]

Jura1 Read what I just wrote. The sitelinks can't be added to the main items because they already have sitelinks linking to Commons galleries. In that circumstance, per this diagram of Multichill's the link has to go to a category item.

Commons template c:Wikidata infobox follows these sitelinks in order to draw data eg for c:Category:Deborah Kerr; and building up these sitelinks to structured descriptions of categories is also what is critically needed, to get those resources in place now for the structured data project on Commons.

The pages are staying because they are needed: we need to connect Commons categories to structured data in a way that can be queried at scale. Jheald (talk) 16:10, 29 June 2018 (UTC)[reply]

I'm not sure if you recently read the notability guidelines. This is explicitly excluded. Please re-read what I just wrote. We already have enough problems with Commons sitelinks. Please avoid adding more (I think most of your other additions are most helpful, btw).
--- Jura 16:16, 29 June 2018 (UTC)[reply]

@Jura1: We need this for the infoboxes, taxo templates, etc, on Commons; and for structured data. That's non-negotiable. If the current wording of the notability guideline is getting in the way of that, then it needs to be fixed. And -- advanced warning to you -- Commons is going to need to roll out category items for intersection categories, so the guidance is going to need to be updated to cope with that too. Jheald (talk) 16:25, 29 June 2018 (UTC)[reply]

@Jura1: If you don't like these, please can you propose a different solution that will let us have the commons sitelinks here? Also see the discussion at Wikidata talk:Notability, which fizzled out without a solution. Thanks. Mike Peel (talk) 16:16, 29 June 2018 (UTC)[reply]

Can we stop this now and check where and why it's suppose to be needed?
--- Jura 12:33, 30 June 2018 (UTC)[reply]

@Jura1: You've been told why it's needed. But for completeness, I have gone through it for you again, at WD:AN. Jheald (talk) 17:15, 30 June 2018 (UTC)[reply]

Commons category linking—have I missed a policy/practice change?[edit]

Has there been a change in the practice that the category link for commons now belongs on the WD topic item rather than the WD category item? I am seeing contributions moving them. Thanks. — billinghurst sDrewth 02:12, 9 July 2018 (UTC)[reply]

@billinghurst: No change that I am aware of. Commons category <-> category links still have advantages if particular wikis (eg Wikisource?) only have the category, not the item; and potentially allow a configurable choice as to which one to follow from Commons if there is a choice.

Perhaps remind User:JotaCartas of the existence of Commons category (P373), which most Wikis will follow for their Commons sitelink, and category's main topic (P301)/topic's main category (P910), which WikiCommons infoboxes and interwiki templates can navigate.

It would of course be good to have this written down as community-approved guidance, that one could simply link to. But that never seems to happen. Perhaps once the latest discussion on category notability gets resolved... Jheald (talk) 08:23, 9 July 2018 (UTC)[reply]

Thanks. This fellow is purposefully moving them from the category items. I have mentioned on their talk, and I was double-checking before I actively resolved. — billinghurst sDrewth 12:54, 9 July 2018 (UTC)[reply]

ScienceSource and P5008[edit]

Hi there – I need ScienceSource (Q55439927) to be added to the "one of" list for on focus list of Wikimedia project (P5008). This is to support the focus list launched yesterday at WD:SSFL, which is now active and raising exclamation marks, if not eyebrows. I'd be grateful for help in fixing the constraint issue. Charles Matthews (talk) 09:28, 11 July 2018 (UTC)[reply]

@Charles Matthews: Done. If you ever need this again, just add another item of property constraint (P2305) qualifier with the Q-number for the list or project, as a qualifier to the property constraint (P2302) = one-of constraint (Q21510859) statement. No preliminary discussion or community authorisation needed -- it's really just a way to keep track of what the property is being used for, so that there's a list that people can easily find. Project looks interesting! Jheald (talk) 11:20, 11 July 2018 (UTC)[reply]

Thanks! Charles Matthews (talk) 11:45, 11 July 2018 (UTC)[reply]

Lsjbot and Wikimedia duplicated page[edit]

If there are 2 articles like Gnosall, shouldn't we just use them and make the distinction, while if there is only 1 Lsjbot article then like Q5169821/Q20989256 should't we just merge them as it doesn't really make sense to split the ceb/sv from the other articles @Kelly:. Lucywood (talk) 20:19, 15 July 2018 (UTC)[reply]

@Lucywood: Happy to see them merged if we can (Q5169821/Q20989256). But as for the other, if ceb-wiki is the only wiki to make the distinctions, then personally I'd be rather more disposed to try to preserve the wikilink between en-wiki and Commons (and in the process permit a rather more meaningful infobox on the Commons category), than accommodating a rogue page on a wiki that's going to have a readership of about one per year. So cut the the ceb-wiki loose, and make it an instance of Wikimedia duplicated page (Q17362920), that's what I'd say. Jheald (talk) 20:33, 15 July 2018 (UTC)[reply]

I'll leave Q5169821/Q20989256 and similar for a few days to see if Kelly objects but I'd point out that settlements and administrative units are still different things and I'm not sure if "Wikimedia duplicated page" is a good tag (that has been pointed out by Lsj), maybe something like "excessive distinction page" would work better? I'd note that any WP article means we have to have an item here, while that's not the case for Commons categories. Lucywood (talk) 11:21, 16 July 2018 (UTC)[reply]

@Lucywood: Regarding Commons categories, I suspect we are moving towards having an item for every Commons cat (and need to) -- see eg current discussion at Wikidata talk:Notability. On the other hand, on Wikidata:Project Chat there's talk of suspending the presumption of notability for wikis with a lot of Lsj articles.

I do think it may be quite a useful test, to look to see whether Commons has a separate 'village' category for the placename, when considering whether to maintain or split an item here.

But I'm happy enough to be over-ruled, if people think it's the right thing to do. The important thing is to come up with a line everyone feels they can live with, so there's no danger of items being flip-flopped backwards and forwards. If necessary, a category could be created on Commons for the village, to preserve Commons <--> wiki linkage. (Though populating it and keeping it maintained may be more difficult).

I haven't done much in this area in about the last year, but it looks like there are currently about 1950 pairs of items (tinyurl.com/y9a2fmqv) modelled similarly to Abbotsbury (Q306685) / Abbotsbury (Q24665923), with the two connected by said to be the same as (P460) and the second designated an instance of (P31) Wikimedia duplicated page (Q17362920). Jheald (talk) 12:28, 16 July 2018 (UTC)[reply]

I though the recent proposals are mainly for situations where there is a gallery taking the main item. Yes I suppose removing presumed notability for Lsj could work but might well cause problems as well.

Very few are split, probably making it too small to be much of a consideration.

I'm not sure what the best option is but I just thought, let Ljsbot decide if we have 2, statements can easily be added for both. I don't really have any strong views either way but I do think that WD can be more specific with this kind of thing. Lucywood (talk) 19:52, 16 July 2018 (UTC)[reply]

@Lucywood: ^^ Lsjbot has separate parish pages for almost 2000 items that are currently merged. (Plus more that may have been created recently). I'm not sure that that is "very few". Yes WD can be more specific, and maybe it should be. But do we want to break all the enwiki <--> Commons links? Besides, those en-wiki articles almost all do combine both; not entirely clear whether it's fair to say they're 'primarily' about the village. But if anyone feels like making and populating those 2000 new categories on Commons for the villages, then I won't stand in their way. Jheald (talk) 20:01, 16 July 2018 (UTC)[reply]

Having 2 items here wouldn't break the links to en much, though they do include a lot of content for the unit as well. I have started Wikidata:Property proposal/Unusually granulated item. Lucywood (talk) 20:29, 16 July 2018 (UTC)[reply]

@Lucywood: ?? Having two items that don't (can't) have the same sitelinks breaks the sitelink to/from Commons 100%. Jheald (talk) 20:31, 16 July 2018 (UTC)[reply]

That would only affect the "extra" item, not the "main" item. Lucywood (talk) 20:34, 16 July 2018 (UTC)[reply]

@Lucywood: Okay, I may not be understanding you correctly. What I thought you were suggesting was taking the current item (instance of (P31) village (Q532) & civil parish (Q1115575) and sitelinked to both en-wiki and Commons, and making it instance of (P31) village (Q532) only and sitelinked to en-wiki, while the Lsjbot item would take over the role of instance of (P31) civil parish (Q1115575) and be sitelinked to Commons; so that the Commons category and the en-wiki article would then no longer be linked together. But in fact you're suggesting something different? Jheald (talk) 20:40, 16 July 2018 (UTC)[reply]

@Lucywood: PS. I have edited your property proposal to indicate the item that (if I understood correctly) you would intend the new property to sit on. Not sure if I did understand it correctly, so you should probably check. Wasn't 100% sure whether what you really wanted to propose was a new property, or whether it was a new class, that the Lsjbot items would be instance of (P31). So hope I got this right. Jheald (talk) 20:48, 16 July 2018 (UTC)[reply]

Yes I think you misunderstood, to clarify the "main" item would be for example Abbotsbury (Q306685) and have it marked as a village and have statements for the settlement, this item would contain all sitelinks that don't make a distinction (including Commons) while Abbotsbury (Q24665923) would only contain the ceb/sv sitelinks, but if a split occurs at Commons or another project, that new page could also be linked. Note even if there is only 1 Commons category, the "extra" item could still contain the Commons category (P373). Lucywood (talk) 20:58, 16 July 2018 (UTC)[reply]

@Lucywood: But would Abbotsbury (Q306685) continue to be instance of (P31) civil parish (Q1115575) ? Or would it be located in the administrative territorial entity (P131) Abbotsbury (Q24665923), and that be the civil parish? Which item would eg the GSS code (2011) (P836) be on? Or the "civil parish" GeoNames ID (P1566)? Etc, etc. Jheald (talk) 21:24, 16 July 2018 (UTC)[reply]

Abbotsbury (Q306685) would only be a village (Q532) and be located in the administrative territorial entity (P131) Abbotsbury (Q24665923). The GSS code (2011) (P836) would be on the parish item, the GeoNames item for the parish would be moved to Abbotsbury (Q24665923), along with Vision of Britain unit ID (P3615) but Vision of Britain place ID (P3616) would be on the village item. However if we were to use the "unusually granulated item" then all the statments would be on the "village" item (similar to how the "duplicate" items are now). Lucywood (talk) 09:55, 17 July 2018 (UTC)[reply]

I have merged the South Kesteven parishes with no article conflicts. Lucywood (talk) 11:40, 2 August 2018 (UTC)[reply]

P373 values that aren't being caught by pi bot's query[edit]

I've started noticing a few cases like John Rankin House (Q14706682), where there was a P373 value but the query that pi bot runs wasn't returning the ID. Wikimania Hackathon 2018 (Q55606654) was another recent case. Any idea what might be happening there? (I've added the sitelink using another script now.) Thanks. Mike Peel (talk) 21:12, 25 July 2018 (UTC)[reply]

@Mike Peel: This is the script that steps through all items with a P373, and adds a Commons sitelink if possible, if various conditions are met (ie no other contenders for the sitelink) ?

If I remember correctly, you're doing a new sweep through all the P373s about once a day.

If that's right, then the most likely issue is that the way WDQS returns the list of P373s isn't 100% deterministic and consistent. So an item with a P373 might be missed by one slice, but not necessarily appear in the next. It's a pain, but unless somebody can write a tighter query, it was the only way I could see to be able to get something more-or-less workable within the timeout constraint.

With enough sweeps, one would think an item shouldn't consistently dodge all of them (unless something very unlucky is conspiring to happen), so one would think even the stragglers ought to be picked up after the second or third or fourth run-through.

(Added) But the P373 on John Rankin House (Q14706682) was added back in May 2016, so that would have been a huge number of sweeps by now that it would have consistently missed... Very strange.

The only other thing I can think of is that the P373 statement might have registered on one WDQS server, but another might have missed the update (if that server was under a lot of pressure at the time).

Or it might be something I've completely not thought of.

How are you spotting these items that appear to be being missed? Jheald (talk) 21:26, 25 July 2018 (UTC)[reply]

Yup, it's the one in the script at [12], which runs daily. These seem to be stragglers that have missed many runs (~100 in some cases). I have a new script that runs through a specified commons category tree looking for items without Wikidata items linked to them, and then searches Wikidata for the category name to find potential matches - it's maybe 70% accurate at the moment, so I'm running it manually (currently through commons:Category:Long Island, see my latest edits from this account). Most don't have existing P373 links, though. Thanks. Mike Peel (talk) 21:31, 25 July 2018 (UTC)[reply]

Hmm, I added a check to see if the image (P18) is in the commons category, which seems to make it ~100% accurate, so maybe this new script should be botified... Thanks. Mike Peel (talk) 22:37, 25 July 2018 (UTC)[reply]

@Mike Peel: Nice! Go for it. Jheald (talk) 22:39, 25 July 2018 (UTC)[reply]

It's now at Wikidata:Requests for permissions/Bot/Pi bot 8! Thanks. Mike Peel (talk) 22:53, 25 July 2018 (UTC)[reply]

This is odd ... pi bot's run out of places to add the infobox! It's finally finished looking through all of the cases where we have commons sitelinks (at least, as of the quarry result from yesterday - I've added a few more since) to add them where it can. I can now (manually) fetch a new list from quarry every so often to catch the latest additions and places where things have changed in the category that might allow the infobox to be added, but that's going to be a lot less frequent than running the bot 24/7! Thanks. Mike Peel (talk) 09:44, 10 August 2018 (UTC)[reply]

@Mike Peel: Yea! That's a fantastic milestone. Looks like Pi bot 8 is still finding some new sitelinks to make, so things are still moving. It might be interesting to produce a breakdown of numbers -- ie number of items with Commonscat sitelinks; number with infoboxes; and then the numbers for each reason a Commonscat with a sitelink doesn't have an infobox, to give an idea of how the numbers fall, and whether everything is accounted for. The focus now I guess moves to creating Wikidata items for Commons categories without sitelinks -- ie looking at the Commons categories for artists or engravers or cartographers, or listed buildings, or whatever, can we identify ones don't have Wikidata items, that should. Also the time may soon have come to start creating systematically items for categories of the form "X by Y", and its intersection subcategories -- eg "cartographers by country", "cartographers from Russia" etc., with appropriate category combines topics (P971) statements. For example, for the "Old maps of... " categories, it would be *exceedingly* useful to be query for which places had an old maps category and which didn't, if one had used eg OpenRefine to match a list of map subjects to a list of places. But perhaps that is something that will need to be eased into gently, use-case by use-case. Jheald (talk) 10:51, 10 August 2018 (UTC)[reply]

There are definitely more sitelinks that can be added ... as well as pi bot 8 running automatically, I've also been running the script manually without the image requirement, and that's also finding quite a few cases (but at the ~70% accuracy level) that I've been adding with my user account. But after that, starting to add new wikidata items will definitely be the way to go - as you say, people and monuments are good ones to start with, as intersection categories are going to be a lot more controversial... Thanks. Mike Peel (talk) 14:05, 10 August 2018 (UTC)[reply]

I found some more sitelinks to be added by removing "MINUS {?item wdt:P910 [] }" from the P373 query and adding some extra Python code to add the sitelink to the category item rather than the topic item - Pi bot's added ~700 of these so far and spot-checks seem to be OK, so I'll leave it running overnight. Relaxing the query a bit more may give us some extra sitelinks that can be added (and maybe in the long term it would be best to deprecate P373 in favour of the sitelinks...) Thanks. Mike Peel (talk) 00:26, 15 August 2018 (UTC)[reply]

Eep![edit]

I just realized that my comment on the Fashion project Talk page might have come off as a slap at you. It was not!!! You have been more helpful than anyone with my sticky little problems. My frustrations are focused elsewhere. - PKM (talk) 20:30, 8 August 2018 (UTC)[reply]

I didn't pick up on any frustration directed at anyone, just an offering up of a particularly knotty question to the community for thought and comments, so no worries at all from this end. Jheald (talk) 21:16, 9 August 2018 (UTC)[reply]

Q21385082[edit]

Hi! Something is wrong with this item. The journal is replaced by itself. Regards --Succu (talk) 10:19, 26 August 2018 (UTC)[reply]

@Succu: Thanks, good catch.

The question here is going to be whether we want one item or two for Stuttgarter Beiträge zur Naturkunde (< 1957 - c.1970) and Stuttgarter Beiträge zur Naturkunde. Serie A: Biologie (c. 1973 - 1999> ).

User:Pigsonthewing originally attached the BHL scans for both of the above to Q21385082 (diff); but probably we want a separate item for Serie A, in the years after the journal split into parts A, B, and C.

One question I never know the answer to is what to put for the end date for the original undivided journal in cases such as these. Do we consider that it ceased in c.1970, to be replaced by the three sub-journals? Or do we consider that it continued, with Serie A, B and C as parts of it?

It would be nice if the style guide at the Periodicals project could give a bit of guidance on questions such as this. Jheald (talk) 17:55, 26 August 2018 (UTC)[reply]

From German National Library (Q27302): Stuttgarter Beiträge zur Naturkunde (1961-1972) replaced by Stuttgarter Beiträge zur Naturkunde. A, Biologie (1973-) and Stuttgarter Beiträge zur Naturkunde. Serie B, Geologie und Paläontologie (1972-2007). Stuttgarter Beiträge zur Naturkunde. Serie C, Wissen für alle (1974-) is a new one, not a split. Hope that helps. --Succu (talk) 18:31, 26 August 2018 (UTC)[reply]

And there is Stuttgarter Beiträge zur Naturkunde aus dem Staatlichen Museum für Naturkunde in Stuttgart (1957-1972) replaced by A/B/C... --Succu (talk) 18:43, 26 August 2018 (UTC)[reply]

Cambridge Wikidata Workshop 20 October[edit]

I mailed you an invitation just now, but the address bounced. Charles Matthews (talk) 14:48, 25 September 2018 (UTC)[reply]

Game[edit]

Feel like playing a game? [13] now has 'Commons category matches' based on suggestions from pi bot using the code we talked about in #P373 values that aren't being caught by pi bot's query (but without image matching). I'll announce it more widely soon, but thought you might like a preview / to do some testing. Thanks. Mike Peel (talk) 18:08, 5 November 2018 (UTC)[reply]

@Mike Peel: Thanks for the invite! My time is quite limited at the moment -- two rather big time-consuming things IRL, both having to be dealt with this week. But I'll try and take a look. I'm always a bit apprehensive of these games -- my fear (after clearing up a lot of bad results from Magnus's 'proposed merge' game) of some is that players can be a bit casual about matches, or be simply unaware / not wary of how many very similarly named but actually different things the game may throw at them. For example, the first match it's offering me is St. Bernard's Chapel (Q7587280) (Building in Patterson, United States of America) vs c:Category:St. Bernard's Chapel (Heiligenkreuzerhof). I fear that far too many people might just reflexively tick 'yes' and go straight to the next, based just on the similarity of names; even though presumbly unless the names were very very similar, the match wouldn't have been offered. So IMO before starting a game like this, people need to be very strongly schooled to approach potential matches from a position of extreme scepticism. The matches will look plausible, or they wouldn't be offered. The aim of the game is not to tick yes. Rather, it is to identify which potential matches need to be rejected. This I think may need to be quite strongly belaboured, because IMO it may be quite a distance from the default mind-set with which people may approach games. People like to say "yes", and probably feel that it is every "yes" that is helping Wikidata. But in reality, the costs of a false-positive are far worse than a false-negative. False-positives that get into Wikidata from a game like this can be very insidious, and can be a real pain to fix (even if perhaps in this case not quite as bad to fix as having to manually unmerge and separate out statements on wrongly-merged items, which can be an absolutely monumental pain).

So my instant reaction looking at this is to ask: have you done absolutely everything you can to screen out bad matches before they can get offered? Eg for geographical items, are there co-ordinates you can perhaps check against a bounding-box of a super-category? Anything that can be done to stop items being matched from different counties or even different countries is worth doing.

And to re-iterate my other key request: please try to instill a default standpoint of extreme scepticism when judging potential matches, in any player of the game. The request should be: "These matches look plausible. But are they really? Please help us to reject the bad ones" -- not "please help us to find the good ones". Jheald (talk) 18:37, 5 November 2018 (UTC)[reply]

Thanks for the feedback. I've modified the description to "Match Commons categories with Wikidata items, and add the commons sitelink to Wikidata.
These matches look plausible. But are they really? Please help us to reject the bad ones by clicking "No" - and if you are sure that it is right, add the link to Wikidata using "Match". If you are not sure, press "Skip".
Bug reports and feedback should be sent to <a href="https://commons.wikimedia.org/wiki/User_talk:Mike_Peel">Mike Peel</a>." - it looks like it might take a short time to show up, though.

So far about 70% of the matches have been rejected, which isn't quite as good as I was hoping, but shows that people (or testers at least) aren't just clicking 'yes' blindly. By their very nature, these are matches that I'm not 100% sure of - otherwise I'd get pi bot to add them automatically. They are ones that need human review, and this seems like a good way of making that easier for people to do, and to get more people doing it. The infobox should make it easier-than-usual for people to spot false matches later on (as people have been spotting the bad bot-added ones). So let's see how it goes. Thanks. Mike Peel (talk) 22:46, 5 November 2018 (UTC)[reply]

@Mike Peel: Just wanted to say that I had another play with this, having seen it in the weekly news, and really enjoyed it. Well done! It's very slick, and I really like all the further links for investigation of 50/50 cases. I hope the accuracy is good, and people get used to pressing the 'No' button. One thing that might be interesting would be to re-offer a proportion to other players, and see how often they agree (and whether that is the same across different sorts of objects, or whether some are more likely to get disagreeing matches). There may still be more that could be done towards auto-matching, eg: species with exactly matching name probably good; geo-locations from different parts of America probably bad; ships compared to rivers probably not a match. But overall, looking really really good (and quite addictive!). Great stuff! Jheald (talk) 22:26, 12 November 2018 (UTC)[reply]

Belated thanks for this! I'm glad you like it. There's quite a backlog (over 30,000 links to be checked right now, with many more still to add to the database), so I'm reticent to re-offer past candidate links to players (particularly as the presence of an existing link is quite obvious through the presence of the wikidata infobox). The ultimate test is whether they later get reverted or changed, and I'm keeping a record of the decisions so that can be checked later if we want. I'm kinda tempted to write some bot code that will automatically add species with exactly matching names at some point, as those are nearly always a match - but the other cases are much more tricky to handle in code, and it's probably heading towards the machine learning arena...

BTW, the next evolution of pi bot will probably be to start bot-creating new Wikidata entries for obvious candidates, where all of the search candidates have been ruled out through the game. That way, there's a high probability that it's not creating duplicates - the main question then is going to be about notability, which ideally the presence of the commons category would resolve. Thanks. Mike Peel (talk) 23:54, 23 November 2018 (UTC)[reply]

Version 1 of that next evolution is now at Wikidata:Requests for permissions/Bot/Pi bot 10. Depending on how that goes, I'll do something similar for categories with authority control IDs. Thanks. Mike Peel (talk) 22:13, 1 December 2018 (UTC)[reply]

GIGO?[edit]

This. -- Tuvalkin (talk) 07:30, 6 December 2018 (UTC)[reply]

@Tuvalkin: Indeed. The underlying problem stemmed from the en-wiki article Federico Lacroze, which includes a CommonsCat template pointing to c:Category:Tramway Rural, a project that Lacroze was particularly associated with.

On the basis of this User:EdgarsBot added a Commons category (P373) to the Wikidata item in September 2016, and given that P373 I then added a sitelink in March this year (which User:PiBot re-added as soon as you removed it, and would have kept re-adding until the P373 was removed). On the basis of the sitelink, PiBot also added an infobox to the Commons category.

I have created a new Wikidata item for Tramway Rural (Buenos Aires) (Q59490136) (though perhaps it should be merged with Ferrocarril Central de Buenos Aires (Q5445404)), and a new Commons category c:Category:Federico Lacroze, which should sort out most of the issues. Jheald (talk) 11:13, 6 December 2018 (UTC)[reply]

I understand what happened and why it did. That’s why I said GIGO. To the outside visitor, Wikidata adds a veneer of respectability and truthiness to the messy, human reality of Wikipedia and its sister projects, with this kind of dismal result. Another example, the one I caught today: A commune/village in Portugal gets assigned to the wrong municipality because some algorithm decided that it was cuter to trust the French and Dutch wikipedias about it instead of the Portuguese one. Then another user noticed it and tried to fix the mess but unsurprisingly your interface (which is something completely different to an experienced wiki user) was too frustrating and he just changed the description.

And the problem is this — times thousands of such cases. It could be though something acceptable for the typical Wikimedia projects user, used as we are to have sister projects of less quality (Wikivoyage, Wikiversity, Wikinews a.s.o.), but Wikidata gets no slack: 1., It’s obviously so overfunded and overhyped that utmost perfection is expected and demanded (no harping about how Wikidata is just a sister project like any other, please, because you’re not — never heard of Wikisource office hours nor of Wikitionary tech bulletins delivered weekly), and 2., it’s been allowed to have its data transcluded in other projects, often to the expense of local data curation mechanisms of much higher quality — becoming an unwelcome and problematic conduit for all kinds of spam, vandalism, and (as is the case here) gross incompetence. And as time goes by it becomes clear that this is not a bug, but a feature.

What I want for Christmas? I want Wikidata gone. Tuvalkin (talk) 00:54, 9 December 2018 (UTC)[reply]

re: Merging items and category pages[edit]

Hi :-) thank you for correcting and for noticing. I know the rules about merging items of categories and articles, that was a mistake (maybe that's because I saw that the only link was a Commons category, which of course is something special and can appear in article items). I'll go check if I made some similar mistakes today. Bye! --Superchilum^{(talk to me!)} 15:18, 7 December 2018 (UTC) p.s.: your correction was not enough, you had to rollback my edits ;-)[reply]

@Superchilum: No, the merge was okay I think, because the only sitelinked category was on Commons. It just needed a little more cleaning up after the merge. Jheald (talk) 15:26, 7 December 2018 (UTC)[reply]

Oh, ok... you think that was the best way? --Superchilum^{(talk to me!)} 10:15, 8 December 2018 (UTC)[reply]

Barnstar[edit]

		The Wikidata Barnstar
		Thank you for helping link Wikidata and Commons together! Mike Peel (talk) 12:52, 21 February 2019 (UTC)[reply]

OSM zoom level[edit]

Hi Jheald! I noticed OpenStreetMap zoom level (P6592) hasn't yet been used, and I was wondering if you have plans to start using it, since I'm excited to see this being applied to items :) NMaia (talk) 01:42, 29 March 2019 (UTC)[reply]

@NMaia: Hi! I'm working towards an upload of maps to Commons with items on Wikidata, which I hope to start creating in about mid-May. OpenStreetMap zoom level (P6592) will be one of the fields I will be adding.

In the meantime, P6592 could be calculated for any item for a map that we have a bounding box for -- though at the moment there are only 46. Bounding boxes plotted here: tinyurl.com/yyprjx9x Jheald (talk) 09:56, 29 March 2019 (UTC)[reply]

Just an FYI, this might interest you as well. NMaia (talk) 07:28, 10 April 2019 (UTC)[reply]

Reminder: Do not add unreferenced statements to Wikidata[edit]

Hi Jheald,

Please bear in mind that statements you add to Wikidata need, if challenged, be supported by a reference. See Help:Sources about ways to do that. Please avoid adding statements to Wikidata as a mere service to an acquittance of yours. It would be highly detrimental to the overall project if we would go there.

If you are unsure if a statement you are trying to reference is supported by the reference you add, please discuss it first. --- Jura 15:55, 7 April 2019 (UTC)[reply]

@Jura1: Enough of this. This has to stop.

You took this to Wikidata:Requests_for_deletions/Archive/2018/08/06#Q37822184_etc.. The decision unanimously went against you, 4-1. The community did not delete the item, but ruled that it was acceptable, to model membership of the Riigikogu in this way. You asked for a decision; please now respect it.

Whatever you may feel about the item, edit-warring is never the right way to go. Nor is low-level disruption. If you don't agree with the data modelling, take it to the talk page. If you still don't agree, take it to a community page. But do not repeatedly try to break the model by pushing edits on the data page. It is especially unacceptable to create wide-scale constraint violations to push your beef about the data modelling.

There are other ways to go, if you want the community to take a proper overall look at this. But this kind of ongoing low-level disruption at a data level is not acceptable, and edit-warring of the kind you have been engaged in on member of the Riigikogu (Q33129158) is absolutely unacceptable.

Consider yourself warned. Jheald (talk) 16:18, 7 April 2019 (UTC)[reply]

Please refrain from personal attacks merely because you don't want or can't provide references for your edits. Considered yourself warned as well. --- Jura 16:21, 7 April 2019 (UTC)[reply]

Wrong British Library system number (P5199) in edition number (P393) reference[edit]

It seems the value for British Library system number (P5199) in the statement at Q63315051#P393 is wrong. Letting you know in case you want to investigate the source and extent of the error. Cheers, --Marsupium (talk) 09:40, 16 May 2019 (UTC)[reply]

@Marsupium: Thanks, that's a really useful catch. I'll look into it, and see whether much else is affected. AT least the actual text extracted seems to be the right one, so that's something. Jheald (talk) 10:36, 16 May 2019 (UTC)[reply]

Seems 28 items were affected, all on edition number (P393) statements (https://w.wiki/42R), in a batch of edits I made on 11 May using wikidatajs/cli for the first time. So an error I think in the particular script that was preparing those edits.

@Marsupium: I'm incredibly grateful, but how on earth did you spot it? Jheald (talk) 10:55, 16 May 2019 (UTC)[reply]

I'm happy it was useful to you, I wanted to jump to the BL's record and took the first link on my screen and it brought me to an unexpected record :-) Best, --Marsupium (talk) 10:59, 16 May 2019 (UTC)[reply]

Reminder: discuss your edits[edit]

Hi Jheald,

If you are not interested in participating in discussing your edits on property constraints, it's probably preferable if you don't edit them. Your conduct on Property:P373 is not helpful. If you are unsure about the way constraints work, don't hesitate to ask. --- Jura 11:14, 27 July 2019 (UTC)[reply]

Quarry oddity[edit]

Tracked in Phabricator
Task T233520

Hi, hope everything's well with you? Am I right in remembering that you came up with the original Quarry query code that I'm using at [14]? I've spotted some Commons categories with Wikidata links but no infoboxes, and it looks like the query isn't finding them. Some examples: commons:Category:Broadway_East,_Baltimore, commons:Category:Buddhist_temples_in_Lamphun_Province, commons:Category:Civil_law_notaries, commons:Category:Climate_change_conferences, commons:Category:Former_components_of_the_Dow_Jones_Industrial_Average, commons:Category:Dukes_of_the_Archipelago, commons:Category:Eastern_Catholic_orders_and_societies, commons:Category:English_people_of_Turkish_descent. They seem to be fine in the Python code, e.g., the bot added commons:Category:Buddhist_temples_in_Ubon_Ratchathani_Province when I manually pointed it to that category, so the problem seems to be that the query doesn't return them. Any ideas how it could be tweaked to catch these cases? Thanks. Mike Peel (talk) 09:10, 22 September 2019 (UTC)[reply]

@Mike Peel: Hi Mike, here are attempts a couple of test queries to try to find out what's going on: quarry:query/39085 vs control quarry:query/39088.

Unfortunately they don't seem to be completing (not sure why, but perhaps something to those LEFT JOINs are more costly than I understood).

But when I did get an earlier version to run (just joining one of the templates), it seemed that the categories above are not showing a Wikidata sitelink in the page_props. Which is odd, because they are showing a Wikidata sitelink, and interwiki links, when the Commons category page itself is displayed. But perhaps this is drawn from a different table (?).

It therefore looks to me as if the SQL updater may have failed to add the sitelink to the page_props table in the cases above. That's something that would need some detailed investigation by the service maintainers to look into how widespread the dropped updates may be, and whether there seem to be any patterns or explanations for them, or any other clues to stop them happening in future. But probably this is one for a phabricator ticket, or to be flagged to the dev team. Jheald (talk) 11:52, 22 September 2019 (UTC)[reply]

@Mike Peel: Confirmed by Lucas: [15]. Jheald (talk) 12:38, 22 September 2019 (UTC)[reply]

Thanks for looking into it. I've posted it on Phabricator at [16]. Thanks. Mike Peel (talk) 13:09, 22 September 2019 (UTC)[reply]

New page for catalogues[edit]

Hi, I created a new page where I started collecting sites that could be added to Mix'n'match and I plan to expand it with the ones that already have scrapers by category. Feel free to use, expand. Best, --Adam Harangozó (talk) 09:53, 17 October 2019 (UTC)[reply]

Writing systems[edit]

A lot of the strange results were because "partitur (musical score)" was a subclass of writing system (that's how film scores got there). I've removed that, but the current Query lag means I can't really see how many of the bogus items will disappear from your list. I'm working on a few other things (fonts) as well. - PKM (talk) 21:39, 13 November 2019 (UTC)[reply]

All of the Greek manuscripts were changed from <instance of> "uncial script" to <writing system> "Uncial script", so the list looks much much better now. I'm not sure typefaces belong here (via <subclass of> typeface (Q65770200) per TDKIV Czech Terminology Database of Library and Information Science (Q11284309)) - we ought to be consistent one way or the other but we're not there yet. - PKM (talk) 01:47, 14 November 2019 (UTC)[reply]

@PKM: Thank you so much for digging in to this -- once again, you are utterly great!

Yes: there would seem to be a distinction to be made between "writing system" (eg particular set of symbols/characters), and presentation of that writing system (eg uncial script (Q784235) or secretary hand (Q16933853) or blackletter (Q213686), also typefaces like Arial or different versions of Fraktur (Q148443)). A guide for the line might be whether Unicode encodes the presentation as a separate script, or as a class of fonts of a particular script. (eg en:Fraktur says Unicode encodes that as a class of fonts. Yet we may still want to make uncial script (Q784235) a value of writing system (P282). So do we need a subclass of writing system (Q8192) to group together distinct sets of symbols/characters, that are more than just a particular way of writing a particular set of characters? Or perhaps classes of ways of writing a particular set of symbols/characters should be grouped together, that could then be MINUS'd out of a query for writing system, if one wanted just different sets of characters? Hmmm... What are the things gathered at Atlas of Endangered Alphabets (Q74568826), and how do we class that group? Jheald (talk) 10:29, 14 November 2019 (UTC)[reply]

It was pretty easy for me to pick this up, as I did a bunch of work on medieval scripts and related items back in early October, so I was familiar with the territory. It doesn't help that AAT distinguishes between "alphabets" and "writing systems", but says "writing systems" <meaning overlaps with> "scripts". Approaching the vexed issue of typeface classification from a Unicode perspective sounds like a great idea, and it never would have occurred to me. :-)

This would be a fun project to work on if we can find a few people who have time/interest to actively participate in discussions of best practices. (What's your time like these days?) I'd be happy to start a project structure if I wouldn't be talking to myself in the dark. - PKM (talk) 19:53, 14 November 2019 (UTC)[reply]

located in the administrative territorial entity (P131) for Scotland[edit]

Hello! I noticed that you added some statements to located in the administrative territorial entity (P131) of Scottish localities like Dumfries (Q652035) or Lockerbie (Q216045). Since we only need to add the most local admin territory here we should use start time (P580) and end time (P582) to indicate when this entities started and ended up being the most local admin territory for locality. According to Wikipedia, Scottish civil parish (Q5124673) local government functions were abolished at May 16, 1930 so they are not administrative territorial entity (Q56061) from this date. Can you add end time (P582) for all this cases? And as statistical territorial entity (Q15042037) it should go to part of (P361) too as we do it for other European statistical territorial entity (Q15042037) for lack of more special property for it. By the way, I find it unnecessary to use object has role (P3831) for located in the administrative territorial entity (P131). It will be usefull for part of (P361) where we can indicate statistical territorial entity (Q15042037), military district (Q580112) or electoral unit (Q192611) but at located in the administrative territorial entity (P131) it's just doubling of instance of (P31) for statements. Сидик из ПТУ (talk) 17:33, 23 November 2019 (UTC)[reply]

Hi User:Сидик из ПТУ. Thanks for getting in touch. Very happy to have another think about this. But given this data modelling is currently used on 70,000 items with a Historic Environment Scotland ID (P709), plus a large number of settlements and natural features as well, can I ask you whether the present modelling is causing you any *urgent* or *practical* problems in the form it is in at the moment? I currently have a very full pipeline of edits already to put through QuickStatements, so I would prefer if we could think about this a bit first, and be sure we get it right. Also pinging @Tagishsimon: and @Andrew Gray: for their thoughts, since they've respectively been doing a lot of work in this area, and have a good familiarity with the historical geography of Scotland.

In principle I am not opposed to adding end time (P582) = 1930 to the P131 statements; though it does seem a little odd, because the civil parishes did go on existing after that point (and indeed do to the current day), both in the public mind and local organisations, and as an organisational unit for records, eg for things like local tax valuations, and (perhaps most immediately relevantly for us) in the recording of things like historic buildings and monuments -- because (right until very recently, and even then not consistently or uniformly), nothing really ever replaced them at that level of locality. So for example the official sites like [17] or [18] or [19] describing eg Gairloch, Charlestown, Charlestown House (Q17828775) all note the civil parish, and the information is used as the basis of our categorisation on Commons for images coming in from eg Wiki Loves Monuments, as well as for other sources such as Geograph; as well as for the system of pages like en:List of listed buildings in Gairloch, Highland on en-wiki. All this remains the case even for buildings and monuments constructed way after 1930, or even in this century, because nothing so systematic has ever replaced the civil parishes at this level of localness.

So what do we do, if we have eg a dam constructed in 1960, and we want to record its civil parish? Should we write inception (P571) = 1960, but located in the administrative territorial entity (P131) = ..., end time (P582) = 1930 ? It does look a little odd. But we do need to adopt a systematic approach for all the buildings and monuments, so the information can be extracted systematically. It's also useful to use P131, so that we can immediately access the information for the pre-1975 counties, just using P131*.

As for part of (P361), I am not entirely sure what you are requesting; but I would be resistant to using it on heritage buildings and monuments items, because P361 already has a more fundamental role in that area, gathering together buildings or monuments (or their constituent parts) that may naturally be collected together into a single group, or may have been listed together under a single listing. I would as far as possible want to avoid using P361 with any other meaning in that context to avoid confusion.

Finally, as for object has role (P3831) = Scottish civil parish (Q5124673), for the moment I would want to leave that in place. The different types of administration, each with different cross-cutting boundaries, existing concurrently in the past in Scotland could be quite complex, particularly in the 19th century, with eg parishes existing alongside Scottish burghs, poor-law combinations, civil registration districts, and probably more things I don't even know about, each sometimes at a higher and sometimes at a lower level. So for the moment, unless it's causing some particular immediate difficulty, for the moment I'd like to leave things in place as they are. Jheald (talk) 21:44, 23 November 2019 (UTC)[reply]

I think ruwiki has some problems with its infobox. It seems to make some incorrect assumptions about P131 leading to issues there. I think for France, the same user was also complaining. --- Jura 22:27, 23 November 2019 (UTC)[reply]

only need to add the most local admin territory is not in my view correct. Reports I use to get sets of Scottish items will tend to time out if I have to calculate the (missing) higher value from the lower value. Redundancy is good; as is labelling each object's role. There is more to be done by way of applying differential ranks, but afaik we lack a suitable tool for that. --Tagishsimon (talk) 22:39, 23 November 2019 (UTC)[reply]

Look, we have documentation for located in the administrative territorial entity (P131) where this rule (only need to add the most local admin territory) is clearly stated. It is also transitive Wikidata property (Q18647515) and hierarchy (Q188619). Thereafter we created an algorithm that, based on this, builds sequences from the village to the state like Cambridge (Q350), Cambridge (Q21713103), Cambridgeshire (Q21272276), Cambridgeshire (Q23112), East of England (Q48006), England (Q21), United Kingdom (Q145) depending on date. It works now instantly and not only in Russian wiki (look to usage of Q18008533). Thus, what has now been done with the French settlements is simply a violation of the agreements recorded in the property description. With this approach, it will be necessary for each country to develop a separate algorithm for compiling such sequences. Of course, this is a much greater evil than the described difficulties with requests. In fact, this disables the ability to trace the hierarchy and requires the person to know it personally, not to recognize it through the located in the administrative territorial entity (P131). Сидик из ПТУ (talk) 08:12, 24 November 2019 (UTC)[reply]

Yes, there is a certain urgency. In Russian wiki use transitive Wikidata property (Q18647515) and hierarchy (Q188619) declared for located in the administrative territorial entity (P131) to make detailed birthplaces in infoboxes like for Anna Sloan (Q3425063) [20]. start time (P580) and end time (P582) help us to make it on the birthdate of person so we need to know which entity was the closest administrative territorial entity (Q56061) for Dumfries (Q652035) at February 5, 1991. We expect only one value on a date and for 20th century (Q6927) it can be cleary performed. I do not argue that civil parishes continue to be used for various purposes in Scotland, but we talk about located in the administrative territorial entity (P131) that used only for administrative territorial entity (Q56061) and not for historical region (Q1620908), cultural region (Q3502482), religious administrative territorial entity (Q20926517) (look to located in the ecclesiastical territorial entity (P5607)) or statistical territorial entity (Q15042037) (look to different from (P1889) statement). How you want to avoid using P361 with any other meaning, so I want to avoid using P131 with any other meaning except administrative territorial entity (Q56061). But using object has role (P3831) for part of (P361) would be the best solution until we create new special property for historical region (Q1620908) or cultural region (Q3502482). And why not to use located in the ecclesiastical territorial entity (P5607) in this case? In any case, what would you do if these parishes were used for categorisation of cultural objects, but would never be administrative units? Сидик из ПТУ (talk) 08:12, 24 November 2019 (UTC)[reply]

@Сидик из ПТУ: Okay, I can see how this is important, particularly for Eastern Europe, where borders have changed so much that the same village might have changed to be in not just in a different district, but in a completely different country between the time of somebody's birth and their death, even if they never once left the place. So we need to think how to get this right.

Note that the assumption that "we expect only one P131 value on a date" is not always safe, even today. There are a number of villages that were split between two parishes, and in England still are even today (where the CP is still the lowest level of administration); further up the chain, there were a few parishes that were split between two counties, even though mostly that was resolved in 1890. And if we consider hills, often the boundary between two administrative regions runs along the summit-line, so part of the hill is in each area. So at the minimum, your algorithm needs to be able to cope with P131 statements with applies to part (P518) qualifications.

Since you are primarily focussed on settlements, we can probably add start time (P580) = 1 April 1996 qualifiers to the present-day Scottish council area (Q15060255) values. We will need to add appropriate Scottish district (Q21457810) for the period 1975-1996, which may take a little research, but probably should be in there, and otherwise your chain will break. For the period before 1975, for the moment it may just be easiest to put end time (P582) = 1975 on the parish values. It's not quite right, because the parishes weren't actively administering anything after 1930. But it won't break your algorithm; and it's not so misleading, because the "landward districts" that some administrative powers were vested in in the period 1930-1975 were made up of groups of parishes, so the structure will still nest nicely, and it's rather easier to put in which parishes were part of which landward districts, rather than have to put it in separately for every hill, settlement, and heritage feature, which frankly nobody is interested in. So I would suggest that might be a reasonable way forward.

One thing your algorithm is going to need to be able to cope with is cases where the upward chain splits. E.g. heritage feature A is in parish B which is split between county C and county D, which may in turn lead to further separate values, before the chains come back together again. How to indicate that feature A is in the part of parish B that is in fact in county C, and not in county D ? Jheald (talk) 13:07, 24 November 2019 (UTC)[reply]

First of all, I am pleased to have a constructive dialogue. Yes, there are some cases have settlements with two ore more equivalent administrative territorial entity (Q56061) for it. We plan to jump over them if they have a common root in the hierarchy or write something like A, B and C, D in this cases. We take care of this problem by our algorithm and only need to follow a hierarchical approach in Wikidata. Regarding end time (P582) of the role of Scottish civil parish (Q5124673) as administrative territorial entity (Q56061), it is best to follow the most common interpretations of British law. If it ends in 1930, we are ready to honestly wait for more correct units to be added for the period 1930-1975. As for the A→B→C&D case, we can use located in the administrative territorial entity (P131) as qulifiers here. For example, [21]. Maybe we can search better qualifier for this switch because same solutions ([22]) are wrong where switches aren't needed. I also want to add that for the parishes in their current roles (not administrative territorial entity (Q56061)), other properties should be found or created, without canceling their role in located in the administrative territorial entity (P131) until 1930. Сидик из ПТУ (talk) 13:57, 24 November 2019 (UTC)[reply]

Add knowledge ID to this[edit]

https://www.wikidata.org/wiki/Q81781662 pls somebody add knowledge Id.

Lady Agnes Stewart[edit]

English Wikipedia said one of Agnes Stewart (Q6469914)'s husband was John Maxwell, 4th Lord Maxwell (Q6247264), who had son Robert Maxwell, 5th Lord Maxwell (Q13972091). But some source says this Agnes Stewart is Robert's wife, and John's wife was another Agnes Stewart (Q75249614). Is Wikipedia wrong?--GZWDer (talk) 05:43, 27 February 2020 (UTC)[reply]

@GZWDer: Seems there has been a certain amount of discussion about these two at https://www.wikitree.com/wiki/Stewart-589 . I'll try to dig in and see whether I can make any sense of it. Thanks for the heads-up. Jheald (talk) 08:59, 27 February 2020 (UTC)[reply]

Just to complicate things, looks like we also have a numbering issue, John Maxwell, 4th Lord Maxwell (Q6247264) vs John Maxwell, 4th Lord Maxwell (Q75249611) (John Maxwell, 3rd Lord Maxwell). Jheald (talk) 09:06, 27 February 2020 (UTC)[reply]

@GZWDer: Agnes Stewart (Q6469914) can't have married John Maxwell, 4th Lord Maxwell (Q6247264), as he died at the Battle of Flodden. That is the battle her first husband Adam Hepburn, 2nd Earl of Bothwell (Q4497270) also died at. So the dates don't work. Jheald (talk) 10:21, 27 February 2020 (UTC)[reply]

@GZWDer: I think that's everything taken care of now. Also Wikipedias updated, and notes left on the relevant WikiTree profiles. Jheald (talk) 13:35, 27 February 2020 (UTC)[reply]

The Peerage[edit]

Note I have still kept a copy of pages of The Peerage as of October 6 2019 (which you can extract spouse information from) at /mnt/nfs/labstore-secondary-tools-project/largedatasetbot/peerage in Toolforge NFS.--GZWDer (talk) 03:15, 28 February 2020 (UTC)[reply]

@GZWDer: Very many thanks for the pointer to these. I've started off by looking at dates of birth and death -- looks like 25,000 more DoBs can be added to items that currently don't have any, haven't looked at DoDs yet, so that should keep QuickStatements busy for a bit, and hopefilly help identify some more matches.

I'm also quite interested to see what I can do about positions people may have held -- this too could help track people down. (eg Members of Parliament, where we've got such good reference coverage thanks to Andrew Gray). And also of course trying to identify people with noble ranks without a Peerage match yet, per those PetScan pagepiles you posted. (Probably makes sense to try to make sure we have this information in WD as statements, so we can update such lists with live queries). So that's all to shoot for. But after that then I definitely will try to get some of the marriages in as well. Thanks again! Jheald (talk) 20:57, 29 February 2020 (UTC)[reply]

33,800 date of death (P570) statements now also sent to Quick Statements. Jheald (talk) 22:54, 29 February 2020 (UTC)[reply]

Wikidata:Property proposal/see talk page discussion at[edit]

Is this proposal still useful, when we already have a similar property?--GZWDer (talk) 17:47, 12 March 2020 (UTC)[reply]

Commons categories: recently created wikidata items 2 (Q55267305)[edit]

Things have changed quite a bit over the last few years, and I'm not sure that Commons categories: recently created wikidata items 2 (Q55267305) is still necessary. What do you think? Thanks. Mike Peel (talk) 21:22, 15 March 2020 (UTC)[reply]

Sure. I don't think we're going to do amything with that set now. There may be some other similar values of on focus list of Wikimedia project (P5008) that could go, too.

Looking at items that are instance of (P31) [23], I think #"" to #4 could probably all go. Jheald (talk) 21:46, 15 March 2020 (UTC)[reply]

Can you remove them through quickstatements, or should I write some python code to do the job? Thanks. Mike Peel (talk) 22:07, 15 March 2020 (UTC)[reply]

I'm a bit maxed out with other things for the next few days. Can try to do it after that. Jheald (talk) 22:09, 15 March 2020 (UTC)[reply]

Long query on the newsletter[edit]

Hello Jheald,

Thanks for your contribution to the newsletter!

As the query is super long and people tend to complain when we send thousands of extra characters to their talk pages, what do you think about including the query on a wikipage (a subpage of your user page for example) and linking to it from the newsletter? Lea Lacroix (WMDE) (talk) 08:56, 16 March 2020 (UTC)[reply]

@Lea Lacroix (WMDE): If I can get the query to run again (it was a bit flaky this morning), I'll take a screenshot of the chart & upload it as a file on Commons that can then be linked to. Can you give me about 1/2 hour? Jheald (talk) 15:22, 16 March 2020 (UTC)[reply]

Deprecated rank located in the administrative territorial entity[edit]

Hi James, why did you change this to deprecated? According to en:West Hill, Devon, it was located in Ottery St Mary (Q1247828) until 30 March 2017. By setting the rank to deprecated you're saying that it was never in Ottery St Mary (Q1247828). I don't think that was your intention, right? You probably want to set it back to normal and set the current one to preferred. Multichill (talk) 19:09, 12 April 2020 (UTC)[reply]

Timelines and categories[edit]

Hi James, hope you're well. I see you originally proposed list related to category (P1753)/category related to list (P1754) back in 2015. What do you think to using it for timelines as well? E.g., I've added it to link timeline of Kyiv history (Q7805783) and Category:Kyiv (Q4620), but it currently causes a constraint violation for instance of (P31)=timeline (Q186117). It's easy to modify the constraint, do you think that's the way to go, or a new property pair? Thanks. Mike Peel (talk) 16:59, 3 May 2020 (UTC)[reply]

Bot lacks proper date logic (TP spouses)[edit]

Your bot, JhealdBatch a date in the 1400s to Henry IV of England (Q161866), and it was treated as a Gregorian calendar date. Such are stated by nearly all sources in the Julian calendar, which was the calendar in widespread use in Europe at the time. A simple remedy is to add logic to the bot to skip any edit involving a date earlier than 1 March 1923, the date the last country (Greece) using the Julian calendar for civil purposes switched to the Gregorian calendar. Jc3s5h (talk) 10:44, 8 August 2020 (UTC)[reply]

@Jc3s5h: Thanks. Unfortunately it's a limitation in QuickStatements -- it identifies all dates as Gregorian.

I will go back and fix the dates once they are in, using a tool that allows more nuance like Wikidata-CLI. But in the very short term, it may be useful to have dates even with the wrong calendar, for help in identifying (or ruling out) the duplicates which are such a curse with the items from ThePeerage. Jheald (talk) 12:20, 8 August 2020 (UTC)[reply]

A tool might work for pre-1582 dates. But I've never heard of a tool that can determine which calendar to use for dates between 5 October 1582 and 1 March 1923. Jc3s5h (talk) 01:13, 9 August 2020 (UTC)[reply]

@Jc3s5h: A fair point. But the focus of the database (over 95% of the entries) relates to the UK, so using 1752 as a watershed should give a reasonable start, at least to a first approximation.

There may be ways to go further to guess better, eg by identifying nationality; or event location, or if there is an existing Julian date (eg for the date of birth of one of the children). The source doesn't distinguish calendar, so it will be a case of trying to guess and then guess better. Jheald (talk) 10:36, 9 August 2020 (UTC)[reply]

Peerage spouses[edit]

Thank you for adding these! Will your bot be able to get all of them? Gamaliel (talk) 17:11, 10 August 2020 (UTC)[reply]

@Gamaliel: I'm working on it! There are a few where there is some additional information beyond the simplest form. So I've left those to one side in this initial run, to examine more closely. And I need to think how best to represent "married between DATE1 and DATE2".

There are a handful that relate to individuals that don't have a Wikidata item (plus a very small number that don't have an individual Peerage item). So those may take longer to get added. But yes, I hope I have got every individual with the word "married" in their entry, so with luck that should be all of them at least as of 7 August. Jheald (talk) 19:04, 10 August 2020 (UTC)[reply]

Now proposed: Wikidata:Property_proposal/latest start date, Wikidata:Property_proposal/earliest end date

Jane/Joan Champernowne[edit]

Previously there are some discussion at User_talk:GZWDer#William_Graham.--GZWDer (talk) 07:54, 13 August 2020 (UTC)[reply]

@GZWDer: Thanks. I will update. Jheald (talk) 08:02, 13 August 2020 (UTC)[reply]

Peerage spouses redux[edit]

I notice when we merge The Peerage entries we can end up with a spouse in twice, I only noticed when I added in the family tree graphic at Commons. Do you know if a bot will remove the duplicate entries, or do they need to be recognized with a search and removed by hand? --RAN (talk) 21:29, 16 August 2020 (UTC)[reply]

@Richard Arthur Norton (1958- ): The issue is references and qualifiers. On the one hand, it is useful to preserve all the references, to indicate that the source has been consulted (in particular for The Peerage, where it's a useful track to try to identify any spouses on the site that I may have missed in my extraction. But often the import from The Peerage will contain additional information, such as start dates and series numbers (or occasionally the existing entry will). So can one merge the two statements with different qualifiers under the same set of references, given that not all the references may support all of the qualifiers?

I am not sure about this, and was going to ask for community input at the Project Chat.

My feeling is that if we don't merge the claims, then some of the entries become almost unreadable. Also, because I used QuickStatements to do the import, it has smashed the info from The Peerage into whatever was there before, regardless of qualifiers (which in most cases weren't there). But, if we do merge the claims together, are we then giving the impression that eg a marriage start date is supported by a top-notch source like ODNB or History of Parliament, when in reality it may just have come from TP, without reference to either of those sources?

I wasn't sure what was the answer to this, so at the moment I haven't done anything, just merged the items and left the claims duplicated (maybe thinking I would run a script to combine the claims later, when more of the merges have been done). But what do you think? Jheald (talk) 21:41, 16 August 2020 (UTC)[reply]

Any way of doing it will be fine, keeping all the references is ok, usually e only need more than one, if we have divergent information, like two different death dates. The dupes have been showing up in these charts: Commons:Category:Arthur Wellesley, 1st Duke of Wellington. --RAN (talk) 22:38, 16 August 2020 (UTC)[reply]

Gaps in family trees caused by the deletion of an individual[edit]

Please look at Wikidata:Administrators%27_noticeboard#Please_restore_the_red_link%2Flinks_in_this_family_tree where individual people from The Peerage upload are being deleted, creating gaps in the family. If you can find other examples that need to be restored, please add them. Can you think of a way that the process can be automated, a search that would lead to a member of the tree giving a red link? --RAN (talk) 17:18, 20 August 2020 (UTC)[reply]

Blocking your bot[edit]

I blocked your bot because it has been editing at speed of 420 edits per minute causing maxlag to increase drastically. Slow down. Amir (talk) 14:28, 22 August 2020 (UTC)[reply]

Wrong merge of persons[edit]

Hello, you have recently merged Teige Mac Murrough O'Brien (Q7694980) and Teig O'Brien (Q76034235). I wonder why because they are 2 different people according to a genealogy chart in A New History of Ireland volume X. I have already fixed today another wrong merge you did (2 persons who didn't even live in the same century). So could you please fix this one and check if there are no others ... Thank you. --Melderick (talk) 01:40, 28 August 2020 (UTC)[reply]

@Melderick: Thanks for catching this. I have undone the merge, which was made on the basis of looking at a list of pairs of people married to the same spouse (in this case Móre O'Brien (Q76034237)) with very similar names. I am aware that there may have been a few errors, especially if one of the spouse (P26) relations (husband -> wife or wife -> husband) may have previously been mis-identified, and I will be trying to track these down.

If you've got access to the reference materials, it would be really useful if you could look into which of the two was in fact married to Móre O'Brien (Q76034237); or whether that is uncertain; or whether perhaps there were two different More O'Briens, that may have got conflated. Also whether Slany O'Brien (Q75285548) and Slany O'Brien (Q76034114) are indeed different (and valid). Thanks! Jheald (talk) 10:10, 28 August 2020 (UTC)[reply]

@Jheald: 1) Unmerging both items is not enough, you also have to go through each family members of Teig O'Brien (Q76034235) and change their link back to himself instead of Q7694980.

2) As you mentioned, yes merging 2 people because they married the same person can be a complete mistake due to the origin of a lot of data. Maybe you could start by checking if those candidates have different parents ? In both case I found it was the case.

3) Now that I have checked, Eleanor FitzAlan (Q5354277) and Eleanor Fitzalan (Q75509012) never had the same spouse, so what was the reason for their merge ?

4) Well I have gone through the reference materials I have access to several months ago and I remember it was quite unclear who both Teige married. The genealogy chart I mentioned doesn't have any wives or daughters at all. I see that ThePeerage is listing both marriages based on BP2003 but I don't have access to this reference. In another reference I have found, only Q7694980 is mentioned and he is said to have married a More daughter of Donald More, but for Móre O'Brien (Q76034237) (daughter of Donald More, but are they the same ?) there is only her first marriage listed, so maybe it was a different More. The Annals of the Four Masters in 1599 says : "More, the daughter of Donnell, son of Conor, son of Turlough O'Brien, died in the month of January. She was a woman praiseworthy in the ways of woman.". Likewise, in 1567 and 1577, the death of both Teige are mentioned but no wive is given (which is not uncommon in the annals).

5) As for their daughters Slany, it is quite the same or even worse. The reference that mentioned Q7694980 married to a More O'Brien, also says that Teige O'Brien (Q75582715) married Slany O'Brien daughter of Teige of Smithtown (Q7694980).

And that's about all I can find. Nothing conclusive from my point of view as the absence of a family link is not a proof by itself. Would be interesting to see what BP2003 exactly says about all these Teige/More/Slany, especially Slany O'Brien (Q76034114) who as you said is possibly a duplicate for Slany O'Brien (Q75285548). --Melderick (talk) 15:38, 28 August 2020 (UTC)[reply]

@Melderick: Regarding Eleanor FitzAlan (Q5354277)/Eleanor Fitzalan (Q75509012): these got caught up because as of 16 August Thomas Browne (Q7787966) was shown as married to both of them [24]. It is now clear, thanks to you, that one of these spouse (P26) statements was not correct; but unfortunately it was only in the last stages of my sweep, after I had made rather a lot of these merges, that I realised I hadn't considered errors of this kind.

Looking for whether the items now have multiple fathers or mothers is indeed a good starting point. And yes, there's now quite a list to investigate: https://w.wiki/acD (fathers) https://w.wiki/acJ (mothers). Other tell-tales could be multiple WikiTree IDs, badly divergent multiple dates of birth, and incompatibilities between dates of birth and parents' dates. I am aware that I did slip up here, and there is work to do to track down the errors. (Though it does seem that only a few in the above duplicates list were caused by recent merges - other possibilities include old errors, or multiple statements because of biological vs legal parents. But yes, I do appreciate that there is clean-up here to do. I have found some of the Irish genealogies very tricky in the past, eg the early Burkes of Clanrickarde, where ThePeerage and editions of Burke's up to the mid 1900s had quite different assignments of dates and marriages as compared to recent works. I do intend to try to do what I can, but I do apologise if you (justifiably) feel that I've made a bit of a ploughed field out of your patch. Jheald (talk) 16:23, 28 August 2020 (UTC)[reply]

69 items with two wikitree matches: https://w.wiki/ahK ; but for two wikitree IDs and two fathers: only one hit https://w.wiki/ahQ - where there is legitimate divergence in the primary sources as to his parentage. Similarly for two wikitree entries and two mothers https://w.wiki/ahY (same hit). Jheald (talk) 12:16, 29 August 2020 (UTC)[reply]

More efficient might be date anomalies, eg principal born after death of spouse tinyurl.com/yycqztus

@Jheald: Thanks for the explanation for Eleanor FitzAlan (Q5354277)/Eleanor Fitzalan (Q75509012). Do you still have the list of your merges ? That would be interesting to compare it with those lists of people with multiple fathers/mothers.

By the way, I wanted to thank you for adding so many spouses. It makes my work so much easier. --Melderick (talk) 17:58, 28 August 2020 (UTC)[reply]

@Melderick: Here's a combined file of the merges I sent to QuickStatements having merely eyeballed the names https://paste.toolforge.org/view/4b5cc826 . The supposed common spouse is in cols 1 & 2, the items to be merged in 3 & 4 and 5 & 6. It runs to 712 lines, though there are a handful of duplicates. In addition there were maybe about half as many again that I looked at more closely, then did manually. And there was an earlier set of merges for items from the Kindred import that had not previously been matched to anything. Jheald (talk) 20:49, 28 August 2020 (UTC)[reply]

@Melderick: The first merge for the Kindred, based on the spouse names being identical https://paste.toolforge.org/view/d4b56f76 (about 1200 lines); the second set for the Kindred, where I eyeballed the spouse names https://paste.toolforge.org/view/22aae18b (350 lines); and the third, where the common spouse had already been merged https://paste.toolforge.org/view/9cb252af (40 lines). Jheald (talk) 21:04, 28 August 2020 (UTC)[reply]

User page for "JhealdBatch "[edit]

Hi. Would you consider creating a global user page for user:JhealdBatch‎? Its edits are showing in multiple wikis when it operates here and it would be nice if it could not be a redl ink when it edits. Thanks for the consideration. — billinghurst sDrewth 07:31, 30 August 2020 (UTC)[reply]

Minor error by your bot on 2 August 2020[edit]

Hi. Just pointing out a minor error from your bot on 2 August 2020. It performed a merge on Sir John Louis, 2nd Baronet (Q76056579) but failed to follow up with the steps to clear and redirect.[25] This allowed another bot to come along a few minutes later and insert new statements into the mostly blank item. From Hill To Shore (talk) 18:58, 4 September 2020 (UTC)[reply]

@From Hill To Shore: Thanks for spotting this. I have opened a thread on the DeltaBot talk page (Topic:Vtcwe2cb98qbh896), to see whether we can think how to identify any more of these, and how to stop it happening any more. Thank you so much for finding this and flagging it. Jheald (talk) 20:22, 4 September 2020 (UTC)[reply]

@From Hill To Shore: A quick update. Having now written a query to try to track down such cases, it looks like there may have been three other times when it may have happened. So a problem that it would be good to avoid in future, but thankfully it doesn't seem to have been too widespread. Again, thanks for a very good catch. Jheald (talk) 09:38, 7 September 2020 (UTC)[reply]

The Peerage[edit]

Were you the person that uploaded all of The Peerage? If not do you know who did it? I have another similar project with the DAHR database but I do not know who, and how, it was uploaded and want that person involved with the DAHR project. --RAN (talk) 17:16, 20 September 2020 (UTC)[reply]

@Richard Arthur Norton (1958- ): The person that made the upload was User:GZWDer. There were a lot of problems with the upload from ThePeerage: questionable notability, poor de-duplication, and only rather a limited part of the data available was uploaded. All of this annoyed rather a lot of people, and created a lot of work for those who decided it nevertheless was worth trying to fix. If thinking about a similar database, I would not necessarily take what was done with TP as a model. Jheald (talk) 17:40, 20 September 2020 (UTC)[reply]

Commercial catalogues[edit]

I added AAT trade catalogs to edition of commercial catalogue (Q55089312). Do you think it fits better on commercial catalogue series (Q55089306)? - PKM (talk) 22:41, 20 September 2020 (UTC)[reply]

We sent you an e-mail[edit]

Hello Jheald,

Really sorry for the inconvenience. This is a gentle note to request that you check your email. We sent you a message titled "The Community Insights survey is coming!". If you have questions, email surveys@wikimedia.org.

You can see my explanation here.

MediaWiki message delivery (talk) 18:46, 25 September 2020 (UTC)[reply]

Craigavon Borough Council[edit]

I see you did some work on Craigavon Borough Council (Q427201), which seems to conflate a local government body and the area it represents. Do you have a preference for how that should be resolved? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:35, 22 December 2020 (UTC)[reply]

@Pigsonthewing: So, looking at it again, for the latest districts it looks like we mostly do have separate items for the territorial entity and the local government authority. (Query https://w.wiki/rhs , which may not catch them all).

For the 1973 to 2015 districts, it looks like we mostly don't have separate entities. (Query https://w.wiki/rhp ). Wikipedias that have entries appear to only have a single entry, named in most cases for the territorial entitiy. En-wiki mostly seems to only have a single entry, but named for the administrative body.

Going forward, if we only have one entry, I would make it for the administrative area (and rename the English label accordingly). I would be tempted to keep all the Wikipedia items connected to that single entry for the sake of sitelinks, changing the label for the English name for the item as required (eg to Craigavon Borough, or just Craigavon).

If you would like to make separate items for the old councils, to allow replaces (P1365) statements on the items for the new councils, then feel free. But I'm not sure what the best way would then be to preserve sitelinks from the enwiki article to other wikis, if you were to change the sitelinks for the en-wiki articles to the new items for the councils. In such a case accommodating other wikis to enwiki could be easily done with a single redirect from authority -> council on enwiki. But going in the other direction would be harder. So it might be worth leaving the enwiki article sitelinked to the territorial entity. What do you think is the best way forward? Jheald (talk) 15:16, 22 December 2020 (UTC)[reply]

I'm not sure, hence the question. I usually look to the history of an item for answers, but in this case, I see that it was conflating the two subjects from the outset. I definitely think we need two items; but I'm not sure which topic this should be for (being mindful of possible external use). Perhaps, as it's a UK subject, we should favour the en.Wikipedia link? @Tagishsimon: Do you have view? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:20, 23 December 2020 (UTC)[reply]

Recusants[edit]

Do have any thoughts about this question? - PKM (talk) 00:03, 23 February 2021 (UTC)[reply]

Stanley Pumphrey (Q94656610) and Stanley Pumphrey (Q76226055)[edit]

Hi, Jheald, wondering if you could take a look at both of them. It obviously is about the same person I quess. Thank you for your time. Lotje (talk) 13:50, 24 February 2021 (UTC)[reply]

@Lotje: Will do. I'm quite tied up with things that I am trying to get straight for the new WD:WP EMEW just at the moment, but would it be okay if I tried to get them straightened out by the end of tomorrow (Thursday) ? Cheers, Jheald (talk) 13:53, 24 February 2021 (UTC)[reply]

No problem at all! Thanks in advance./ :-) Lotje (talk) 14:09, 24 February 2021 (UTC)[reply]

Structured data across Wikimedia is starting![edit]

Hi James! I hope you're fine. :) I sent you an email some days ago about Structured Data Across Wikimedia (SDAW), the new WMF project that is about to start, and I wanted to be sure you received it.

Please do not feel pressured by this message! I'm just curious if you can (or want) to drop a line in the talk page, or if you know other user(s) that can be interested in the topic.

Hope to hear from you soon! --Sannita (WMF) (talk) 17:12, 5 March 2021 (UTC)[reply]

Odd mergers[edit]

Just a heads-up that items like Runnymede (Q105909295) don't seem to be merging properly right now (by JhealdBatch). Thanks. Mike Peel (talk) 13:18, 13 March 2021 (UTC)[reply]

@Mike Peel: Thanks, Mike. As I understand it, there's a known bug with QS and there's a bot that finishes the job after a day or so. But does this go beyond that? Jheald (talk) 13:34, 13 March 2021 (UTC)[reply]

Just that the item gets left as an empty item afterwards, I hadn't seen this behaviour before but I don't use QS. Good to know that it will be bot-tidied later. Thanks. Mike Peel (talk) 13:38, 13 March 2021 (UTC)[reply]

@Mike Peel: cf Wikidata:Project_chat/Archive/2019/11#Merge_with_QuickStatements_not_creating_redirects_? - which AIUI is not something I need to worry about, a bot will take care of it? It spooked me the first time I saw it, too. Jheald (talk) 13:39, 13 March 2021 (UTC)[reply]

Scottish Local Authorities (pre-1996)[edit]

Hi Jheald! It looks like Wikidata has no entities for Scottish Local Authorities from 1975 - 1996 (two tier) nor earlier 1900 - 1975 iterations.

I see you created the current versions and wondered if you had any views before I dive in and start on 1975 - 1996. I'm looking at Aberdeen for now as a template.

I did some research and I've updated the intro section there https://en.wikipedia.org/wiki/Aberdeen_City_Council with that.

For 1975 we would have a regional authority ( in this case Grampian Regional Council ) and a lower tier (not sure of the term) called City of Aberdeen District Council. Each would be a subclass of https://www.wikidata.org/wiki/Q21451695

Does that sound sort of right?

Similarly the 1900 - 1975 would need to be of a new type - an instance of a subclass of Q21451695 - in this case called ‘The County of the City of Aberdeen’ alias ‘Aberdeen Corporation’, and ‘Corporation of the City of Aberdeen’ and ‘Aberdeen City Council’.

Each new entity would have a different from property, and replaces (or follows) and the inverses.

Does that all sound about right to you?

Thanks. Ian Watty62 (talk) 18:34, 8 July 2021 (UTC)[reply]

I had more of a look at this and there are items for Scottish Regional Council Scottish Regional Council (Q105582114) and Scottish District Council Scottish district council (Q66772470). I've tidied them up a little.

It looks like all nine Regional Councils were created. https://w.wiki/3cX3

But only 40 of the Districts exist https://w.wiki/3cXK. Even within regions the coverage of Districts is patchy: In Grampian Banff and Buchan are there, and Gordon but no City of Aberdeen or Moray District councils. I'll have a go at fixing Grampian Ones but I don't have time for a couple of weeks to remediate all Regions' ones. Nor to check the quality of existing ones.

Watty62 (talk) 17:47, 9 July 2021 (UTC)[reply]

Call for participation in a task-based online experiment[edit]

Dear Jheald,

I hope you are doing good,

I am Kholoud, a researcher at King's College London, and I work on a project as part of my PhD research, in which I have developed a personalised recommender system that suggests Wikidata items for the editors based on their past edits. I am collaborating on this project with Elena Simperl and Miaojing Shi.

I am inviting you to a task-based study that will ask you to provide your judgments about the relevance of the items suggested by our system based on your previous edits. Participation is completely voluntary, and your cooperation will enable us to evaluate the accuracy of the recommender system in suggesting relevant items to you. We will analyse the results anonymised, and they will be published to a research venue.

The study will start in late January 2022 or early February 2022, and it should take no more than 30 minutes.

If you agree to participate in this study, please either contact me at kholoud.alghamdi@kcl.ac.uk or use this form https://docs.google.com/forms/d/e/1FAIpQLSees9WzFXR0Vl3mHLkZCaByeFHRrBy51kBca53euq9nt3XWog/viewform?usp=sf_link I will contact you with the link to start the study.

For more information about the study, please read this post: https://www.wikidata.org/wiki/User:Kholoudsaa In case you have further questions or require more information, don't hesitate to contact me through my mentioned email.

Thank you for considering taking part in this research.

Regards

Kholoudsaa (talk) 17:48, 3 January 2022 (UTC)[reply]

Anne Browne[edit]

In 2017, w:Special:Contributions/Ancestral-Zhenkoji have added a supposed third marriage to w:Anne Browne (daughter of William Browne (Q16197601) died 1514 and Alice Keble (Q76295761)) and related articles, and the information preserved today. (The user have no other edits). Genealogics.org also have such information. However:

Geni.com said Anne Browne (Q75911120) is daughter of William Browne (Q110415187) died 1507 (with unspecified wife).
WikiTree said Anne Browne (Q75911120) is daughter of William Browne (Q16197601) and another wife, which made these two Anne Browne half-sisters.
The Peerage said Alice Keble is wife of William Browne died 1507 which does not agree with the Wikipedia article.

--GZWDer (talk) 22:06, 27 January 2022 (UTC)[reply]

wikitech[edit]

https://wikitech.wikimedia.org/wiki/User:Jheald -- Jheald (talk) 13:42, 29 January 2022 (UTC)[reply]

Peerage references[edit]

There is a user I frequently encounter who removes references to the Peerage on statements for relatives. They don't remove the peerage property, just the use of it as a references. I can just revert them but the thought of getting into yet another conflict with this person exhausts me. Do you think this will interfere with the work people or doing on the Peerage items or do you think it doesn't matter at all? Gamaliel (talk) 15:09, 23 February 2022 (UTC)[reply]

@Gamaliel: I think it is tiresome of them, because it removes the signpost to where the suggested connection came from; and it removes the ability to directly click through to see just exactly what the entry at The Peerage actually says (in particular, to see whether the entry at The Peerage contains any additional side-information that supports our matching of that entry to the particular Wikidata item, or whether it doesn't -- something which is very helpful to know, if for any reason the claimed relative connection has come into doubt).

So I think what the person is doing is definitely unhelpful. The first purpose of sourcing is "Say where you got it from" -- and that should not be removed, no matter how weak or unreliable somebody may believe that source to be.

We don't source things as a warrant of reliability. We source them so people can check.

Losing the sourcing may not be the biggest deal in the world. But it is certainly unhelpful. Jheald (talk) 16:11, 23 February 2022 (UTC)[reply]

Thanks. I restored the references, and they characteristically retaliated by removing the infobox on the corresponding en.wp article. I've come to expect this kind of behavior, but at least the references are still there for editors working on the Peerage items. Gamaliel (talk) 15:42, 24 February 2022 (UTC)[reply]

BHL Wikidata Project[edit]

Hi Jheald,

I realize that you spearheaded the Wikidata:WikiProject_BHL in 2018. (Thank you!) I've recently joined the project and I am the current BHL Data Manager (hired just last month). I was wondering if we could discuss a few items regarding BHL's Wikidata efforts:

targeted activities for BHL Staff
What else to include on the project page
BHL properties in Wikidata

Please let me know if a meet-up is of interest in the near-term.

Thanks! JJ JJFord BHL (talk) 18:00, 10 March 2022 (UTC)[reply]

OpenRefine and SDC updates: user survey and monthly office hours[edit]

Hello! You are receiving this message because you signed up for updates about the Structured Data on Commons (SDC) features that are currently developed for OpenRefine.

Short survey for SDC features in OpenRefine[edit]

OpenRefine is running a short survey to learn about user needs and expectations for its new SDC features. If you upload files to Wikimedia Commons and/or edit structured data there, please help by filling in this survey!

Monthly OpenRefine and Wikimedia office hours[edit]

OpenRefine's community meetup of February 22 was very well attended. You can see its recording, slides and notes here. The team now hosts monthly, informal office hours for Wikimedians (online, via Zoom). Upcoming office hours are:

Tuesday, April 19, 2022 at 4PM UTC (how late is this in my timezone?)
Tuesday, May 24, 2022 at 8AM UTC (how late is this in my timezone?)
Tuesday, June 21, 2022 at 4PM UTC (how late is this in my timezone?)

The Zoom link of the next office hour will be posted on OpenRefine's info page on Wikimedia Commons. Please drop by and say hi!

All the best! SFauconnier (talk) 14:00, 9 April 2022 (UTC)[reply]

Unparished parts items that are now parished[edit]

What should be done with Q105908846, Q105908861, Q105909183 and Q105908961 now that these districts are entirely parishes? Lucywood (talk) 13:57, 17 July 2022 (UTC)[reply]

@Lucywood: At the moment these items are principally being used in statements like:

Parish Church of St Peter (Q17556167)located in the administrative territorial entity (P131)Rother (Q1811605)statement is subject of (P805)unparished part of Rother (Q105908846)

I'm still not sure if that's the right qualifier, but there's surely a case for keeping the statement with a end time (P582) qualifier at standard rank, alongside a statement at preferred rank giving the parish that's now been created. (I find wikidata-cli can be quite a useful tool for moving groups of statements to preferred rank).

As for the unparished area item itself, perhaps a dissolved, abolished or demolished date (P576) statement with end cause (P1534) = an item for "creation of parishes" ? A consequence of text (P9680) qualifier could also be added, if there was a particular statutory instrument that brought the new parishes into being? Jheald (talk) 09:49, 18 July 2022 (UTC)[reply]

With Rother the unparished area was "Bexhill" the only unparished are that became a parish, in the case of Wellingborough it also became a parish, in the case of Northampton and Eastleigh the unparished area was split into multiple parishes. Q853012 and Q913499 already states they were unparished areas until 2021 and from then were parishes. Q731069 states it was an unparished area until 2021 and Q192240 should probably also have the same statement. Lucywood (talk) 14:07, 18 July 2022 (UTC)[reply]

@Lucywood: replaced by (P1366) + replaces (P1365) may also be useful to connect the old and the new, (the latter sometimes with applies to part (P518) = 'somevalue'). Jheald (talk) 14:12, 18 July 2022 (UTC)[reply]

Shouldn't Parish Church of St Peter (Q17556167) have P131 as Bexhill-on-Sea (Q853012) with P805 as unparished area (Q7897276) end time 2021 and civil parish (Q1115575) start time 2021. Lucywood (talk) 14:18, 18 July 2022 (UTC)[reply]

One could certainly make an argument for that (probably with the more straightforward qualifier object has role (P3831), rather than the more exotic (?and maybe not quite right?) statement is subject of (P805)).

But I think there are also a couple of objections one might run into: i) that the unparished area (Q7897276) wasn't an administrative area prior to 2021, suitable to be the object of a located in the administrative territorial entity (P131) -- what it was was the absence of an administrative area (and that the actual administrative area was Rother (Q1811605); ii) that we ought to treat Parish Church of St Peter (Q17556167) before 2021 in the same way that we treat places in unparished administrative areas now (so that queries that work for such areas now need minimal adaptation).

On the other other hand, one could make the case that so many of these area became 'unparished' because they used to be municipal areas, that those former municipalities gave the area key identities (and boundary?) even after abolition, and so maybe they are also a key thing to try to represent? (But possibly best done with <place> P131 <municipal entity> / end time = 1974 -- where the municipal area would be a different thing to the civil parish that has now come into being?)

Pinging @Andrew Gray:, who has a wise sense for this sort of thing -- and what sort of structures may be most useful for later building on. I may also ping him via the telegram group. Jheald (talk) 14:47, 18 July 2022 (UTC)[reply]

Something like coextensive with (P3403) or territory overlaps (P3179) might also be useful to relate the new civil parish to the old municipal area -- albeit perhaps with some suitable qualifier to note they were not contemporaneous -- nor even direct successors? Though can't immediately think what that qualifier might best be? ETA: A qualifier like has characteristic (P1552) = 'non contemporaneous' might be possible. Jheald (talk) 15:03, 18 July 2022 (UTC)[reply]

move "author name string" to "author" based on VIAF[edit]

Hi @Jheald: It seems you added a couple hundred works with this pattern:

author name string: "ACHENBACH, Hermann, Writer of Verse"; qualifier VIAF ID: 57822913 (https://www.wikidata.org/wiki/Q63313824#P2093)

They are reported as VIAF prop scope violations (https://www.wikidata.org/wiki/Wikidata:Database_reports/Constraint_violations/P214#%22Scope%22_violations)

"ACHENBACH, Hermann" is an ambiguous name but in this case both VIAF and WD agree that it's Hermann Achenbach (Q94931290).

Could you please move these "author name string" claims to "author"? I could help with a SPARQL to find feasible cases and maybe generate QS.

Cheers! Vladimir Alexiev (talk) 08:41, 16 September 2022 (UTC)[reply]

Another similar twisted pattern is below. There's no "contributor name string" so you use an "unknown value" :-(

contributor to the creative work or subject: unknown value
object named as: HART, Francis, of Perth, Western Australia.
VIAF ID: 3160155286588287180007

@Vladimir Alexiev: Thanks for this.

Somevalue + "object named as" is a standard pattern to use when the object of a statement is known, but there is as yet no item for them. (cf Help:Statements#Unknown_or_no_values "Unknown value may also mean...")

Here's a query https://w.wiki/5hyJ for people with these VIAFs that now have items (about 38), and I'm happy to update those.

There seem to be https://w.wiki/5hyR about 108 cases where these VIAF qualifiers are being used for people who do not have items. I'm afraid I don't have time to do a decent job of creating decent properly populated items with properly researched properties and properly searched external-IDs for them right now. Jheald (talk) 10:22, 16 September 2022 (UTC)[reply]

Using a WD item for those that currently exist would be a valuable fix. thanks! Vladimir Alexiev (talk) 10:54, 19 September 2022 (UTC)[reply]

Stop JhealdBatch[edit]

Hi Jheald, please stop your bot User:JhealdBatch until you find time to correct wrong merges, see Talk:Q64326. @MisterSynergy: FYI. --Kolja21 (talk) 02:11, 15 October 2022 (UTC)[reply]

Done Thanks for separating the items. --Kolja21 (talk) 12:06, 16 October 2022 (UTC)[reply]

@Kolja21: Can you help at all with the transcription of what the BM print says about Ludwig of Bavaria (Q104097704) ? With the old spelling and the old lettering, I can't get it accurate enough to get a solid translation from Google Translate -- in particular for what it says about Elizabeth and Hungary; and about Ludwig and Nuremberg. Thanks, Jheald (talk) 12:52, 16 October 2022 (UTC)[reply]

I've tried but I don't know if it is of any help. --Kolja21 (talk) 13:39, 16 October 2022 (UTC)[reply]

Thanks! Jheald (talk) 13:47, 16 October 2022 (UTC)[reply]

Tabular data revisited[edit]

Hi Jheald, now that it's clear that .tab pages will have a hard cap at 2MB, it would be good to revisit the tabular data project: how do we want to publish and version modestly-sized datasets? Sj (talk) 16:57, 7 April 2023 (UTC)[reply]

Art UK matches[edit]

Hi James, in the past you matched Art UK venue ID (P1602), Art UK collection ID (P1751) and Art UK artist ID (P1367) with items. I have https://w.wiki/6qB3 for missing collection and https://w.wiki/6syX for missing artist. Want to help matching these? See also Property talk:P1679 Multichill (talk) 19:06, 26 June 2023 (UTC)[reply]

@Multichill: Thanks for the ping. Nice query. Here https://w.wiki/6tCG is a slightly adapted one with DISTINCT to make sure each painting is counted only once; a variant https://w.wiki/6tDL to include an Art UK painting page URL that can be used to check the Art UK ID for the artist. (Usually this can be predicted from the string, but it's probably worth the check); and one https://w.wiki/6tDR to include all of the Qids for the paintings by the artist, to enable the P170 additions.

It should be a straightforward job using OpenRefine to see which artists can be easily matched to wikidata items, and then to fill in the creator (P170) values on the works and Art UK artist ID (P1367) values on the artists accordingly.

I have been away from wiki for a few weeks, and have quite a lot of stuff to get back to that was in hand, but let me see if I can take a chunk out of these.

Regarding Art UK artists more generally, I think I last looked at the dataset in 2019, when Sarah Harmon [26] at Art UK was able to get me a spreadsheet with fields

id / name (with dates) / gender / reference / role

for all the sculptors then in their database (1838 rows). Of course this is only a fraction of the total number of artists in their system; and since then they have in particular added a lot more sculptors. Jade King [27] was also in cc for some of the discussion, and had previously (2017) been quite receptive to a list of potential duplicates on their database (some of which she was able to gently point out were mistakes at our end). The 2019 contact with Sarah Harmon came about via an editathon in Scotland she had been co-ordinating with Sara Thomas (WMUK) [28]; it's possible Sara may have had further contact with Art UK.

In the spreadsheet the "reference" column is the slug for an artist url that they construct from their name + dates field -- ie it's the value we track with Art UK artist ID (P1367). It is worth noting that they change this (and the artist url) if they update the dates and/or the preferred name. It is quite a long time since we last checked Art UK artist ID (P1367) values for validity, so a number may no longer be correct, and may need to be re-identified.

The "id" column is their own sequential internal numeric ID -- typically 5 digits for most sculptors, but it can be fewer. (Highest value 54067 in the sculptors they sent me). This column would be well worth asking for, if we can persuade them to give us a current list of all their artists, because it would make P1367 value updates so much easier to process going forward. (Assuming that their internal numeric IDs have remained stable). As of 2019 the sheet of the 1838 sculptors was the only list of artists I had as of that date been able to get from them. Jheald (talk) 18:37, 27 June 2023 (UTC)[reply]

Welcome back. See also Topic:Xk7vz5wdhfbvq05e. Multichill (talk) 20:26, 27 June 2023 (UTC)[reply]

First P170s now going in (batch), for 676 works by 59 artists who had P1367s newly added by MnM automatcher. Jheald (talk) 09:49, 28 June 2023 (UTC)[reply]
First matches with OR now made. P170s going in for 721 works (batch), based on reconciliations of 63 artists. Jheald (talk) 21:44, 29 June 2023 (UTC)[reply]
P170s going in for a further 236 works (batch), based on reconciliations for a further 34 artists (batch). -- Jheald (talk) 19:31, 8 July 2023 (UTC)[reply]
And a further 85 works (batch), based on reconciliations for 11 more artists. (batch) -- Jheald (talk) 16:22, 20 July 2023 (UTC)[reply]

provenance research loves wikidata – Berlin/online[edit]

Hello Jheald, maybe you have time to join »Provenance Loves Wiki 2024« https://de.wikipedia.org/wiki/Wikipedia:Arbeitsgemeinschaft_Kunstwissenschaften_%2B_Wikipedia/Provenance_loves_Wiki/English n Berlin, Jan 12–14, 2024, (and we figure out online attendance), to work on issues of provenance data and wikidata, together with the attendees. You are kindly invited to join – regards Pippich (talk) 12:05, 24 October 2023 (UTC)[reply]

Q95215645[edit]

Hi, please have a look at your recent matching at Josef Šimůnek (Q95215645) - the surnames and also dates of birth don't match. Vojtěch Dostál (talk) 12:45, 15 December 2023 (UTC)[reply]

Oh, maybe it's just matching by Osiem gwiazdek that has led to your edits. Vojtěch Dostál (talk) 12:46, 15 December 2023 (UTC)[reply]

@Vojtěch Dostál: Thanks for catching this. Yes, I was matching forward from the old DACS id, based on the old and new names associated with it by DACS uniquely matching. If you can identify the right Josef Simek that would be great. Jheald (talk) 12:53, 15 December 2023 (UTC)[reply]

Q59484302[edit]

Hello Jheald,

Jhealdbot has matched an "Angelika Lautenschlager-Kunzmann" to Ursula Abramowski-Lautenschläger (Q59484302). That doesn't seem right, Angelika ≠ Ursula, Lautenschlager is not the same as Lautenschläger (though the DACS ID (2022) (P10706) could have ommitted the umlaut here), and Ursula Lautenschläger doesn't have a "Kunzmann" anywhere in here name. She is also known as Ursula Abramowski-Lautenschläger. Regards Rosenzweig (talk) 13:02, 16 December 2023 (UTC)[reply]

@Rosenzweig: Thanks, good catch. Like the case above, the ID was matched forward from the previously existing DACS ID (former) (P4663) statement, that had not been correctly matched. I have now moved the two DACS IDs on to a new item, Angelika Lautenschlager-Kunzmann (Q123888428), Jheald (talk) 10:14, 17 December 2023 (UTC)[reply]

Thanks. Rosenzweig (talk) 21:30, 17 December 2023 (UTC)[reply]