User talk:ArthurPSmith

From Wikidata
Jump to navigation Jump to search

See User talk:ArthurPSmith/Archive for older discussions.

Author disambiguator[edit]

Author disambiguator is the best thing since sliced bread. Thanks soooo much for this tool. - PKM (talk) 20:05, 5 December 2018 (UTC)

  • Great tool. Interestingly, @Alexmar983: asked me for something like it just the other day.
    BTW, if one types a full name (first middle last name), fuzzy search seems to find people without the middle name, but not those where the middle name is limited to its initial. Maybe these should also be found when starting from first+last name.
    Maybe the tool could also check if VIAF is present (and suggested its addition). If you just check for a single one, that might be the most useful one. There are obviously a few other (non-library ones) likely the be found on such author items (notably Scopus, Researchgate, even Linkedout).--- Jura 05:49, 7 December 2018 (UTC)
I follow the needs of the users in real time, so I am often looking for what is under development, I am not surprised ;) Thanks for pinging me--Alexmar983 (talk) 06:06, 7 December 2018 (UTC)
The "fuzzy search" logic definitely needs a bit of work. I'm currently trying to improve the clustering, which doesn't really do what you would expect. Definitely good suggestions on looking at other author ID's besides ORCID! ArthurPSmith (talk) 14:12, 7 December 2018 (UTC)
I noticed, but I wasn't sure what to suggest. In one case, a possibility to sort by journal would have been handy. For another, "check all" was sufficient. --- Jura 08:43, 8 December 2018 (UTC)
@PKM, Jura1, Alexmar983: I thought you might want to know - the service has been considerably updated in a number of different ways: (1) Author name searching is I think much better (though it is now case-sensitive as it is using SPARQL literals), (2) Article clustering should be much more sensible, (3) I added a VIAF search/input form. (4) I've limited the number of articles shown to prevent some out-of-memory and related problems, though there are still some issues with that needing further improvement. And there have been a number of other updates and fixes, so it should be even easier to use... ArthurPSmith (talk) 15:35, 8 January 2019 (UTC)
Thank you ArthurPSmith.--Alexmar983 (talk) 15:55, 8 January 2019 (UTC)
Excellent! Thanks for the update. - PKM (talk) 23:29, 8 January 2019 (UTC)
@ArthurPSmith: Great tool, indeed! I would like to refer to it in a paper. Besides the url, is any paper available to cite? Thank you in advance! --Carlobia (talk) 16:40, 7 January 2021 (UTC)
@Carlobia: Nothing in "paper" form, but I have given a few presentations on it, I don't know if they would count as suitable references? Most recently at the 2020 Wikicite virtual conference - ArthurPSmith (talk) 18:16, 7 January 2021 (UTC)
@ArthurPSmith: Thank you! It will fit perfectly! --Carlobia (talk) 08:16, 8 January 2021 (UTC)

FYI[edit] 00:30, 15 December 2018 (UTC)


Could you please look into India (L40021)? It's in English but I think users somehow misunderstood purposes of the lexeme structure. Sense has pronuncations, translations are written as representations of forms, etc. KaMan (talk) 09:48, 15 December 2018 (UTC)

Yeah, looks like some people got carried away there. I added the standard form and moved the pronunciations there. Not sure what you mean about translations - there aren't any listed right now? Or you mean the sense glosses? I think that's ok. ArthurPSmith (talk) 13:15, 15 December 2018 (UTC)
I mean in form L40021-F1 there are four representations with language "en" (India's), "te" (భారత దేశం యొక్క), "ml" (ഇന്ത്യയുടെ), "bn" (ভারতের). To me they look more like translations but they can be transcriptions as well, I do not know. KaMan (talk) 13:53, 15 December 2018 (UTC)

Closing RFC[edit]

Hi Arthur, since you were not involved in the discussion and hoping that you don't have further remarks, could you please close this RFC? The consensus seems to be there, so once it is closed I can add the information to Wikidata:Property creators (or you can do it yourself if you wish so).--Micru (talk) 23:34, 17 December 2018 (UTC)

@Micru: ✓ Done - however in addition to edits to Wikidata:Property creators it wasn't clear to me what the plan was for Proposal 3, hopefully you can sort out what needs to happen there? ArthurPSmith (talk) 16:01, 18 December 2018 (UTC)
Thanks a lot for the closing and for the really nice summary. I will look into it asap.--Micru (talk) 16:02, 18 December 2018 (UTC)

Lemma in English[edit]

Should (L40585) be uppercased in first letter or is it just duplication of lemma (L14835)? KaMan (talk) 06:49, 29 December 2018 (UTC)

Looks like an anonymous user was experimenting. I merged them. ArthurPSmith (talk) 15:01, 29 December 2018 (UTC)

Mongols and com[edit]

invalid ID (L41371) and invalid ID (L41449) - looking at lexical category I'm not sure if this lexemes should be deleted or corrected. It's English so I will leave it up to You. KaMan (talk) 08:13, 17 January 2019 (UTC)

Thanks, I've requested their deletion. I wonder if we should have a special Lexeme deletion requests page? ArthurPSmith (talk) 15:09, 17 January 2019 (UTC)
I request deletion of about one "lexeme" per day (today two) and they are usually deleted very fast. I think separate page could be problematic for administrators (yet another page to observe) so I would stay with current global page. KaMan (talk) 15:36, 17 January 2019 (UTC)

Re: Which Federica Fabbri?[edit]

Hi Arthur, i think it is the same person but I do not have the full certainty, so you can delete if you think it appropriate. Thanks for the tip, Alessandra Boccone (talk) 11:24, 22 January 2019 (UTC)

Colors as subclass of entity?[edit]

Back in May 2018, you changed the subclass of various colors from "color" to "entity" (e.g. in this edit). This seems wrong to me, but I thought I'd ask you about it before reverting. Can you explain further? JesseW (talk) 05:04, 6 February 2019 (UTC)

@JesseW: "red" is a color. "color" is not a color. Their ontological status is quite different, so "subclass" makes no sense; the relation "instance of" (P31) was there all along and is correct. dark red (Q5223370) subclass of (P279) red (Q3142) is fine - the more narrowly defined color is subsumed within the broader one. There's no such parent relation available for the primary colors. ArthurPSmith (talk) 14:54, 6 February 2019 (UTC)
Excellent, thank you! I'll copy this on to some of the relevant talk pages, so other people wondering about it can find it more easily. JesseW (talk) 03:30, 7 February 2019 (UTC)
...This seems wrong. All dark reds are reds, all reds are colors, all colors being X would correctly imply that all reds are X. That seems to match the subclass of (P279) relation? It might be that color (Q1075) is associated to a particular sense of "color" which doesn't match this use, but another item with the same label might? I'm not quite sure how best to handle this.
In any case, setting "red" to be a direct subclass of "entity" is certainly not the best answer. --Yair rand (talk) 07:06, 28 February 2019 (UTC)
@Yair rand: No - "the rose is red" is a very different statement from "the rose is color". ArthurPSmith (talk) 15:08, 28 February 2019 (UTC)
That would be the case for any adjective. That sentence uses the adjective sense (the rose isn't actually noun-sense red), which is presumably not the topic of the item. To the best of my knowledge, there are no items with an adjective sense as the topic. I don't even know how that would be possible to work for pretty much anything. --Yair rand (talk) 17:42, 28 February 2019 (UTC)
@Yair rand: I suppose that's a fair point. Nevertheless, when you say "dark reds are reds" and "all reds are colors", the "are" in those two sentences has different meanings - in Wikidata terms the first is P279, the second is P31. If there was some item that could be considered a superclass of "red" in the same sense as the "dark red : red" relationship, it would need to be something like "red in a broader sense" - "red plus infrared" perhaps, or "red and purple". "Color" doesn't make sense to me at all in that role. ArthurPSmith (talk) 18:26, 28 February 2019 (UTC)
@Yair rand: I've been thinking it about it some more - I think the main issue is that with the color hierarchy we are using P279 (subclass) as a proxy for a more specific property like "within the color space of". So in reality I think NONE of the colors should be considered classes at all (what are their instances anyway?) - rather they should be treated just as we do with locations - as a possibly overlapping hierarchy of entities with their own parent/child relation. And all instances of "color". What do you think of this approach - i.e. should we propose a new property for this? ArthurPSmith (talk) 15:52, 1 March 2019 (UTC)

Devyn Grillo[edit]

invalid ID (L42364) - Is this some kind of proper name in English or candidate for deletion? KaMan (talk) 08:41, 12 February 2019 (UTC)

Deletion - seems to be the users' own name. ArthurPSmith (talk) 14:20, 12 February 2019 (UTC)
What about invalid ID (L42322) (look at lexical category) KaMan (talk) 13:06, 13 February 2019 (UTC)
Not a word - thanks! How are you noticing these? ArthurPSmith (talk) 13:27, 13 February 2019 (UTC)
Every new day I read all new lexemes since last day. It's not that much. KaMan (talk) 15:29, 13 February 2019 (UTC)
Another Englih word (L42840) KaMan (talk) 07:53, 18 February 2019 (UTC)
@KaMan: Merged! ArthurPSmith (talk) 15:00, 18 February 2019 (UTC)

New parameter proposal to property P5892[edit]

Hello, @ArthurPSmith:, I hope you are well! Can I ask for your guidance in where is the best place to propose an alteration on the property UOL Eleições ID (P5892)? I'm finally creating the items of politicians, so the items of the elections can be created and this identifier can be used. To do that, I think an update at the property is needed (I explain here why). To who or where do I have to submmit this request? Thank you in advance, Ederporto (talk) 06:58, 15 February 2019 (UTC)

I commented on the property talk page - maybe next Tuesday (Feb 19) will work to make the change? ArthurPSmith (talk) 19:47, 15 February 2019 (UTC)

P6516 and externalid resolver[edit]

Hi, It seems that P6516 formatter URL needs your externalid resolver, as seen at Diaspidiotus juglansregiae (Q10470807). I tried several things at Aonidiella citrina (Q10414113) as well. Can you adjust the resolver and the formatter URL to make it function? Thanks in advance. Lymantria (talk) 22:10, 19 February 2019 (UTC)

@Lymantria: For URL-encoding issues it doesn't actually need any special coding, you can just drop it in. I edited the formatter URL on P6516 to use it, and it seems to work (see the two examples with spaces). You can either stick with the '%20' or use ' ' as the separator here, it seems to work either way. ArthurPSmith (talk) 19:10, 20 February 2019 (UTC)
Thank you. That's weird, I am shown a 404 error all the time. The '%20' is translated apparently into '%2520' by the software, and the scalenet-website doesn't accept the ' ' seperator either when I try it. Neither in firefox, nor in chrome. Lymantria (talk) 22:01, 20 February 2019 (UTC)
Ah, I see. That must be a caching problem. Thanks again. Lymantria (talk) 22:05, 20 February 2019 (UTC)
Oh yes, you have to edit the identifier for it to be recalculated I think. ArthurPSmith (talk) 22:37, 20 February 2019 (UTC)
It works fine. Thanks much. Lymantria (talk) 11:56, 21 February 2019 (UTC)

subclass of (P279)[edit]

Hi Arthur,

Thank you for pointing this out. At first, I had a single one item with two values for ISBN-13 (P212) , but I got a warning. I do not remember what it was saying but I thought it was a way to bypass the problem… Thank you. Genium (talk) 17:47, 13 March 2019 (UTC)

Sure to support a hoax?[edit] "Die Seite basiert auf einem Hoax von "MisterSynergy"". Good luck! 22:52, 8 April 2019 (UTC)

splitting up external ID based on regex[edit]

There is no split of

  1. GND ID into
    1. with "-"
    2. without "-"
  2. VIAF ID into
    1. [1-9]\d(\d{0,7}})
    2. [1-9]\d(\d{17,20})

etc. Why then would one split out BBLD IDs that match /[0-9]{16}/? 23:10, 8 April 2019 (UTC)

It was requested, no serious objections. I'm sure there's a history I'm unaware of but I don't see how it's relevant. ArthurPSmith (talk) 17:17, 9 April 2019 (UTC)
And no serious support. And no seriuos evidence for existence. It was created as part of a campaign by Jura1 and MisterSynergy, look at - there are different sources for creating a BBLD ID, but the notion that some belong to "former scheme" and others to "new scheme" is not supported at all. Is Wikidata going to create a new property for each BBLD ID creation mechanism? Or even better, a pair for each mechanism to have former IDs and new IDs (how long is an ID new?).... wait, maybe three, to have former-current-new separated into different properties? Could all that violate en:WP:OR? 23:35, 10 April 2019 (UTC)

Is SourceMD really working?[edit]

I am not sure if it was on and then off. Using SourceMD last night, I loaded a list of DOIs and the items are fine (correction... I actually loaded these items a couple weeks ago). I tried loading a few individual DOIs today and SourceMD says the batches were successful, though I cannot find the items (or the DOI). Strange. As example, Batch 6878. Trilotat (talk) 14:30, 15 April 2019 (UTC)

It doesn't seem to be listing anything on that batch, not sure what that means? I haven't tried it myself, it just looked like the change Magnus made would definitely fix the problem we were running into. ArthurPSmith (talk) 15:29, 15 April 2019 (UTC)

Christian hymns / canticles[edit]

Hi there! Re: your revert: I'm in the middle of trying to clean up the cluster of hymns, psalms, canticles, national anthems etc. and it will be a little messy for awhile while I move things around, I hope you can bear with me. It's a proper mess at the moment, thoroughly mixed together as the Scandinavian word "salme" is extensively used both for Christian songs and Christian poems, and not just for psalms, (and never for sports!) while the Spanish/Portuguese name most of their local, national and sports-anthem "himno/hino" gettings them mixed in with the religious. The Germans have a whole bunch of strict, narrow definitions of course, and the English borrow freely from all the above. So there you have it, hope it doesn't disturb things too much, it shouldn't take too long to fix. Moebeus (talk) 16:17, 20 April 2019 (UTC)

Ok - I just noticed your post on Project Chat about it. I ran into it because you'd created a subclass loop which is a no-no and gets caught in one of our Listeria reports... ArthurPSmith (talk) 16:24, 20 April 2019 (UTC)

WikidataCon submission on Author Disambiguator?[edit]

Hi Arthur, are you planning on (i) attending and (ii) such a submission? I will likely not be able to attend in person, but would be interested in helping with something on the topic, especially the part of integration with Scholia or Listeria to round up curation workflows. --Daniel Mietchen (talk) 13:12, 24 April 2019 (UTC)

@Daniel Mietchen: Yes, I submitted a proposal already for a 25-minute presentation - your input on it would be great, thanks! ArthurPSmith (talk) 18:21, 24 April 2019 (UTC)
Sounds good — count me in when preparation time comes. --Daniel Mietchen (talk) 03:01, 25 April 2019 (UTC)

Query service lag[edit]

Could you refrain from editing large items for a while? We're experiencing some lag on the query service atm... Sjoerd de Bruin (talk) 14:35, 24 April 2019 (UTC)

Due to the "stop batch" feature not working, I've blocked your account for the query service to recover. Sjoerd de Bruin (talk) 15:01, 24 April 2019 (UTC)
Hi @Sjoerddebruin: - sorry I was traveling. Hmm, I've been working on large items for the last several days, I didn't realize it could contribute to wdqs lag. Is there some background info on why this happens/how to avoid? ArthurPSmith (talk) 18:23, 24 April 2019 (UTC)
I have no idea what caused todays issues, still investigating. Sjoerd de Bruin (talk) 18:25, 24 April 2019 (UTC)
Anyway, thanks for the Grafana pointer, I'm going to run a small collection of updates now and see how it affects things. ArthurPSmith (talk) 18:26, 24 April 2019 (UTC)
Hmm, it does look like a couple of the wdqs servers gain a few minutes lag when I start one of those jobs, and it goes away when I stop it. I'm not sure the pattern's entirely consistent though. I've just restarted the one that was stopped earlier today, which is longer than the others I had prepared; hopefully just running that one will not cause too severe a problem. I'll check in again later today. ArthurPSmith (talk) 20:29, 24 April 2019 (UTC)
@Sjoerddebruin: is there a phab task or other activity I could look at on this? I had 3 batch jobs updating large items running most of last night, and Grafana indicated there was no problem until about 11:00 GMT this morning (jobs had been running since about 01:00 GMT); I checked around 13:30 GMT and noticed the lags were still high on two of the servers, so I stopped the batch jobs. One of the servers seems to have recovered although not immediately, but the other (wdqs1005) still has over a 40 minute lag several hours later. So there's definitely something else going on that's making these lags so bad. ArthurPSmith (talk) 16:02, 26 April 2019 (UTC)
Sorry, we don't have a task for the current issues yet but I do see a pattern between edits done to large items and the query service lag. The volume of your edits in the last 6 hours was 3.2 GB, which all needs to be processed (the query service currently reloads whole items on updates, work is needed on that). Sjoerd de Bruin (talk) 07:25, 27 April 2019 (UTC)
I can confirm that the problem seems to come from batch edits to large items - stopping Daniel Mietchen's jobs impacting large items had a pretty clear effect two days ago. According to Wikiscan you are the only one running batches affecting large items at the moment, so I would expect the lag to reduce if you stop these. − Pintoch (talk) 10:34, 27 April 2019 (UTC)
@Pintoch, Sjoerddebruin: I've stopped the large-edit jobs for now, will watch the lag to see if it's safe to restart. These same jobs were running for about 10 hours earlier yesterday with no bad lag though. From the 24-hour "wikiscan" there were some other people with multi-GB updates in the past day. Any idea why only 2 of the wdqs servers seem to be affected? ArthurPSmith (talk) 13:04, 27 April 2019 (UTC)
It's been 2 hours, and there's no noticeable improvement in the lag. I really don't see a correlation with the edits I've been doing at all. ArthurPSmith (talk) 15:21, 27 April 2019 (UTC)
6 hours now. I'm restarting the jobs, there was no discernible effect of my turning them off. My edit rate is really slow, I have a hard time believing I'm causing the problem here. ArthurPSmith (talk) 19:23, 27 April 2019 (UTC)
And now, with those batches running for the last few hours, query lag has dropped almost to zero for all the servers. My jobs at least seem pretty clearly to be not making things worse. ArthurPSmith (talk) 00:38, 28 April 2019 (UTC)
By the way, I suspect the "wikiscan" is seriously overestimating the impact of the jobs I'm running - they make generally 4 of 5 edits to the same item one after the other, so the actual volume that has to be moved should be at most 1/4 of what's stated, assuming it's counting the size of the item for each edit, and wdqs doesn't copy the data 4 or 5 times when that's not needed. ArthurPSmith (talk) 00:40, 28 April 2019 (UTC)
Thanks for experimenting! wdqs doesn't copy the data 4 or 5 times when that's not needed I don't think that is true - my understanding is that it does copy the data 4 of 5 times in these cases (we had a discussion with Stas on IRC about that a few days ago and he confirmed that). − Pintoch (talk) 08:24, 28 April 2019 (UTC)
The lag is now more than two hours again. Like said above, the edit rate isn't the problem but the affected items. Yes, there are a few more with such high edit volume but those edit a lot more items. At some periods of the day there isn't much other activity, thus the query service can handle it. But when others are also running batches it's a problem. Please, for our (data) users: postpone for the time being. Sjoerd de Bruin (talk) 14:43, 29 April 2019 (UTC)
@Sjoerddebruin: I turned it off, the lag continued to climb. It's clearly NOT me that's the problem here. ArthurPSmith (talk) 17:34, 29 April 2019 (UTC)
Note that when you're disabling a mass-editing bot, the lag won't go down immediately. The service has still to go through the accumulated backlog of edits, and the lag starts to go down only when the sync point gets past the point where the bot has been turned off. If you have 100 edits/s for the last hour and the Updater can only do 50 edits/s, then it still takes it 2 hrs to go through that hour of updates, even if the editing is turned off now, because updater is not in the 'now' yet. Which means the lag will be raising. Smalyshev (WMF) (talk) 19:19, 29 April 2019 (UTC)
@Smalyshev (WMF): But I'm NOT doing "50 edits/s". I'm doing about 1 edit per 10 seconds at most. And as noted above (and has happened today) the lag continued to rise for HOURS after I shut down the job. It really can't be my jobs that are the problem here. ArthurPSmith (talk) 20:21, 29 April 2019 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I really would like to understand the underlying problem here - Smalyshev (WMF) is there documentation of the different servers shown on this Grafana chart? Why are wdqs1004 and wdqs1005 (and sometimes wdqs1006) always the ones with long lags, while the wdqs2001,2,3 are usually fine? Is a few GB of data over 6 hours really overwhelming to the network or something? ArthurPSmith (talk) 20:31, 29 April 2019 (UTC)

1004,5,6 are public eqiad cluster servers, 2001/2/3 are public codfw cluster servers. Equiad cluster usually gets the most traffic. The problem is not network data size but the number of updates to Wikidata. Updater has to process all of them, plus process all the query load (and, judging from the number of bans, people still keep ignoring throttling system and try to force through as many as possible). If there are too many updates or too many queries, the servers get slow, which is reflected as lag.
The cluster setup is described here:
Smalyshev (WMF) (talk) 20:36, 29 April 2019 (UTC)
@Smalyshev (WMF): Thanks, that's very helpful! I didn't know about eqiad/codfw before, and the impact of load balancing explains the discrepancy... I assume all updates have to go to all servers, it's just queries that are load-balanced - so the underlying cause of the lag for the last month or so (given the codfw servers have been fine) has to be high query volume, not a problem with updates (though of course with no updates there would be no lag issue!) However, the peak query time from this chart seems to be daily around noon, while the comparable lag chart seems to usually peak around 22:00 (and is not consistently happening every day). So that doesn't entirely explain things either... Anyway, I guess I'll try to avoid running batches between noon and midnight GMT and see if that helps at all. ArthurPSmith (talk) 21:13, 29 April 2019 (UTC)
Yes, all servers do the updates, but the query load is different. However, it is a threshold problem - if number of updates incoming is less than number of updates server can process, it is fine, regardless of how large the difference it. Once the sum of load + update frequency goes over server capacity, the server starts lagging. While throttling/banning and query expirations can mitigate to some level the load issue, the server still has to process all the updates, so heavy update load can cause lags too. It is the sum of both factors. The servers right now can deal with usual query load + update load, but spikes in either - or both - if they large enough, can be problematic. Smalyshev (WMF) (talk) 05:24, 30 April 2019 (UTC)
Wikidata's edit rate is pretty steady according to this chart - though there was a significant dip on April 25 that does coincide with a good day for lag - but other than that one day the whole chart doesn't seem clearly correlated with either edit rate or query rate or the combination, and there's mysterious time shifts like from noon (peak query and close to peak edits most days) to 22:00 UDT (peak lag). Anyway, I'll stick with avoiding the 12:00 - 24:00 times for batch edits for now. ArthurPSmith (talk) 11:35, 30 April 2019 (UTC)

What is the plan?[edit]

We know about growth in Wikidata, we know that future ambitions will not cease to be as ambitious as they are and were. As we are unable to service our current ambitions, what is the plan for the future. What growth is planned for and what are the contingency plans. As I said earlier, Wikidata is not a relational database, what we experience is the consequence of the absence of relational mechanisms. There is a science to this, what are the plans for the future. How are we going to cope.. PS throw some iron at the problem. Thanks, GerardM (talk) 05:59, 2 May 2019 (UTC)

@GerardM: See this Phabricator ticket which collects a series of requests for more hardware for WDQS as you suggest. It's not a simple problem - scalability in the long run means having to abandon the "vertical" model (the entire graph on one server) and splitting it up among multiple servers, which is a complex technical problem, and may require changing the underlying graph query software (currently Blazegraph). Meanwhile we need to work within the constraints we have right now. ArthurPSmith (talk) 11:38, 2 May 2019 (UTC)
It does not provide me with the answer I am looking for. It is technical, what I am looking is the scenarios considered in growth, not technology that is to follow. Thanks, GerardM (talk) 15:06, 2 May 2019 (UTC)
The issues are technical. If Wikidata is growing faster than the capacity of individual units of computer hardware, then we have to spread the pieces of Wikidata across multiple individual units, which requires significant development. If computer hardware capabilities are growing faster than Wikidata is, then we can just upgrade the hardware and be happy. It looks like we're under the first scenario, not the second. ArthurPSmith (talk) 15:10, 2 May 2019 (UTC)
Technical approaches get you a hack that "makes it work" for now. I am not interested in that, I am interested to learn if exponential growth is expected, planned for and that we are considering "next generation" approaches that enable growth like 1000% in a year (when it is the growth that is considered plausible). Thanks, GerardM (talk) 05:58, 3 May 2019 (UTC)
The technical requirements - and the money to pay for them - are the limiting factor in any growth plan. Read the phabricator tickets I referenced, and you'll see wikidata developers are asking for more capacity, and getting some pushback. Maybe you can spearhead an effort to give the developers more resources? ArthurPSmith (talk) 11:07, 3 May 2019 (UTC)

We had a good week, but today is bad[edit]

@Sjoerddebruin, Pintoch, Smalyshev (WMF): Grafana is showing the worst lag since last Monday today - and steadily going up. I stopped all my large-item jobs earlier today, however, this SourceMD batch from GerardM editing large items has been running for over 6 days now. I don't think there's any way to pause it that would allow it to restart? Magnus?? Is there any way to tell what else is happening this morning (or is it just Monday morning heavy query volume?) that may be causing trouble? ArthurPSmith (talk) 12:30, 6 May 2019 (UTC)

I don't think there is a way for admins to stop an individual batch, let alone enabling later resumption. Blocking the user is the only thing I can help with, I am afraid. − Pintoch (talk) 12:39, 6 May 2019 (UTC)
Well, it looks like things are recovering. Maybe it's just around noon UTC Monday's will always be bad? ArthurPSmith (talk) 14:21, 6 May 2019 (UTC)
Jobs from SourceMD can be stopped using the UI. They can be restarted at a later date. I have no problem when need be jobs are halted in this way. I have been at work all day. I have stopped the job for now. Given that the job has run for a couple of days, long periods where everything was smooth, you cannot say that it is this job on its own that is the problem. So what happened at noon that gave us such issues ? Thanks, GerardM (talk) 16:20, 6 May 2019 (UTC)
I assume it's heavy query volume - I don't know if it's a small number of specific users, or a more general problem of many people hitting WDQS at the same time. Stas mentioned that the problem seems to happen when query + update volume together go over some threshold, updates don't generally seem to cause trouble on their own. ArthurPSmith (talk) 20:08, 6 May 2019 (UTC)
And today, the following Monday, looks even worse - and all the large-item batch jobs were stopped over 2 hours ago. ArthurPSmith (talk) 12:16, 13 May 2019 (UTC)


Hi ArthurPSmith, thanks for setting Wikidata:Property_proposal/music_video to ready. Would you click "create"? I can then do the other steps. --- Jura 17:56, 30 April 2019 (UTC)

Go ahead, it's now music video (P6718). ArthurPSmith (talk) 20:40, 30 April 2019 (UTC)
  • Thanks once more. BTW, would do the same for this and that? I will do the other steps. --- Jura 18:51, 28 May 2019 (UTC)
@Jura1: Ok! ISO speed (P6789) and f-number (P6790) ArthurPSmith (talk) 18:58, 28 May 2019 (UTC)

Use of ISSN for DOI identifiers[edit]

Hi Arthur. I think we can use DOI identifiers for journals as well. There is a recommendation here. At least Wiley uses it widely. That's why I included it in Q6295227 and other items. Best regards. --Gerwoman (talk) 19:02, 7 May 2019 (UTC)

@Gerwoman: Hmm, ok, but in this case it looks like Journal of Forecasting (Q29011411) was created earlier (based on that DOI)? Perhaps the instance of (P31) there needs to be fixed and the two items merged? ArthurPSmith (talk) 20:28, 7 May 2019 (UTC)
Yes. Now merged. --Gerwoman (talk) 16:18, 8 May 2019 (UTC)
@Gerwoman: Ok, thanks! There may have been some others of yours that I removed DOI's from for the same reason - I'll be more careful checking for that sort of problem in future! This was based on looking at constraint violations on the DOI property. ArthurPSmith (talk) 17:44, 8 May 2019 (UTC)

Aren't chemical elements substances?[edit]

Hello Arthur, you reverted my attemt to make 'chemical element' a subclass of 'pure substance'. I'm new to wikidata and want to understand. I hope this is the correct way to contact you. You stated: "Chemical element" is not (only) a kind of substance" That may be right, but I didn't want it to be a substance only. I wanted it to be a substance too. I have got the intuition that chemicals like sodium or oxigen somehow should be chemical substances and not only abstract classes. Don't you agree? All ontologies I know, classify substances like this:

  homogeneous mixture
  heterogeneous mixture
 pure substance


Why shouldn't wikidata do so?  – The preceding unsigned comment was added by Micgra (talk • contribs) at 17:08, May 15, 2019‎ (UTC).

  • @Micgra: Wikidata's upper-level ontology is a bit of a mess; however, please don't change anything in it without discussion with members of the associated wikiproject - in this case Wikidata:WikiProject Chemistry. On this specific question, Wikidata already has the entry simple substance (Q2512777) which would take the spot you have for "elements" in the suggested classification above, and note that the two entries (chemical element (Q11344) and simple substance (Q2512777)) are linked via a "different from" relation here, which indicates we have considered the relation and for the purposes of Wikidata they are distinct. In particular, "chemical element" here represents both substances and individual atoms whether they are in a pure substance or combined with other elements to form molecules or compound substances or mixtures etc. It is an overarching class - actually a metaclass, whose instances are the individual types of atoms that nature gives us. So yes, they are quite distinct in meaning here. ArthurPSmith (talk) 17:41, 15 May 2019 (UTC)

New duplicate DOIs[edit]

Hi, I just found a couple of new duplicate items with DOIs containing < (see Q63976771 and Q64357784). Both were created during the last two weeks by SourceMD. It looks like the problem is encoding in the DOIs but I don't know why they are encoded. The DOI for each article appears to be correct in Crossref - could SourceMD be encoding the < and >? Simon Cobb (User:Sic19 ; talk page) 00:19, 6 June 2019 (UTC)

@Sic19: It sounds like that's what's happening; however it's possible SourceMD is getting the DOI's from somewhere else (ORCID, PMC?) where the real problem is. I have been working on cleaning these up after the fact so it's not a huge problem, but it's still annoying... ArthurPSmith (talk) 15:16, 6 June 2019 (UTC)

How's ORES working out for you?[edit]

Hi ArthurPSmith, I'm working with User:EpochFail (@halfak on irc) on a research study to look into how mw:ORES is working out on wikis where it has been enabled. I was hoping to talk a little about what the kind of work you do on Wikidata and about how the ORES edit filters and classifiers have been working out. Do you use any tools other than Special:RecentChanges or Special:Watchlist that take advantage of ORES? Do you know of any other tools that are used to patrol that do not use ORES? I'm also interested in any other observations you may have about how the ORES scores are working out. Thank you! Groceryheist (talk) 23:52, 12 June 2019 (UTC)

@Groceryheist: You should probably go visit Wikidata:WikiProject Counter-Vandalism, which lists some tools for counter-vandalism on Wikidata, and people who are heavily involved in it. I've used the Open-ended Recent Changes tool [ORC] a bit. I don't pay a lot of attention to the ORES data; it doesn't seem very well calibrated for Wikidata, which has very different sorts of edits from the language wikipedias. A lot of edits flagged by ORES here are just fine, they were flagged just because an anonymous user did a bunch of work to fix up an item. On the other hand, the volume of edits here that need patrolling is pretty overwhelming so we seem to miss a lot. It would definitely be helpful to have better tools for that. The multilinguality here makes it hard though. ArthurPSmith (talk) 13:09, 13 June 2019 (UTC)
Hi ArthurPSmith, thanks so much for getting back to me so soon! Your comments about anonymous users are particularly helpful. Do you have any other thoughts about how ORES treats anons? It's also interesting that you say ORES doesn't seem well calibrated for Wikidata. Also thank you for pointing me to the counter vandalism project and to ORC! Finally, can you think of anyone else who might want to chat a little bit with me about quality control and ORES on Wikidata? Groceryheist (talk) 20:34, 13 June 2019 (UTC)
@Groceryheist: I think anons on wikidata probably should be treated pretty much the same as any user requiring patrolling (less than 50 edits?); I'm not sure if ORES does something different. A lot of the wikidata edits that ORES seems to flag but which I think are fine are creation of descriptions for items in a new language; ideally ORES would run the description through some kind of translation software to see if the words match to some degree the existing descriptions in other languages. Of course we should flag vulgarities in any language; but it seems to flag a lot of perfectly innocent translation work. For example. On who to talk to - User:YMS I think is particularly knowledgeable about Wikidata vandalism. You might also want to talk to some of the admins who have to deal with vandals. ArthurPSmith (talk) 13:41, 14 June 2019 (UTC)
@ArthurPSmith: The feedback about the difficulty with translations is interesting and I'll pass this on the User:EpochFail. I'll also reach out to User:YMS as you suggest. Thanks for your help! Groceryheist (talk) 20:08, 16 June 2019 (UTC)
@Groceryheist, EpochFail: This discussion on Project Chat mentions several tools used for Countervandalism and some other people involved in it here, so you might want to get hold of that group also. ArthurPSmith (talk) 14:30, 17 June 2019 (UTC)
@ArthurPSmith:, awesome! Thank you! Groceryheist (talk) 18:27, 9 July 2019 (UTC)

Property creation[edit]

Hi there!

Could you please create Wikidata:Property proposal/Réunionnais du monde ID?

Cheers, Nomen ad hoc (talk) 12:30, 15 June 2019 (UTC).

@Nomen ad hoc: - it takes me about 10 minutes to create a new property (unless I'm just asked to "click the button" which takes about 1 minute, if you are willing to fill in all the detailed attributes on the property after creation that's a big help). Given there are 70+ properties waiting to be created, that's a lot of work to get them all done... when I get a free 10 minutes I'll take a look at yours next though. ArthurPSmith (talk) 14:22, 17 June 2019 (UTC)
Ah, apologies, I didn't know that it take such a time! Best regards, Nomen ad hoc (talk) 14:35, 17 June 2019 (UTC).

Property creation[edit]

I would like to suggest you to use a script to create automatically properties when the proposals have reached maturity. The script also reports some common issues back to the proposer. This page explains how to improve the property proposals. Regards, ZI Jony (Talk) 02:17, 30 June 2019 (UTC)

re:Wikidata:Property proposal/Wikispecies template for this work[edit]

Just gonna say that if this one fails I'm throwing the towel and someone else will have to be the one proposing it again. Part of the failure rests entirely on me butting head with user:pigsonthewing (not even on Wikidata!) in-between the two attempts I've made at it. I am 100% convinced it is the only reason his support switched to the mos misleading oppose I've seen in a long time. He has a storied past for grudges.

Either way, I do hope someone bringing it up has more luck persuading the people at property proposals than I did. Circeus (talk) 19:41, 4 July 2019 (UTC)

@Circeus: I wasn't aware of a previous attempt. I don't think it was linked from the current proposal? In any case, I don't understand exactly where you stand on this - would you support it with URL datatype? Given the centrality of properties to data modeling in Wikidata we do need to strive for consensus on how they are to be, and that means proposers need to be engaged in the discussion and try to be as clear as possible. ArthurPSmith (talk) 22:30, 4 July 2019 (UTC)
The first proposal is briefly mentioned (though not linked) in my answer to Andy. It's here, if you're curious (and should you desire to be really nosy, this is the incident I'm talking about). It was actually more focused (aside from the name) but apparently, that made it even less attractive to the reviewers.
I'm not sure a url version (aside from issues connected to the template moving for whatever reason) would allow the backlink from the template to the work. This is ultimately what the property is intended to provide (and is needed for eventually building automatic lists of work with templates, or nomenclatural acts, as mentioned in the proposal), but doing the link in the other direction ("work that this template generates a reference for" template -> work instead of the proposed "reference template for this work" work -> template) would definitely not be acceptable to the Wikidata users. You gotta pick your fights and all that. At this point, this is just not a fight I want to bother with anymore (especially if Andy, who has no actual interest in Wikispecies's content quality, is going to get in the way out of spite). Circeus (talk) 00:02, 5 July 2019 (UTC)
"Part of the failure rests entirely on me butting head with user:pigsonthewing" False. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:12, 5 July 2019 (UTC)
"Andy, who has no actual interest in Wikispecies's content quality" Also false. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:13, 5 July 2019 (UTC)

Ready to click[edit]

Hi ArthurPSmith,

As you consider this and that ready, could you click "create"? Similarly another one. I can then complete the properties. --- Jura 13:20, 19 July 2019 (UTC)

@Jura1: - ok unabbreviated text (P7008), extracted from (P7009) and imprimitur granted by (P7010). ArthurPSmith (talk) 13:26, 19 July 2019 (UTC)

Community Insights Survey[edit]

RMaung (WMF) 17:37, 10 September 2019 (UTC)

Reminder: Community Insights Survey[edit]

RMaung (WMF) 19:53, 20 September 2019 (UTC)


Hi Arthur, Can you give me any advice on the follow through of this? Starting to think its dying a death. Maybe I have the wrong end of the stick even? Regards Broichmore (talk) 15:20, 21 September 2019 (UTC)

If I understand correctly, you probably need to make some proposals at Wikidata:Property proposal. You might also want to consider starting a Wikiproject (see Wikidata:WikiProjects) around Ships, as there seem to be several interested parties. ArthurPSmith (talk) 19:34, 21 September 2019 (UTC)
@Broichmore: if you aren't following this page! ArthurPSmith (talk) 19:35, 21 September 2019 (UTC)
Thanks Arthur. Appreciate it. Sadly I don't know if there is enough interest for a WikiProject. There is a well established page on Wikipedia, but i've struggled in vain for synergy with Commons. Hence my plea on Wikidata. Broichmore (talk) 12:00, 22 September 2019 (UTC)

Lexeme aaron[edit]

Hi there! Re: I noticed you linked the male given name sense to the hebrew script אהרון, rather than the English Aaron. I don't pretend to know how Lexemes work but I'm curious if this was on purpose or perhaps an oversight? Moebeus (talk) 23:34, 4 October 2019 (UTC)

@Moebeus: That was a suggestion from the MatchSinn tool. If those two "given names" are really distinct concepts, then I think according to our data model it is appropriate to link them as different senses of the word "Aaron". Or maybe the two "give name" entries should be merged? I'm not terribly familiar with our name model actually, so not sure the best approach. ArthurPSmith (talk) 11:53, 7 October 2019 (UTC)
They should definitely not be merged, as the model we use (as championed by Project Names) stresses that names in different scripts and/or with different spellings should be kept apart as separate name items. One reason for this being that the English Romanization might differ from the German or French Romanization, as an example. But anyways, I was just curious and I don't really have an opinion on how the Lexemes should be structured. Thx for the answer! Moebeus (talk) 12:10, 7 October 2019 (UTC)

New page for catalogues[edit]

Hi, I created a new page for collecting sites that could be added to Mix'n'match and I plan to expand it with the ones that already have scrapers by category. Feel free to use, expand. Best, Adam Harangozó (talk) 19:55, 19 October 2019 (UTC)

@Adam Harangozó: Thanks. I added a line for "UNESCO Nomenclature" - we should probably propose a property for this too. Did I add it in the area you would have expected? ArthurPSmith (talk) 14:00, 28 October 2019 (UTC)
Thanks! Yes but feel free to decide on the categories as you will! --Adam Harangozó (talk) 16:11, 28 October 2019 (UTC)

Getting data from a property with OpenRefine[edit]

Hello! I have a OpenRefine related question and maybe you can help me with this. I have a list of items and I can get easily a column with their INE municipality code (P772). Is it possible to get the Wikidata item that has this property in a new column? Thanks! -Theklan (talk) 19:00, 27 October 2019 (UTC)

@Theklan: (just replying because I happen to be around)
You can do as follows:
  • Create a new column, the values of the cells in that column should be some random garbage such as "2ebb3698dfff32bc6"
  • Reconcile this column to Wikidata, by enabling the column which contains your INE municipality code (P772) values and matching it to… INE municipality code (P772)
Values which have a corresponding identifier will be matched. The purpose of using random garbage as cell values in the column to reconcile is to make sure they are non-empty (otherwise they would be skipped by reconciliation) and their content does not correspond to any item on Wikidata (otherwise the reconciliation could find items based on their label - we only want it to find items based on their INE municipality code (P772) value). − Pintoch (talk) 05:47, 28 October 2019 (UTC)
@Pintoch: Thanks! -Theklan (talk) 08:41, 28 October 2019 (UTC)
@Theklan, Pintoch: - uh, thanks for sorting it out guys! I wouldn't have thought of the random text trick (I would have advised just reconciling some other sensible column like "name"). ArthurPSmith (talk) 14:01, 28 October 2019 (UTC)
The problem is I didn't have name. So the random text trick seems the best option (by the way, I downloaded the file as a spreadsheet, made another page there with all the localities and INE codes and used the VLOOKUP function to find the correspondences. -Theklan (talk) 14:22, 28 October 2019 (UTC)

Author disambiguation with EditGroups[edit]

Hi Arthur,

It was great to catch up at WikidataCon! I have added support for your Author Disambiguator tool in EditGroups. For the edits that should be tracked as batches on EditGroups, you can use edit summaries of the following form:

my very informative edit summary ([[:toollabs:editgroups/b/AD/89ead4fe|details]])

where 89ead4fe is a randomly generated hexadecimal string which identifies the batch (and "AD" stands for author disambiguator). This hash is generated by your tool: no interaction with EditGroups is required on your side. All edits with a summary which matches this pattern will be attributed to your tool. See Wikidata:Edit_groups/Adding_a_tool for more details if needed. − Pintoch (talk) 05:40, 28 October 2019 (UTC)

Thanks! I just added a github issue to remind myself to get this done! ArthurPSmith (talk) 13:58, 28 October 2019 (UTC)
@Pintoch: I've been testing this out, and it doesn't seem to be working yet? Is there more of a delay than is stated? See for example this group, which includes this edit... the page just says "Edit group "5db8c3db1b8e7" not found" ??? ArthurPSmith (talk) 23:01, 29 October 2019 (UTC)
Whoops, sorry that was a mixup on my side. This is now resolved: A few thoughts:
  • Wouldn't it be nice to link the Qid in the summary? Author Disambiguator change author for [[Q33698593]] ([[:toollabs:editgroups/b/AD/5db8d7bc810d7|details]]) instead of Author Disambiguator change author for Q33698593 ([[:toollabs:editgroups/b/AD/5db8d7bc810d7|details]])
  • Your hashes do not seem random, is that intentional? (Is there no collision risk?)
Pintoch (talk) 08:36, 30 October 2019 (UTC)
Thanks, yes it does work now! And oops on the hashes - they're not working quite as I intended (seems to have a different hash for each edit for one thing, so not grouping at all!) - I have been using php's uniqid() function to generate the hash but that's probably not the right thing to use. Time to revise a bit... ArthurPSmith (talk) 12:50, 30 October 2019 (UTC)
Ok, also it is worth noting that the first summary of the batch is used as summary for the entire batch. So it might not be worth to include the Qid of the item being worked on in the summary (given that it is already clear from the context in most UIs), but rather the item of the authors involved, perhaps? − Pintoch (talk) 14:42, 30 October 2019 (UTC)
Just to follow up - above issues should be addressed now - see for example this page. ArthurPSmith (talk) 20:07, 30 October 2019 (UTC)

about the new features of Author Disambiguator[edit]

Thank you for this tool. I use it since a couple of months now and it's great !
I think it's a good idea to quit QuickStatements, having now your own tag in the summary edit and doing the job in only one edit. But there's some issues with these new features (at least, for me).
So, first, when I'm connecting to my Wikimedia user account and I click on "Link selected works to author", it open and run a blank new tab in my browser. The tool is working (the editions are done on the concerned items), but I cannot see the progression of it and after several minutes, the "about:blank" tab usually finish in a 504 Gateway Timeout error. I think this is also slowing significantly my browser (and|or) my computer. I'm running Firefox 70.0 on Window 10.
Second, I think that the English label and item number of the concerned author should appear in the edit summary. As an example, something like "Author Disambiguator set author for Q59275792" isn't usefull because it is obvious that it is Q59275792, the edited item, who is concerned. I think it should be something like "Author Disambiguator set author Daniel Muenstermann (Q64856332)"
I hope these comments will be usefull and thanks again for this tool ! Simon Villeneuve (talk) 13:57, 30 October 2019 (UTC)

@Simon Villeneuve: Hi! Yes, the progression problem is a real issue; I'm working on a couple of fronts to address it, but it's a bit complicated... It shouldn't be slowing your computer much though, unless you are editing papers with a very large number of authors where there's a lot of data to display, is that the case? There's almost no javascript involved, all the real work of matching and editing is being done on the server side (php). On the edit summary - good idea, I'll add a github issue to work on that. ArthurPSmith (talk) 14:07, 30 October 2019 (UTC)
Yes, it's the case (these dam particles physicists and their articles with > 2 000 authors). Simon Villeneuve (talk) 14:19, 30 October 2019 (UTC)
@Simon Villeneuve: By the way the change to the edit summary is installed live. Also a link to "Edit Groups" (see above discussion with Pintoch). ArthurPSmith (talk) 20:08, 30 October 2019 (UTC)
@Simon Villeneuve: There's now a new feature that moves the name to author transformations off to a background process, which you can follow in the browser via the new "Batches" page. However, it doesn't work quite as it did in testing, I still have a bit of tweaking to do. Ping me if you are running into any troubles with it. ArthurPSmith (talk) 21:30, 12 November 2019 (UTC)
Yes, I saw it. This is good !
My only problem for now is when I finish a batch of 500, I recharge the Author Disambiguator page with the same author name to do the rest of the items, but the page didn't update correctly, even if I purge it. It take some time before the page actualise. By example, I finished my batch of Veronique Boisvert (Q67482673) hours ago and the Author Disambiguator page still give 159 publications found. But these pages already have been done (some 10 hours ago). Simon Villeneuve (talk) 21:38, 12 November 2019 (UTC)
This is unfortunately a problem with WDQS - it used to be current within a minute or so most of the time, but lately it is often several hours behind Wikidata updates. ArthurPSmith (talk) 21:44, 12 November 2019 (UTC)

Conservatorio Luca Marenzio (Q30263550)[edit]

Good evening ArthurPSmith,

I am contacting you about the page in the title (merely for informational purpose); actually, I do removed that items because I found the existence of multiple references/IDS (some were out of dated or incorrect) related to the same and sole element, so I do proposed to unify & maintain only one reference/ID. Here was my request: link. Thank you for the attention and your work/contribution to the Wiki's projects.

Best regards --BOSS.mattia (talk) 18:43, 27 November 2019 (UTC)

@BOSS.mattia: If you find something out of date or incorrect, this is a wiki, you are free to edit it and make changes. It's not helpful to create new items that duplicate existing ones! Also if something becomes out of date that doesn't mean it should be deleted - it was true at one point in time, so if necessary you can attach the dates (and any source references) to the information. ArthurPSmith (talk) 19:11, 27 November 2019 (UTC)
Actually, the situation was: there were multiple elements already existing before my intervention, I chose one and I do uptaded it and then I emptied the other pages/elements in orther to have only one right element/ID and in order to move forward with the work & save time for other users/contributors. Thank you for your kind reply. Yours faithfully, --BOSS.mattia (talk) 19:22, 27 November 2019 (UTC)

list of property proposal for Wikidata[edit]

Dear Arthur, I have created a property proposal for wikidata, I hope I have done this correctly. In this process I have noticed that the page was added however to an admin category of pages exceeding the allowed inclusions, I am not sure this is relevant, but I thought perhaps just to drop a message about that. Thank you very much! --Pietromarialiuzzo (talk) 13:56, 6 January 2020 (UTC)

@Pietromarialiuzzo: Sure, it is a good start. Several pointers: (1) the "represents" line ("Subject item" in the template) should point to the item for the specific database ("Beta masaheft" in this case?) that the property is associated with, or other item that is specifically related to what the property does. You should probably create an item for this project if there isn't one already. You can move the current values you have here down to the "Domain" line. (2) the data type for this is surely "external-id" and not "string", as it is a real identifier for an entity with the project. (3) It is useful to link the examples (use the single-square-bracket '[' syntax) using your formatter URL to make it easy for readers to verify these links work and are useful. ArthurPSmith (talk) 14:25, 6 January 2020 (UTC)

VG databases[edit]

Indeed, there sure are a lot :-) I’m conscious of the extra-load I’m causing :-/ ; will stop with these soon. And then process proposals myself, as penance ^_^'. Jean-Fred (talk) 17:53, 20 January 2020 (UTC)

…and thanks for your patience and diligence in processing these proposals (mine and others’), I appreciate it :) Jean-Fred (talk) 17:54, 20 January 2020 (UTC)

Canmore monument type[edit]

Would it be possible to mark this proposal as "ready" too ?  :-) Jheald (talk) 18:49, 3 February 2020 (UTC)

@Jheald: Done! ArthurPSmith (talk) 19:57, 3 February 2020 (UTC)

Creating a Bot[edit]

Hello Arthur,

I asked if it is possible to add statements with a bot in a way what is similar to QuickStatements. Can you look to the page Wikidata:Bot requests and answer there please if you know a way how it works. I think it can help for big uploads. -- Hogü-456 (talk) 16:27, 10 February 2020 (UTC)

Property proposal discussion progress[edit]

Please pardon my visit to your talk page with a request! Would you be willing to take a look at Wikidata:Property_proposal/WeChangEd_ID to see if you'd be up for creating it? Thank you for considering. YULdigitalpreservation (talk) 22:14, 2 March 2020 (UTC)

Done! ArthurPSmith (talk) 14:29, 3 March 2020 (UTC)

Japanese label[edit]

I don't think it is a practice to add space between surname and given name in Japanese labels. None of Japanese article use this practice (example).--GZWDer (talk) 13:48, 9 March 2020 (UTC)

I was just cutting/pasting from the source documents (which were Japanese websites). Chinese names don't use a space, but it seems to be common for Japanese. ArthurPSmith (talk) 14:00, 9 March 2020 (UTC)
@GZWDer: Example here. ArthurPSmith (talk) 14:02, 9 March 2020 (UTC)
But if you think it's wrong I don't mind being corrected! ArthurPSmith (talk) 14:05, 9 March 2020 (UTC)
Japanese Wikipedia does not use this convention (ja:WP:NC#PERSON). I don't know what should the practice of Wikidata be.--GZWDer (talk) 14:10, 9 March 2020 (UTC)
Ok, that jawiki page is fairly clear; I don't see any reason why Wikidata should be different, I'll follow that rule going forward (and fix ones I notice going back). Thanks. ArthurPSmith (talk) 14:39, 9 March 2020 (UTC)

Property proposal - historic county[edit]

Thanks for looking in at Property proposal/historic county. Where do we go now to press the button on creating it? Does it need extra work because of the limited "Allowed values"? I've not done one of these before. Thanks. Hogweard (talk) 17:42, 10 March 2020 (UTC)

@Hogweard: Once the property has been created, other people can add constraints or do other parts of the process by editing the property page. So all that's really needed is to hit the "create" button - but you have to have "property creator" permission to do it. Since I was slightly involved I'd prefer if somebody else actually took that on in this case. ArthurPSmith (talk) 18:41, 10 March 2020 (UTC)

Creating properties for premodern works[edit]

If you would not mind creating these (I do not have the necessary permissions), I can subsequently tidy them up:

Thank you for your help with these! AndrewNJ (talk) 13:12, 18 March 2020 (UTC)

Your help with all these is most appreciated! AndrewNJ (talk) 19:36, 19 March 2020 (UTC)
@AndrewNJ: You're welcome! All done I think - a few of them were completed by somebody else. ArthurPSmith (talk) 19:43, 19 March 2020 (UTC)

Hindi as language qualifier[edit]

I can not add Hindi as a language qualifier for my entries. It shows an error of code.

Please consult!

Murchana (talk) 12:47, 23 March 2020 (UTC)

@Murchana: I'm not sure exactly what you are trying to do - can you give me a link to the item you were trying to edit, and which property you were using? If it's a monolingual text property, then the language should be given as the "Wikimedia language code" - which should be 'hi' for Hindi. See the relevant statements on Hindi (Q1568). But if this isn't helpful please ask again! ArthurPSmith (talk) 13:55, 23 March 2020 (UTC)

Property creation[edit]

Hi Arthur,

Would you review/create this (fairly urgent) and that. If you click create I/others will/can complete the properties. Also, I responded to your feedback on hardiness and Filceolaire. --- Jura 13:14, 1 April 2020 (UTC)

  • @Jura1: Both done but I haven't done any of the other work on these... ArthurPSmith (talk) 13:41, 1 April 2020 (UTC)

Property proposal Donations[edit]

Hey @ArthurPSmith:, thank you for moving the Wikidata:Property proposal/Donations to Ready. Is there anything I can/should do to take this further? Or is there kind o a queue admins are working on? Best --Newt713 (talk) 14:48, 4 April 2020 (UTC)

  • @Newt713: It usually takes a few days for a property creator to get to it. ArthurPSmith (talk) 14:04, 6 April 2020 (UTC)
Perfect. Thank you! --Newt713 (talk) 14:46, 6 April 2020 (UTC)


I really don't appreciate the reverts. Especially when the topic hasn't been resolved and most people disagreed with the premise that you can't translate a proper noun into English. I'm not doing this off of personal preferences and neither should you. If there's a sound policy based reason behind it, fine. But I have yet to see one and the fact that you ignored my examples and can't answer basic questions about it anymore then Matthew hk can shows your not doing it from one. I've been pretty reasonable about this and I really don't feel like edit warring. We should be able to settle it in a civil manor. Just reverting me instead of actually discussing it or at least providing an actual policy to base it on isn't though. --Adamant1 (talk) 00:06, 20 April 2020 (UTC)

Btw, I asked about this on help:Label, because unlike you and Matthew hk I'm actually making a good faith effort to work this out. If there isn't a clear answer or the discussion involves canvassing/personal attacks from either of you, I will just revert you and report you for edit warring. Feel free to add a constructive, on topic comment to it if you have one though. --Adamant1 (talk) 00:06, 20 April 2020 (UTC)
@Adamant1: Sorry, I thought we had come to a conclusion there, and I went ahead and looked at the examples presented and tried to judge what was the right solution. If there's a specific question you still have that I haven't answered, feel free to go ahead and ask it again. ArthurPSmith (talk) 15:13, 20 April 2020 (UTC)
It's cool. It seemed like to that me that you and Mathew hk decided based on your opinions and just went ahead with reverting me without a sound reason behind it or actually explaining anything. Maybe it will be figured out in the label talk page though at least. --Adamant1 (talk) 03:49, 21 April 2020 (UTC)


Hi Arthur,

Would you click create on the COVID one? I would complete it as needed. --- Jura 17:06, 28 April 2020 (UTC)

  • Similarly at [1]? --- Jura 16:59, 11 June 2020 (UTC)


Ciao ArturPSmith, do we continue to propose changes on your Git, but with formatter URL (P1630) =… or do we have to create a developer account?

What is the procedure?

Thanks for all. —Eihel (talk) 17:46, 16 May 2020 (UTC)

proposed changes in git is fine. Yes, the URL has changed, as is being switched over to the new domain with this pattern. But nothing else has changed. You'll start seeing this with all the other tools too (if you haven't already)! ArthurPSmith (talk) 01:54, 17 May 2020 (UTC)
@Eihel: Forgot to ping! ArthurPSmith (talk) 01:55, 17 May 2020 (UTC)

property creation: "banned in"[edit]

Hi Arthur! As you know, I proposed the property "banned in" recently to aid with interconnecting items. Instead of creating this property "banned in" you have suggested starting to create thousands of new data items like "Ban of slavery in Qatar", "Ban of chewing gum in Singapore", "Ban of Breitbart from en.wp", "Ban of Mediator in France".

I asked you to explain how such tri-partite items would be linked to the data items of fundamental interest on the discussion page but forgot to ping you. Could you explain how those looking for a list of all medicines banned in France in 2009, or for all the dates of slavery being banned, or for all the news organizations banned from en.wp could do so easily from your proposed reification of 1000s of banned in x predicates into items.

This would be very easy with a property, which is why I'm asking you to demonstrate that that information could be as easily recovered automatically with a SPARQL query. Before I start creating the reifications you suggest, I would like to be sure that this information would not remain inaccessible in the labels of thousands of items. Here is a link to the discussion for convenience. Thanks for your help cleaning up the disinformation in WikiData (e.g. the problem of benfluorex (Q421695) being listed as a medication, without mention that is banned in France, was taken off the market in Spain, Italy, etc.) SashiRolls (talk) 22:16, 16 May 2020 (UTC)

I appreciate your perseverance. Please don't give up before you've explained your proposed model, though! Based on your additions I assume you plan to generate the query along the following lines: Find all (instances of ban) where (main subject = medication) and (date=2009) and (country = France). The sticking point I see is that (main subject = Benfluorex) is not the same as (main subject = medication) though Benfluorex is an (instance of a medication). I would love to see the SPARQL query you are aiming for... won't you be so kind as to add that last bit of pedagogy, please? So far, we've added an item and 4 statements to accomplish (perhaps?) what could be done with one property statement... SashiRolls (talk) 20:37, 18 May 2020 (UTC)
  • @SashiRolls: I'm afraid I don't have time for this discussion right now. I may get back to it, but please engage other people on this too. ArthurPSmith (talk) 20:48, 18 May 2020 (UTC)
Ok, no problem. Thanks for trying to explain your intuition. SashiRolls (talk) 21:15, 18 May 2020 (UTC)

Inviting editors for a short interview study[edit]

Hi Arthur,

I recently created an interview study research proposal to talk with Wikidata editors and understand their perception of Wikidata reuse. I have posted an invite on Wikidata project chat but I'm not getting responses. So I'm writing to ask if you could have any suggestions on other ways to recruit editors to participate. Thank you very much! Have a great day. Chuankaz (talk) 20:08, 20 May 2020 (UTC)

@Chuankaz: I'm not sure what to advise. For those who read your proposal in some detail it seemed to be looking for people who were roughly evenly divided in their editing of Wikipedia (do you mean in any language or just English?) and Wikidata, but I'm not sure there are many who fit that criterion - at least for myself I primarily edit only Wikidata, not anything else. I've occasionally edited English Wikipedia, and also in French, German, and some other languages, but in Wikidata I have thousands of times more edits. Maybe you should do a short first-round one or two-question survey to better characterize what you are looking for? ArthurPSmith (talk) 03:02, 21 May 2020 (UTC)
Thank you for the very useful advice! It would be great if participants have experience in editing Wikipedia only because it could provide a contrast of experience. It is not mandatory but I did not clarified it clearly in my proposal. I'll definitely adjust it. Thanks again! Chuankaz (talk) 03:17, 21 May 2020 (UTC)

National Film Board of Canada director identifier[edit]

Hi Arthur. Thank you for creating P6891 last year. I just noticed it, and was wondering if I should help by adding the ID codes from to items on NFB filmmakers. My question is, how does it help, because at this time, it doesn't seem to appear as an authority control identifier on Wikipedia? thanks, Shawn à Montréal (talk) 00:40, 27 May 2020 (UTC)

@Shawn à Montréal: It is helpful to link the NFB identifier to all the other external identifiers within Wikidata. I'm not familiar with Wikipedia rules on authority control but I suspect if there's a large enough/reliable set of identifiers on Wikidata then there is a process for adding it. ArthurPSmith (talk) 20:10, 27 May 2020 (UTC)
Ok, with that in mind I will start adding them in WD, if I can. Shawn à Montréal (talk) 20:15, 27 May 2020 (UTC)
Hello again. Actually thanks to Simon Villeneuve at frwiki it is being used. One either adds the template {{ONF}} or better yet it appears alongside other identifiers using {{Bases audiovisuel}}. So it is being put to good use. Shawn à Montréal (talk) 20:41, 4 July 2020 (UTC)


Regarding your revert: In my opinion it would make more sense to have lyricist as a subclass of author and songwriter as a subclass of lyricist and composer. What do you think? --Discostu (talk) 07:42, 20 June 2020 (UTC)

@Discostu: Actually the subclass statements on songwriter (Q753110) don't make sense as they are, and what you propose wouldn't really help: 'A' subclass of 'B' is supposed to mean that every instance of A is also an instance of B, but not every songwriter is a composer. Maybe it should go directly to creator (Q2500638) as superclass? Anyway, songwriter definitely shouldn't be a subclass of lyricist, as not all songwriters are lyricists. ArthurPSmith (talk) 12:52, 22 June 2020 (UTC)

Sports season[edit]

Hello, I must have done that by error, maybe due to confusion. Actually it should have been the rugby union season as subclass of sports season. -- Blackcat (talk) 07:56, 20 June 2020 (UTC)

Dowker-Thistlethwaite name property[edit]

Hi! I've amended Wikidata:Property proposal/Dowker-Thistlethwaite name to take your suggestion into account. Could you take a look at the finished proposal, and check if it's OK? Kind regards, The Anome (talk) 14:20, 27 June 2020 (UTC)

New OpenRefine reconciliation service[edit]


Thank you for wearing the {{User loves OpenRefine}} userbox on your user page!

Because the existing Wikidata reconciliation service has had severe performance issues recently, I have created a new one which should be faster and more robust. You can add it to OpenRefine in the reconciliation dialog with the following URL: (or by replacing en by any other language code).

If you have any issues with this new service, let me know.

Happy reconciling! − Pintoch (talk)

Thanks, I'll give it a try (I was just using OpenRefine last week, but probably won't be again for a little while...) ArthurPSmith (talk) 16:00, 20 July 2020 (UTC)

Wikidata:Property proposal/Henrik Ibsen skrifter ID Edit conflict[edit]

An edit conflict occured. I hope my last edit made things a bit clearer (?). Pls have a second look if possible. Breg Pmt (talk) 18:47, 10 August 2020 (UTC)

@Pmt: Looks ok to me, thanks for letting me know. ArthurPSmith (talk) 18:57, 10 August 2020 (UTC)

Opening hours[edit]

I think Syced and Hannes' concerns were addressed in the opening hours proposal. Would you mind going ahead and creating the property? NMaia (talk) 01:24, 7 September 2020 (UTC)

We sent you an e-mail[edit]

Hello ArthurPSmith,

Really sorry for the inconvenience. This is a gentle note to request that you check your email. We sent you a message titled "The Community Insights survey is coming!". If you have questions, email

You can see my explanation here.

MediaWiki message delivery (talk) 18:45, 25 September 2020 (UTC)

error in GRID[edit]

This entry contains an alias "Zūnyì yīxuéyuàn", which is wrong. How can this be removed?--GZWDer (talk) 16:47, 29 October 2020 (UTC)

@GZWDer: You can send GRID a message, or if you'd prefer I can do it, I've sent them many corrections in the past. It will probably take a few months to show up though, they don't update things very quickly. ArthurPSmith (talk) 19:01, 29 October 2020 (UTC)


Are you sure about this? As fas as I know, EMEF never had any operations in Amadora — rather in Campolide, Guifães, FFoz, Barreiro and elsewhere. In Amadora existed for several decades a much bigger rail manuf. company — Sorefame. Maybe there was some confusion between these two? Tuvalkin (talk) 04:14, 21 November 2020 (UTC)

@Tuvalkin: That was based on the GRID data - here - but I didn't independently verify it, feel free to correct? ArthurPSmith (talk) 14:01, 23 November 2020 (UTC)
Yes, I saw the GRID page you linked to — but how is GRID content created and sourced and why do we trust it? I will not correct it because I’m not 100% sure myself right now, but when I come around to find the necessary authoritative sources for each and every EMEF facility (there were many: EMEF is it’s mostly a 1990s privatization of all rolling stocks repair workshops in Portugal, and every major station had one) I will add the info to pt:EMEF, S.A. (and pt:Sorefame) — I will not come over to WikiData to waste my time clicking UI rat mazes, no offense. Tuvalkin (talk) 23:58, 24 November 2020 (UTC)

Jewish Museum Berlin object ID[edit]

Thanks for fixing my typo at Jewish Museum Berlin object ID and not excoriating me for it! --RAN (talk) 21:35, 29 January 2021 (UTC)

I want my talk page to look like User talk:Belteshassar's talk page[edit]

Hi. I'm asking for your help(totally out of the blue) if my talk page can look like User talk:Belteshassar's talk page. LotsofTheories (talk) 21:03, 9 February 2021 (UTC)

L14698 and L184508[edit]

Hi! I just came across the two lexemes bow (L14698) and bow (L184508) I asked myself why these two lexemes were not combined into one lexeme. Because you are the creator of both lexemes, what your intention is behind them? --Gymnicus (talk) 22:07, 20 February 2021 (UTC)

@Gymnicus: Because they are two distinct lexemes, pronounced differently even though they are spelled the same and have the same forms. See the differing senses on the two lexemes. ArthurPSmith (talk) 13:20, 22 February 2021 (UTC)
In my opinion, the different senses are no justification for separate lexemes. As a counter-example to this thesis, I would cite the lexeme Bienenstich (L10226) In the German language, this lexeme describes both the sting of a bee and a sheet cake. These two senses are also completely different and yet it is a lexeme. There must be other differences. For example, by the lexemes Bank (L34723) and Bank (L34791) the reason for the separation is the different grammatical forms. In this case of bow (L14698) and bow (L184508) the different pronunciation you mentioned could be the reason for the separation into two lexemes. But then, from my point of view, the different pronunciation should also be specified. There are various properties that can be used for this in Wikidata. You can see an overview in the template Lexicographical properties. Perhaps you can incorporate at least one of the three properties into each of the two lexemes so that one can see the difference between them. --Gymnicus (talk) 14:44, 22 February 2021 (UTC)
@Gymnicus: I think any English speaker would understand the difference by looking at the senses, that was my implication. Yes hopefully pronunciation properties will be added, but I've added thousands of lexemes without those statements, and working on other things right now, so it's unlikely I will get to it. ArthurPSmith (talk) 16:05, 22 February 2021 (UTC)
One reference on this. ArthurPSmith (talk) 16:07, 22 February 2021 (UTC)
or more humorously... ArthurPSmith (talk) 16:11, 22 February 2021 (UTC)
“I think any English speaker would understand the difference by looking at the senses” - This assumption has already been refuted by me, as you have already noticed. I speak English, but I didn't know the difference between bow (L14698) and bow (L184508). That is why it could have been that I thoughtlessly put the two lexemes together because I saw no difference between them, except for the different meanings. But as already shown on the lexeme Bienenstich (L10226), this is no reason not to merge the two lexemes. I think it's a shame that you don't add the pronunciation. Then one must hope that the two lexemes are not carelessly put together by someone. Unfortunately, I cannot add the pronunciation either, because I have no idea about it. --Gymnicus (talk) 21:29, 22 February 2021 (UTC)

Strange merge[edit]

Hi Arthur, you've merged de:Universidad Santa María (Ecuador) (founded 1996) with en:Federico Santa María Technical University (founded 1931). [2] --Kolja21 (talk) 16:23, 26 February 2021 (UTC)

@Kolja21: It is the same institution - however the inception date from GRID would appear to be incorrect (based on the dewiki text). It seems to be a branch of the Chilean university, Federico Santa María Technical University (Q457793); I'll link them together. ArthurPSmith (talk) 16:31, 26 February 2021 (UTC)
Oh, wait, now I don't understand your comment. I didn't merge the de entry with the en entry, the de entry is for the university in Ecuador, the en one is from Chile. ArthurPSmith (talk) 16:35, 26 February 2021 (UTC)


Should Android and iOS versions of apps have seperate items? I cannot for the life of me find anything regarding the modeling of apps --Trade (talk) 00:04, 1 March 2021 (UTC)

@Trade: No, I think a single item is generally the right thing for a single piece of software - see Wikidata:WikiProject Informatics/Software/Properties and related pages. ArthurPSmith (talk) 18:53, 1 March 2021 (UTC)
@Trade: @ArthurPSmith: You should create separate item for iOS and Android otherwise it will be messed. Eurohunter (talk) 20:44, 15 March 2021 (UTC)


Ho ricevuto un messaggio tuo ma non ho capito se c'è un testo.... Ho fatto qualcosa?...  – The preceding unsigned comment was added by Gabriele.badii (talk • contribs) at 18:41, April 16, 2021‎ (UTC).

@Gabriele.badii: It's just a standard welcome template - I try to add it when I see somebody commenting on here who seems new. Nothing you did was a problem at all! ArthurPSmith (talk) 12:27, 19 April 2021 (UTC)

Thank you Gabriele.badii (talk) 12:34, 19 April 2021 (UTC)