Wikidata:Bot requests

From Wikidata
Jump to: navigation, search





for permissions


for deletions


for deletion

for comment

and imports

a query


Bot requests
If you have a bot request, add a new section using the button and tell exactly what you want. To reduce the process time, first discuss the legitimacy of your request with the community in the Project chat or in the Wikiprojects's talk page. Please refer to previous discussions justifying the task in your request.

For botflag requests, see Wikidata:Requests for permissions.

Tools available to all users which can be used to accomplish the work without the need for a bot:

  1. PetScan for creating items from Wikimedia pages and/or adding same statements to items
  2. QuickStatements for creating items and/or adding different statements to items
  3. Harvest Templates for importing statements from Wikimedia projects
  4. Descriptioner for adding descriptions to many items
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2017/07.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 2 days.
You may find these related resources helpful:

High-contrast-document-save.svg Data Import Hub
High-contrast-view-refresh.svg Why import data into Wikidata.
Light-Bulb by Till Teenck.svg Learn how to import data
Noun project 1248.svg Bot requests
Question Noun project 2185.svg Ask a data import question
Check Box Noun project 10759.svg Data Import Archive


Taxon labels[edit]

For items where instance of (P31)=taxon (Q16521), and where there is already a label one one or more languages, which is the same as the value of taxon name (P225), the label should be copied to all other empty, western alphabet, labels. For example, this edit. Please can someone attend to this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:11, 10 March 2016 (UTC)

Do you mean label or alias? I would support the latter where there is already a label and that label is not already the taxon name. --Izno (talk) 17:03, 10 March 2016 (UTC)
No, I mean label; as per the example edit I gave. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
See your last request: Wikidata:Bot_requests/Archive/2015/08#Taxon_names. --Succu (talk) 18:57, 10 March 2016 (UTC)
Which was archived unresolved. We still have many thousands of missing labels. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
Nope. There is no consensus doing this. Reach one. --Succu (talk) 20:22, 10 March 2016 (UTC)
You saying "there is no consensus" does not mean that there is none. Do you have a reasoned objection to the proposal? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:56, 10 March 2016 (UTC)
Go back and read the linked discussions. In the nursery of wikidata some communities had strong objections. If they changed their mind my bot can easily execute this job. --Succu (talk) 21:19, 10 March 2016 (UTC)
So that's a "no" to my question, then. I read the linked discussions, and mostly I see people not discussing the proposal, and you claiming "there is no consensus", to which another poster responded "What I found, is a discussion of exactly one year old, and just one person that is not supporting because of 'the gadgets then need to load more data'. Is that the same 'no consensus' as you meant?". There are no reasoned objections there, either. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:24, 10 March 2016 (UTC)
For the lazy ones:
--Succu (talk) 21:53, 10 March 2016 (UTC)
I already done for Italian label in past. Here other two propose: May 2014 and March 2015 --ValterVB (talk) 09:54, 11 March 2016 (UTC)
@ValterVB: Thank you. Can you help across any other, or all, western-alphabet languages, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:18, 16 March 2016 (UTC)
Yes I can do it, but before to modify 2,098,749 items I think is necessary to have a strong consensus. --ValterVB (talk) 18:14, 16 March 2016 (UTC)
@ValterVB: Thank you. Could you do a small batch, say 100, as an example, so we can then ask on, say, Project Chat? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:03, 18 March 2016 (UTC)
Simply ask with the example given by you. --Succu (talk) 15:16, 18 March 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Pigsonthewing:

  • Test edit: Q14945671, Q21444273, Q2508347, Q25247.
  • Languge: "en","de","fr","it","es","af","an","ast","bar","br","ca","co","cs","cy","da","de-at","de-ch","en-ca","en-gb","eo","et","eu","fi","frp","fur","ga","gd","gl","gsw","hr","ia","id","ie","is","io","kg","lb","li","lij","mg","min","ms","nap","nb","nds","nds-nl","nl","nn","nrm","oc","pcd","pl","pms","pt","pt-br","rm","ro","sc","scn","sco","sk","sl","sr-el","sv","sw","vec","vi","vls","vo","wa","wo","zu"
  • Rule:

Very important: is necessary verify if the list of languages is complete. Is the same that I use for disambiguation item. --ValterVB (talk) 09:42, 19 March 2016 (UTC)

    • I really don't like the idea of this. The label, according to Help:Label, should be the most common name. I doubt that most people are familiar with the latin names. Inserting the latin name everywhere prevents language fallback from working and stops people from being shown the common name in another language they speak. A very simple example, Special:Diff/313676163 added latin names for the de-at and de-ch labels which now stops the common name from the de label from being shown. - Nikki (talk) 10:29, 19 March 2016 (UTC)
      • @Nikki: The vast majority of taxons have no common name; and certainly no common name in every language. And of course edits can subsequently be overwritten if a common name does exist. As for fallback, we could limit this to "top level" languages. Would that satisfy? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
        • As far as I'm aware most tools rely on the absence of certain information. Adding #10,000 csv file of Latin / Welsh (cy) species of birds. would be rendered to handcraft. --Succu (talk) 23:11, 19 March 2016 (UTC)
          • Perhaps this issue could be resolved by excluding certain groups? Or the script used in your example could overwrite the label if it matches the taxon name? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
        • It may be the case that most taxon items won't have a common name in any language, but I don't see anything here which is only trying to target the taxon items which have no common names. Adding the same string to lots of labels isn't adding any new information and as Succu pointed out, doing that can get in the way (e.g. it makes it more difficult to find items with missing labels, it can get in the way when merging (moving common names to the aliases because the target already has the latin name as a label) and IIRC the bot which adds labels for items where a sitelink has been recently added will only do so if there is no existing label). To me, these requests seem like people are trying to fill in gaps in other languages for the sake of filling in the gaps with something (despite that being the aim of the language fallback support), not because the speakers of those languages think it would be useful for them and want it to happen (if I understand this correctly, @Innocent bystander: is objecting to it for their language). - Nikki (talk) 22:40, 22 March 2016 (UTC)
          • Yes, the tolerance against bot-mistakes is limited on svwiki. Mistakes initiated by errors in the source is no big issue, but mistakes initiated by "guesses" done by a bot is not tolerated at all. The modules we have on svwiki have no problem handling items without Swedish labels. We have a fallback-system which can use any label in any language. -- Innocent bystander (talk) 06:39, 23 March 2016 (UTC)
            • @Innocent bystander: This would not involve an "guesses". Your Wikipedia's modules may handle items without labels, but what about third-party reusers? Have you identified any issues with the test edits provided above? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
              • No, I have not found any issue in the examples. But this is not my subject, I would not see an issue even if it was directly under my nose. Adding correct statements for Scientific names and Common names looks more important here for the third party users than labels, which cannot be sourced. NB, the work of Lsjbot have done that Swedish and Cebuano probably have more labels than any other language in the taxon set. You will not miss much by excluding 'sv' in this botrun. -- Innocent bystander (talk) 07:00, 24 March 2016 (UTC)
                • If a taxon name can be sourced, then by definition so can the label. If you have identified no errors, then your reference to "guesses" is not substantiated. true, adding for Scientific names and Common names is important, but the two tasks are not mutually exclusive, and their relative importance is subjective. To pick one example at random, from the many possible, Dayus (Q18107066) currently has no label in Swedish, and so would benefit from the suggested bot run. indeed, it currently has only 7 labels, all the same, and all using the scientific name. Indeed, what are the various European language's common name for this mainly Chinese genus? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:34, 25 March 2016 (UTC)
          • No, this is not "trying to fill in gaps in other languages for the sake of filling in the gaps". Nor are most of the languages affected served by fallback. If this task is completed, then "find items with missing labels" will not be an issue for the items concerned, because they will have valid labels. Meanwhile, what is the likelihood of these labels being provided manually? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
            • If this is not trying to fill in the gaps for the sake of filling in the gaps, what problem is it solving and why does language fallback not help? (I'm sure the development team would be like to know that language fallback is not working properly). The taxonomic names are not the preferred labels and valid is not the same as useful (adding "human" as the description for humans with no description was valid, yet users found it annoying and useless and they were all removed again), the labels for a specific language in that language are still missing even if we make it seem like they're not by filling in all the gaps with taxonomic names, it's just masking the problem. I can't predict the future so I don't see any point in speculating how likely it is that someone will come along and add common names. They might, they might not. - Nikki (talk) 23:02, 24 March 2016 (UTC)
              • It solves the problem of an external user, making a query (say for "all species in genus X") being returned the Q items with no labels, in their language. This could break third party applications, also. In some cases, there is currently no label in any language - how does language fallback work then? How does it work if the external user's language is Indonesian, and there is only an English label saying, say, "Lesser Spotted Woodpecker"? And, again, taxonomic names are the preferred labels for the many thousands of species - the vast majority - with no common name - or with no common name in a given language. The "human" examples compares apples with pears. This is a proposal to add specific labels, not vague descriptions (the equivalent would be adding "taxon" as a description). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:26, 25 March 2016 (UTC)
                • Why should an external user query a Wikidata internal called label and not rely on a query of taxon name (P225)? --Succu (talk) 22:04, 25 March 2016 (UTC)
                  • For any of a number of reasons; not least that they may be querying things which are not all taxons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:32, 26 March 2016 (UTC)
                    • Grand answer. Maybe they are searching the labels for aliens, gods, fairy tales or something else? A better solution would be if the Wikibase could be configured to take certain properties like as taxon name (P225) or title (P1476) as a default value as a language independent label. --Succu (talk) 21:09, 27 March 2016 (UTC)
                      • Maybe it could. But it is not. That was suggested a year or two ago, in the discussions you cited above, and I see no move to make it so, no any significant support for doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:19, 27 March 2016 (UTC)
                        • So what? Did you reached an agreement with svwiwki, cebwiki, warwiki, viwiki or nlwiki we should go along your proposed way? --Succu (talk) 21:43, 27 March 2016 (UTC)
    • @ValterVB: Thank you. I think your rules are correct. I converted the Ps &Qs in your comment to templates, for clarity. Hope that's OK. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
  • Symbol oppose vote.svg Oppose That majority of taxons does not have a common name, does not mean that all western languages should automatically use the scientific name as label. Matěj Suchánek (talk) 13:23, 16 April 2016 (UTC)
    • Nobody is saying "all western languages should automatically use the scientific name as label"; if the items already have label, it won't be changed. If a scientific label is added as a label, where none existed previously, and then that label is changed to some other valid string, the latter will not be overwritten. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:31, 20 April 2016 (UTC)

We seem to have reached as stalemate, with the most recent objections being straw men, or based on historic and inconclusive discussions. How may we move forward? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:28, 16 May 2016 (UTC)

That's simple: drop your request. --Succu (talk) 18:33, 16 May 2016 (UTC)
Were there a cogent reason to, I would. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:57, 17 May 2016 (UTC)
Anyone? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:04, 10 September 2016 (UTC)
@Pigsonthewing: I'll support the proposal if it is limited to major languages that don't have other fallbacks. For most taxons, the scientific name is the only name, and even for taxons with a common name, having the scientific name as the label is better than having no label at all. I'm reluctant to enact this for a huge number of languages though, as it might make merges (which are commonly needed for taxons) a pain to complete. Kaldari (talk) 23:02, 28 September 2016 (UTC)
@Kaldari: Thank you. Please can you be more specific as to what you mean by "major languages that don't have other fallbacks"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:35, 29 September 2016 (UTC)
@Pigsonthewing: Maybe just the biggest Latin languages: English, German, Spanish, French, Portuguese, Italian, Polish, Dutch. Kaldari (talk) 18:29, 29 September 2016 (UTC)
I'm not sure why we'd limit ourselves to them, but if we can agree they should be done, let's do so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 29 September 2016 (UTC)
@Kaldari: Did you see my reply? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:48, 10 October 2016 (UTC)

Symbol oppose vote oversat.svg Strong oppose As said before...--Succu (talk) 22:02, 10 October 2016 (UTC)

What you actually said was "There is no consensus doing this. Reach one.". My reply was "You saying 'there is no consensus' does not mean that there is none. Do you have a reasoned objection to the proposal?", and you provided none then, nor since. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:11, 11 October 2016 (UTC)

It appears that Succu now agrees that adding scientific names of taxons as labels is a good thing. Shall we now proceed with a bot? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:53, 22 February 2017 (UTC)

Nonsens. My bot is adding labels in a few languages where there is agreement to do so. --Succu (talk) 17:07, 22 February 2017 (UTC)
Your bot is - as can be seen in the diff I provided, by anyone who chooses to look at it - adding scientific names of taxons as labels, which is exactly what I proposed we use a bot to do; and what you claimed to have opposed. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:38, 23 February 2017 (UTC)
...where there is agreement to do so. --Succu (talk) 14:02, 23 February 2017 (UTC)
Agreement from whom? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:09, 23 February 2017 (UTC)
Simply follow the discussions. --Succu (talk) 14:16, 23 February 2017 (UTC)
Please answer the question. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:37, 23 February 2017 (UTC)
I did often enough. So it's meanwhile a little bit boring. --Succu (talk) 15:50, 23 February 2017 (UTC)
I ask you a third time: Agreement from whom? Please answer, succinctly and unambiguously, with links, otherwise people may reasonably conclude that there are no such agreements. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:00, 23 February 2017 (UTC)
Again: follow the links in this thread. There is no need to repeat it again and again. --Succu (talk) 16:06, 23 February 2017 (UTC)
Succu has finally given an answer at Wikidata:Project chat#Agreement to add scientific names of taxons as labels, where discussion continues. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:53, 24 February 2017 (UTC)
I'm not willingly to answer in a thread full of insults. It doesn't help to move on.
Over there I stated „At earlier discussions that RfC was accepted as a starting point.“ refering to Wikidata:Requests for comment/Automatic labelling. To find the follow up discussions please scroll to the top of this thread.
Here I stated „...where there is agreement to do so“. This refers to the current common practice implemented by different bot owners, which stretches the scope of the aforementioned RfC. I'm not aware that the Wikidata community disagreed with this additions. My bot activity regarding adding ruwiki labels based on P225 were scrutinized at User:Succu/Archive/2015#Latest_SuccuBot_activities. There are some more pros and cons why not doing this.
Mr. Mabbett could you please summarize your arguments why we should mass add labels written with the en:Western alphabet based on taxon name (P225). BTW: This query gives back more than 240 language codes, including some not existing like lat-vul.
--Succu (talk) 21:56, 24 February 2017 (UTC)
The "Western alphabet" is inescapable, no matter of the language of the project, as both major Codes of nomenclature prescribe it. - Brya (talk) 05:51, 25 February 2017 (UTC)
Mr. Mabbett says: „In the light of comments made here I propose shortly to get a bot ...” --Succu (talk) 20:57, 28 February 2017 (UTC)
@billinghurst, Jarekt, Tagishsimon: Sry. --Succu (talk) 21:00, 28 February 2017 (UTC)
Indeed I do. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:29, 19 March 2017 (UTC)
Anyway, from time to time there appears editors which clones related labels for several items or even to small bacthes of items, but this is a drop in ocean. To keep a ~million of items without labels in many of most important languages isn't the best choice. XXN, 21:35, 30 June 2017 (UTC)
  • I'd Symbol support vote.svg Support this request, but with a clause - to not do this for *absolutely all taxon items*. To begin with taxons which have only few sitelinks (especially in "botopedias") or no sitelinks at all - these should be taxons which are exotic, uncommon or almost unknown, and most probably they does not have yet a common name in most of languages. The percentage of potential "unwated additions" in such case is very low. XXN, 21:35, 30 June 2017 (UTC)
At least one of your botopedias said no. --Succu (talk) 21:39, 30 June 2017 (UTC)
So, adding scientific name as the default Russian and Bulgarian label it's OK, but for other languages not? Maybe this was discussed somewhere else, I don't know. What would be then opposing arguments for doing the same for Romanian, for example?
I think this discussion needs a wider attention and involvement of the community, and perhaps it should take place either ad Project Chat or RFC. XXN, 19:01, 2 July 2017 (UTC)
@XXN: Please see Wikidata:Project chat/Archive/2017/03#Agreement to add scientific names of taxons as labels. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:17, 3 July 2017 (UTC)

Add P1082 (population) and P585 (point in time) from PLwiki to Wikidata[edit]

Looks like PLwiki has lots of population information other Wiki does not have. It will be useful to have it for all of us. בורה בורה (talk) 18:23, 12 April 2016 (UTC)

It might be helpful to give some supporting links here, to be sure to get the right information from the right place into the right fields. Can you list one pl-article and one corresponding wikidata-item that is manually filled with the desired information? Than I can see if I can get the information filled by a script in the same way. Edoderoo (talk) 18:26, 16 April 2016 (UTC)
Edoderoo sorry for the late reply. I was on vacation. Take for example the article "Żołynia" in PLwiki. It has a population of 5188 as of 2013. However this information does not exist on Wikidata item (Q2363612). There are thousands of examples like this, but you got the idea... PLwiki is really great on population. Share it with us all. בורה בורה (talk) 10:19, 4 May 2016 (UTC)
It would be better to find a reliable source instead. Sjoerd de Bruin (talk) 07:44, 19 September 2016 (UTC)
בורה בורה: No activity for a long time. Marking this one as resolved. Multichill (talk) 11:43, 30 November 2016 (UTC)
Multichill, What do I suppose to say? I put a request, I explain it and it is not done yet... In many Wikidata items I see ENwiki or RUwiki as a source. Why can't PLwiki be a source as well? Please reopen this request and put it back in the queue. I appreciate if you tag me on your reply. בורה בורה (talk) 03:36, 1 December 2016 (UTC)
בורה בורה: You request something, you should keep an eye on it for example by putting this page on your watchlist and reply if someone responds. That way tasks get resolved (or declined) quickly.
I had a look at pl:Żołynia. It uses the template pl:Template:Wieś infobox with the fields "liczba ludności" and "rok". I assume "liczba ludności" would map to population (P1082) and "rok" would be the source for the point in time (P585) qualifier like this example?
I agree with Sjoerd that a real source is preferred over import from Wikipedia, but that doesn't mean we shouldn't import this. We can always add references when a real source becomes available.
@Edoderoo: do you want to have a shot at this one? Multichill (talk) 10:42, 1 December 2016 (UTC)
Thanks for the ping. I can have a look this month to this one. Edoderoo (talk) 10:56, 1 December 2016 (UTC)
Multichill indeed your mapping is correct. I am standing by to see when this is done. Once completed our articles will be populate automatically as we are retrieving the info from Wikidata! בורה בורה (talk) 18:38, 1 December 2016 (UTC)
Multichill, Edoderoo, any progress here? בורה בורה (talk) 19:15, 18 December 2016 (UTC)
Sorry, my bot got blocked and the bot-bit revoked, so for the time being I can create a script, but I can't test nor execute it... Edoderoo (talk) 15:39, 18 January 2017 (UTC)
To clarify this, I asked Edoderoo to create a RfBots for the particular tasks he wants to be doing, but he decided not to do that. --Vogone (talk) 03:36, 19 January 2017 (UTC)

Take care of disambiguation items[edit]

Points to cover

Somehow it should be possible to create a bot that handles disambiguation items entirely. Not sure what are all the functions needed, but I started a list on the right side. Please add more. Eventually a Wikibase function might even do that.
--- Jura 13:36, 18 April 2016 (UTC)

Empty disambiguation: Probably @Pasleim: can create User:Pasleim/Items for deletion/Disambiguation . Rules: Item without sitelink, with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410). For the other point my bot alredy do something, (for my bot a disambiguation is an item with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410)). Descriptions I use description used in autoEdit Label: I add the same label for all the latin language only if all the sitelink without disambiguation are the same. With these 2 operation I detect a lot of duplicate: same label+description. For now the list is very long (maybe >10K item) but isn't possible to merge automatically too much errors. Another thing to do is normalize the descriptions, there are a lot of item with not standard description. --ValterVB (talk) 18:02, 18 April 2016 (UTC)
  • Personally, I'm not that much worried about duplicate disambiguation items. Mixes between content and disambiguations are much more problematic. It seems they keep appearing through problems with page moves. BTW, I added static numbers to the points.
    --- Jura 10:06, 19 April 2016 (UTC)
    You will always have duplicate disambiguation items, since svwiki has duplicate disambiguation-pages. Some of these duplicates exists because they cover different topics and some of them exists since the pages otherwise becomes to long. A third category are the bot-generated duplicates. They should be treated as temporary, until a carbon based user has merged them.
    And how are un-normalized descriptions a problem? -- Innocent bystander (talk) 10:58, 19 April 2016 (UTC)
About "un-normalized descriptions": ex I have a disambiguation item with label "XXXX" and description "Wikipedia disambiguation", if I create a new item with label "XXXX" and description "Wikimedia disambiguation" I don't see that already exist an disambiguation item "XXXX", if the description is "normalized" I see immediately the the disambiguation already exist so I can merge it. --ValterVB (talk) 11:10, 19 April 2016 (UTC)
For some fields, this proved quite efficient. If there are several items that can't be merged, as some point, there will be something like "Wikimedia disambiguation page (2)", etc.
--- Jura 12:10, 19 April 2016 (UTC)

Lazy start for point (4): 47 links to add instance of (P31)=Wikimedia disambiguation page (Q4167410) to items without statements in categories of sitelinks on Category:Disambiguation pages (Q1982926): en, simple, da, ja, ka, ba, ca, nl, el, hr, sr, tr, eu, hu, ro, no, eo, cs, sv, fi, hy, et, uk, sk, it, mk, kk, pt, zh, sh, id, az, de, be_x_old, be, es, sl, bs, la, pl, fr, lv, ru, nn, lt, sq, bg,
--- Jura 12:07, 23 April 2016 (UTC)

The biggest problem is to define what pages are disambiguation pages, given names and surnames. For example Backman (Q183341) and Backman (Q23773321). I don't see what is the difference between enwiki and fiwiki links. Enwiki page is in category "surnames" and fiwiki page in categories "disambiguation pages" and "list of people by surname", but the page in fiwiki only contains surnames, so basically it could be in the same item as the enwiki link. --Stryn (talk) 13:10, 23 April 2016 (UTC)

I think people at Wikidata could be tempted to make editorial decisions for Wikipedia, but I don't think it's up to Wikidata to determine what Wikipedia has to consider a disambiguation page. If a language version considers a page to be a disambiguation page, then it should go on a disambiguation item. If it's an article about a city that also lists similarly named cities, it should be on an item about that city. Even if some users at Wikidata attempted to set "capital" to a disambiguation page as Wikipedia did the same, such a solution can't be sustained. The situation for given names and family names isn't much different. In the meantime, at least it's clear which items at Wikidata have what purpose.
--- Jura 14:20, 23 April 2016 (UTC)
You then have to love Category:Surname-disambigs (Q19121541)! -- Innocent bystander (talk) 14:35, 23 April 2016 (UTC)
IMHO: In Wikipedia disambiguation page are page that listing page or possible page that have the same spelling, no assumption should be made about the meaning. If we limit the content to partial sets whith some specific criterion we haven't a disambiguation page but a list (ex. list of person with the same surname List of people with surname Williams (Q6633281). These pages must use tag __DISAMBIG__ to permit bot and human to recognize without doubts a disambiguation from a different item. In Wikidata disambiguation item are item the connect disambiguations page with the same spelling. --ValterVB (talk) 20:02, 23 April 2016 (UTC)

Disambiguation item without sitelink --ValterVB (talk) 21:30, 23 April 2016 (UTC)

I'd delete all of them.
--- Jura 06:13, 24 April 2016 (UTC)

Some queries for point (7):

A better way needs to be found for (7a).
--- Jura 08:07, 25 April 2016 (UTC)

I brought up the question of the empty items at Wikidata:Project_chat#Wikidata.2C_a_stable_source_for_disambiguation_items.3F.
--- Jura 09:39, 27 April 2016 (UTC)

As this is related: Wikidata:Project chat/Archive/2016/04#Deleting descriptions. Note, that other languages could be checked. --Edgars2007 (talk) 10:30, 27 April 2016 (UTC)

I don't mind debating if we should keep or redirect empty disambiguation items (if admins want to check them first ..), but I think we should avoid recycling them for anything else. --- Jura 10:34, 27 April 2016 (UTC)
As it can't be avoided entirely, I added a point 10.
--- Jura 08:32, 30 April 2016 (UTC)
Point (3) and (10) are done. For point (2) I created User:Pasleim/disambiguationmerge. --Pasleim (talk) 19:22, 2 July 2016 (UTC)
Thanks, Pasleim.
--- Jura 05:02, 11 July 2016 (UTC)
  • Matěj Suchánek made User:MatSuBot/Disambig errors which covers some of 7b.
    Some things it finds:
    • Articles that are linked from disambiguation items
    • Disambiguation items that were merged with items for concepts relevant to these articles (maybe we should check items for disambiguation with more than a P31-statement or attempt to block such merges)
    • Pages in languages were the disambiguation category isn't correctly set-up or recognized by the bot (some pages even have "(disambiguation)" in the page title). e.g. Q27721 (36 sitelinks) – ig:1 (disambiguation)
    • Pages in categories close to disambiguation categories. (e.g. w:Category:Set indices on ships)
    • Redirects to non-disambiguations. (e.g. Q37817 (27 sitelinks) idwiki – id:Montreuil – redirects to id:Komune di departemen Pas-de-Calais (Q243036, not a disambiguation)

Seems like an iceberg. It might be easier to check these by language and once the various problems are identified, attempt to sort out some automatically.
--- Jura 05:02, 11 July 2016 (UTC)

Note that my bot only recognizes pages with the __DISAMBIG__ magic word as disambiguations. If you want a wiki-specific approach, I can write a new script which will work only for chosen wikis. Matěj Suchánek (talk) 09:12, 12 July 2016 (UTC)
  • Step #4 should be done for now. The above list now includes links for 160+ sites.
    --- Jura 22:02, 5 August 2016 (UTC)
  • For step #3a, there is now Phab:T141845
    --- Jura 22:30, 5 August 2016 (UTC)
List of disambiguation item with conflict on Label/description --ValterVB (talk) 13:57, 6 August 2016 (UTC)
  • Added #11.
    --- Jura 02:05, 21 September 2016 (UTC)
  • Is it appropriate to add 12. Mix-n-Match should not offer disambiguation items for matching to external authority files. --Vladimir Alexiev (talk) 11:56, 21 January 2017 (UTC)
    • Sure, the list is freely editable, but the focus is mainly on how to handle these items rather than fix other tools. I wonder if things like Topic:Tjgt6ynwufjm65zk aren't just the tip of an iceberg with some other root problem.
      --- Jura 12:18, 21 January 2017 (UTC)

Get GeoNames ID from the Cebuano or Swedish Wikipedia[edit]

Currently there are many concepts such as West Branch Sabbies River (Q22564260) that refer to geographical features that have articles in the Cebuano and Swedish Wikipedia. For most of them there's an Infobox with information at the respective Wikipedia but not all of the information is available in Wikidata. I would propose that the information get's copied over by a bot. There are to many articles to copy information manually. Especially the GeoNames ID should be easy to copy automatically. ChristianKl (talk) 15:52, 6 July 2016 (UTC)

Be very very careful! The GeoNamesID's that has been added here before, based on the Wikipedia-links in the GeoNames database are very very often very very wrong! Starting with copying the geonames-ID's from the sv/ceb-articles is a good start! We then can detect mismatching in Wikidata and GeoNames. Other kind of information can thereafter be directly be collected from GeoNames. But even that data is often wrong. An example, large parts of the Faroe Islands (Q4628) in GeoNames is located on the bottom of the Atlantic. -- Innocent bystander (talk) 16:26, 12 July 2016 (UTC)
@Innocent bystander: Note: I did import few thousands of Geonames IDs some few weeks ago. Can't say, how many are left there. If svwiki had some tracking category, that would be helpful :) --Edgars2007 (talk) 17:18, 31 July 2016 (UTC)
@Edgars2007: I'll see what I can do (tomorrow). One issue here is that a tracking-category cannot separate the Lsjbot-articles from the others. -- Innocent bystander (talk) 18:56, 31 July 2016 (UTC)
@Innocent bystander: To clarify, I'm only asking about category for Geonames parameter, not about others. I don't see any reason why this fact (who created article) is relevant in this situation. If needed, that can be get with database query. --Edgars2007 (talk) 19:43, 31 July 2016 (UTC)
@Edgars2007: I intend to create (at least) two categories. One for when P1556 is missing here and one for when WD and WP do not agree about the geonames-id. A third potential category could be used to detect when there is a geonames-parameter in WP and it matches P1556. In such cases, the parameter could be removed from WP. -- Innocent bystander (talk) 05:25, 1 August 2016 (UTC)
@Edgars2007: ✓ Done Category:Wikipedia:Articles with a geonames-parameter but without P1566 at Wikidata (Q26205593)! It will take some time until the category is completely filled with related articles. It will also take some time after you have added the property here, until the category is removed on svwiki. -- Innocent bystander (talk) 07:01, 1 August 2016 (UTC)
The category is now filled with almost 250000 pages. A category for the cases when WD and svwp contradicts each other have ~4000 members. -- Innocent bystander (talk) 07:10, 2 August 2016 (UTC)
Yesterday evening that was some 300 pages (for the first category) :D --Edgars2007 (talk) 07:17, 2 August 2016 (UTC)
@Edgars2007: Any progress? Lsjbot is halted for some more time, so there is a possibility to catch up with hir! I am daily sorting out some of the more complicated constraints-problems and other problems reported on svwiki. -- Innocent bystander (talk) 06:37, 21 August 2016 (UTC)
@Innocent bystander: I haven't forgot about you. Yes, I haven't had (much) time to do this yet, but will try to clean-up the category. --Edgars2007 (talk) 07:38, 21 August 2016 (UTC)

There are now 500,000+ identifiers to be imported. If you have a bot with 1 second throttle, this will take almost six days. Any volunteers? The more we are, the faster it's done. Matěj Suchánek (talk) 09:35, 29 January 2017 (UTC)

I can give you QS commands. Details in e-mail. --Edgars2007 (talk) 09:43, 29 January 2017 (UTC)

I want to help. Can I just use Harvest Templates (Q21914398) ? Do I just use the template and parameter, or do I have to use the category (sv:Kategori:Wikipedia:Artiklar där geonames-parametern och P1566 på Wikidata inte överensstämmer) as a filter too? SJK (talk) 13:38, 19 March 2017 (UTC)

(See also below.) Yes, Harvest Templates should work. If that filter works, I recommend you using it, it should be faster. Matěj Suchánek (talk) 14:08, 19 March 2017 (UTC)
@Matěj Suchánek: Thanks, it does work, albeit not very well. You can only do a batch of about 1000 or else it times out fetching the pages. More significantly, around 90% it refuses to set P1566 because P31 is absent. So for a batch of 1000 you only get 100 imported. Probably not worth the effort pursuing this path any further. SJK (talk) 09:41, 20 March 2017 (UTC)
Oh, sorry then. Actually it makes sense that P31 should be present first, so Harvest Templates is useless for now. Matěj Suchánek (talk) 13:48, 20 March 2017 (UTC)

Delete redirects[edit]

Lists like Josve05a's User:Josve05a/dupes tend to be full of redirects with sitelinks. Samples: Q5145525 and Q23812706. Would you delete all sitelinks on these items that are redirects?

Obviously, a solution that would solve this for even more item would be better.
--- Jura 23:41, 12 September 2016 (UTC)

For seeking/finding/implementing a more general solution, I created phab:T145522.
--- Jura 07:38, 14 September 2016 (UTC)
I have cleaned up those lists. I'm almost 100% sure there is a repository with database reports regarding redirects per project. I'll search for it later. Matěj Suchánek (talk) 20:52, 30 January 2017 (UTC)

Import original film titles (P1476) from Wikipedia[edit]

About 25-30% of film items have "title" set, but many don't. WD:WikiProject_Movies/Tools#Wikipedia_infobox_mapping lists infobox fields that include it, but Harvesttemplates doesn't allow import. If the language of the film isn't known, the language code can be set to "und" (=unknown).

Some languages have categories to add the original language of the film (P364), but others don't (notably frwiki for French-language films, itwiki for Italian-language films). Obviously, frequently we could assimilate the categories for the country of origin with a language.
--- Jura 15:24, 21 November 2016 (UTC)

French Wikipedia has the original title in a text block. This could easily be imported with "und" as language code, or, if the language in the same block can be decode, with the corresponding language code.
--- Jura 10:53, 10 April 2017 (UTC)

Clean up aliases with registered trademark signs[edit]

I've just fixed an item with the alias "Celexa®". Please can someone check how widespread the use of "®" is; and - if significant - have a bot remove the symbol from aliases? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:05, 27 November 2016 (UTC)

@Pigsonthewing: My query found 2000. Matěj Suchánek (talk) 11:40, 11 December 2016 (UTC)
Thank you. Can anyone fix these? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:30, 19 March 2017 (UTC)
The number is now higher than three months ago, maybe we also want to track where they come from. Are you sure just removing such aliases is always the right thing? Matěj Suchánek (talk) 14:06, 19 March 2017 (UTC)
I don't think we are.
--- Jura 14:29, 19 March 2017 (UTC)
Not removing; cleaning (by removing the symbol). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:54, 20 March 2017 (UTC)
A number of recent additions are in the titles of journal articles. Perhaps they should be kept; or the version with the symbol made an alias, and the version without used as the label. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:58, 20 March 2017 (UTC)
@Harej: (FYI). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:59, 20 March 2017 (UTC)

Adding French census data to Wikidata[edit]

Hi, (Info to @Zolo:, @VIGNERON:, @Oliv0:, @Snipre:)
I am looking for a contributor to load on wikidata the French census data by a bot from Excel tables that I would communicate. These tables contain the population data defined by the following properties: (INSEE municipality code (P374)), (population (P1082)) and uncertainty, but also by qualifiers characterizing these data: (point in time (P585)), (determination method (P459)) and (criterion used (P1013)) and two sources (that of the data themselves (INSEE) and that of the census calendar).

In France, the census is indeed now (from 2006) based on an annual collection of information, covering successively all the communal territories over a period of five years. Municipalities with fewer than 10,000 inhabitants carry out a census survey covering the entire population, one in five communes each year. Municipalities with a population of 10 000 or more, carry out a sample survey of a sample of addresses representing 8% of their dwellings each year. Each year there are three types of population values :

  • real populations (Q39825)
  • populations estimated by interpolation (Q187631) or extrapolation (Q744069)
  • populations estimated by sampling (Q3490295)

It's therefore necessary to load these qualifiers on Wikidata in order to correctly use the data. Loading only the population data would be insufficient.

There will be one set of data per years since 2006 (ie, 2006, 2007, 2008, etc. until 2013). The data for 2007 are here :
Is there a volunteer ? Roland45 (talk) 13:04, 2 December 2016 (UTC)

Are the data with CC0 license? --ValterVB (talk) 13:16, 2 December 2016 (UTC)
It is published by INSEE under "Open Licence": see former discussion, also here and here. Oliv0 (talk) 13:22, 2 December 2016 (UTC)
You mean this Open licence? I read on the wiki page that "Information released under the Open License may be re-used with attribution, such as a URL or other identification of the producer". In my opinion it isn't compatible with CC0 and personally I never add data with a license different from CC0. --ValterVB (talk) 13:36, 2 December 2016 (UTC)
@Oliv0, Roland45: Just read this to understand that "Open Licence" is compatible with CC-BY and not with CC0. The author of "Open Licence" defines himself this compatibility so it is not possible to import in an automatic way the full dataset from INSEE. Snipre (talk) 17:04, 2 December 2016 (UTC)
Please read the links I gave: compatibility is clear, no need to discuss and hinder upload again. Oliv0 (talk) 17:08, 2 December 2016 (UTC)
@Oliv0: No links you gave is an official answer or comment from the author of the Open Licence, so unless one of the contributors you mentioned by your links takes the responsibility of his comments and is ready to go in front of a tribunal to defend its position, this is just words in the wind. The link I provide states clearly that the author of the Open Licence, Etalab, a French commission, defines its licence as compatible with CC-BY. In the text this is not confusing: "Selon la mission Etalab, la Licence ouverte / Open licence « s’inscrit dans un contexte international en étant compatible avec les standards des licences Open Data développées à l’étranger et notamment celles du gouvernement britannique (Open Government Licence) ainsi que les autres standards internationaux (Open Database Commons-BY, Creative Commons-BY 2.0)« ." Aucune mention à la licence du gouvernement américain ou à CC0. Merci donc d'apporter un commentaire official si tu veux continuer à soutenir ta position. Snipre (talk) 17:18, 2 December 2016 (UTC)
Please read the arguments in the links I gave "si tu veux" (if you want) to see unnecessary problems, you will see that the unanimous analysis made by the contributors was indeed what I said, and the possible compatibility with CC-BY is quite a different topic. Oliv0 (talk) 17:22, 2 December 2016 (UTC)
Quelle est l'autorité de tes intervenants ? Des gens qui contribuent pour la plupart sous pseudo, cela ne fait pas lourd face à un commentaire official (voir la description de la licence ouverte par l'Etalab sur son propre blog ici avec encore une fois un lien très clair entre licence ouverte et CC-BY est mentionné, et rien concernant CC0). Après on peut jouer sur la question de l'intégralité des données, du choix créatif ou non,... mais cela, c'est clairement jouer aux limites des licences. Au final, c'est à celui qui importera des les données d'assumer. Snipre (talk) 17:53, 2 December 2016 (UTC)
@Snipre:You mention : "it is not possible to import in an automatic way the full dataset from INSEE." But it's not the case. The Excel table that I propose is a reconstituted table that you will not find anywhere on the INSEE website. Each data has its own url (such as this one). If I only loaded one data (one line of the table), would it be not eligible too ? If you think like that, the most data of Wikidata are'nt eligible. And especially all datas of population which are yet online.Roland45 (talk) 17:38, 2 December 2016 (UTC)
To any bot operators who is interested in importing the data from INSEE under Open Licence, the author of Open Licence defines in its blog (see in French this page) the compatibility of Open Licence with CC-BY and didn't mention CC0. So importing data under Open Licence in WD leads to a high potential of non respecting the terms of Open Licence. Until someone provides an official comment from Etalab, the author of Open licence, defining the compatibility of Open Licence with CC0 licence, your responsibility is engaged. Snipre (talk) 17:53, 2 December 2016 (UTC)
In my respectful view, all the discuss (in French and in English) that are linked to this topic say the same things: it's OK to import. So I don't understand why there are still heated debates… Tubezlob (🙋) 18:05, 2 December 2016 (UTC)
The 2013 data have apparently already been loaded (but of course without the qualifiers mentioned above) with the following entries : imported from (P143) French Wikipedia (Q8447) or stated in (P248) INSEE (Q156616), without any additional precision. I do not know if these imports have been done manually or "en masse", but is it better? Should'nt they be deleted ?Roland45 (talk) 18:10, 2 December 2016 (UTC)
@Roland45, ValterVB, Oliv0, Tubezlob, Snipre: « your responsibility is engaged » true but it's the same thing for every addition to Wikidata, including but not limited to, every import from Wikimedia projects (which is done daily). And as stated in the CC0 legal code « A Work made available under CC0 may be protected by copyright and related or neighboring rights ». @Roland45: connais-tu QuickStatements ? je peux t'en expliquer le fonctionnement pour que tu fasses l'import toi-même ;) Cdlt, VIGNERON (talk) 19:06, 4 December 2016 (UTC)
@Roland45, ValterVB, Oliv0, VIGNERON, Snipre: (Q20666306) is under Open License (Q3238028), and a bot (KrBot) imported a lot of data about persons directly from So it's seems to be OK, no? Tubezlob (🙋) 14:17, 22 December 2016 (UTC)
@Tubezlob: it is « OK » to me. Cdlt, VIGNERON (talk) 14:29, 22 December 2016 (UTC)
@Tubezlob: Not for me, I think that "Open licence", like "Open Government Licence", fall under "Creative Commons Attribution (CC-BY) licence" so it is incompatible with "CC0 license". In case of uncertainty like this I prefer not to intervene. --ValterVB (talk) 09:18, 23 December 2016 (UTC)
To explain better: I need a lawyer/expert of licence who can confirm or deny whether the citation must also be maintained by those who use the Wikidata data or not. In the first case we can't use the data, in the second case we can use it (probably). --ValterVB (talk) 09:25, 23 December 2016 (UTC)
@Tubezlob: People are doing crazy, that didn't prove that they did good things. Snipre (talk) 14:56, 28 December 2016 (UTC)
@VIGNERON: Merci de m'expliquer où tu lis dans la ligne suivante qui est issue de l'organisme auteur la licence ouverte la compatibilité entre la licence ouverte et celle CC0: "Une licence (la licence ouverte) qui s’inscrit dans un contexte international en étant compatible avec les standards des licences Open Data développées à l’étranger et notamment celles du gouvernement britannique (Open Government Licence) ainsi que les autres standards internationaux (ODC-BY, CC-BY 2.0)." Cette phrase vient du site officielle de l'Etalab qui a écrit la licence ouverte pour le gouvernement. Bref, on y parle de CC-BY, mais pas de CC0. Il faut acheter où les lunettes spéciales pour lire CC0 sur cette page, parce que je suis preneur.
Le droit des bases de données est plus complexe que celui des objets isolés, mais une chose est sûre, une donnée seule et isolée n'est pas sous droit, par contre l'intégralité d'un ensemble de données extrait de manière systématique tombe sous le coup de la directive européenne des droits des bases de données (voir commentaire de la Fondation sur le sujet ici et en particulier la phrase Extracting and using a insubstantial portion does not infringe, but the Directive also prohibits the "repeated and systematic extraction" of "insubstantial parts of the contents of the database".
Moralité, tant que l'on se contente d'extraire de manière non-coordonnée les données et non dans un but systématique (en gros plusieurs contributeurs travaillant indépendamment et avec de petites quantité de données), on passe sous le coup de la directive, mais dès que l'on sort le bot, on change de niveau et on tombe sous le coup de la directive qui reconnaît des droits au propriétaire de la base de données. C'est pourquoi le gouvernement français a demandé à Etalab de fournir une licence pour les données de l'Etat français afin de faciliter l'utilisation des données qui sont protégées par la directive européenne. Cette licence permet de se débarrasser de la directive et d'une autorisation en bonne et due forme, mais sous les conditions de la licence ouverte, qui se définit elle-même comme compatible CC-BY. Voilà les faits issus d'organismes identifiables et habilités dans leur domaine, l'Union européenne, l'équipe légale de la Foundation et l'Etat farnçais via Etalab. A partir de là, chacun fait ce qui veut, mais on ne peut pas prétendre que c'est correct de transférer des données sous licence ouverte vers CC0, car 1) cela n'apparaît nulle par les documents officiels, 2) c'est ignorer délibérément la relation qui a été faite par l'auteur de la licence entre licence ouverte et licence CC-BY. Et c'est sur ce dernier point que je reviens sur la responsabilité, sur cet oubli volontaire de la pensée à l'origine de la licence ouverte. Snipre (talk) 14:56, 28 December 2016 (UTC)
@Snipre: tu t'attaches à la théorie et aux textes, je parle plutôt de pratique et de l'esprit des textes. Depuis la création et tout les jours, des imports sont faits depuis des sources qui ne sont formellement peut-être pas compatible, à commercer par les imports automatiques depuis Wikipédia sans que personne n'y voit rien à redire. Sinon, sur le plan légal et juridique, pour le côté droit d'auteur (mais le droit d'auteur s'applique aux œuvres, une donnée est-elle une œuvre ?) la paternité est toujours incontournable en France (et dans la plupart des pays du monde) donc la différence entre CC0 et CC-BY est quasiment inexistante juridiquement. De plus, les références remplissent un rôle qui me semble suffisant du point de vue de la paternité. Sinon, sur le côté du droit sui generi propres aux bases de données, là encore le problème se pose pour toute données qui sont pourtant importées quotidiennement sur Wikidata et dans le cas présent, il ne me semble pas que l'on prenne une part substantielle du jeu de données. Bref, on pourrait ratiociner encore longtemps mais l'import est en cours et je ne vois pas de problème, et de toute façon - si le cas échéant un responsable demande le retrait - il sera facile de supprimer les données concernées. Cdlt, VIGNERON (talk) 18:11, 28 December 2016 (UTC)
Mention of compatibility with one licence does not exclude compatibility with another one; compatibility with CC0 has been discussed and proved many times in the links given above, no need to discuss it again. Oliv0 (talk) 06:29, 30 December 2016 (UTC)

Why is importing CC-BY data a problem? As long as you indicate the source of the statements in Wikidata, the attribution clause of the CC-BY license should be satisfied, right? − Pintoch (talk) 08:51, 20 January 2017 (UTC)

When releasing data into Wikidata a user specifies that they have the right to release the data into the public domain (CC0). Reusers of Wikidata are supposed to be able use parts of Wikidata without carrying along all the sources. ChristianKl (talk) 20:23, 16 February 2017 (UTC)

Unincorporated Communities[edit]

HI There. I was wondering if a bot is able to fill in stuff. Im looking at en:Category:Unincorporated communities in Missouri and I was hoping someone could quickly go to the wikidata items and fill in "unincorporated community in Missouri in the description page. MechQuester (talk) 05:09, 19 December 2016 (UTC)

If you add statements to these items, eventually a bot would do that or it can be generated by autodescription from the statements.
--- Jura 07:10, 20 December 2016 (UTC)
✓ Done with descriptioner-tool --Pasleim (talk) 10:22, 20 December 2016 (UTC)
LOL wow you are amazing @Pasleim:, may I ask for the syntax of what you wrote? Except this time, its Townships in Missouri and other states? I kind of want to learn how to write some. 04:47, 24 December 2016 (UTC)
This would add the description "township in Missouri" to all items with instance of (P31)=township of Missouri (Q6270791). However, as you will see, there are many red lines because there are many townships with the same label, e.g. 28 Washington Townships in Missouri. That means you need a more specific description. A possibility is to name the county in the description. This you can get by this request. --Pasleim (talk) 11:16, 24 December 2016 (UTC)

It looks like the labels of those townships could also do with cleanup; for example:

  • Hubble Township, Cape Girardeau County, Missouri - > Hubble Township
  • Linn Township, Audrain County -> Linn Township

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:32, 24 December 2016 (UTC)

@Pigsonthewing:, they do.... thats alot of edits though.  – The preceding unsigned comment was added by MechQuester (talk • contribs) at 11:24, 26 December 2016‎ (UTC).
That's why we have bots. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:57, 26 December 2016 (UTC)
Can anyone do this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:33, 19 March 2017 (UTC)

duos without parts[edit]

All these items should have 2 additional items linked with "has part". The items should link back with "part of".

For many it should be possible to create correctly labelled items from this list.
--- Jura 11:17, 31 December 2016 (UTC)

But label, instance of (P31): human (Q5), part of (P361) and perhaps sibling (P3373)/spouse (P26) are the only data that you can guess. Isn't it too little? (But it's an interesting one which I can imagine doing myself.) Matěj Suchánek (talk) 14:56, 23 January 2017 (UTC)
It depends where you come from. Starting from Wikidata:Database_reports/without_claims_by_site, this is much better. Doing even one manually is quite time-consuming.
Besides, it provides a way to find a WP article from the name of a person.
Obviously, once created, adding more information would be helpful too. If there is a VIAF identifier for the duo, frequently there are is also one for the each part.
--- Jura 10:15, 24 January 2017 (UTC)
@Jura1: So I looked into this today and it is fun. Feel free to suggest what could be better or ask for clarification how the bot works. Examples: Saints Cosmas and Damian (Q76486), no label (Q91789), Sergius and Bacchus (Q140013), Cocl & Seff (Q151489). Matěj Suchánek (talk) 20:41, 29 January 2017 (UTC)
@Matěj Suchánek: that just what I had in mind. Thanks! Sorry for not responding earlier.
--- Jura 09:03, 26 February 2017 (UTC)
My script is now public. I wonder if the bot should report somewhere which items it created. Matěj Suchánek (talk) 15:06, 4 March 2017 (UTC)
I created Wikidata:WikiProject Q5/lists/duos and added an announcement to the weekly update.
--- Jura 10:54, 10 April 2017 (UTC)


Hi, can someone add the rivers to descriptions. For example en:Category:Rivers of Montana. MechQuester (talk) 06:55, 4 January 2017 (UTC)

and also Q19558910 MechQuester (talk) 19:58, 4 January 2017 (UTC)

you could try out Descriptioner --Pasleim (talk) 16:05, 24 January 2017 (UTC)

OpenStreetMap objects[edit]

(Pinging participants in the deletion discussion for OpenStreetMap Relation identifier (P402): Yurik, Jura1, MaxSem, Kolossos, Susanna Ånäs, Abbe98, Andy Mabbett, d1gggg, Jklamo, Denny, Nikki, Sabas88, Thierry Caro, Glglgl, Frankieroberto, VIGNERON, and Kozuch.)

This is (for now) a draft proposal, but I'd rather not put this in the deletion debate where fewer people with bot experience would see this. I also did not realize this page existed on Wikidata and put a bot suggestion in the community portal a few weeks ago for some reason (no responses), which is why this is rather late (the deletion discussion began in November). Feel free to modify this, because I don't really know how this would work but wanted to make a request anyway because apparently no one's done it yet. Jc86035 (talk) 11:51, 17 January 2017 (UTC)

Could we have a bot which

  1. automatically pulls Wikidata item links from OSM objects' wikidata tags (but, if necessary for data quality, only if the item matches the Wikipedia article in the object's wikipedia tag), from the whole database initially and from new changesets thereafter;
  2. updates Wikidata items' OpenStreetMap Relation identifier (P402) (as well as properties for way and node tags, if created) from the initial dump and afterwards (human-approved if there's more than one object linked to the same Wikipedia article/Wikidata item ID);
  3. deletes the property value(s) from Wikidata items whenever a Wikidata ID is removed from an OSM object and not readded to another object (but, if necessary for data quality, only if the wikipedia tag is also removed without replacement; and only if manually approved for OSM users removing more than 10 of them within 24 hours); and
  4. makes a list of one-way links in the Wikidata → OSM direction and a list of OSM objects/Wikidata items where the links between items don't match each other?

In addition, would it be possible to use the same bot to automate this on OSM in the other direction as well? (I haven't notified the wiki or the OSM mailing lists or anything.)

Many thanks, Jc86035 (talk) 11:51, 17 January 2017 (UTC)

  • Symbol oppose vote.svg Oppose 1-3, for the reasons given in the deletion discussion; and ad nauseum. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:23, 17 January 2017 (UTC)
  • Symbol oppose vote.svg Oppose because OSM ids are not stable and they can change unexpectedly and without notice they should not really be stored. This is also why it was suggested that the OSM relation Property should be deleted. What would the use case for having a connection from Wikidata to OSM be in your case? I tried to address the discoverability issue by creating a userscript that displays a link to OSM on Wikidata... --Abbe98 (talk) 14:25, 17 January 2017 (UTC)
    • I don't have any sort of use case for this, but some people in the deletion discussion think this is a good idea and I thought I might as well make this. Jc86035 (talk) 14:50, 19 January 2017 (UTC)
  • Symbol support vote.svg Support the idea is good but with it needs some discussion and improvements (the import should respect the constraint of OpenStreetMap Relation identifier (P402) for examples). @Abbe98: OSM id are pretty stable, pretty much as stable as Wikipedia article name and when they change this is not really « unexpectedly and without notice » (I slightly remeber a tool like an API for querrying the changeset). For the use case, I can see hundred thousand cases who benefit to have all information on one single place; the first one being no need to use two different tools (using just Wikidata Query is - by definition - better than using Wikidata Query and Turbo Overpass). Plus, as there is some discussion against adding Wikidata ID on OSM (more exactly, IIRC, revert mass adding until a consensus is reached) or as the rules maybe different on the two projects, it's more secure to store the data on our side too (with our rules, like we do for other databases). PS: Your tool seems interresting but I can't make it work (or more probably I did something wrong or I'm not looking for what should be expect), what is it supposed to do? (I'd like to test it on cases, like [2] linking to Wilhelm Trute (Q15987301) or [3] and [4] both linking to Assen (Q798), examples taken from OpenStreetMap Relation identifier (P402) contraint violations). PPS: @Jc86035: for the first point, you don't really need a bot, you could do it yourself with Overpass request and Quickstatements (and some wit to check for inconsistencies). Cdlt, VIGNERON (talk) 15:12, 17 January 2017 (UTC)
    @VIGNERON:, I do disagree with you on the stability of OSM ids, some are stable(per the examples given during the deletion discussion) but most are not and just because you can query the changes it's not a streamlined process. Most of the "hundred thousands" of use cases must have been forgotten during the deletion discussion. I can see many use cases too but not where they are better then the options. On the subject of my tool; it should add a link to the OSM element in the sidebar to the left(under tools). Please drop me a note on Github or on my talk page if the issue persist. --Abbe98 (talk) 15:29, 17 January 2017 (UTC)
    @Abbe98: you can of course disagree but do you have any figures to suport your views? out of 96 French départements (Overpass request), 88 have the same number since their creation so ~91 % are stable. That is pretty stable to me (especially since départements are highly edited objects, so relation on more common objects are less likely to be unstable), in the same order as wiki sitelinks are stable (but more than Wikidata items, though there not 100 % stable either).
    Oh, my mistake, indeed, I wasn't looking at the right place. I will look into it (right now, I can spot one thing : when several OSM objects link to Wikidata your tool only show one, see the Assen (Q798) example I gave earlier). It's a great tool for readers and visualisation but it doesn't help for editors, re-users, external manipulation or querying. Why not turn this tool into a « compare WD and OSM (via OpenStreetMap Relation identifier (P402)), give warning if there is inconsistencies/problem and suggest to add/correct the relation »?
    Cdlt, VIGNERON (talk) 16:04, 17 January 2017 (UTC)
    @VIGNERON: your Overpass query did not return any data for me(Should it display anything when outputted as CSV?). I'm not sure départements is s good example as they are not merged and deleted as often as most OSM elements(AFAIK). No I have not put any efforts into obtaining any figures as it's easy to just analyze a set of non-diverse elements. Please see the deletion discussion here at Wikidata and T145284.
    Yes it's a known limitation that it does only links to one element(I'm not sure how to solve it UI-vise). I wonder how Kartographer deals with multiply Wikidata tags. The reason for not creating such a tool is that I believe OpenStreetMap Relation identifier (P402) should be deleted and even if OSM identifier where stable OpenStreetMap Relation identifier (P402) would be very limited(It's only for relations). --Abbe98 (talk) 19:03, 17 January 2017 (UTC)
    @Abbe98: strange, the overpass query should give a CSV with French départements relation ID. It was just an example, if you have a better one, or even better something more general, feel free to share it. I've seen phabricator tickets and the two deletion requests (and even participating), I'm still waiting to be convince a valid reason for deletion and to see numbers about the instability (either OSM instabillity or Wikidata instabillity). Cdlt, VIGNERON (talk) 19:55, 17 January 2017 (UTC)
    @Abbe98: Maybe you might need to press the magnifying glass button in the map sidebar? If there isn't any data you should see a grey bar at the top of the map something like "blank dataset received". Jc86035 (talk) 14:44, 19 January 2017 (UTC)
    I had my Overpass Turbo set to another Overpass instance witch returned broken data. --Abbe98 (talk) 14:12, 21 January 2017 (UTC)
    Ah, okay then. Jc86035 (talk) 15:54, 21 January 2017 (UTC)
  • Symbol oppose vote.svg Oppose it would need properties for nodes and ways (and in a future areas?) to describe the range of geometries one Wikidata object could be represented with. --Sabas88 (talk) 20:48, 17 January 2017 (UTC)
  • Symbol support vote.svg Support. I don't believe there is any serious unstability problem with OSM that does not exist with almost any external database. Whatever, a bot dedicating time to maintaining the property is not going to hurt anyone. So there is certainly no grounded reason to oppose. Thierry Caro (talk) 10:11, 19 January 2017 (UTC)
    • @Thierry Caro: I guess one downside could be that since OSM data can (I think) be directly pulled from WMF projects through GeoJSON, and Wikimedia editors can use their accounts to add data to OSM, there's not really much point in bothering to maintain two separate and redundant databases with a bot. Jc86035 (talk) 14:44, 19 January 2017 (UTC)
  • Symbol support vote.svg Support It's more user-friendly when we have the property. I don't think a little bit of data-unstability will lead to problems that a bot can't remedy. ChristianKl (talk) 13:14, 29 May 2017 (UTC)

Lighthouses: import P625 from Commons[edit]

There are some 300 lighthouses with coordinates at Commons:

Somehow the PetScan option doesn't work for them. It would be good if these could be imported.
--- Jura 20:50, 17 January 2017 (UTC)

@Jura1: Both the import and the mentioned problem need attention but the latter is not obvious to me. Matěj Suchánek (talk) 15:09, 22 January 2017 (UTC)
@Jura1: I can do it using pywikibot--Mikey641 (talk) 16:41, 22 January 2017 (UTC)
@Jura1: OK so I'm probably gonna do it tommorow because since this morning I'm actually transfaring coordinates from hewiki to wikidata, so after I'm done I'm gonna transfer from commons--Mikey641 (talk) 18:29, 22 January 2017 (UTC)
This would be simple if their module added coordinates to the page info in categories as well (it does in files only). That's why it doesn't work in PetScan. Matěj Suchánek (talk) 13:28, 2 February 2017 (UTC)
Interesting. It might be easier to fix that then.
--- Jura 09:04, 26 February 2017 (UTC)

User script to notify participants in a property proposal[edit]

Normally when a property proposal is closed the participants should be informed with a "ping", however this is done manually and it takes a lot of effort for long discussions. Would it be possible to create a script that would scrap all the user names in a property proposal page and would format the resulting list as a ping? (in groups of 5, because I think that is the limit).--Micru (talk) 20:27, 19 January 2017 (UTC)

This sounds like a good idea. Work that can be done by bots should be done by bots ;) ChristianKl (talk) 10:46, 23 January 2017 (UTC)
See phab:T139898.--GZWDer (talk) 17:23, 24 January 2017 (UTC)

Natural RU names[edit]

Per this discussion, we need to run a bot that would convert all Russian labels for people's entities into a "normal" form - "[First_name] [Patronymic_or_Middle_name] [Last_name]", instead of "Last, First Patronymic". The existing form of the name should be moved to the "Also known as" column. I suspect that Kyrgyz and Ukranian languages would also benefit from it. --Yurik (talk) 03:14, 22 January 2017 (UTC)

I am working on this. There are some questions which I'll discuss at the forum. --Infovarius (talk) 20:43, 22 January 2017 (UTC)
  • For people with name in native language (P1559) in Russian, it might be worth creating new items for given names and family names and adding them at the same time.
    --- Jura 17:45, 26 January 2017 (UTC)

Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)[edit]

  • Source: Philippine Statistics Authority
  • Link: (As FOI request, Philippines public domain)
  • Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
  • Structure: Population (P1082)
  • Covers all
    • municipality of the Philippines (Q24764)
    • city of the Philippines (Q104157)
    • province of the Philippines (Q24746)
    • region of the Philippines (Q24698)
  • Example item
    • Municipality : Dasol (Q41917)
    • City : Urdaneta (Q43168)
    • Province : Pangasinan (Q13871)
    • Region: Ilocos Region (Q12933)

Major upload needed, hard to do it manually. 2010 and 2015 data already present --Exec8 (talk) 05:02, 28 January 2017 (UTC)

articles from Norwegian wiki added to wrong items[edit]

A bot of inactive user:Emaus, did some wrong edits like this one adding sitelink to no:Neoclitopa to Neoclitopa nitidipennis (Q14869209) when it should have been added to Neoclitopa (Q18115528). I collected bunch of them by hand but there is too many of them. we need to identify links like that and move them to proper item. I will try to write a query to identify them but can some help me with moving them? --Jarekt (talk) 13:20, 15 February 2017 (UTC)

SELECT ?item ?pItem ?taxon ?parentTaxon ?sitelink WHERE {
    ?item  wdt:P171 ?pItem .          # has parent item
    ?item  wdt:P225 ?taxon .          # taxon name
    ?item  wdt:P105 ?rank .           # taxon rank
    ?pItem wdt:P225 ?parentTaxon .    # parent taxon name
    VALUES ?rank {wd:Q7432 }          # restrict rank to species only at this moment
    ?sitelink schema:about ?item .
    FILTER(STRSTARTS(STR(?sitelink), ""))
    FILTER(STRENDS(STR(?sitelink), ENCODE_FOR_URI(?parentTaxon))) # norwegian article name matches parent taxon
    #MINUS{ ?item wdt:P225 ?parentTaxon . }
} LIMIT 100
Try it!
Here is an example of a query with some of the problem sitelinks. --Jarekt (talk) 13:45, 15 February 2017 (UTC)
Any Norwegian speakers to verify that those are a bad sitelinks? --Jarekt (talk) 13:51, 15 February 2017 (UTC)

I can try to check this up in a day or two. How many errors may it be, can it be repaired manually? (I guess we must, a Bot can not sort this out? Dan Koehl (talk) 21:04, 17 February 2017 (UTC)

Please note Wikidata_talk:WikiProject_Taxonomy#Many_bad_sitelinks_to_Norwegian_Wikipedia. --Succu (talk) 21:07, 17 February 2017 (UTC)

Cycle sport events: move claims from length (P2043) to event distance (P3157) and remove unreferenced bounds from values, if existing[edit]

I request a bot run to do the following:

  • In items which have ?item wdt:P31/wdt:P279* wd:Q13406554 (instances of subclasses of sport competition (Q13406554), so basically sports competition items) replace all length (P2043) claims by event distance (P3157) claims. Quantity amount values and units, as well as existing qualifiers and references should be kept.
  • However, plenty claims still have bounds as a leftover from the time when we were not able to use quantities without bounds. I therefore request to remove all “±0” bounds, if no reference is given in the P2043 statement. It might be worth to consider removing all “±[0\.]*1” bounds as well, but I am not fully sure about that (could be repaired manually otherwise as well).

This bot run will affect in total around 2634 claims:

SELECT ?item ?itemLabel ?length ?upperBound ?lowerBound ?diff {
  ?item p:P2043 [ psv:P2043 ?value ] . # items that use P2043 (length)
  ?value wikibase:quantityAmount ?length .
    ?value wikibase:quantityUpperBound ?upperBound; wikibase:quantityLowerBound ?lowerBound .
    BIND(?upperBound - ?lowerBound AS ?diff) .
  ?item wdt:P31/wdt:P279* wd:Q13406554 . # and have P31 with subclass of sport competition (Q13406554)
#  MINUS { # activate this to filter away items that are related to cycle sport
#    VALUES ?cyclingClasses { wd:Q15091377 wd:Q18131152 }
#    ?item wdt:P31/wdt:P279* ?cyclingClasses . # but not P31 with subclass of cycling race (Q15091377) or stage (Q18131152)
#  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en' }

Try it! There is a commented part in the SPARQL query which tests the types of sports which are affected by this bot request (the MINUS section). In fact, in the field of sports events length (P2043) is exclusively used by cycle sport events (defined by types cycling race (Q15091377) and stage (Q18131152)). Our cycle sport project members were early adopters of the “quantity with units”-properties, including length (P2043). I therefore already talked to the maintainers of Module:Cycling race at Module talk:Cycling race#event distance (P3157) instead of length (P2043), which heavily uses length (P2043). They support a move to the event-specific event distance (P3157) and have already modified their module to support both properties. Via {{ExternalUse}} in Property talk:P2043 we also identified a frwikinews-Module which needs to be moved, but this is not a complicated task to my knowledge.

In general, event distance (P3157) has some advantages over length (P2043) for events. First of all, racing sports events are not physical objects which have a property “length” as a physical dimension. What one wants to express in these cases is the distance along a path which the event participants have or had to cover during the competition. Secondly, in sports events one often uses rather unphysical distance units such as lap (Q26484625), whose use is better reflected by the event distance property. It is therefore useful to gather all event distance information in one property.

Ping involved users @Molarus, Jérémy-Günther-Heinz Jähnick. Feel free to ping more editors, if necessary.

Thanks, —MisterSynergy (talk) 07:39, 20 February 2017 (UTC)

I heard that @Zolo might be interested as well, due to P2043 use in this context in fr:Module:Infobox/Descriptif course cycliste. This is unfortunately not registered by the {{ExternalUse}} template on Property talk:P2043. —MisterSynergy (talk) 11:53, 20 February 2017 (UTC)
I have searched for templates in most wikis for "P2043" and found it:Modulo:Ciclismo. I have not found a cycling template that reads P2043 data per Module Wikidata, (but I have found a railway template in enWP that uses P2043). We have to edit those Modules after moving the data to the new property as soon as possible. I hope the lua modules are coded well and don´t break.
Wikinews n:fr:Module:Cycling race and the wikis that use our Module:Cycling race will need a new version of this module. I can do this except for esWiki, because I can´t edit there. --Molarus 12:51, 20 February 2017 (UTC)

GeoNames ID (P1566), country (P17) and instance of (P31) from the Swedish and Cebuano Wikipedia[edit]

Can someone with his bot do an import of the GeoNames ID (P1566), country (P17) and instance of (P31) from the template "geobox" of the cebuano and swedisch wikipedia?--Cavaliere grande (talk) 10:36, 11 March 2017 (UTC)

For GeoNames ID (P1566), see also #Get GeoNames ID from the Cebuano or Swedish Wikipedia. I haven't yet started working on this since I think this should be done from a dedicated bot account. Matěj Suchánek (talk) 09:44, 13 March 2017 (UTC)
I will try to set up Geonames ids for QS2 during this week; OK, Matěj Suchánek? Later will try to do something with for P31 and P17 (at least partly) --Edgars2007 (talk) 15:52, 13 March 2017 (UTC)
Wikidata:Database reports/items without claims categories/svwiki has links to PetScan.
--- Jura 13:34, 19 March 2017 (UTC)

My bot now collecting information (instance of (P31), country (P17), located in the administrative territorial entity (P131), elevation above sea level (P2044), GeoNames ID (P1566)) for most of unconnected pages and then it's will continue to create new items, but the bot working slowly with python when creating new items. --Mr. Ibrahem (talk) 00:47, 25 June 2017 (UTC)

Import interwikis from Kabiye Wikipedia[edit]

Once it's live, interwikis from (kbp:) should be imported. See for updates.
--- Jura 22:00, 19 March 2017 (UTC)

Remove commas from places[edit]

Per Help:Label the label shouldn't contain the disambigauted title Eg Q1935676 should be "Bootle" not "Bootle, Cumbria". Suggesting that the disambiguated title be added as an alias. Lucywood (talk) 15:33, 20 March 2017 (UTC)

Be careful with this. There are a number of parishes where the official name includes a comma that is not a disambiguation -- eg Aston, Cote, Shifford and Chimney (Q4810933).
I also question whether it's a good idea for churches. Having several hundred churches all called "St Margaret's", rather than "St Margaret's, Thisvillage" would make editing a nightmare. Already there are enough bad merges from some of Magnus's game tools, prompting people to merge things that are actually hundreds of miles apart. This could make things far worse.
There are also a number of places where there is more than one village with the same name in the same county -- see eg the uniqueness violations for KEPN for examples: The danger is that people may see village+county aliases or descriptions, and not realise the possibility of choosing the wrong one.
These issues are not insuperable; but some caution may be advisable. Jheald (talk) 23:12, 20 March 2017 (UTC)
Also pinging @Pigsonthewing:, who I saw recently removed the village name from a church (diff), so may have thoughts on this. Jheald (talk) 12:16, 21 March 2017 (UTC)
Thank you. AIUI, disambiguation should always be done in the description, not in the label. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:05, 21 March 2017 (UTC)
Personally I prefer keeping the disabiguation part in the label until it's properly disambiguated by the description. I have recently run my bot to clean up some Czech labels. Matěj Suchánek (talk) 12:38, 21 March 2017 (UTC)
with automated descriptions such disambiguation issues are no longer relevant. I often shorten such names then again I use Reasonator and it does include automated descriptions. Thanks, GerardM (talk) 12:57, 21 March 2017 (UTC)
Maybe just doing the ones with the county in the title would be a good idea first. Lucywood (talk) 23:08, 21 March 2017 (UTC)

Christie's work IDs[edit]

Please can someone import values for Christie's work ID (P3783) from commons:Template:Christie's online? Harvest Templates baulks at it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:54, 21 March 2017 (UTC)

Import from Allgemeine Deutsche Biographie on dewikisource[edit]

There are about 26.000 biographical articles in the Allgemeine Deutsche Biographie (Q590208) which are all transcribed, proofread, and linked in a Wikisource project (s:de:Allgemeine Deutsche Biographie). All of those articles have Wikidata items, but most of the items have only the most basic information: instance of (P31) = biographical article (Q19389637) (or cross-reference (Q1302249)), part of (P361) = Allgemeine Deutsche Biographie (Q590208).

There is a way to harvest more statements from Wikisource. A lot of data is available in an infobox, s:de:Vorlage:ADBDaten. For example s:de:ADB:Heyne, Christian Gottlob has:

WS parameter value WD equivalent
TITEL Heyne, Christian Gottlob title (P1476)  Heyne, Christian Gottlob (German)
VORIGER Heynatz, Johann Friedrich follows (P155)  no label (Q21251870)
NÄCHSTER Heynlin de Lapide, Johannes followed by (P156)  no label (Q27582972)
BAND 12 volume (P478)  12
ANFANGSSEITE 375 page(s) (P304)  375-378
ART Biographie instance of (P31)  biographical article (Q19389637)
AUTORENKÜRZEL1 Bursian. author (P50)  Conrad Bursian (Q87123) AND/OR author name string (P2093)  Bursian
WIKIPEDIA Christian Gottlob Heyne main subject (P921)  Christian Gottlob Heyne (Q63182)
WIKISOURCE Christian Gottlob Heyne

Some of these might be simple to transclude from Wikisource to Wikidata, some more elaborate. The authors (AUTORENKÜRZEL1, AUTORENKÜRZEL2) are listed in s:de:Modul:ADB/Autoren which is pretty comprehensive, although not perfect. Not every author has a Wikisource author page, so Wikidata items are not always available.

I would really appreciate the help of the community on this. Jonathan Groß (talk) 07:33, 10 April 2017 (UTC)

@PokestarFan: Do you have any idea how to do any of this? It needs to be translated into several bot tasks, I think. Jonathan Groß (talk) 12:50, 2 May 2017 (UTC)

I know nothing about bot coding. PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font! 18:09, 6 May 2017 (UTC)
@Jonathan Groß: I can help you with this. A few questions:
  1. In no label (Q23758767) somebody added stated as (P1932) as qualifier of author (P50). Should I also do it?
  2. page(s) (P304) is used as qualifier on published in (P1433). Is this okay or should it be an own statement?
  3. What about labels and descriptions? Is there some common format? --Pasleim (talk) 13:57, 2 May 2017 (UTC)

@Pasleim: Thanks for your offer. My answers to your questions:

  1. I don't think that's ideal. I would prefer to have to separate statements in each item, author (P50)  Conrad Bursian (Q87123) AND author name string (P2093)  Bursian without qualifiers. When there is no author page on Wikisource, only use P2093.
  2. I think it would be best to have page(s) (P304) as it's own statement. Qualifiers can be tricky to track down.
  3. Until now, there is no "authoritative" label and description standard. The label should make it clear that this is an item for a biographical article, not for a person. Maybe "ADB article 'Heyne, Christian Gottlob'"? The description could be modelled as "article about TITEL in the Allgemeine Deutsche Biographie (ADB)". Jonathan Groß (talk) 14:13, 2 May 2017 (UTC)
mostly done. For the momenent I left out the labels and descriptions but can add them if there is consensus for the format proposed by Jura1. --Pasleim (talk) 12:10, 9 May 2017 (UTC)
@Pasleim: Thank your very much! This is great! As for labels, I think according to Jura1's example we should use "Heyne, Christian Gottlob (ADB)" in any language, with English description "entry in the Allgemeine Deutsche Biographie", and German description "Artikel in der Allgemeinen Deutschen Biographie". Jonathan Groß (talk) 15:35, 9 May 2017 (UTC)
@Pasleim, Jonathan Groß: FYI items no label (Q26132951) and Franz Joseph Muxel (Q27518264) seem to be mixed. Matěj Suchánek (talk) 09:33, 12 May 2017 (UTC)
@Matěj Suchánek: Thanks! I fixed it. Jonathan Groß (talk) 08:07, 15 May 2017 (UTC)

Alexa rankings[edit]

Could a bot automatically update Alexa rank (P1661) for a selected list of items and their websites, much like OKBot did on the English Wikipedia? (The bot was blocked in 2014 for introducing errors to pages, and was never fixed.) Thanks, Jc86035 (talk) 10:03, 15 April 2017 (UTC)

Add P3858 to items[edit]

Could route diagram (P3858) be added to items based on the values in |2= of Template:Railway-routemap (Q21400400) and |1= of all the other templates on this page (in all languages)? Thanks, Jc86035 (talk) 10:48, 15 April 2017 (UTC)

items for segments[edit]

For many anthology film (Q336144), it can be worth creating an item for each segment (sample: Q16672466#P527). Such items can include details on director/cast/etc as applicable (sample: Q26156116).

The list of anthology films includes already existing items.

This task is similar to #duos_without_parts above.
--- Jura 10:14, 22 April 2017 (UTC)

Import data for TA98 Latin term (P3982)[edit]

Request date: 24 May 2017, by: ChristianKl (talkcontribslogs)

Link to discussions justifying the request

Task description

Currently, someone falsely imported values from enwiki into TA98 Latin term (P3982). The problem is that enwiki lists any Latin names. In 90% of the case that's the TA98 names but in many cases it isn't. The correct way to get the right names would be to import them from the source. Fortunately, we already have Terminologia Anatomica 98 ID (P1323) and there's a freely accessible table at that maps the ID's to the names.

It would be great if a bot can import that data and do the mapping for all items that have Terminologia Anatomica 98 ID (P1323) filled.

I think there's a good chance that multiple Wiki's will import TA98 Latin term (P3982) into their anatomy templates when it's provided with high quality data.

Licence of data to import (if relevant)

Their website speaks about "Free access to published terminologies" but doesn't specify what they mean with free. It's also questionable whether the data is protected by copyright.

Request process

Cleanup sv/cebwiki imports ?[edit]

I just noticed that Q24695115 duplicates Q953452 and that I had imported the area 0,36 km² from svwiki for Q24695115, but Q953452 already had 36 acres (15 ha) (as enwiki). I wonder if there are more such issues with area imported from svwiki by others. If yes, some cleanup is needed. --
--- Jura 05:27, 26 May 2017 (UTC)

Merge of language of work or name (P407) with original language of work (P364)[edit]

Request date: 5 June 2017, by: Snipre (talkcontribslogs)

Link to discussions justifying the request

Please see Wikidata:Properties_for_deletion#language_of_work_or_name_.28P407.29_and_original_language_of_work_.28P364.29 and especially the last subsections Wikidata:Properties_for_deletion#Migration.

Task description

Following the discussion, the merge of property P407 with P364 was finally decided in order to have an uniform use in WD. We are looking for a bot operator which can do this merge according to the description. The merge should be announced first so we are currently looking for bot operator which will be available to perform that task. The date of the merge has to be discussed.

Licence of data to import (if relevant)



@Matěj Suchánek, Pasleim, ValterVB:

  • Symbol oppose vote.svg Oppose for movies until a solution is found. Please proceed for books and other works, it's messy answays.
    --- Jura 12:12, 5 June 2017 (UTC)
    @Jura1: It's time to wake up. Sorry but the topic was discussed in the Wikidata:Properties for deletion page since 3 weeks and when the question about the application of the merge was raised, nobody was there to mention a problem. I propose you to have a look at the discussion there and to provide your feedback. Meanwhile we can already find a bot operator and see with him the merge procedure, then we will inform the data users, so we have still 2-3 weeks before the merge. Snipre (talk) 13:57, 5 June 2017 (UTC)
    No solution was proposed. I understand that you are still interested in the theoretical appeal, but is there actually a practical problem you want to solve?
    --- Jura 14:07, 5 June 2017 (UTC)
    @Jura1: Please just look at the discussion on Wikidata:Properties_for_deletion#language_of_work_or_name_.28P407.29_and_original_language_of_work_.28P364.29, and read once the subsection Wikidata:Properties_for_deletion#Movies where the topic of the movies was discussed and try to add useful comment there: you were the first one to comment the announcement of the merging (see here) but you never deigned to come and to discuss the open points although a call for discussion was done in the announcement. So the question is to know if you really want to discuss ? Snipre (talk) 15:38, 5 June 2017 (UTC)
  • I can do it within the next few days. --Pasleim (talk) 13:41, 7 July 2017 (UTC)
Request process

200 Creator templates to import[edit]

Request date: 6 June 2017, by: Jarekt (talkcontribslogs)

Link to discussions justifying the request
Task description

c:Category:Creator templates with authority control data holds Creator templates with authority control identifiers but without link to Wikidata item. In the past we managed to either match all such pages with existing items or create new ones, but a new batch was created. Can someone help with this? I would:

  1. perform search for the names and check if there are any matches (I already did a search based on VIAF and did not find any hits)
  2. create new items and copy as much data as resonable
  3. add item q-codes to Creator pages, or give them to me and I will add them.

Request process

Videogame descriptions[edit]

Request date: 18 June 2017, by: PokestarFan (talkcontribslogs)

Link to discussions justifying the request
Task description

Use a bot to add a predifned list of descriptions to any item with instance of (P31), and only instance of (P31) video game (Q7889).

        'en':'video game',
        'en-ca':'video game',
        'en-gb':'video game',   
        'fr':'jeu vidéo',
        'pl':'gra wideo',
Licence of data to import (if relevant)

Are you sure they are correct? MechQuester (talk) 19:41, 20 June 2017 (UTC)

Request process
  • Starting from these, I've added a couple more descriptions including release years, for a part of items. Some 12000 items (tinyurl .com/yd6ey9nt) don't have publication date, and I'm in doubt to add or not such simple descriptions (anyway for a part of them there will arise API conflicts of non-unique label+description and the bot will fail to do needed changes); thus maybe it's better to wait until these items will have publication date (P577). XXN, 19:29, 21 June 2017 (UTC)

P18 imports from infoboxes for buildings[edit]

Request date: 20 June 2017, by: Nemo bis (talkcontribslogs)

Link to discussions justifying the request
  • Thanks to [5] I noticed that a lot of items about Italian places lack a image (P18), which makes it hard to find the items which actually need me to shoot/look for a photo.
Task description
  • Take all the templates in w:it:Categoria:Template sinottici - architettura and all their transclusions.
  • Take the filename passed as "Immagine" parameter and add it to the Wikidata item as P18 if there is no P18 yet.
  • Create the item if missing.

--Nemo 20:00, 20 June 2017 (UTC)

Licence of data to import (if relevant)

No database rights in USA, not copyrightable


You can do it in self-service using Doing that for you on all templates --Teolemon (talk) 13:58, 21 June 2017 (UTC)

I could but I don't think it's the best way to import tens of thousands of pages. --Nemo 17:41, 24 June 2017 (UTC)
Maybe ✓ Done, some 977 pages done. But it was quite clunky (couldn't do more than one request at a time to avoid gateway errors) so this may have missed something. --Nemo 21:08, 24 June 2017 (UTC)
Request process

Import from Pauly-Wissowa on dewikisource[edit]

Request date: 23 June 2017, by: Jonathan Groß (talkcontribslogs)

Link to discussions justifying the request
Task description

This task is similar to the one I requested for the Allgemeine Deutsche Biographie (ADB; see here on this page) a few weeks ago; however, this current request is different as it is going to involve re-runs on a regular basis.

German Wikisource has an ongoing project (started in 2008) to digitize the Paulys Realenzyklopädie der klassischen Altertumswissenschaft (Q1138524) (abbreviated as RE) one of the most important reference works for the classicists. Since this encyclopedia was published over a period of almost 90 years (from 1893 to 1980), several generations of scholars contributed to it (we're working on identifying them and verifying there dates on a subpage), and while some of them are still alive, a large part of the article's authors have died more than 70 years ago; hence, in the EU, their works are in the public domain.

We collect some valuable data from and metadata about the RE articles in the template REDaten, which in connection with the subcategories of the main project category can very well serve as a source for statements on Wikidata.

There are some things to note. First, there are (by our definition) three different kinds of articles: a) full articles, b) Verweisungen (redirects originally created by the RE editors), c) Nachträge (supplements to articles, mostly additions and amendments, sometimes replacement articles). Our policy is to create Wikisource pages for a and b (note: pages, not redirects), and add c where it was intended by the RE editors: Below the original articles (a), i.e. on the same Wikisource page.

As far as I see, there are two tasks that need to be done on a regular basis:

  1. Check s:de:Kategorie:Paulys Realencyclopädie der classischen Altertumswissenschaft for pages with no Wikidata item and create items for them
  2. Check items for pages from that category for consistency and add labels, descriptions, and statements to them.

The second task is definitely the more complex one, so I'll differentiate and try to be as specific as possible, using the RE article on Apollon as an example:

  1. Labels should be the original heading of the articles with a statement like "Pauly-Wissowa" in brackets. This can be done by taking the page name (e.g. "RE:Apollon", subtract the "RE:" part and add " (Pauly-Wissowa)", to form the label "Apollon (Pauly-Wissowa)".
  2. Descriptions should be something generic, something like "article in Paulys Realencyclopädie der classischen Altertumswissenschaft (RE)". For instances of a, the first part should be "article", for b it should be "cross-reference".
  3. Statements should correspond with infobox template parameters (Template:REDaten):
    1. IF VERWEIS=OFF IS TRUE THEN instance of (P31)  encyclopedic article (Q17329259) ELSE instance of (P31)  cross-reference (Q1302249).
    2. All items should have part of (P361)  Paulys Realenzyklopädie der klassischen Altertumswissenschaft (Q1138524).
    3. title (P1476) can be the same as the label, but there is a problem with languages. Most lemmata are derived from Latin and Greek, but some are actually Greek or Latin themselves. Greek (to be precise, this means Ancient Greek (Q35497)) can be identified by the characters in the title (whenever they arae in Greek script), but Latin uses the same characters as German. This is something we still have to figure out.
    4. published in (P1433) can be inferred from BAND=, which adds a category as well. In our example, RE:Apollon has BAND=II,1, which puts it into Category:RE:Band II,1 = Pauly-Wissowa vol. II,1 (Q26414959). Usually, RE is cited with Roman numerals, but Arabic numerals are also in use, hence we should add both (maybe with a qualifier for the numeral system).
    5. publication date (P577) can be added like published in (P1433), since every volume has a specific date of publication.
    6. page(s) (P304) can be parsed from SPALTE_START= and SPALTE_END=.
    7. follows (P155) and followed by (P156) can be inferred from VORGÄNGER= and NACHFOLGER= respectively. If there is no such item, this statement should be left out completely (instead of making it no value).
    8. main subject (P921), arguably the most useful part of this, can be parsed either from WIKIPEDIA= or WIKISOURCE=.
    9. author (P50) can be taken from Template:REAutor at the bottom of the article. Sometimes there are multiple instances of this template in a single article. There is a Module:RE/Autoren which can help.

As far as I can think of, those are the things a bot could handle best. I'm sure there are some points that need further discussion. Please feel free to tell me your opinions and ask questions below. Jonathan Groß (talk) 13:33, 23 June 2017 (UTC)

@Pasleim, Tolanor, THE IT, S8w4, Pfaerrich: Looking forward to your input! Jonathan Groß (talk) 13:33, 23 June 2017 (UTC)


To me this all looks fine! Like I said elsewhere however for point 8 (which is indeed the most important part of this) to work we need a cleanup initiative for the RE to Wikipedia links first. All links to WP-disambiguations and lists should be removed with the help of a bot (links to sites such das w:de:Ariobarzanes, w:de:Antoninus, or w:de:Antistius etc.). There's a non-current list of all links from RE to Wikipedia at User:Pyfisch/RE. I have already corrected many of them, but many remain. --Tolanor (talk) 19:13, 25 June 2017 (UTC)

@Tolanor: I don't think we need to clean up the WIKIPEDIA= links first. On the contrary: Once all those links are transferred to Wikidata, cleanup will be a lot easier because we can run queries for main subject (P921) linking to instance of (P31)  Wikimedia disambiguation page (Q4167410), then we get lists of all RE articles linking to Begriffsklärungsseiten. Jonathan Groß (talk) 08:40, 26 June 2017 (UTC)
Okay, sounds good. --Tolanor (talk) 11:20, 26 June 2017 (UTC)
@Pasleim, Pyfisch:? I'm afraid we need someone with a bot here. --Tolanor (talk) 19:53, 4 July 2017 (UTC)
@Jonathan Groß, Tolanor: I started to programm it but I encouter problems in distinguishing articles from cross-references. Using the parameter VERWEIS is not an option because many pages don't use it but the category [[Kategorie:RE:Verweisung]] was manually added, for example in s:de:RE:Acilius 46. Relying on the category is also not an option because there are also articles in s:de:Kategorie:RE:Verweisung, for example s:de:RE:Augustum 1. Do you know a solution how I can figure out if a page is an article or a cross-reference? --Pasleim (talk) 20:13, 22 July 2017 (UTC)

Request process

Importing area_imperial from geobox settlement into area (P2046)[edit]

Request date: 27 June 2017, by: ChristianKl (talkcontribslogs)

Task description

I did recently request a query on the density of various cities. While doing that, I discovered that many cities don't have area-data inside Wikidata but have area-data on enwiki. Naperville (Q243007) is a good example. Importing area_land_imperial and area_water_imperial into area (P2046) along with applies to part (P518) would also be great. ChristianKl (talk) 21:59, 27 June 2017 (UTC)

There's also "Infobox settlement" which has area_total_sq_mi, area_land_sq_mi and area_water_km2 which can be imported. ChristianKl (talk) 22:18, 27 June 2017 (UTC)

I notice the "Infobox settlement" will display converted values if the imperial value is present and the SI value is absent, or vice versa. It would seem that from Wikipedia's point of view, it would be better to specify only one of the values and let the other be computed, to avoid the accidental creation of inconsistent values. I don't know how this would affect the import process. Jc3s5h (talk) 07:59, 14 July 2017 (UTC)

Request process

have started the import with HarvestTemplates but now stopped due to complaints on Topic:Tu9am4jyv67ga4tf. --Pasleim (talk) 07:40, 14 July 2017 (UTC)

BBF ID[edit]

Request date: 30 June 2017, by: Jonathan Groß (talkcontribslogs)

Link to discussions justifying the request
Task description

In May 2017, the Bibliothek für Bildungsgeschichtliche Forschung - Research Library for the History of Education (Q856552) introduced their new database. The old IDs are now obsolete. On my request, their technicians have been kind enough to scrape the new IDs from the old p-strings, and created a matching list of Wikidata Items, old BBF IDs, and new BBF IDs, available here.

I think a bot should replace the old p-strings in these 637 items with the new IDs. Jonathan Groß (talk) 08:41, 30 June 2017 (UTC)

Request process
  • Accepted by XXN, 20:56, 2 July 2017 (UTC) and under process
  • Done. XXN, 21:08, 2 July 2017 (UTC)

This should not have been done. The old IDs are data, and data is not something which we should discard. How - for example - is someone holding the old ID in their database now supposed to resolve it, using Wikidata? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:09, 3 July 2017 (UTC)

@Pigsonthewing: these are simple IDs. The old BBF IDs became useless after they were replaced by the new ones in the source website. The URLs generated by property were non-functional with old IDs, and after the property was updated, those old IDs didn't matched anymore the regex format of the allowed values: /[0-9a-f]{8}(-[0-9a-f]{4}){3}-[0-9a-f]{12}/. Cc Jonathan Groß XXN, 10:39, 3 July 2017 (UTC)
I understand all that; I don't agree that the IDs became "useless" as data when the URLs changed. I note that you did not address my question. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:04, 3 July 2017 (UTC)

@Pigsonthewing: The old IDs may be used somewhere (even in printed publications), but they are useless now. It took the BBF technicians a week of work to identify our 637 entries in their old database using the p-strings, since (as they told me) these were never intended to serve as permanent identifiers. The current IDs are stable, though, and replacing the outdated strings with actual identifiers serves everybody best. Jonathan Groß (talk) 11:06, 3 July 2017 (UTC)

No: they are not "useless", they still identify the same subjects that they did previously. Replacement does not serve the user in my question (which you have also not answered), or someone suing printed material such as that to which you refer, at all, much less serve them "best". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:40, 3 July 2017 (UTC)
How do they effectively identify anything, since they're not used in any accessible database? Jonathan Groß (talk) 17:01, 3 July 2017 (UTC)
The last time I looked, Wikidata was an accessible database; and until this unfortunate bot job, the IDs were used in it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:29, 3 July 2017 (UTC)
@Pigsonthewing: Do you have reason to believe that the old IDs are used elsewhere? ChristianKl (talk) 23:28, 3 July 2017 (UTC)
Even if they are, there is no way to use the old IDs for anything now, not even for identification, since they cannot be checked. The old database was taken down and replaced, so the 630-something old IDs that Wikidata (and Wikipedia) used until June 2017 are obsolete. Their replacement doesn't mean that they are not available (there is still this file). It's just that the old IDs are of no use to anybody anymore. I don't know how widely the BBF archival database was used until the database overhaul in May 2017. I know of a few printed publications and websites that refer to it, in some rare instances using p-strings as IDs, but mostly just with words such as "see BBF / DIPF archival database, Personalblatt XYX (accessed 35 May 2010)". Whatever the case, as long as these publications refer to specific archival matter, it is easily possible to find the new identifiers for said matter in the new database, without recourse to any obsolete URLs or IDs. As far as I am concerned, we can happily part with the obsolete identifiers. Jonathan Groß (talk) 06:47, 4 July 2017 (UTC)
"there is no way to use the old IDs for anything now" Indeed so, whereas if they were kept in Wikidata there would be. That's why this bot job was harmful. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:55, 4 July 2017 (UTC)

General comment: AFAIK we do not have an external identifier help page, thus I suggest to draft one in response to this incident which also covers handling of obsolete identifiers and database handles. Situations such as this one have not happened that often until now, but we have to expect more cases in the future. A defined transient process from old to new identifiers would be useful, and this could take into account that old identifiers might be useful for a while after obsolescence, but we probably do not want to pile up plenty of them forever (“BBF ID (old)”, “BBF ID (old2)”, “BBF ID (very old)” and so on…). —MisterSynergy (talk) 10:11, 4 July 2017 (UTC)

The thing is also that having a link with the old identifier is useless since it's leading to a 404 page. If the old identifiers are stored, then only as text, without any auto-generated link. Such solution could also be useful for databases where entries are deleted after, e.g., a person has died (FIDE etc.). On could implement this non-linked IDs either as a new configuration or somehow connect it with the "Deprecated" setting. Steak (talk) 11:48, 4 July 2017 (UTC)

Replace DOI citations with items[edit]

Do we have a bot doing this, and if not, should we?

  1. Find a citation that is DOI (P356) based
  2. Lookup that DOI to see if it used by an item in Wikidata
  3. if so, replace the citation using stated in (P248)
  4. [optional] if not, fetch metadata from the webpage that the DOI resolves to, create an item, then replace the citation

Obviously, any replacements should preserve other qualifiers, like quotations, page number, etc. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:06, 11 July 2017 (UTC)

Note: For sparqling the value of DOI (P356) please use the ASCII uppercase version of the DOI to avoid the creation of duplicates. -Succu (talk) 20:10, 13 July 2017 (UTC)

Bot to mark threads by Mediawiki mass delivery bot on my user talk page as "Read"[edit]

Request date: 19 July 2017, by: PokestarFan (talkcontribslogs)

Link to discussions justifying the request
Task description

I made another request earlier, but I was too broas. This is a follow-up request and I have specified that it should be for my user page, not all other user pages.

Licence of data to import (if relevant)
I hope nobody is going to waste him or her time on this. Sjoerd de Bruin (talk) 14:00, 19 July 2017 (UTC)

@PokestarFan: For me this request is a bit weird. It takes you only a few seconds to mark a thread as "read" but a bot operator likely has to spend a few hours to write a script doing the same. --Pasleim (talk) 16:46, 19 July 2017 (UTC)

Request process

Bot to track merges[edit]

Request date: 20 July 2017, by: PokestarFan (talkcontribslogs)

Link to discussions justifying the request
Task description

Provide list of all merges and total amount of merges in 4 hours. Move to subpage with format Year/Month/Day/Number(1-6). --PokestarFan • Drink some tea and talk with me • Stalk my edits • I'm not shouting, I just like this font! 21:40, 20 July 2017 (UTC)

Licence of data to import (if relevant)

Request process