Wikidata:Bot requests

From Wikidata
Jump to: navigation, search

Project
chat

Administrators'
noticeboard

Development
team

Translators'
noticeboard

Requests
for permissions

Interwiki
conflicts

Requests
for deletions

Property
proposal

Properties
for deletion

Requests
for comment

Partnerships
and imports

Request
a query

Bot
requests

Bot requests
If you have a bot request, create a new section here and tell exactly what you want. You should discuss your request first and wait for the decision of the community. Please refer to previous discussion.

For botflag requests, see Wikidata:Requests for permissions.

Tools available to all users which can be used to accomplish the work without the need for a bot:

  1. PetScan for creating items from Wikimedia pages and/or adding same statements to items
  2. QuickStatements for creating items and/or adding different statements to items
  3. Harvest Templates for importing statements from Wikimedia projects
  4. Descriptioner for adding descriptions to many items
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2017/02.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 2 days.
You may find these related resources helpful:

High-contrast-document-save.svg Data Import Hub
High-contrast-view-refresh.svg Why import data into Wikidata.
Light-Bulb by Till Teenck.svg Learn how to import data
Noun project 1248.svg Bot requests
Question Noun project 2185.svg Ask a data import question

Contents

Cyrillic merges[edit]

This included pairs of items with articles at ruwiki and ukwiki each (Sample: Q15061198 / Q12171178). Maybe it's possible to find similar items merely based on labels in these languages and merge them. --- Jura 03:33, 19 September 2015 (UTC)

I cannot find any ru-uk pairs. Are they all done? --Infovarius (talk) 16:27, 3 November 2015 (UTC)
The ones on that list are identified based on dates of birth/death and we regularly go through them. The occasional findings there (also with ru/be) suggest that there are more (without dates). A query would need to be done to find them. --- Jura 16:33, 3 November 2015 (UTC)
Today the list includes quite a few, thanks to new dates of birth/death being added. --- Jura 16:43, 2 December 2015 (UTC)
A step could involve reviewing suggestions for missing labels in one language based on labels in another languages with Add Names as labels (Q21640602): sample be/ru. --- Jura 11:44, 6 December 2015 (UTC)
I came across a few items that had interwikis in ukwiki to ruwiki, but as they were on separate items, these weren't used to link the articles to existing items (sample, merged since). --- Jura 10:17, 15 December 2015 (UTC)
SELECT DISTINCT ?item ?Spanishlabel ?item2 ?Italianlabel
WHERE 
{
  	VALUES ?item { wd:Q19909894 }
  	?item wdt:P31 wd:Q5 .

    VALUES ?item2 { wd:Q16704775 }
  	?item2 wdt:P31 wd:Q5 .

    ?item rdfs:label ?Spanishlabel . FILTER(lang(?Spanishlabel)="ru")
	BIND(REPLACE(?Spanishlabel, ",", "") as ?Spanishlabel2)

    ?item2 rdfs:label ?Italianlabel . FILTER(lang(?Italianlabel)="uk")

    FILTER(str(?Spanishlabel2) = str(?Italianlabel))
  	FILTER(str(?Spanishlabel) != str(?Italianlabel))
}
LIMIT 1

#added by Jura1

Try it!

The above currently finds one pair. It times out when not limited to specific items ;) Maybe there is a better way to find these.
--- Jura 14:19, 3 April 2016 (UTC)

In the meantime the two items were merged, so it doesn't work anymore.
--- Jura 16:54, 4 April 2016 (UTC)
See also User:Pasleim/projectmerge/ruwiki-ukwiki. XXN, 08:22, 8 September 2016 (UTC)

Taxon labels[edit]

For items where instance of (P31)=taxon (Q16521), and where there is already a label one one or more languages, which is the same as the value of taxon name (P225), the label should be copied to all other empty, western alphabet, labels. For example, this edit. Please can someone attend to this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:11, 10 March 2016 (UTC)

Do you mean label or alias? I would support the latter where there is already a label and that label is not already the taxon name. --Izno (talk) 17:03, 10 March 2016 (UTC)
No, I mean label; as per the example edit I gave. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
See your last request: Wikidata:Bot_requests/Archive/2015/08#Taxon_names. --Succu (talk) 18:57, 10 March 2016 (UTC)
Which was archived unresolved. We still have many thousands of missing labels. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
Nope. There is no consensus doing this. Reach one. --Succu (talk) 20:22, 10 March 2016 (UTC)
You saying "there is no consensus" does not mean that there is none. Do you have a reasoned objection to the proposal? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:56, 10 March 2016 (UTC)
Go back and read the linked discussions. In the nursery of wikidata some communities had strong objections. If they changed their mind my bot can easily execute this job. --Succu (talk) 21:19, 10 March 2016 (UTC)
So that's a "no" to my question, then. I read the linked discussions, and mostly I see people not discussing the proposal, and you claiming "there is no consensus", to which another poster responded "What I found, is a discussion of exactly one year old, and just one person that is not supporting because of 'the gadgets then need to load more data'. Is that the same 'no consensus' as you meant?". There are no reasoned objections there, either. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:24, 10 March 2016 (UTC)
For the lazy ones:
--Succu (talk) 21:53, 10 March 2016 (UTC)
I already done for Italian label in past. Here other two propose: May 2014 and March 2015 --ValterVB (talk) 09:54, 11 March 2016 (UTC)
@ValterVB: Thank you. Can you help across any other, or all, western-alphabet languages, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:18, 16 March 2016 (UTC)
Yes I can do it, but before to modify 2,098,749 items I think is necessary to have a strong consensus. --ValterVB (talk) 18:14, 16 March 2016 (UTC)
@ValterVB: Thank you. Could you do a small batch, say 100, as an example, so we can then ask on, say, Project Chat? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:03, 18 March 2016 (UTC)
Simply ask with the example given by you. --Succu (talk) 15:16, 18 March 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Pigsonthewing:

  • Test edit: Q14945671, Q21444273, Q2508347, Q25247.
  • Languge: "en","de","fr","it","es","af","an","ast","bar","br","ca","co","cs","cy","da","de-at","de-ch","en-ca","en-gb","eo","et","eu","fi","frp","fur","ga","gd","gl","gsw","hr","ia","id","ie","is","io","kg","lb","li","lij","mg","min","ms","nap","nb","nds","nds-nl","nl","nn","nrm","oc","pcd","pl","pms","pt","pt-br","rm","ro","sc","scn","sco","sk","sl","sr-el","sv","sw","vec","vi","vls","vo","wa","wo","zu"
  • Rule:

Very important: is necessary verify if the list of languages is complete. Is the same that I use for disambiguation item. --ValterVB (talk) 09:42, 19 March 2016 (UTC)

    • I really don't like the idea of this. The label, according to Help:Label, should be the most common name. I doubt that most people are familiar with the latin names. Inserting the latin name everywhere prevents language fallback from working and stops people from being shown the common name in another language they speak. A very simple example, Special:Diff/313676163 added latin names for the de-at and de-ch labels which now stops the common name from the de label from being shown. - Nikki (talk) 10:29, 19 March 2016 (UTC)
      • @Nikki: The vast majority of taxons have no common name; and certainly no common name in every language. And of course edits can subsequently be overwritten if a common name does exist. As for fallback, we could limit this to "top level" languages. Would that satisfy? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
        • As far as I'm aware most tools rely on the absence of certain information. Adding #10,000 csv file of Latin / Welsh (cy) species of birds. would be rendered to handcraft. --Succu (talk) 23:11, 19 March 2016 (UTC)
          • Perhaps this issue could be resolved by excluding certain groups? Or the script used in your example could overwrite the label if it matches the taxon name? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
        • It may be the case that most taxon items won't have a common name in any language, but I don't see anything here which is only trying to target the taxon items which have no common names. Adding the same string to lots of labels isn't adding any new information and as Succu pointed out, doing that can get in the way (e.g. it makes it more difficult to find items with missing labels, it can get in the way when merging (moving common names to the aliases because the target already has the latin name as a label) and IIRC the bot which adds labels for items where a sitelink has been recently added will only do so if there is no existing label). To me, these requests seem like people are trying to fill in gaps in other languages for the sake of filling in the gaps with something (despite that being the aim of the language fallback support), not because the speakers of those languages think it would be useful for them and want it to happen (if I understand this correctly, @Innocent bystander: is objecting to it for their language). - Nikki (talk) 22:40, 22 March 2016 (UTC)
          • Yes, the tolerance against bot-mistakes is limited on svwiki. Mistakes initiated by errors in the source is no big issue, but mistakes initiated by "guesses" done by a bot is not tolerated at all. The modules we have on svwiki have no problem handling items without Swedish labels. We have a fallback-system which can use any label in any language. -- Innocent bystander (talk) 06:39, 23 March 2016 (UTC)
            • @Innocent bystander: This would not involve an "guesses". Your Wikipedia's modules may handle items without labels, but what about third-party reusers? Have you identified any issues with the test edits provided above? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
              • No, I have not found any issue in the examples. But this is not my subject, I would not see an issue even if it was directly under my nose. Adding correct statements for Scientific names and Common names looks more important here for the third party users than labels, which cannot be sourced. NB, the work of Lsjbot have done that Swedish and Cebuano probably have more labels than any other language in the taxon set. You will not miss much by excluding 'sv' in this botrun. -- Innocent bystander (talk) 07:00, 24 March 2016 (UTC)
                • If a taxon name can be sourced, then by definition so can the label. If you have identified no errors, then your reference to "guesses" is not substantiated. true, adding for Scientific names and Common names is important, but the two tasks are not mutually exclusive, and their relative importance is subjective. To pick one example at random, from the many possible, Dayus (Q18107066) currently has no label in Swedish, and so would benefit from the suggested bot run. indeed, it currently has only 7 labels, all the same, and all using the scientific name. Indeed, what are the various European language's common name for this mainly Chinese genus? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:34, 25 March 2016 (UTC)
          • No, this is not "trying to fill in gaps in other languages for the sake of filling in the gaps". Nor are most of the languages affected served by fallback. If this task is completed, then "find items with missing labels" will not be an issue for the items concerned, because they will have valid labels. Meanwhile, what is the likelihood of these labels being provided manually? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
            • If this is not trying to fill in the gaps for the sake of filling in the gaps, what problem is it solving and why does language fallback not help? (I'm sure the development team would be like to know that language fallback is not working properly). The taxonomic names are not the preferred labels and valid is not the same as useful (adding "human" as the description for humans with no description was valid, yet users found it annoying and useless and they were all removed again), the labels for a specific language in that language are still missing even if we make it seem like they're not by filling in all the gaps with taxonomic names, it's just masking the problem. I can't predict the future so I don't see any point in speculating how likely it is that someone will come along and add common names. They might, they might not. - Nikki (talk) 23:02, 24 March 2016 (UTC)
              • It solves the problem of an external user, making a query (say for "all species in genus X") being returned the Q items with no labels, in their language. This could break third party applications, also. In some cases, there is currently no label in any language - how does language fallback work then? How does it work if the external user's language is Indonesian, and there is only an English label saying, say, "Lesser Spotted Woodpecker"? And, again, taxonomic names are the preferred labels for the many thousands of species - the vast majority - with no common name - or with no common name in a given language. The "human" examples compares apples with pears. This is a proposal to add specific labels, not vague descriptions (the equivalent would be adding "taxon" as a description). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:26, 25 March 2016 (UTC)
                • Why should an external user query a Wikidata internal called label and not rely on a query of taxon name (P225)? --Succu (talk) 22:04, 25 March 2016 (UTC)
                  • For any of a number of reasons; not least that they may be querying things which are not all taxons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:32, 26 March 2016 (UTC)
                    • Grand answer. Maybe they are searching the labels for aliens, gods, fairy tales or something else? A better solution would be if the Wikibase could be configured to take certain properties like as taxon name (P225) or title (P1476) as a default value as a language independent label. --Succu (talk) 21:09, 27 March 2016 (UTC)
                      • Maybe it could. But it is not. That was suggested a year or two ago, in the discussions you cited above, and I see no move to make it so, no any significant support for doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:19, 27 March 2016 (UTC)
                        • So what? Did you reached an agreement with svwiwki, cebwiki, warwiki, viwiki or nlwiki we should go along your proposed way? --Succu (talk) 21:43, 27 March 2016 (UTC)
    • @ValterVB: Thank you. I think your rules are correct. I converted the Ps &Qs in your comment to templates, for clarity. Hope that's OK. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
  • Symbol oppose vote.svg Oppose That majority of taxons does not have a common name, does not mean that all western languages should automatically use the scientific name as label. Matěj Suchánek (talk) 13:23, 16 April 2016 (UTC)
    • Nobody is saying "all western languages should automatically use the scientific name as label"; if the items already have label, it won't be changed. If a scientific label is added as a label, where none existed previously, and then that label is changed to some other valid string, the latter will not be overwritten. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:31, 20 April 2016 (UTC)

We seem to have reached as stalemate, with the most recent objections being straw men, or based on historic and inconclusive discussions. How may we move forward? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:28, 16 May 2016 (UTC)

That's simple: drop your request. --Succu (talk) 18:33, 16 May 2016 (UTC)
Were there a cogent reason to, I would. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:57, 17 May 2016 (UTC)
Anyone? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:04, 10 September 2016 (UTC)
@Pigsonthewing: I'll support the proposal if it is limited to major languages that don't have other fallbacks. For most taxons, the scientific name is the only name, and even for taxons with a common name, having the scientific name as the label is better than having no label at all. I'm reluctant to enact this for a huge number of languages though, as it might make merges (which are commonly needed for taxons) a pain to complete. Kaldari (talk) 23:02, 28 September 2016 (UTC)
@Kaldari: Thank you. Please can you be more specific as to what you mean by "major languages that don't have other fallbacks"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:35, 29 September 2016 (UTC)
@Pigsonthewing: Maybe just the biggest Latin languages: English, German, Spanish, French, Portuguese, Italian, Polish, Dutch. Kaldari (talk) 18:29, 29 September 2016 (UTC)
I'm not sure why we'd limit ourselves to them, but if we can agree they should be done, let's do so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:40, 29 September 2016 (UTC)
@Kaldari: Did you see my reply? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:48, 10 October 2016 (UTC)

Symbol oppose vote oversat.svg Strong oppose As said before...--Succu (talk) 22:02, 10 October 2016 (UTC)

What you actually said was "There is no consensus doing this. Reach one.". My reply was "You saying 'there is no consensus' does not mean that there is none. Do you have a reasoned objection to the proposal?", and you provided none then, nor since. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:11, 11 October 2016 (UTC)

Add labels from sitelinks[edit]

There used to be a bot that added labels based on sitelinks (enwiki sitelink => en label). I think it stopped running at some point. Maybe some alternative should be found.
--- Jura 08:32, 8 April 2016 (UTC)

I have seen, that Pasleim's bot is doing some job in this area, at least for German and French. --Edgars2007 (talk) 16:20, 9 April 2016 (UTC)
I do it for all the languages, but only for item that have one of these values in instance of (P31):

There is the problem with uppercase/lowercase --ValterVB (talk) 16:30, 9 April 2016 (UTC)

Another rule that I use: add label if the first letter of sitelink is one of this list:
  • (
  • !
  • ?
  • "
  • $
  • '
  • ,
  • .
  • /
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

If you have other suggestion I can add it --ValterVB (talk) 16:41, 9 April 2016 (UTC)

  • Pictogram voting comment.svg Comment Just to make sure this is clear: this is mainly for items that exist and where someone added manually a sitelink to, e.g., enwiki, but the items doesn't have a label in the corresponding language yet. It does not concern items that don't have an English label, but no sitelink to English. I don't think search finds such items if they have no label defined at all. It's key that at least a basic label is defined for such items.
    If you are looking for rules to implement, then try the ones used by PetScan (Q23665536). It mainly removes disambiguators in round brackets. I think this works fine for Wikipedia. A large amount of pages are created that way. It might not work well for Wikisource.
    --- Jura 10:50, 10 April 2016 (UTC)
Jura, these rules are applied only on item that have a sitelink but don't have a label in language of the sitelinki. I check for all the sitelink that end with "wiki", excep "commonswiki", "wikidatawiki", "specieswiki", "metawiki" and "mediawikiwiki" and I delete disambiguation with parenthesis. --ValterVB (talk) 12:13, 10 April 2016 (UTC)

{{Section resolved|Sjoerd de Bruin (talk) 07:45, 19 September 2016 (UTC)}}

I think we still should try to do something about this. If we want to be purist and not add them as labels, I think they should be added at least as alias.
--- Jura 13:32, 20 September 2016 (UTC)

Maybe we should move the discussion to WD:PC or WD:RFC to get community consensus on one or the other solution. --Pasleim (talk) 13:59, 20 September 2016 (UTC)
+1 to move this discussion to WD:PC or WD:RFC, as Pasleim proposed. There are now a lot of items about human settlements for which we have labels only in sv and ceb, but it may be reasonable to copy these labels (from Geonames) to many other languages. --XXN, 22:51, 20 November 2016 (UTC)
The ceb/sv.. question has nothing to do with this request.
--- Jura 23:04, 20 November 2016 (UTC)
Hm, yep, it's an idea for a related more complex potential future proposal:) If so, your request is even much simpler (less controversial) and a consensus may be reached and here too.
At least addition of labels from sitelinks where they doesn't contain any type of parenthesis is almost uncontroversial and can be done ASAP. I'd support also addition of labels applying the rule of disambiguators removal, where is appropriate. --XXN, 00:28, 21 November 2016 (UTC)
  • I think PLbot currently is working on this task (not sure if for all languages); previously I've also noticed that Dexbot works on this but only for ar/fa Wikipedias. --XXN, 15:11, 11 December 2016 (UTC)
+BotNinja[1]. --XXN, 13:13, 1 January 2017 (UTC)

Add P1082 (population) and P585 (point in time) from PLwiki to Wikidata[edit]

Looks like PLwiki has lots of population information other Wiki does not have. It will be useful to have it for all of us. בורה בורה (talk) 18:23, 12 April 2016 (UTC)

It might be helpful to give some supporting links here, to be sure to get the right information from the right place into the right fields. Can you list one pl-article and one corresponding wikidata-item that is manually filled with the desired information? Than I can see if I can get the information filled by a script in the same way. Edoderoo (talk) 18:26, 16 April 2016 (UTC)
Edoderoo sorry for the late reply. I was on vacation. Take for example the article "Żołynia" in PLwiki. It has a population of 5188 as of 2013. However this information does not exist on Wikidata item (Q2363612). There are thousands of examples like this, but you got the idea... PLwiki is really great on population. Share it with us all. בורה בורה (talk) 10:19, 4 May 2016 (UTC)
It would be better to find a reliable source instead. Sjoerd de Bruin (talk) 07:44, 19 September 2016 (UTC)
בורה בורה: No activity for a long time. Marking this one as resolved. Multichill (talk) 11:43, 30 November 2016 (UTC)
Multichill, What do I suppose to say? I put a request, I explain it and it is not done yet... In many Wikidata items I see ENwiki or RUwiki as a source. Why can't PLwiki be a source as well? Please reopen this request and put it back in the queue. I appreciate if you tag me on your reply. בורה בורה (talk) 03:36, 1 December 2016 (UTC)
בורה בורה: You request something, you should keep an eye on it for example by putting this page on your watchlist and reply if someone responds. That way tasks get resolved (or declined) quickly.
I had a look at pl:Żołynia. It uses the template pl:Template:Wieś infobox with the fields "liczba ludności" and "rok". I assume "liczba ludności" would map to population (P1082) and "rok" would be the source for the point in time (P585) qualifier like this example?
I agree with Sjoerd that a real source is preferred over import from Wikipedia, but that doesn't mean we shouldn't import this. We can always add references when a real source becomes available.
@Edoderoo: do you want to have a shot at this one? Multichill (talk) 10:42, 1 December 2016 (UTC)
Thanks for the ping. I can have a look this month to this one. Edoderoo (talk) 10:56, 1 December 2016 (UTC)
Multichill indeed your mapping is correct. I am standing by to see when this is done. Once completed our articles will be populate automatically as we are retrieving the info from Wikidata! בורה בורה (talk) 18:38, 1 December 2016 (UTC)
Multichill, Edoderoo, any progress here? בורה בורה (talk) 19:15, 18 December 2016 (UTC)
Sorry, my bot got blocked and the bot-bit revoked, so for the time being I can create a script, but I can't test nor execute it... Edoderoo (talk) 15:39, 18 January 2017 (UTC)
To clarify this, I asked Edoderoo to create a RfBots for the particular tasks he wants to be doing, but he decided not to do that. --Vogone (talk) 03:36, 19 January 2017 (UTC)

Take care of disambiguation items[edit]

Points to cover

Somehow it should be possible to create a bot that handles disambiguation items entirely. Not sure what are all the functions needed, but I started a list on the right side. Please add more. Eventually a Wikibase function might even do that.
--- Jura 13:36, 18 April 2016 (UTC)

Empty disambiguation: Probably @Pasleim: can create User:Pasleim/Items for deletion/Disambiguation . Rules: Item without sitelink, with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410). For the other point my bot alredy do something, (for my bot a disambiguation is an item with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410)). Descriptions I use description used in autoEdit Label: I add the same label for all the latin language only if all the sitelink without disambiguation are the same. With these 2 operation I detect a lot of duplicate: same label+description. For now the list is very long (maybe >10K item) but isn't possible to merge automatically too much errors. Another thing to do is normalize the descriptions, there are a lot of item with not standard description. --ValterVB (talk) 18:02, 18 April 2016 (UTC)
  • Personally, I'm not that much worried about duplicate disambiguation items. Mixes between content and disambiguations are much more problematic. It seems they keep appearing through problems with page moves. BTW, I added static numbers to the points.
    --- Jura 10:06, 19 April 2016 (UTC)
    You will always have duplicate disambiguation items, since svwiki has duplicate disambiguation-pages. Some of these duplicates exists because they cover different topics and some of them exists since the pages otherwise becomes to long. A third category are the bot-generated duplicates. They should be treated as temporary, until a carbon based user has merged them.
    And how are un-normalized descriptions a problem? -- Innocent bystander (talk) 10:58, 19 April 2016 (UTC)
About "un-normalized descriptions": ex I have a disambiguation item with label "XXXX" and description "Wikipedia disambiguation", if I create a new item with label "XXXX" and description "Wikimedia disambiguation" I don't see that already exist an disambiguation item "XXXX", if the description is "normalized" I see immediately the the disambiguation already exist so I can merge it. --ValterVB (talk) 11:10, 19 April 2016 (UTC)
For some fields, this proved quite efficient. If there are several items that can't be merged, as some point, there will be something like "Wikimedia disambiguation page (2)", etc.
--- Jura 12:10, 19 April 2016 (UTC)

Lazy start for point (4): 47 links to add instance of (P31)=Wikimedia disambiguation page (Q4167410) to items without statements in categories of sitelinks on Category:Disambiguation pages (Q1982926): en, simple, da, ja, ka, la, ba, ca, nl, pl, el, hr, sr, tr, eu, hu, nn, sq, ro, no, eo, bs, cs, es, sl, lv, fi, hy, ru, et, uk, it, mk, kk, pt, zh, sh, id, az, de, be_x_old, be, sk, fr, lt, sv, bg,
--- Jura 12:07, 23 April 2016 (UTC)

The biggest problem is to define what pages are disambiguation pages, given names and surnames. For example Backman (Q183341) and Backman (Q23773321). I don't see what is the difference between enwiki and fiwiki links. Enwiki page is in category "surnames" and fiwiki page in categories "disambiguation pages" and "list of people by surname", but the page in fiwiki only contains surnames, so basically it could be in the same item as the enwiki link. --Stryn (talk) 13:10, 23 April 2016 (UTC)

I think people at Wikidata could be tempted to make editorial decisions for Wikipedia, but I don't think it's up to Wikidata to determine what Wikipedia has to consider a disambiguation page. If a language version considers a page to be a disambiguation page, then it should go on a disambiguation item. If it's an article about a city that also lists similarly named cities, it should be on an item about that city. Even if some users at Wikidata attempted to set "capital" to a disambiguation page as Wikipedia did the same, such a solution can't be sustained. The situation for given names and family names isn't much different. In the meantime, at least it's clear which items at Wikidata have what purpose.
--- Jura 14:20, 23 April 2016 (UTC)
You then have to love Category:Surname-disambigs (Q19121541)! -- Innocent bystander (talk) 14:35, 23 April 2016 (UTC)
IMHO: In Wikipedia disambiguation page are page that listing page or possible page that have the same spelling, no assumption should be made about the meaning. If we limit the content to partial sets whith some specific criterion we haven't a disambiguation page but a list (ex. list of person with the same surname List of people with surname Williams (Q6633281). These pages must use tag __DISAMBIG__ to permit bot and human to recognize without doubts a disambiguation from a different item. In Wikidata disambiguation item are item the connect disambiguations page with the same spelling. --ValterVB (talk) 20:02, 23 April 2016 (UTC)

Disambiguation item without sitelink --ValterVB (talk) 21:30, 23 April 2016 (UTC)

I'd delete all of them.
--- Jura 06:13, 24 April 2016 (UTC)

Some queries for point (7):

A better way needs to be found for (7a).
--- Jura 08:07, 25 April 2016 (UTC)

I brought up the question of the empty items at Wikidata:Project_chat#Wikidata.2C_a_stable_source_for_disambiguation_items.3F.
--- Jura 09:39, 27 April 2016 (UTC)

As this is related: Wikidata:Project chat/Archive/2016/04#Deleting descriptions. Note, that other languages could be checked. --Edgars2007 (talk) 10:30, 27 April 2016 (UTC)

I don't mind debating if we should keep or redirect empty disambiguation items (if admins want to check them first ..), but I think we should avoid recycling them for anything else. --- Jura 10:34, 27 April 2016 (UTC)
As it can't be avoided entirely, I added a point 10.
--- Jura 08:32, 30 April 2016 (UTC)
Point (3) and (10) are done. For point (2) I created User:Pasleim/disambiguationmerge. --Pasleim (talk) 19:22, 2 July 2016 (UTC)
Thanks, Pasleim.
--- Jura 05:02, 11 July 2016 (UTC)
  • Matěj Suchánek made User:MatSuBot/Disambig errors which covers some of 7b.
    Some things it finds:
    • Articles that are linked from disambiguation items
    • Disambiguation items that were merged with items for concepts relevant to these articles (maybe we should check items for disambiguation with more than a P31-statement or attempt to block such merges)
    • Pages in languages were the disambiguation category isn't correctly set-up or recognized by the bot (some pages even have "(disambiguation)" in the page title). e.g. Q27721 (36 sitelinks) – ig:1 (disambiguation)
    • Pages in categories close to disambiguation categories. (e.g. w:Category:Set indices on ships)
    • Redirects to non-disambiguations. (e.g. Q37817 (27 sitelinks) idwiki – id:Montreuil – redirects to id:Komune di departemen Pas-de-Calais (Q243036, not a disambiguation)

Seems like an iceberg. It might be easier to check these by language and once the various problems are identified, attempt to sort out some automatically.
--- Jura 05:02, 11 July 2016 (UTC)

Note that my bot only recognizes pages with the __DISAMBIG__ magic word as disambiguations. If you want a wiki-specific approach, I can write a new script which will work only for chosen wikis. Matěj Suchánek (talk) 09:12, 12 July 2016 (UTC)
  • Step #4 should be done for now. The above list now includes links for 160+ sites.
    --- Jura 22:02, 5 August 2016 (UTC)
  • For step #3a, there is now Phab:T141845
    --- Jura 22:30, 5 August 2016 (UTC)
List of disambiguation item with conflict on Label/description --ValterVB (talk) 13:57, 6 August 2016 (UTC)
  • Added #11.
    --- Jura 02:05, 21 September 2016 (UTC)
  • Is it appropriate to add 12. Mix-n-Match should not offer disambiguation items for matching to external authority files. --Vladimir Alexiev (talk) 11:56, 21 January 2017 (UTC)
    • Sure, the list is freely editable, but the focus is mainly on how to handle these items rather than fix other tools. I wonder if things like Topic:Tjgt6ynwufjm65zk aren't just the tip of an iceberg with some other root problem.
      --- Jura 12:18, 21 January 2017 (UTC)

Bloomberg Privat Company Search[edit]

Crawl all ~300.000 companies and add them to wikidata.  – The preceding unsigned comment was added by 192.35.17.12 (talk • contribs) at 16:09, 23 May 2016 (UTC).

(related) I've research and spoken to Bloomberg employees previously on importing their symbols (BBGID). I've tried quickly proposing clear cut properties with some taking nearly a year to be approved (What you'd need). Disappointingly we've imported notability from Wikipedia with people worrying about too many items. There's also significant structural problems with Wikidata because its a crappy mirror of Wikipedia (and the smaller ones at that). Movie soundtracks can't be linked to the article's Soundtrack section (many items => 1 article). Multi-platform video games are currently a mess (1 article => many items).

To start you'll need to propose a new property Dispenser (talk) 20:09, 23 May 2016 (UTC)

@Dispenser: I added a property proposal: https://www.wikidata.org/w/index.php?title=Wikidata:Property_proposal/Financial_Instrument_Global_Identifier_(FIGI) ChristianKl (talk) 11:20, 24 September 2016 (UTC)
@Dispenser: We just created Bloomberg private company ID (P3377) ChristianKl (talk) 23:21, 3 December 2016 (UTC)

MCN number import[edit]

There are 10,031 identifiers for MCN code (P1987) that can be extracted from [3] or this English version. Many (but not all) items cited are animal taxons, which can be easily machine-read. For the rest, it would be useful if the bot generated a list presenting possible meanings (by comparing the English and Portuguese versions of the xls file with Wikidata language entries). Pikolas (talk) 12:38, 14 August 2015 (UTC)

What's the copyright status of those documents? Sjoerd de Bruin (talk) 13:04, 14 August 2015 (UTC)
It's unclear. I've opened a FOIA request to know under what license those are published. For reference, the protocol number is 52750.000363/2015-51 and can be accessed at http://www.acessoainformacao.gov.br/sistema/Principal.aspx. Pikolas (talk) 13:40, 14 August 2015 (UTC)
I heard back from them. They have assured me it's under the public domain. How can I prove this to Wikidata? Pikolas (talk) 01:48, 2 October 2015 (UTC)
@Pikolas: I have only just noticed that you haven't had the courtesy of a reply. The best method would be to get them to put a statement to that effect on their website. Failing that, you could get them to email OTRS. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:55, 10 October 2016 (UTC)
@Sjoerddebruin: Reopening this thread since I forgot to ping you. NMaia (talk) 15:45, 1 June 2016 (UTC)
Updated links: Portuguese version, English version. NMaia (talk) 19:35, 2 June 2016 (UTC)

Get GeoNames ID from the Cebuano or Swedish Wikipedia[edit]

Currently there are many concepts such as no label (Q22564260) that refer to geographical features that have articles in the Cebuano and Swedish Wikipedia. For most of them there's an Infobox with information at the respective Wikipedia but not all of the information is available in Wikidata. I would propose that the information get's copied over by a bot. There are to many articles to copy information manually. Especially the GeoNames ID should be easy to copy automatically. ChristianKl (talk) 15:52, 6 July 2016 (UTC)

Be very very careful! The GeoNamesID's that has been added here before, based on the Wikipedia-links in the GeoNames database are very very often very very wrong! Starting with copying the geonames-ID's from the sv/ceb-articles is a good start! We then can detect mismatching in Wikidata and GeoNames. Other kind of information can thereafter be directly be collected from GeoNames. But even that data is often wrong. An example, large parts of the Faroe Islands (Q4628) in GeoNames is located on the bottom of the Atlantic. -- Innocent bystander (talk) 16:26, 12 July 2016 (UTC)
@Innocent bystander: Note: I did import few thousands of Geonames IDs some few weeks ago. Can't say, how many are left there. If svwiki had some tracking category, that would be helpful :) --Edgars2007 (talk) 17:18, 31 July 2016 (UTC)
@Edgars2007: I'll see what I can do (tomorrow). One issue here is that a tracking-category cannot separate the Lsjbot-articles from the others. -- Innocent bystander (talk) 18:56, 31 July 2016 (UTC)
@Innocent bystander: To clarify, I'm only asking about category for Geonames parameter, not about others. I don't see any reason why this fact (who created article) is relevant in this situation. If needed, that can be get with database query. --Edgars2007 (talk) 19:43, 31 July 2016 (UTC)
@Edgars2007: I intend to create (at least) two categories. One for when P1556 is missing here and one for when WD and WP do not agree about the geonames-id. A third potential category could be used to detect when there is a geonames-parameter in WP and it matches P1556. In such cases, the parameter could be removed from WP. -- Innocent bystander (talk) 05:25, 1 August 2016 (UTC)
@Edgars2007: ✓ Done Category:Wikipedia:Articles with a geonames-parameter but without P1566 at Wikidata (Q26205593)! It will take some time until the category is completely filled with related articles. It will also take some time after you have added the property here, until the category is removed on svwiki. -- Innocent bystander (talk) 07:01, 1 August 2016 (UTC)
The category is now filled with almost 250000 pages. A category for the cases when WD and svwp contradicts each other have ~4000 members. -- Innocent bystander (talk) 07:10, 2 August 2016 (UTC)
Yesterday evening that was some 300 pages (for the first category) :D --Edgars2007 (talk) 07:17, 2 August 2016 (UTC)
@Edgars2007: Any progress? Lsjbot is halted for some more time, so there is a possibility to catch up with hir! I am daily sorting out some of the more complicated constraints-problems and other problems reported on svwiki. -- Innocent bystander (talk) 06:37, 21 August 2016 (UTC)
@Innocent bystander: I haven't forgot about you. Yes, I haven't had (much) time to do this yet, but will try to clean-up the category. --Edgars2007 (talk) 07:38, 21 August 2016 (UTC)

There are now 500,000+ identifiers to be imported. If you have a bot with 1 second throttle, this will take almost six days. Any volunteers? The more we are, the faster it's done. Matěj Suchánek (talk) 09:35, 29 January 2017 (UTC)

I can give you QS commands. Details in e-mail. --Edgars2007 (talk) 09:43, 29 January 2017 (UTC)

Updating population of US towns[edit]

Hello, I was wondering if a bot can be used to update population estimates in the U.S for 2015. I think a good source of information is here. It is a government website. Is this feasable?MechQuester (talk) 06:09, 26 July 2016 (UTC)

If there is any desire to do this, it should be as additional information, and should not replace official census information from 2010. This is because many laws have different provisions depending on the population of a town or city, and such laws always reference official census results which are done once every 10 years. Interim results from the Census Bureau are not recognized by law. Jc3s5h (talk) 14:22, 10 September 2016 (UTC)

Import Template:Bio from itwiki[edit]

To avoid that the gap gets to big, it might be worth doing another import. There are a series of steps outlined in Help:Import Template:Bio from itwiki.
--- Jura 17:08, 29 July 2016 (UTC)

  • I updated some the links (Autolist>PetScan) and did some of it.
    --- Jura 16:34, 17 January 2017 (UTC)

Zerozero footballer Ids & others in refs[edit]

It seems that many of en.Wikipedia's use of en:Template:Zerozero is in <ref></ref> tags. Does anyone have a bot that can compare the subject of the target page, and add matches using thefinalball.com ID (P3047)? And similar cases?Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:14, 9 August 2016 (UTC)

Likewise:

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:30, 11 August 2016 (UTC)

And:

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:46, 4 October 2016 (UTC)

Remove incorrect svwiki sitelinks[edit]

Many svwiki articles about places of China created by Lsjbot have incorrect zhwiki sitelinks, which are imported to Wikidata. Some examples I have fixed manually are:

  1. sv:Gaohu (köpinghuvudort i Kina, Jiangxi Sheng, lat 28,93, long 115,24), incorrectly links to no label (Q11671865) (a person), correct item is Gaohu (Q2656384) (a town)
  2. sv:Yushan (köpinghuvudort i Kina, Chongqing Shi, lat 29,53, long 108,43), incorrectly links to no label (Q22079207) (a person), correct item is no label (Q13714765) (a town)
  3. sv:Hongfenghu, incorrectly links to no label (Q15928338) (a lake), correct item is no label (Q14143028) (a town). I have added the correct svwiki link to the former article
  4. sv:Bianhe (köping i Kina, Anhui), incorrectly links to no label (Q11137293) (a disambiguation page), correct item is no label (Q11137300) (a subdistrict, formerly a town)
  5. sv:Chenyaohu, incorrectly links to no label (Q16935572) (a lake), correct item is no label (Q14343855) (a town)

Request to:

  1. Remove all svwiki sitelinks and Swedish labels in https://petscan.wmflabs.org/?psid=130084 (all are errors)
  2. Remove all svwiki sitelinks and Swedish labels in https://petscan.wmflabs.org/?psid=130088 (all are errors)
  3. Remove all svwiki sitelinks and Swedish labels in https://petscan.wmflabs.org/?psid=130091 (all are errors)
  4. Remove all svwiki sitelinks and Swedish labels in sv:Kategori:Robotskapade Kinaartiklar if the distance between the Wikidata P625 and coordinate in svwiki>100km

--GZWDer (talk) 15:43, 11 August 2016 (UTC)

Note: Lsjbot have not completed all articles about places of China. This should be done again when it's completed.--GZWDer (talk) 15:47, 11 August 2016 (UTC)
Looks like Chinese to me. Maybe @Innocent bystander: can explain it in Swedish to Lsj.
--- Jura 16:01, 11 August 2016 (UTC)
@Lsj: most likely speaks/writes better English than me.
I think Lsjbot has finished with Peoples republic of China. (The nations has been edited in alphabetic order after the ISO-codes, with a few exceptions for nations who have been requested (ex: Syria) or was a part of a benchmark (ex: South Sudan)) This is a known bug, but it has been hard to undo all the mistakes. It happend when places in China was interwiki-linked to articles on zh-wiki with the same "label" as the Chinese places in GeoNames. So the Swedish labels are not necessarily wrong in these cases, even if the sitelinks are. -- Innocent bystander (talk) 16:46, 11 August 2016 (UTC)
There're still a red link in sv:Bianhe. For labels: zhwiki articles always use full name, not short names; in addition, the full name may not be unique, which disambiguation pages are needed.--GZWDer (talk) 17:18, 11 August 2016 (UTC)
I ping @Bothnia: who is skilled in both East Asian and Swedish. My knowledge in Chinese is extremely limited! -- Innocent bystander (talk) 18:29, 11 August 2016 (UTC)

Requests #2 and #3 ✓ Done. Not done #4 - is a bit more difficult, and #1 – not sure if it's 100% sure (all articles in sv:Kategori:Robotskapade Kinaartiklar that doesn't have official name (P1448) in their WD items). --XXN, 20:07, 8 September 2016 (UTC)

Done #4. If somebody is interested, removed sitelink and label from 396 items, the record holder is Rock Creek (Q2352739) - the distance is 12435.39 km. I have data also for other articles for that category, if somebody wants to analyze further. Many coords are really close (<5 km), but few hundrets are in that 5-100 km part. --Edgars2007 (talk) 07:16, 13 October 2016 (UTC)

Redundant P1343 for DNB00[edit]

A lot of items contain two described by source (P1343) statements for the exact same article. One links the article directly, and one uses Dictionary of National Biography (1885-1900) (Q15987216) as value with the article as qualifier. I think the latter is the correct way of linking these, thus the redundant statements should be removed. See this item for a example. Sjoerd de Bruin (talk) 19:17, 17 August 2016 (UTC)

Wikidata:WikiProject DNB recommends the other. Using Q15987216 would be redundant as it's present in the linked item.
--- Jura 05:47, 18 August 2016 (UTC)
@Sjoerddebruin: I also prefer to link directly to the article, as recommended in Wikidata:WikiProject DNB#Examples. Is it okay for you if I remove the statement with the qualifier? --Pasleim (talk) 12:47, 24 August 2016 (UTC)
Please do, everything is better than this duplicated stuff. Sjoerd de Bruin (talk) 13:33, 24 August 2016 (UTC)
Definitely not ok. -- Sergey kudryavtsev (talk) 04:28, 26 August 2016 (UTC)
Just realized that there are more than 20,000 of such duplicated claims. @Sergey kudryavtsev: Are these claims used in a LUA module? --Pasleim (talk) 21:42, 25 August 2016 (UTC)
@Pasleim: Yes. The claims described by source (P1343) = Dictionary of National Biography (1885-1900) (Q15987216), Dictionary of National Biography, first supplement (Q16014700) or Dictionary of National Biography, second supplement (Q16014697) with qualifier stated in (P248) are widelly used in ruwp and ruws. At ruws such pages collected in categories: s:ru:Категория:Викитека:Ссылка из Викиданных:DNB, s:ru:Категория:Викитека:Ссылка из Викиданных:DNB01 and s:ru:Категория:Викитека:Ссылка из Викиданных:DNB12, altogether about 1000 pages.
Using described by source (P1343) with a qualifier stated in (P248) greatly optimize access in LUA modules — а described by source (P1343)'s values acts as some flag to decide to load or no to load an article's item. (I said all this to Jura1 several months ago, but he do not heed the voice of reason...) In opposed case LUA module should load and inspect all described by source (P1343)'s values until a required article would be found! -- Sergey kudryavtsev (talk) 04:24, 26 August 2016 (UTC)
I see. However, in most cases items only have one described by source (P1343) value. Then no additional LUA calls should be needed. In cases where there are two or three P1343 values, one resp. two more LUA calls are needed but this should still be okay performancwise. Items with more than three P1343 values are rather rare, see [4] for the distribution. --Pasleim (talk) 12:33, 26 August 2016 (UTC)
But average P1343-per-item value will actively grow because most enciclopedic article still not linked for now. The perfomance critical is a entity loading operation, which you calls as "LUA calls". I not afraids exceed the limit, but ruws sometime gets unstable timeout errors on wikidata-linked pages. So every additional loading operation makes timeout more probable. -- Sergey kudryavtsev (talk) 06:37, 27 August 2016 (UTC)
It's kind of cool being confused with Joe F., but still. Last time I discussed this with Sergev, the Russian module needed fixing and didn't make use of DNB at all. A fix was offered. The argument advanced here seems to be the same as the one we got for adding country qualifiers to every place of birth/death statements. Incidentally, I think both modules using that were designed by the same former contributor.
--- Jura 11:40, 27 August 2016 (UTC)

For what it's worth, I find the construction with the DNB article item as qualifier to be acceptable. It is a serious issue, clearly; but given that Wikisource only gradually proofreads articles (e.g. Britannica 1911 is only slowly being completed), it makes a lot of sense to add the main work, and then qualify with the article when that is possible/available. I realise this a Wikisource argument rather than a Wikidata argument. But I'm not going to apologise too much for that. Charles Matthews (talk) 11:48, 30 August 2016 (UTC)

VIAF import[edit]

Please see Property_talk:P214#Import_.3F.
--- Jura 05:09, 25 August 2016 (UTC)

Revert label additions by Edoderoobot in the beginning of May[edit]

In the beginning of May, Edoderoobot (talkcontribslogs) copied a lot of labels from other languages like here and here. You can clearly see that these aren't acceptable labels in Dutch. I've asked the bot operator multiple times to clean this up, but they are still there. Can someone help me? Sjoerd de Bruin (talk) 07:17, 25 August 2016 (UTC)

The following query uses these:

SELECT *
{
  	?item wdt:P31 wd:Q13406463 .
	?item rdfs:label ?labelnl FILTER(lang(?labelnl)="nl")
  	?item rdfs:label ?labelen FILTER(lang(?labelen)="en" && str(?labelnl) = str(?labelen) )
}

Try it!

Above a list of (all) NL labels that are identical with EN (4698). You could use QuickStatements to delete the label for some or all (or replace it).
--- Jura 07:36, 25 August 2016 (UTC)

I would not be bothered if they were cleared all. I have now filtered for what items/instance of (P31) it makes sense to take over the English description (right now items like human (Q5)), so if any are deleted in excess I can re-do them with my (repaired) bot script. But i will have a look myself if this SPARQL-script can automate a repair action. This might be the opening I needed to get it fixed myself. Edoderoo (talk) 08:21, 25 August 2016 (UTC)
I created a repair script based on the above SPARQL-query... will run it tomorrow, as right now another script isn't finished yet. Please be adviced that there might be more P31-types, but those can be fixed with the same script. Most likely Sjoerd will keep contact with me about those, but feel free to contact me in case someone finds another case. Once more thanks to Jura for this helpful SPARQL-script! Edoderoo (talk) 13:24, 25 August 2016 (UTC)

Also a lot of errors in January, see here for a example. Sjoerd de Bruin (talk) 11:37, 31 August 2016 (UTC)

Also a broad selection of subjects, see Special:Diff/330646941. Can't we mass-revert? Sjoerd de Bruin (talk) 14:30, 8 September 2016 (UTC)

Idea: Multi-wiki KML bot[edit]

Are there any bot operators willing to work on a multi-work bot task? If so, please see meta:Talk:KML files - Evad37 [talk] 04:10, 27 August 2016 (UTC)

NSW Flora IDs[edit]

Values for NSW Flora ID (P3130) are held in English Wikipedia's en:Template:NSW Flora Online, but split over multiple parameters, preventing the use of HarvestTemplates. Please can someone import them? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:23, 1 September 2016 (UTC)

My bot will import this from the original source. --Succu (talk) 16:44, 1 September 2016 (UTC)

@Pigsonthewing, YULdigitalpreservation, ChristianKl: At the moment this property works only for species (formatter: href=/cgi-bin/NSWfl.pl?page=nswfl&lvl=sp&name=), but the website has

too. This is simmilar to GRIN URL (P1421) or AlgaeBase URL (P1348). So the datatype of this property should be changed to URL. --Succu (talk) 08:13, 2 September 2016 (UTC)

@Succu: one of possible work-arounds is to have property value "nswfl" and a new qualifier with value "fm". Of course, not perfect... But this probably has to be discussed somewhere else, not on BOTREQ page.--Edgars2007 (talk) 08:18, 2 September 2016 (UTC)
@Edgars2007 Do you have a working example for this „workarond“? --Succu (talk) 21:16, 2 September 2016 (UTC)
@Succu: No, I don't have. --Edgars2007 (talk) 04:01, 3 September 2016 (UTC)

────────────────────────────────────────────────────────────────────────────────────────────────────

The simplest fix will be to rename this property, and have another for other ranks. Otherwise, change the formatter URL and use IDs like:

  • lvl=sp&name=Avicennia~marina
  • lvl=in&name=Avicennia~marina+subsp.~australasica
  • lvl=gn&name=Avicennia

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:54, 2 September 2016 (UTC)

I doubt this is a reasonable option. --Succu (talk) 18:21, 2 September 2016 (UTC)
Two, mutually-exclusive, options were suggested. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:58, 5 September 2016 (UTC)
I count three:
  1. recreate with datatype URL - straightforward
  2. create two additional taxon properties for the same dataset - complex (we don't use 3 properties for GRIN)
  3. reuse the current property with an URL fragment - a strange mixup between datatypes external ID and URL
--Succu (talk) 20:36, 8 September 2016 (UTC)
I was referring to my post, to which you replied in the singular. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:02, 10 September 2016 (UTC)

NLSZ authority name (HuBpOSK) import from VIAF[edit]

Hey, I was wondering if someone could import NSZL name authority ID (P3133) from VIAF. The trick is that this ID is not the primary ID on the NLSZ record (that would be NSZL ID (P951)), but the one under HuBpOSK in NLSZ records (so for http://viaf.org/processed/NSZL%7C000000015848 it is "114", see Antal Szerb (Q570810)). Thanks! – Máté (talk) 05:33, 4 September 2016 (UTC)

Calendar date[edit]

For every instance of calendar date (Q205892), please can someone's bot add calculated values like in these edits. It may also be possible to calculate labels in other languages; and values for other properties. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:51, 5 September 2016 (UTC)

P.S. query. --Edgars2007 (talk) 13:39, 10 September 2016 (UTC)

Delete redirects[edit]

Lists like Josve05a's User:Josve05a/dupes tend to be full of redirects with sitelinks. Samples: Q5145525 and Q23812706. Would you delete all sitelinks on these items that are redirects?

Obviously, a solution that would solve this for even more item would be better.
--- Jura 23:41, 12 September 2016 (UTC)

For seeking/finding/implementing a more general solution, I created phab:T145522.
--- Jura 07:38, 14 September 2016 (UTC)
I have cleaned up those lists. I'm almost 100% sure there is a repository with database reports regarding redirects per project. I'll search for it later. Matěj Suchánek (talk) 20:52, 30 January 2017 (UTC)

add locative as (en) alias[edit]

For places with a native label (P1705) in a language that has a locative case, it would be helpful if the form would be added as an alias.
Sample: Q1799#P1705 "Wrocław" > alias "Wrocławiu".
--- Jura 09:47, 18 September 2016 (UTC)
Question: Why? How can this be useful, in English particularly?
Concern: I have never heard of a bot which would successfully work with a Fusional language (Q318917). Matěj Suchánek (talk) 07:50, 24 September 2016 (UTC)
The problem is that sometimes locations in Wikipedia (and elsewhere) are given in this form and don't necessarily link to corresponding articles.
Currently, there is no way for people to find this locations on Wikidata.
The bot would just add an alias. Obviously, the input lists would first need to be compiled.
--- Jura 12:56, 24 September 2016 (UTC)
@Matěj Suchánek:: Maybe it's easier to do it this way: Wikidata:Property proposal/locative.
--- Jura 12:37, 6 January 2017 (UTC)

Species acronyms[edit]

Items for pages in species:Category:Repositories should probably have the full name listed on species:repositories as label and the acronym as alias.

If no better P31 value can be found, maybe P31=organization can do. Not sure what to suggest for the related categories. These are for taxa whose type specimen is held by that institution.
--- Jura 14:39, 20 September 2016 (UTC)

Dropped a note at Wikispecies. --Succu (talk) 19:48, 20 September 2016 (UTC)
The category is a mess replete with duplicates that I've long given up trying to deal with, but generally these pages should be treated as cross-wiki links for the equivalent institution (cf. species:AMNH, species:DNMNH, the latter of which I just merged into the proper institution).
In and of itself this is straightforward. The problem comes where many, if not most of these don't necessarily have straightforward matching pages on other wikis (e.g. because often a given institution correspond to several "collections" that have no been merged in Wikispecies), and that's not counting renamed institutions, collections that were moved/merged years ago, or the occasional outright ambiguous or incorrect name. Circeus (talk) 01:55, 21 September 2016 (UTC)
@Circeus: As long as the acronyms in Wikispecies article titles match the entry in the list, it should be fairly straightforward. Adding the full name to item labels would make it easier to find duplicates/merge them with other items. That some institutions have changed their name since or that collections were absorbed by others shouldn't be much of an issue. Wikidata is a good place to hold historic data as well.
--- Jura 13:01, 24 September 2016 (UTC)
I think we need a property to map an institution to a code. Index Herbariorum (Q11712089) (website) is an example for a register of herbarium (Q181916). --Succu (talk) 19:18, 24 September 2016 (UTC)
This might help, but isn't necessarily needed for this request.
BTW short name (P1813) could also be used.
--- Jura 10:48, 26 September 2016 (UTC)

Automatically creating a human subclass for anatomical features that don't already have subclasses[edit]

Some statements are true for the fingers of every species but others are human specific. Currently we often don't have separate items for the concept in humans. I think it would be valuable to have a bot that automatically creates human subclasses. ChristianKl (talk) 09:45, 22 September 2016 (UTC)

Can you please provide a list with all anatomical features? --Pasleim (talk) 11:56, 26 September 2016 (UTC)
We have animal structure (Q25570959). That then get's subclassed in different ways. That should produce a long list of anatomical features where most exist in humans. ChristianKl (talk) 14:43, 1 October 2016 (UTC)

Official tourist website[edit]

As discussed here and as summarized here I would request to transfer the information (relevant to website and toponym) stored in it:voy: into the agreed structure.

The information are stored in the following templates divided per toponym type:

When the associated Wikidata instance of a toponym do not have the property tourist office (P2872), it must be created, adding as a value the instance that contains the official tourist website. If this instance does not exist, it must be created with the following properties and values:

  • official website (P856) with the official tourist website extracted from the Quickbar template
  • country (P17) with the same value of the main instance. Obviously if the main instance is a Country the value it that instance itself
  • instance of (P31) with official tourism agency (Q26989327) ... this value is to declare its official status
  • If exist a "city property" can be stored the instance associated to the QuickbarCity
  • If exist a "territory property" can be stored the instance associated to the QuickbarRegion

I hope it is enough clear, if not, feel free to ask. Thanks, --Andyrom75 (talk) 20:11, 24 September 2016 (UTC)

I began working on this. Unfortunately the information you provided is not sufficient to create new items. We need labels and descriptions in at least one language. Any ideas? -- T.seppelt (talk) 18:26, 26 October 2016 (UTC)
@Andyrom75: please have a look at no label (Q27573285). -- T.seppelt (talk) 19:12, 26 October 2016 (UTC)
T.seppelt, sorry for sooo late answer, but your should have pinged me :-)
We haven't sotred the label of the website because we don't need it. By the way you can use as a (temporary?) label the web domain. For example for the web site "www.blabla.domain.org" the label would be "domain.org". Let me know, --Andyrom75 (talk) 08:15, 30 November 2016 (UTC)
@Andyrom75: yes I could do that, but I'm not sure if we should create hundreds/thousands of items with domain names as labels. --T.seppelt (talk) 06:30, 1 December 2016 (UTC)
T.seppelt, consider that many of those websites has different names in different languages, so in any case that information would be incomplete. From a certain point of view would be better to have the domain because at least is unique. --Andyrom75 (talk) 08:33, 2 December 2016 (UTC)

Guardian data about US police killings[edit]

https://www.theguardian.com/us-news/series/counted-us-police-killings provides data about individuals in the US who were killed by the police. Should we import the data? If we import the data it might also be interesting to make a public statement that invites other people to contribute data about those people. ChristianKl (talk) 14:38, 26 September 2016 (UTC)

Import it to what item? Jc3s5h (talk) 15:38, 26 September 2016 (UTC)
Items for the people who are killed. The Guardian lists their names and data about them. It would be possible to automatically create lists in Wikipedia that show all police killings in month X. ChristianKl (talk) 16:26, 26 September 2016 (UTC)
I object to another bot that will create lots of items without making an effort to see if there is already an item for the person. Of course, if there were an existing item it would be necessary to rigorously investigate whether the person who was killed was the same person named in an existing item. I realize that occasionally duplicate items will be created accidentally, but doing it en mass with a bot doesn't seem like a good idea to me. Jc3s5h (talk) 17:07, 26 September 2016 (UTC)
Why? Merging items is easy. Especially with the merging game. ChristianKl (talk) 17:38, 26 September 2016 (UTC)
What data? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:19, 26 September 2016 (UTC)
There seems to be something of an API: https://interactive.guim.co.uk/2015/the-counted/records/20162.json, although it isn't documented. Kaldari (talk) 22:45, 28 September 2016 (UTC)
Names of the people who are killed. manner of death (P1196) "killed by police gunshot" (and other classes for deaths that aren't gunshots). date of death (P570). ethnic group (P172). I think there's interest in having Wikipedia lists of people killed by police by race. ChristianKl (talk) 23:18, 28 September 2016 (UTC)
  • enwiki has fairly detailed lists. If you are interested in the topic, you could import these.
    --- Jura 09:33, 1 October 2016 (UTC)

film budget[edit]

The following query uses these:

SELECT ?item
WHERE {
  ?item wdt:P2769 [];
        wdt:P31/wdt:P279* wd:Q11424
}

Try it!

The query above returns over a thousand items, whereas the entertainment media jargon 'budget' refers to estimated cost (P2130) and not budget (P2769). These published numbers are estimated after production, and are not actually the planned budget. I would like someone to move these statements to the correct property while keeping the qualifiers. – Máté (talk) 12:04, 30 September 2016 (UTC)

Would someone please do it? :) It's been the wrong way for far too long now. – Máté (talk) 10:27, 6 February 2017 (UTC)

Copy data from the property documentation to property statements[edit]

The syntax in which the formatter url and other statements are stored in the property documentation should be easy to understand for a bot, that can automatically create statements from them. ChristianKl (talk) 11:34, 1 October 2016 (UTC)

This should be "move", not "copy". Some statements are suitable for this; others not. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:46, 1 October 2016 (UTC)
Wikidata:Requests for permissions/Bot/MatSuBot 2. Matěj Suchánek (talk) 18:58, 1 October 2016 (UTC)

{{Section resolved|1=Sjoerd de Bruin (talk) 08:55, 12 October 2016 (UTC)}}

Postponing archiving per Andy's comment and Topic:Td698cyh1l7depz2. My bot's RfP only allows copying, not (re)moving data from talk pages. So this can remain an open task, although I can imagine someone would like to keep it more detailed than what property statements can provide. Matěj Suchánek (talk) 17:33, 14 October 2016 (UTC)

Add a band/musician database to Wikidata.[edit]

Hello I am wondering if I need a bot or to use QuickStatements for uploading data (as opposed to inserting "items" one by one).

I am currently manually collected a dataset of around 400 Latvian rockbands, their participant names (more than 1000 musicians) and the role in the band, (musical instrument they are playing). I would like to upload that data set to Wikidata, so an embedded network graph could be made trough Wikidata query, and users could add and contribute to that graph trough Wikidata.

The data are in a spreadsheet: first column is the bandnames (the name of each band is repeated as many times as there is different bandmembers), the second column is the musicians (some of them have instruments and recording labels as extra columns)

Is there a way by using my dataset to:

  • -upload a list of bandnames as "items", being "instance of: bands" with "country of origin: Latvia"
  • -upload a list of musician names as "items", with "instance of: human" and for example "instance of: bassist", "instance of: female"
  • -upload a list of band-musicion pairs, creating for each of the bands "has part: (the musician name)"

Thank you if someone has the time to answer.
--- LinardsLinardsLinards 04:00, 20 October 2016 (UTC)

Are all the band notable? Without reference I don't think they are --ValterVB (talk) 06:32, 20 October 2016 (UTC)
Hi ValterVB, thanks for the comment, most of the bands in the data-set are notable even by Wikipedia standards (recorded albums, national radio rotation, awards, press publications, coverage's and interviews) but here on Wikidata, the purpose of uploading the bands serves the third Wikidata notability criterion " It fulfils some structural need, for example: it is needed to make statements made in other items more useful." The bands in Latvia are highly interconnected, with a lot of musicians playing in more than one band. (See the press coverage of network graph of the same data, of connecting the bands by having at least one band member common http://www.parmuziku.lv/interesanti/latvijas-grupu-saites-1092) therefore every of the bands serves as structural connection between other bands. Adding the dataset also allows to query such things as: male/female distribution in Latvian rockmusic, bands belonging to a specific label, distance between two bands by their players, bands coming from a specific city, the bass players in Latvia etc. and also query network graphs of band genealogy again based on specific parameters. --LinardsLinardsLinards (talk) 13:59, 21 October 2016 (UTC)
If we made a "loop" of not notable item, we haven't notable items. The third criterion relates to notable item. If item "A" is notable then I can create item "B" for structural need. But If I create item "A" that isn't notable also if I create item "B" not notabale and link it to "A" and viceversa, I haven't 2 notable item. --ValterVB (talk) 17:22, 21 October 2016 (UTC)
No problem ValterVB, I am aware of that. The data-set consists of most known Latvian bands. The data are collected manually from press interviews and other data sets. It sounds that I should ad these sources. I will do that. And what about the actual uploading, is a bot necessarily to do that or there exists already solutions that are able to do such uploading?--LinardsLinardsLinards (talk) 02:52, 22 October 2016 (UTC)
It's a problem because you know thet they are « are most known Latvian bands », but we can't know it. So, if the band aren't in wikipedia, wich kind of source you add to prove that are notable? --ValterVB (talk) 08:26, 22 October 2016 (UTC)
After I got the data manually from interviews with latvian music journalists, social media, and and presented them in a network graph form, I reached out to "Latvian performer and producer union" (http://www.laipa.org/), they collect information about the usage of music in radio, films, advertisements, collects the money and divides across the musicians. And since they are a public institution, they shared their database of registered bands and their participants. I could use that as a reference, since these are bands that gets to be played on radio. P.S. Unfortunately Latvian Wikipedia is very small and it currently does not include a lot of bands, what it should. So I want to show the capabilities of open databases trough data visualisations, to attract more editors in this specific field.--LinardsLinardsLinards (talk) 15:35, 22 October 2016 (UTC)
Can you add an id or link for every band in laipa.org site? Maybe we can create a new property, something like "id on laipa.prg" --ValterVB (talk) 16:48, 22 October 2016 (UTC)
The data I got from laipa.org only consists of unique integer IDs for Bands and musicians (There are several musicians with the same names). Would it maybe be appropriate to add Laipa.org ID as reference to the bands that are in data-set provided by them, and then I could find the interviews and mentions for the rest of the bands in the set?--LinardsLinardsLinards (talk) 17:47, 22 October 2016 (UTC)
Who should I ask about adding the Laipa.org ID property? I also have the information of who of the musicians have died, but no specific date or year. Does it makes sense to add property "date of death" as unknown? --LinardsLinardsLinards (talk) 22:45, 25 October 2016 (UTC)
To get a Laipa.org ID property you can start a request on Wikidata:Property proposal/Authority control. Adding "unknown" is okay if you know the person has died but you don't know the date. --Pasleim (talk) 09:08, 24 November 2016 (UTC)

Sitelink removal[edit]

This lists pages at Wikipedia that are not disambiguations, but are linked from items that have P31=Q4167410. Could you remove them? I will add them to appropriate items by QuickStatements afterwards.
--- Jura 04:25, 25 October 2016 (UTC)

I strongly oppose the use of a bot before cleaning out all the items on the list which aren't given names, because the categories concerned are not only "given name" and 'disambiguation" but also 'surnames', for example. I'm working on a equivalent list since early September and I don't see why we should do it badly with a bot when we'll still have to pass individually on each article to clean it correctly. The list was nearly twice as long when I started and it's going down steadily. --Harmonia Amanda (talk) 05:27, 25 October 2016 (UTC)
I agree that some should be given names, other "name" item. Don't worry about that.
--- Jura 05:45, 25 October 2016 (UTC)
Uh yes, I worry! How exactly do you intend to treat it? When I see your query happily mixing names, given names and disambiguation pages (because the sitelinks other than the English one can have always been genuinely a disambiguation page) and you only say "I'll treat it", I worry. --Harmonia Amanda (talk) 06:44, 25 October 2016 (UTC)
Feel free to do it manually. Please make sure to not re-purpose any item.
--- Jura 22:46, 27 October 2016 (UTC)
Refers to Wikimedia disambiguation page (Q4167410). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:06, 25 October 2016 (UTC)

P775[edit]

There are some updates related to items with Swedish urban area code (P775) in the pipeline! A number of items should have this change (the add of P813 in the reference is probably optional). All of them should have a P31:Q12813115-claim already, but if they are missing it, add it! This relates to all items who have P775 with any of these values. If there is no item with any of these PP775-values, please let me know! -- Innocent bystander (talk) 06:02, 26 October 2016 (UTC)

Animated films: cast member (P161) to voice actor (P725)[edit]

We need to find a solution for: Wikidata:Bot_requests/Archive/2016/04#.28Voice.29_actors (request archived before it was actioned). Not quite sure what would be the best approach though.
--- Jura 11:12, 28 October 2016 (UTC)

The lists at Films with live action and animation (Q1091779) and categories from Category:Films with live action and animation (Q8458270) could be used to complete genre, once done, the conversion P161>P725 of the other animated films could be done.
--- Jura 13:20, 29 October 2016 (UTC)
Support. Quite direct replaces are needed. --Infovarius (talk) 13:01, 8 November 2016 (UTC)
Note that there are animated films that utilize motion capture. In those instances actors are more like live-action cast members than voice actors. – Máté (talk) 17:21, 29 December 2016 (UTC)

add "category's main topic" to category items[edit]

Many category items, with instance of (P31) = "Wikimedia category" (Q4167836), are missing category's main topic (P301). However they share the same Commons category (P373) with some other "article" item; such items should set P301 to the q-code of that "article" item. For example Q15723537 was missing P301 but its P373 = "Ancient Greek theatre (Ohrid)" which was the same as P373 of Q3180446. So Q15723537 's P301 was set to Q3180446. Maybe a bot or someone handy with tools can find and fix such cases. --Jarekt (talk) 15:39, 1 November 2016 (UTC)

There is a unique constraint on category's main topic and I believe on Commons category, which indicates to me that the more likely change needed is a removal of certain properties in some locations. A SPARQL query would help us understand the problem better. --Izno (talk) 18:16, 1 November 2016 (UTC)
I agree that Commons category (P373) should be unique, but at the moment it is not how people use it, since there are 209,644 constraint violations. By all means clean up those violations but please fix category's main topic (P301) properties while you are doing it. --Jarekt (talk) 01:28, 2 November 2016 (UTC)
Maybe those violations are the result of the past automatic additions. How about finding the unique candidates for what you want and adding them to Mix'n'match as automatically matched? Is that possible (both finding and adding)? --AVRS (talk) 08:49, 2 November 2016 (UTC)
As I understand Mix'n'match is for connecting Wikidata with other databases. I do not know how to use it for matching between article and category items on Wikidata. I think it is just a matter of writing some database query to identify article and category items that share P373 but lack P910 and P301. For such pairs one should verify (based on label) that they relate to the same concept, then add P910 and P301, and remove P373 from category items.--Jarekt (talk) 12:13, 2 November 2016 (UTC)
There are 179,412 category items with a redundant Commons category (P373) claim, i.e. the category's main topic has the same commons category value [5]. This query returns items sharing the same Commons category (P373) but missing category's main topic (P301). To me it seems that in most cases, category's main topic (P301) should not be added but Commons category (P373) should be removed. --Pasleim (talk) 13:12, 2 November 2016 (UTC)
I figured out how to write this query capturing most that I requested, and I agree that mostly hits are false positives. I did not realized how messy Commons category (P373) field was, it looks like many wikipedias just set it to related concepts or parent concepts, etc. However I think all "category" items (column 3) in this query should loose redundant P373. If others agree, is there an easy way to remove a property from a list of q-codes? --Jarekt (talk) 14:13, 2 November 2016 (UTC)

Adding Census information to Wikidata[edit]

Hi, I'm new to the Wikimedia and Wikidata community. I am representing the U.S. Census Bureau. CensusBot will perform two main tasks:

  1. Insert Census information into new/existing Wikidata pages and update as new information is available.
  2. Update links in existing Wikipedia pages (including text and Infoboxes) with links to this newly inserted data to make sure that any pages that reference Census information stay up to date through the use of dynamic links to Wikidata.

The goals of this project are to make accurate and up to date Census information available to the public to use through Wikidata, as well as to ensure that any new and existing Wikipedia pages that reference Census information can take advantage of this data.

Insertion of Data into Wikidata[edit]

My plan for accomplishing this is to create a bot that will connect the Census Bureau's various APIs. I plan to start small with this task and scale up the number of variables and data sets that the bot will upload to Wikidata.

I will be updating this page as I make progress on this project. I would appreciate it if you can provide me with a bot flag. I would also appreciate any assistance that you can provide as I may am hoping that this provide a wealth of valuable information to the Wikidata community.  – The preceding unsigned comment was added by CensusBot (talk • contribs) at 01:23, 17 November 2016‎ (UTC).

It's great to have you on Wikidata :). Feel free to ask if you need any help.
To get your Bot flag, you need to request the permission at https://www.wikidata.org/wiki/Wikidata:Requests_for_permissions/Bot ChristianKl (talk) 08:56, 17 November 2016 (UTC)

Import original film titles (P1476) from Wikipedia[edit]

About 25-30% of film items have "title" set, but many don't. WD:WikiProject_Movies/Tools#Wikipedia_infobox_mapping lists infobox fields that include it, but Harvesttemplates doesn't allow import. If the language of the film isn't known, the language code can be set to "und" (=unknown).

Some languages have categories to add the original language of the film (P364), but others don't (notably frwiki for French-language films, itwiki for Italian-language films). Obviously, frequently we could assimilate the categories for the country of origin with a language.
--- Jura 15:24, 21 November 2016 (UTC)

Import data from Nobel Prize People Nomination ID[edit]

There are notability criteria for the ability to nominate a person for the Nobel Prize. The who are nominated are also obviously notable as a result. Entries like http://www.nobelprize.org/nomination/archive/show_people.php?id=511 contain the year of birth and death. That should be enough information to match entries to existing Wikidata items. When no existing one's exist we can add new items. ChristianKl (talk) 22:10, 21 November 2016 (UTC)

Country → Denmark[edit]

Pages using country (P17) for Denmark should point to the sovereign state Kingdom of Denmark (Q756617) rather than the constituent state Denmark (Q35). The constituent state can be kept as a qualifier by making it country (P17)  Kingdom of Denmark (Q756617) / located in the administrative territorial entity (P131)Denmark (Q35). Thank you! --Arctic.gnome (talk) 19:28, 23 November 2016 (UTC)

I agree on the move Denmark (Q35)->Kingdom of Denmark (Q756617) but I don't think located in the administrative territorial entity (P131) should be added as qualifier. Geographical entities should have located in the administrative territorial entity (P131)=Denmark (Q35) or a more precise value as a statement, all other entities shouldn't be restricted to Denmark (Q35). --Pasleim (talk) 11:10, 27 November 2016 (UTC)
So this edits should be done on every page using country (P17) = Denmark (Q35) or just on geographic location (Q2221906) entities ? Ocram89 (talk) 10:29, 29 November 2016 (UTC)
Valid question! I know to little about this to know if P17:Kingdom of Denmark (Q756617) is a valid value for anything but places. The differences between the EU-Denmark relations and the EU-Greenland/Faroe Islands relations makes me doubt that it is valid to use P17:Kingdom of Denmark (Q756617) for organisations and people. Also note that the parts of the kingdom of Denmark in some aspects are treated as sovereign states. They are allowed to speak for themselves in many international circumstances. -- Innocent bystander (talk) 20:19, 29 November 2016 (UTC)
@Ocram89, Innocent bystander: The country (P17) property isn't used on people. For governmental organizations and treaties we have applies to territorial jurisdiction (P1001) to distinguish whether it legally applies to the kingdom or the constituent country. But for everything else, the situation is trickier. We've been trying to restrict the country (P17) property to sovereign states (e.g., Scotland and French Guiana aren't allowed as values). But maybe Greenland and Denmark should be an exception to that rule? If so, I wonder if all usages of country (P17)Kingdom of Denmark (Q756617) should be replaced so that we don't have inconsistency about how places in Denmark-proper are categorized. --Arctic.gnome (talk) 17:23, 7 December 2016 (UTC)
When I talked about organisations, my main concern was non-govs and private organisations. A Faroese fishing company is probably not as restricted by EU-rules as a Jysk fishing company. -- Innocent bystander (talk) 17:37, 7 December 2016 (UTC)

Clean up aliases with registered trademark signs[edit]

I've just fixed an item with the alias "Celexa®". Please can someone check how widespread the use of "®" is; and - if significant - have a bot remove the symbol from aliases? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:05, 27 November 2016 (UTC)

@Pigsonthewing: My query found 2000. Matěj Suchánek (talk) 11:40, 11 December 2016 (UTC)

City Data from hewiki[edit]

Hey. In Hewiki we have a bot that takes data from the Israel Central Bureau of Statistics, That means that we have a lot of data about israeli cities, villages, etc.. I would really like for someone to take this data and put it in wikidata.

You can look at Tel Aviv Yaffo to see an example.

Paramaters:

|קואורדינטות=Coordinates
|מחוז=District
|ראש העירייה=Mayor
|גובה ממוצע=Average Height
|‎תאריך ייסוד=establishment date
|אוכלוסייה=Population
|צפיפות אוכלוסייה=Density
|תחום שיפוט=Area
|מדד ג'יני=Gini coefficient
|דירוג חברתי כלכלי=socio economic index
|Official website

Also, I would really like for you to take coordinates from other places in israel like שומריה (מוסד חינוכי)

Tagging @Edoderoo:

Thank you

--Mikey641 (talk) 19:56, 1 December 2016 (UTC)

@Mikey641: You can also do this yourself using this tool (except coordinates which require a bot). Matěj Suchánek (talk) 19:26, 17 December 2016 (UTC)
@Matěj Suchánek: Thank you!! didn't know of this tool. I still can't add population (It's is kind of complicated) but I will add other parameteres--Mikey641 (talk) 19:33, 17 December 2016 (UTC)

Adding French census data to Wikidata[edit]

Hi, (Info to @Zolo:, @VIGNERON:, @Oliv0:, @Snipre:)
I am looking for a contributor to load on wikidata the French census data by a bot from Excel tables that I would communicate. These tables contain the population data defined by the following properties: (INSEE municipality code (P374)), (population (P1082)) and uncertainty, but also by qualifiers characterizing these data: (point in time (P585)), (determination method (P459)) and (criterion used (P1013)) and two sources (that of the data themselves (INSEE) and that of the census calendar).

In France, the census is indeed now (from 2006) based on an annual collection of information, covering successively all the communal territories over a period of five years. Municipalities with fewer than 10,000 inhabitants carry out a census survey covering the entire population, one in five communes each year. Municipalities with a population of 10 000 or more, carry out a sample survey of a sample of addresses representing 8% of their dwellings each year. Each year there are three types of population values :

  • real populations (Q39825)
  • populations estimated by interpolation (Q187631) or extrapolation (Q744069)
  • populations estimated by sampling (Q3490295)

It's therefore necessary to load these qualifiers on Wikidata in order to correctly use the data. Loading only the population data would be insufficient.

There will be one set of data per years since 2006 (ie, 2006, 2007, 2008, etc. until 2013). The data for 2007 are here : https://www.dropbox.com/s/i6vs5ug64ls4upt/WD%20Communes-Populations%202007.xls?dl=0
Is there a volunteer ? Roland45 (talk) 13:04, 2 December 2016 (UTC)

Are the data with CC0 license? --ValterVB (talk) 13:16, 2 December 2016 (UTC)
It is published by INSEE under "Open Licence": see former discussion, also here and here. Oliv0 (talk) 13:22, 2 December 2016 (UTC)
You mean this Open licence? I read on the wiki page that "Information released under the Open License may be re-used with attribution, such as a URL or other identification of the producer". In my opinion it isn't compatible with CC0 and personally I never add data with a license different from CC0. --ValterVB (talk) 13:36, 2 December 2016 (UTC)
@Oliv0, Roland45: Just read this to understand that "Open Licence" is compatible with CC-BY and not with CC0. The author of "Open Licence" defines himself this compatibility so it is not possible to import in an automatic way the full dataset from INSEE. Snipre (talk) 17:04, 2 December 2016 (UTC)
Please read the links I gave: compatibility is clear, no need to discuss and hinder upload again. Oliv0 (talk) 17:08, 2 December 2016 (UTC)
@Oliv0: No links you gave is an official answer or comment from the author of the Open Licence, so unless one of the contributors you mentioned by your links takes the responsibility of his comments and is ready to go in front of a tribunal to defend its position, this is just words in the wind. The link I provide states clearly that the author of the Open Licence, Etalab, a French commission, defines its licence as compatible with CC-BY. In the text this is not confusing: "Selon la mission Etalab, la Licence ouverte / Open licence « s’inscrit dans un contexte international en étant compatible avec les standards des licences Open Data développées à l’étranger et notamment celles du gouvernement britannique (Open Government Licence) ainsi que les autres standards internationaux (Open Database Commons-BY, Creative Commons-BY 2.0)« ." Aucune mention à la licence du gouvernement américain ou à CC0. Merci donc d'apporter un commentaire official si tu veux continuer à soutenir ta position. Snipre (talk) 17:18, 2 December 2016 (UTC)
Please read the arguments in the links I gave "si tu veux" (if you want) to see unnecessary problems, you will see that the unanimous analysis made by the contributors was indeed what I said, and the possible compatibility with CC-BY is quite a different topic. Oliv0 (talk) 17:22, 2 December 2016 (UTC)
Quelle est l'autorité de tes intervenants ? Des gens qui contribuent pour la plupart sous pseudo, cela ne fait pas lourd face à un commentaire official (voir la description de la licence ouverte par l'Etalab sur son propre blog ici avec encore une fois un lien très clair entre licence ouverte et CC-BY est mentionné, et rien concernant CC0). Après on peut jouer sur la question de l'intégralité des données, du choix créatif ou non,... mais cela, c'est clairement jouer aux limites des licences. Au final, c'est à celui qui importera des les données d'assumer. Snipre (talk) 17:53, 2 December 2016 (UTC)
@Snipre:You mention : "it is not possible to import in an automatic way the full dataset from INSEE." But it's not the case. The Excel table that I propose is a reconstituted table that you will not find anywhere on the INSEE website. Each data has its own url (such as this one). If I only loaded one data (one line of the table), would it be not eligible too ? If you think like that, the most data of Wikidata are'nt eligible. And especially all datas of population which are yet online.Roland45 (talk) 17:38, 2 December 2016 (UTC)
To any bot operators who is interested in importing the data from INSEE under Open Licence, the author of Open Licence defines in its blog (see in French this page) the compatibility of Open Licence with CC-BY and didn't mention CC0. So importing data under Open Licence in WD leads to a high potential of non respecting the terms of Open Licence. Until someone provides an official comment from Etalab, the author of Open licence, defining the compatibility of Open Licence with CC0 licence, your responsibility is engaged. Snipre (talk) 17:53, 2 December 2016 (UTC)
In my respectful view, all the discuss (in French and in English) that are linked to this topic say the same things: it's OK to import. So I don't understand why there are still heated debates… Tubezlob (🙋) 18:05, 2 December 2016 (UTC)
The 2013 data have apparently already been loaded (but of course without the qualifiers mentioned above) with the following entries : imported from (P143) French Wikipedia (Q8447) or stated in (P248) INSEE (Q156616), without any additional precision. I do not know if these imports have been done manually or "en masse", but is it better? Should'nt they be deleted ?Roland45 (talk) 18:10, 2 December 2016 (UTC)
@Roland45, ValterVB, Oliv0, Tubezlob, Snipre: « your responsibility is engaged » true but it's the same thing for every addition to Wikidata, including but not limited to, every import from Wikimedia projects (which is done daily). And as stated in the CC0 legal code « A Work made available under CC0 may be protected by copyright and related or neighboring rights ». @Roland45: connais-tu QuickStatements ? je peux t'en expliquer le fonctionnement pour que tu fasses l'import toi-même ;) Cdlt, VIGNERON (talk) 19:06, 4 December 2016 (UTC)
@Roland45, ValterVB, Oliv0, VIGNERON, Snipre: data.bnf.fr (Q20666306) is under Open License (Q3238028), and a bot (KrBot) imported a lot of data about persons directly from data.bnf.fr. So it's seems to be OK, no? Tubezlob (🙋) 14:17, 22 December 2016 (UTC)
@Tubezlob: it is « OK » to me. Cdlt, VIGNERON (talk) 14:29, 22 December 2016 (UTC)
@Tubezlob: Not for me, I think that "Open licence", like "Open Government Licence", fall under "Creative Commons Attribution (CC-BY) licence" so it is incompatible with "CC0 license". In case of uncertainty like this I prefer not to intervene. --ValterVB (talk) 09:18, 23 December 2016 (UTC)
To explain better: I need a lawyer/expert of licence who can confirm or deny whether the citation must also be maintained by those who use the Wikidata data or not. In the first case we can't use the data, in the second case we can use it (probably). --ValterVB (talk) 09:25, 23 December 2016 (UTC)
@Tubezlob: People are doing crazy, that didn't prove that they did good things. Snipre (talk) 14:56, 28 December 2016 (UTC)
@VIGNERON: Merci de m'expliquer où tu lis dans la ligne suivante qui est issue de l'organisme auteur la licence ouverte la compatibilité entre la licence ouverte et celle CC0: "Une licence (la licence ouverte) qui s’inscrit dans un contexte international en étant compatible avec les standards des licences Open Data développées à l’étranger et notamment celles du gouvernement britannique (Open Government Licence) ainsi que les autres standards internationaux (ODC-BY, CC-BY 2.0)." Cette phrase vient du site officielle de l'Etalab qui a écrit la licence ouverte pour le gouvernement. Bref, on y parle de CC-BY, mais pas de CC0. Il faut acheter où les lunettes spéciales pour lire CC0 sur cette page, parce que je suis preneur.
Le droit des bases de données est plus complexe que celui des objets isolés, mais une chose est sûre, une donnée seule et isolée n'est pas sous droit, par contre l'intégralité d'un ensemble de données extrait de manière systématique tombe sous le coup de la directive européenne des droits des bases de données (voir commentaire de la Fondation sur le sujet ici et en particulier la phrase Extracting and using a insubstantial portion does not infringe, but the Directive also prohibits the "repeated and systematic extraction" of "insubstantial parts of the contents of the database".
Moralité, tant que l'on se contente d'extraire de manière non-coordonnée les données et non dans un but systématique (en gros plusieurs contributeurs travaillant indépendamment et avec de petites quantité de données), on passe sous le coup de la directive, mais dès que l'on sort le bot, on change de niveau et on tombe sous le coup de la directive qui reconnaît des droits au propriétaire de la base de données. C'est pourquoi le gouvernement français a demandé à Etalab de fournir une licence pour les données de l'Etat français afin de faciliter l'utilisation des données qui sont protégées par la directive européenne. Cette licence permet de se débarrasser de la directive et d'une autorisation en bonne et due forme, mais sous les conditions de la licence ouverte, qui se définit elle-même comme compatible CC-BY. Voilà les faits issus d'organismes identifiables et habilités dans leur domaine, l'Union européenne, l'équipe légale de la Foundation et l'Etat farnçais via Etalab. A partir de là, chacun fait ce qui veut, mais on ne peut pas prétendre que c'est correct de transférer des données sous licence ouverte vers CC0, car 1) cela n'apparaît nulle par les documents officiels, 2) c'est ignorer délibérément la relation qui a été faite par l'auteur de la licence entre licence ouverte et licence CC-BY. Et c'est sur ce dernier point que je reviens sur la responsabilité, sur cet oubli volontaire de la pensée à l'origine de la licence ouverte. Snipre (talk) 14:56, 28 December 2016 (UTC)
@Snipre: tu t'attaches à la théorie et aux textes, je parle plutôt de pratique et de l'esprit des textes. Depuis la création et tout les jours, des imports sont faits depuis des sources qui ne sont formellement peut-être pas compatible, à commercer par les imports automatiques depuis Wikipédia sans que personne n'y voit rien à redire. Sinon, sur le plan légal et juridique, pour le côté droit d'auteur (mais le droit d'auteur s'applique aux œuvres, une donnée est-elle une œuvre ?) la paternité est toujours incontournable en France (et dans la plupart des pays du monde) donc la différence entre CC0 et CC-BY est quasiment inexistante juridiquement. De plus, les références remplissent un rôle qui me semble suffisant du point de vue de la paternité. Sinon, sur le côté du droit sui generi propres aux bases de données, là encore le problème se pose pour toute données qui sont pourtant importées quotidiennement sur Wikidata et dans le cas présent, il ne me semble pas que l'on prenne une part substantielle du jeu de données. Bref, on pourrait ratiociner encore longtemps mais l'import est en cours et je ne vois pas de problème, et de toute façon - si le cas échéant un responsable demande le retrait - il sera facile de supprimer les données concernées. Cdlt, VIGNERON (talk) 18:11, 28 December 2016 (UTC)
Mention of compatibility with one licence does not exclude compatibility with another one; compatibility with CC0 has been discussed and proved many times in the links given above, no need to discuss it again. Oliv0 (talk) 06:29, 30 December 2016 (UTC)

Why is importing CC-BY data a problem? As long as you indicate the source of the statements in Wikidata, the attribution clause of the CC-BY license should be satisfied, right? − Pintoch (talk) 08:51, 20 January 2017 (UTC)

When releasing data into Wikidata a user species that they have the right to release the data into the public domain (CC0). Reusers of Wikidata are supposed to be able use parts of Wikidata without carrying along all the sources. ChristianKl (talk) 20:23, 16 February 2017 (UTC)

Add legislative term to German politicians[edit]

We would like to add additional information to German politicians. They have the `P39` (is member of) property with the value of the corresponding local parliament already.

We want to add an additional qualifier to those which is `P2937` (legislative period) with a value of the corresponding current period. And example for this can be found on Ernst-Ulrich Alda

The list of the politicians we would like to edit has been derived from scraping the parliament websites and comparing them to wikidata items that have the property (`Member of Landtag <state here>`) already.

The code can be found here. https://gist.github.com/k-nut/81fbbfb167ec6003b534379ccdaf33bb

--Knuthuehne (talk) 09:05, 4 December 2016 (UTC)

Can you add also some source? --ValterVB (talk) 09:10, 4 December 2016 (UTC)
@ValterVB do you mean add the source property to the data? We could add the respective website that lists the members of parliament as a source always. --Knuthuehne (talk) 16:56, 6 December 2016 (UTC)
@Knuthuehne: Perfect :) --ValterVB (talk) 18:31, 6 December 2016 (UTC)

member of (P463) > P39 qualifier parliamentary term (P2937)[edit]

Following the discussion at Wikidata:Project_chat/Archive/2016/11#Parliamentary_terms, I think the request at Wikidata:Bot_requests/Archive/2016/08#member of (P463) > P39 qualifier parliamentary term (P2937) can now be done.

Summary:

Some time ago the qualifier P2937 was created. It should allow to link the parliamentary term from the statement in P39. Many items still have the information in P463. Sample change: [6].
Wikidata:WikiProject_British_Politicians has a list of corresponding P463 and P39 values.

@Andrew Gray: fyi.
--- Jura 09:23, 5 December 2016 (UTC)

Thanks. I got round to doing some work on a more detailed data model this weekend and have now posted it up at Wikidata:WikiProject British Politicians#Properties. @Oravrattas:, is this going to cause any problems for EveryPolitician if we move all the P463 "member of" terms into qualifiers? Andrew Gray (talk) 20:28, 5 December 2016 (UTC)
@Andrew Gray:, that should be fairly simple to adjust for, I think, so +1 from here. --Oravrattas (talk) 17:48, 7 December 2016 (UTC)
Looks like it's doable with quickstatements: [7].
--- Jura 21:02, 5 December 2016 (UTC)

People: remove disambiguation from Spanish labels[edit]

It might be worth doing a bot run to clean up Spanish labels: [8] [9] [10].
--- Jura 00:28, 9 December 2016 (UTC)

Full URLs vs external IDs[edit]

We could do with a regularly-operating bot (say, daily or weekly) that spots the presence of full URLs in values for external-id type properties (perhaps by watching constraint reports), and removes the extraneous characters, based on the formatter URL, like in this edit. Can anyone oblige, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:58, 11 December 2016 (UTC)

KrBot is doing this as far as I know. In this case it might have failed because the used URL did not match the formatter URL. @Ivan A. Krestinin: --Pasleim (talk) 18:14, 11 December 2016 (UTC)
✓ Done for VKontakte username (P3185). But my bot has specific algorithm for each property. So some another property or another error type require additional code. — Ivan A. Krestinin (talk) 13:01, 12 December 2016 (UTC)

Automatic population of inverse properties[edit]

If Property P527 (has as part) is the inverse of P361 (part of) then why is it not automatically populated? If you add to item Qa property P361 (part of) Qb, then automatically on Qb the property P527 (has as part) is added as Qa?

A good database design principle it to avoid duplication as it leads to ambiguity. Would it not be better to have dynamic property / automatically populated property? Or, remove inverse properties & add an inverse view to an item?

I am new to Wikidata so this may be a rookie question or lack of understand on my side. – The preceding unsigned comment was added by WvanZyl (talk • contribs) at 12. 12. 2016, 10:23‎ (UTC).

What you say makes sense and certainly there are bots doing this. But what if the statement is wrong or even vandalism. Bot that doesn't check this adds the statement to another item which makes it more difficult to clean it up afterwards. I have also observed that many has part (P527) are wrong as users want to express an inverse to subclass of (P279). Matěj Suchánek (talk) 14:05, 12 December 2016 (UTC)

Unincorporated Communities[edit]

HI There. I was wondering if a bot is able to fill in stuff. Im looking at en:Category:Unincorporated communities in Missouri and I was hoping someone could quickly go to the wikidata items and fill in "unincorporated community in Missouri in the description page. MechQuester (talk) 05:09, 19 December 2016 (UTC)

If you add statements to these items, eventually a bot would do that or it can be generated by autodescription from the statements.
--- Jura 07:10, 20 December 2016 (UTC)
✓ Done with descriptioner-tool --Pasleim (talk) 10:22, 20 December 2016 (UTC)
LOL wow you are amazing @Pasleim:, may I ask for the syntax of what you wrote? Except this time, its Townships in Missouri and other states? I kind of want to learn how to write some. 04:47, 24 December 2016 (UTC)
This would add the description "township in Missouri" to all items with instance of (P31)=township of Missouri (Q6270791). However, as you will see, there are many red lines because there are many townships with the same label, e.g. 28 Washington Townships in Missouri. That means you need a more specific description. A possibility is to name the county in the description. This you can get by this request. --Pasleim (talk) 11:16, 24 December 2016 (UTC)

It looks like the labels of those townships could also do with cleanup; for example:

  • Hubble Township, Cape Girardeau County, Missouri - > Hubble Township
  • Linn Township, Audrain County -> Linn Township

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:32, 24 December 2016 (UTC)

@Pigsonthewing:, they do.... thats alot of edits though.  – The preceding unsigned comment was added by MechQuester (talk • contribs) at 11:24, 26 December 2016‎ (UTC).
That's why we have bots. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:57, 26 December 2016 (UTC)

Google Knowledge Graph identifier (P2671) & Freebase ID (P646)[edit]

@Kolja21: @Srittau: Proposal to dynamically fetch (and validate) Google Knowledge Graph and Freebase ids using Google’s Knowledge Graph Search API, or equivalent. Google’s Knowledge Graph Search API nearly always provides the URL of any Wikipedia article associated with each Google Knowledge Graph entity along with the relevant Freebase id (MID) or Google Knowledge Graph id (KGID). This is then just a case of linking this to the relevant Wikidata entity. I understand there might be a database copyright issue here, however in its December 2014 post where Google announced the deprecation of Freebase they pledged to support the Wikidata project moving forwards (https://plus.google.com/109936836907132434202/posts/bu3z2wVqcQc). In a quick Python test, a prototype of this (matching Wikidata ids and MID/KGIDs) was successful. -- domswaine (talk) 18:41, 19 December 2016 (UTC)

Can you give a sample? Merely querying Google for the label of an item will bring false positives. BTW Maybe P646 could be a subproperty of P2671.
--- Jura 09:59, 27 December 2016 (UTC)

P625 for items with P776[edit]

Coordinates in items with Swedish minor urban area code (P776) are often very poorly imported here. We now have a good source in no label (Q28048805) for many of these items. P776 can be found in column G in this spreadsheet and coordinates in SWEREF 99TM can be found in M and N.

My opinion is that the current P625 can be removed from these items and the coordinates from this source can be used instead. The coordinates can be rounded to nearest second and the data will be good enough for our use. These entities can be as small as 100x100 meters, so rounding it even further does not do the job.

Not all items who should have P776, have it yet. I am working on that right now. But I see no point in waiting the 2-3 years it will take until I have finished that job. (Yes, I think it has to be done by hand.) -- Innocent bystander (talk) 10:01, 22 December 2016 (UTC)

@Innocent bystander: Is there some forumla to convert SWEREF 99TM coordinates to WGS84 coordinates? --Pasleim (talk) 12:34, 17 January 2017 (UTC)
@Pasleim: Well, there is probably somewhere, but User:RoMex has done the work here if it is of some use? Also note that some if these items have two P625-claims. One is enough! -- Innocent bystander (talk) 12:49, 17 January 2017 (UTC)
There is such script available, if that makes life easier than svwiki page. Google search for "SWEREF 99TM to wgs" has some other results. --Edgars2007 (talk) 16:10, 17 January 2017 (UTC)
Since I think we can round this number to nearest second, to be more useful for our templates in Wikipedia, I think any flaws in such an algoritms are of very little concern. -- Innocent bystander (talk) 19:41, 17 January 2017 (UTC)

Need to retrieve large amount of data[edit]

1. For my project I need to download ALL entries of "administrative territorial entity" (Q56061) - 2.3 millions of them, and later of "geographic region" (Q82794) - 2.9 millions.

I have downloaded full dump of wikidata, but it's way too huge and processing speed and time turn to be not vary acceptable.

I want to retrieve all the necessary data using API and I wonder if there are limits on request rates for the API. A page about limits says to "make your requests in series rather than in parallel", but it still lead to overload.

2. As a contribution back to wikidata i'm going to update all those entries with links to openstreetmap and geonames (and maybe some more properties from these databases). The same question about API limits arises.

If you have a bot flag you can read 500 item for API request but in this case the dump is the best choice probably in a couple of hours (maybe less) you can read all the dump. If you need only Qnumber probably you can use Query Service using "LIMIT" and "OFFSET" statement. In writing you can update only one item at a time, the best thing is to make more changes with an edit but not all the framework can do it. --ValterVB (talk) 19:46, 29 December 2016 (UTC)
Scanning (just reading) full dump took about 30 hours. To make online retreival better it has to be about 30ms for an entry. It seems impossible, because query from www.wikidata.org/wiki/Special:EntityData/Qxxx.json takes about 300ms. Concidering this, my request is quite irrelevant and could be deleted (but i do not see such option). Qwiglydee (talk) 21:00, 29 December 2016 (UTC)
30 hours? There is some problem, I use dump every week and the time is probably around 2 hours. I started 1 hour ago a test: I read and deserialize every row of the dump, and I have added an IF just to do something, when finished I will post here the result, or if I fall asleep, I'll do it tomorrow :) --ValterVB (talk) 21:11, 29 December 2016 (UTC)
Here it is: 1 hour and 48 minutes --ValterVB (talk) 21:49, 29 December 2016 (UTC)

duos without parts[edit]

All these items should have 2 additional items linked with "has part". The items should link back with "part of".

For many it should be possible to create correctly labelled items from this list.
--- Jura 11:17, 31 December 2016 (UTC)

But label, instance of (P31): human (Q5), part of (P361) and perhaps sibling (P3373)/spouse (P26) are the only data that you can guess. Isn't it too little? (But it's an interesting one which I can imagine doing myself.) Matěj Suchánek (talk) 14:56, 23 January 2017 (UTC)
It depends where you come from. Starting from Wikidata:Database_reports/without_claims_by_site, this is much better. Doing even one manually is quite time-consuming.
Besides, it provides a way to find a WP article from the name of a person.
Obviously, once created, adding more information would be helpful too. If there is a VIAF identifier for the duo, frequently there are is also one for the each part.
--- Jura 10:15, 24 January 2017 (UTC)
@Jura1: So I looked into this today and it is fun. Feel free to suggest what could be better or ask for clarification how the bot works. Examples: Saints Cosmas and Damian (Q76486), no label (Q91789), Sergius and Bacchus (Q140013), Cocl & Seff (Q151489). Matěj Suchánek (talk) 20:41, 29 January 2017 (UTC)

River[edit]

Hi, can someone add the rivers to descriptions. For example en:Category:Rivers of Montana. MechQuester (talk) 06:55, 4 January 2017 (UTC)


and also Q19558910 MechQuester (talk) 19:58, 4 January 2017 (UTC)

you could try out Descriptioner --Pasleim (talk) 16:05, 24 January 2017 (UTC)

Bloomberg data about people and private companies[edit]

We now have the properties Bloomberg person ID (P3052) and Bloomberg private company ID (P3377). I consider the underlying data to be very valuable. There are many influential companies who don't have Wikipedia pages and their board members also don't have Wikipedia pages. It would be great if the data would make it's way into Wikidata. ChristianKl (talk) 19:47, 4 January 2017 (UTC)

OpenStreetMap objects[edit]

(Pinging participants in the deletion discussion for OpenStreetMap Relation identifier (P402): Yurik, Jura1, MaxSem, Kolossos, Susanna Ånäs, Abbe98, Andy Mabbett, d1gggg, Jklamo, Denny, Nikki, Sabas88, Thierry Caro, Glglgl, Frankieroberto, VIGNERON, and Kozuch.)

This is (for now) a draft proposal, but I'd rather not put this in the deletion debate where fewer people with bot experience would see this. I also did not realize this page existed on Wikidata and put a bot suggestion in the community portal a few weeks ago for some reason (no responses), which is why this is rather late (the deletion discussion began in November). Feel free to modify this, because I don't really know how this would work but wanted to make a request anyway because apparently no one's done it yet. Jc86035 (talk) 11:51, 17 January 2017 (UTC)


Could we have a bot which

  1. automatically pulls Wikidata item links from OSM objects' wikidata tags (but, if necessary for data quality, only if the item matches the Wikipedia article in the object's wikipedia tag), from the whole database initially and from new changesets thereafter;
  2. updates Wikidata items' OpenStreetMap Relation identifier (P402) (as well as properties for way and node tags, if created) from the initial dump and afterwards (human-approved if there's more than one object linked to the same Wikipedia article/Wikidata item ID);
  3. deletes the property value(s) from Wikidata items whenever a Wikidata ID is removed from an OSM object and not readded to another object (but, if necessary for data quality, only if the wikipedia tag is also removed without replacement; and only if manually approved for OSM users removing more than 10 of them within 24 hours); and
  4. makes a list of one-way links in the Wikidata → OSM direction and a list of OSM objects/Wikidata items where the links between items don't match each other?

In addition, would it be possible to use the same bot to automate this on OSM in the other direction as well? (I haven't notified the wiki or the OSM mailing lists or anything.)

Many thanks, Jc86035 (talk) 11:51, 17 January 2017 (UTC)

  • Symbol oppose vote.svg Oppose 1-3, for the reasons given in the deletion discussion; and ad nauseum. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:23, 17 January 2017 (UTC)
  • Symbol oppose vote.svg Oppose because OSM ids are not stable and they can change unexpectedly and without notice they should not really be stored. This is also why it was suggested that the OSM relation Property should be deleted. What would the use case for having a connection from Wikidata to OSM be in your case? I tried to address the discoverability issue by creating a userscript that displays a link to OSM on Wikidata... --Abbe98 (talk) 14:25, 17 January 2017 (UTC)
    • I don't have any sort of use case for this, but some people in the deletion discussion think this is a good idea and I thought I might as well make this. Jc86035 (talk) 14:50, 19 January 2017 (UTC)
  • Symbol support vote.svg Support the idea is good but with it needs some discussion and improvements (the import should respect the constraint of OpenStreetMap Relation identifier (P402) for examples). @Abbe98: OSM id are pretty stable, pretty much as stable as Wikipedia article name and when they change this is not really « unexpectedly and without notice » (I slightly remeber a tool like an API for querrying the changeset). For the use case, I can see hundred thousand cases who benefit to have all information on one single place; the first one being no need to use two different tools (using just Wikidata Query is - by definition - better than using Wikidata Query and Turbo Overpass). Plus, as there is some discussion against adding Wikidata ID on OSM (more exactly, IIRC, revert mass adding until a consensus is reached) or as the rules maybe different on the two projects, it's more secure to store the data on our side too (with our rules, like we do for other databases). PS: Your tool seems interresting but I can't make it work (or more probably I did something wrong or I'm not looking for what should be expect), what is it supposed to do? (I'd like to test it on cases, like [11] linking to Wilhelm Trute (Q15987301) or [12] and [13] both linking to Assen (Q798), examples taken from OpenStreetMap Relation identifier (P402) contraint violations). PPS: @Jc86035: for the first point, you don't really need a bot, you could do it yourself with Overpass request and Quickstatements (and some wit to check for inconsistencies). Cdlt, VIGNERON (talk) 15:12, 17 January 2017 (UTC)
    @VIGNERON:, I do disagree with you on the stability of OSM ids, some are stable(per the examples given during the deletion discussion) but most are not and just because you can query the changes it's not a streamlined process. Most of the "hundred thousands" of use cases must have been forgotten during the deletion discussion. I can see many use cases too but not where they are better then the options. On the subject of my tool; it should add a link to the OSM element in the sidebar to the left(under tools). Please drop me a note on Github or on my talk page if the issue persist. --Abbe98 (talk) 15:29, 17 January 2017 (UTC)
    @Abbe98: you can of course disagree but do you have any figures to suport your views? out of 96 French départements (Overpass request), 88 have the same number since their creation so ~91 % are stable. That is pretty stable to me (especially since départements are highly edited objects, so relation on more common objects are less likely to be unstable), in the same order as wiki sitelinks are stable (but more than Wikidata items, though there not 100 % stable either).
    Oh, my mistake, indeed, I wasn't looking at the right place. I will look into it (right now, I can spot one thing : when several OSM objects link to Wikidata your tool only show one, see the Assen (Q798) example I gave earlier). It's a great tool for readers and visualisation but it doesn't help for editors, re-users, external manipulation or querying. Why not turn this tool into a « compare WD and OSM (via OpenStreetMap Relation identifier (P402)), give warning if there is inconsistencies/problem and suggest to add/correct the relation »?
    Cdlt, VIGNERON (talk) 16:04, 17 January 2017 (UTC)
    @VIGNERON: your Overpass query did not return any data for me(Should it display anything when outputted as CSV?). I'm not sure départements is s good example as they are not merged and deleted as often as most OSM elements(AFAIK). No I have not put any efforts into obtaining any figures as it's easy to just analyze a set of non-diverse elements. Please see the deletion discussion here at Wikidata and T145284.
    Yes it's a known limitation that it does only links to one element(I'm not sure how to solve it UI-vise). I wonder how Kartographer deals with multiply Wikidata tags. The reason for not creating such a tool is that I believe OpenStreetMap Relation identifier (P402) should be deleted and even if OSM identifier where stable OpenStreetMap Relation identifier (P402) would be very limited(It's only for relations). --Abbe98 (talk) 19:03, 17 January 2017 (UTC)
    @Abbe98: strange, the overpass query should give a CSV with French départements relation ID. It was just an example, if you have a better one, or even better something more general, feel free to share it. I've seen phabricator tickets and the two deletion requests (and even participating), I'm still waiting to be convince a valid reason for deletion and to see numbers about the instability (either OSM instabillity or Wikidata instabillity). Cdlt, VIGNERON (talk) 19:55, 17 January 2017 (UTC)
    @Abbe98: Maybe you might need to press the magnifying glass button in the map sidebar? If there isn't any data you should see a grey bar at the top of the map something like "blank dataset received". Jc86035 (talk) 14:44, 19 January 2017 (UTC)
    I had my Overpass Turbo set to another Overpass instance witch returned broken data. --Abbe98 (talk) 14:12, 21 January 2017 (UTC)
    Ah, okay then. Jc86035 (talk) 15:54, 21 January 2017 (UTC)
  • Symbol oppose vote.svg Oppose it would need properties for nodes and ways (and in a future areas?) to describe the range of geometries one Wikidata object could be represented with. --Sabas88 (talk) 20:48, 17 January 2017 (UTC)
  • Symbol support vote.svg Support. I don't believe there is any serious unstability problem with OSM that does not exist with almost any external database. Whatever, a bot dedicating time to maintaining the property is not going to hurt anyone. So there is certainly no grounded reason to oppose. Thierry Caro (talk) 10:11, 19 January 2017 (UTC)
    • @Thierry Caro: I guess one downside could be that since OSM data can (I think) be directly pulled from WMF projects through GeoJSON, and Wikimedia editors can use their accounts to add data to OSM, there's not really much point in bothering to maintain two separate and redundant databases with a bot. Jc86035 (talk) 14:44, 19 January 2017 (UTC)

Lighthouses: import P625 from Commons[edit]

There are some 300 lighthouses with coordinates at Commons: https://petscan.wmflabs.org/?psid=677138

Somehow the PetScan option doesn't work for them. It would be good if these could be imported.
--- Jura 20:50, 17 January 2017 (UTC)

@Jura1: Both the import and the mentioned problem need attention but the latter is not obvious to me. Matěj Suchánek (talk) 15:09, 22 January 2017 (UTC)
@Jura1: I can do it using pywikibot--Mikey641 (talk) 16:41, 22 January 2017 (UTC)
@Jura1: OK so I'm probably gonna do it tommorow because since this morning I'm actually transfaring coordinates from hewiki to wikidata, so after I'm done I'm gonna transfer from commons--Mikey641 (talk) 18:29, 22 January 2017 (UTC)
This would be simple if their module added coordinates to the page info in categories as well (it does in files only). That's why it doesn't work in PetScan. Matěj Suchánek (talk) 13:28, 2 February 2017 (UTC)

User script to notify participants in a property proposal[edit]

Normally when a property proposal is closed the participants should be informed with a "ping", however this is done manually and it takes a lot of effort for long discussions. Would it be possible to create a script that would scrap all the user names in a property proposal page and would format the resulting list as a ping? (in groups of 5, because I think that is the limit).--Micru (talk) 20:27, 19 January 2017 (UTC)

This sounds like a good idea. Work that can be done by bots should be done by bots ;) ChristianKl (talk) 10:46, 23 January 2017 (UTC)
See phab:T139898.--GZWDer (talk) 17:23, 24 January 2017 (UTC)

Natural RU names[edit]

Per this discussion, we need to run a bot that would convert all Russian labels for people's entities into a "normal" form - "[First_name] [Patronymic_or_Middle_name] [Last_name]", instead of "Last, First Patronymic". The existing form of the name should be moved to the "Also known as" column. I suspect that Kyrgyz and Ukranian languages would also benefit from it. --Yurik (talk) 03:14, 22 January 2017 (UTC)

I am working on this. There are some questions which I'll discuss at the forum. --Infovarius (talk) 20:43, 22 January 2017 (UTC)
  • For people with name in native language (P1559) in Russian, it might be worth creating new items for given names and family names and adding them at the same time.
    --- Jura 17:45, 26 January 2017 (UTC)

Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)[edit]

  • Source: Philippine Statistics Authority
  • Link: https://archive.org/download/PhilippinesCensusofPopulationLGUs19032007 (As FOI request, Philippines public domain)
  • Description: Census of Population data of Philippines Cities, Municipalities, Provinces and Regions (1903-2007)
  • Structure: Population (P1082)
  • Covers all
    • municipality of the Philippines (Q24764)
    • city of the Philippines (Q104157)
    • province of the Philippines (Q24746)
    • region of the Philippines (Q24698)
  • Example item
    • Municipality : Dasol (Q41917)
    • City : Urdaneta (Q43168)
    • Province : Pangasinan (Q13871)
    • Region: Ilocos Region (Q12933)

Major upload needed, hard to do it manually. 2010 and 2015 data already present --Exec8 (talk) 05:02, 28 January 2017 (UTC)

articles from Norwegian wiki added to wrong items[edit]

A bot of inactive user:Emaus, did some wrong edits like this one adding sitelink to no:Neoclitopa to Neoclitopa nitidipennis (Q14869209) when it should have been added to Neoclitopa (Q18115528). I collected bunch of them by hand but there is too many of them. we need to identify links like that and move them to proper item. I will try to write a query to identify them but can some help me with moving them? --Jarekt (talk) 13:20, 15 February 2017 (UTC)

SELECT ?item ?pItem ?taxon ?parentTaxon ?sitelink WHERE {
    ?item  wdt:P171 ?pItem .          # has parent item
    ?item  wdt:P225 ?taxon .          # taxon name
    ?item  wdt:P105 ?rank .           # taxon rank
    ?pItem wdt:P225 ?parentTaxon .    # parent taxon name
    VALUES ?rank {wd:Q7432 }          # restrict rank to species only at this moment
    ?sitelink schema:about ?item .
    FILTER(STRSTARTS(STR(?sitelink), "https://no.wikipedia.org/wiki/"))
    FILTER(STRENDS(STR(?sitelink), ENCODE_FOR_URI(?parentTaxon))) # norwegian article name matches parent taxon
    #MINUS{ ?item wdt:P225 ?parentTaxon . }
} LIMIT 100
Try it!
Here is an example of a query with some of the problem sitelinks. --Jarekt (talk) 13:45, 15 February 2017 (UTC)
Any Norwegian speakers to verify that those are a bad sitelinks? --Jarekt (talk) 13:51, 15 February 2017 (UTC)

I can try to check this up in a day or two. How many errors may it be, can it be repaired manually? (I guess we must, a Bot can not sort this out? Dan Koehl (talk) 21:04, 17 February 2017 (UTC)

Please note Wikidata_talk:WikiProject_Taxonomy#Many_bad_sitelinks_to_Norwegian_Wikipedia. --Succu (talk) 21:07, 17 February 2017 (UTC)

Cycle sport events: move claims from length (P2043) to event distance (P3157) and remove unreferenced bounds from values, if existing[edit]

I request a bot run to do the following:

  • In items which have ?item wdt:P31/wdt:P279* wd:Q13406554 (instances of subclasses of sport competition (Q13406554), so basically sports competition items) replace all length (P2043) claims by event distance (P3157) claims. Quantity amount values and units, as well as existing qualifiers and references should be kept.
  • However, plenty claims still have bounds as a leftover from the time when we were not able to use quantities without bounds. I therefore request to remove all “±0” bounds, if no reference is given in the P2043 statement. It might be worth to consider removing all “±[0\.]*1” bounds as well, but I am not fully sure about that (could be repaired manually otherwise as well).

This bot run will affect in total around 2634 claims:

SELECT ?item ?itemLabel ?length ?upperBound ?lowerBound ?diff {
  ?item p:P2043 [ psv:P2043 ?value ] . # items that use P2043 (length)
  ?value wikibase:quantityAmount ?length .
  OPTIONAL {
    ?value wikibase:quantityUpperBound ?upperBound; wikibase:quantityLowerBound ?lowerBound .
    BIND(?upperBound - ?lowerBound AS ?diff) .
  }
  ?item wdt:P31/wdt:P279* wd:Q13406554 . # and have P31 with subclass of sport competition (Q13406554)
#  MINUS { # activate this to filter away items that are related to cycle sport
#    VALUES ?cyclingClasses { wd:Q15091377 wd:Q18131152 }
#    ?item wdt:P31/wdt:P279* ?cyclingClasses . # but not P31 with subclass of cycling race (Q15091377) or stage (Q18131152)
#  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en' }
}

Try it! There is a commented part in the SPARQL query which tests the types of sports which are affected by this bot request (the MINUS section). In fact, in the field of sports events length (P2043) is exclusively used by cycle sport events (defined by types cycling race (Q15091377) and stage (Q18131152)). Our cycle sport project members were early adopters of the “quantity with units”-properties, including length (P2043). I therefore already talked to the maintainers of Module:Cycling race at Module talk:Cycling race#event distance (P3157) instead of length (P2043), which heavily uses length (P2043). They support a move to the event-specific event distance (P3157) and have already modified their module to support both properties. Via {{ExternalUse}} in Property talk:P2043 we also identified a frwikinews-Module which needs to be moved, but this is not a complicated task to my knowledge.

In general, event distance (P3157) has some advantages over length (P2043) for events. First of all, racing sports events are not physical objects which have a property “length” as a physical dimension. What one wants to express in these cases is the distance along a path which the event participants have or had to cover during the competition. Secondly, in sports events one often uses rather unphysical distance units such as lap (Q26484625), whose use is better reflected by the event distance property. It is therefore useful to gather all event distance information in one property.

Ping involved users @Molarus, Jérémy-Günther-Heinz Jähnick. Feel free to ping more editors, if necessary.

Thanks, —MisterSynergy (talk) 07:39, 20 February 2017 (UTC)

I heard that @Zolo might be interested as well, due to P2043 use in this context in fr:Module:Infobox/Descriptif course cycliste. This is unfortunately not registered by the {{ExternalUse}} template on Property talk:P2043. —MisterSynergy (talk) 11:53, 20 February 2017 (UTC)
I have searched for templates in most wikis for "P2043" and found it:Modulo:Ciclismo. I have not found a cycling template that reads P2043 data per Module Wikidata, (but I have found a railway template in enWP that uses P2043). We have to edit those Modules after moving the data to the new property as soon as possible. I hope the lua modules are coded well and don´t break.
Wikinews n:fr:Module:Cycling race and the wikis that use our Module:Cycling race will need a new version of this module. I can do this except for esWiki, because I can´t edit there. --Molarus 12:51, 20 February 2017 (UTC)