Wikidata:Bot requests

From Wikidata
Jump to: navigation, search


Bot requests
If you have a bot request, create a new section here and tell exactly what you want. You should discuss your request first and wait for the decision of the community. Please refer to previous discussion. If you want to request sitelink moves, see list of delinkers. For botflag requests, see Wikidata:Requests for permissions.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2016/06.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 2 days.

Contents

Population of French communes[edit]

Can any bot migrate data about population of French communities from fr.wiki to Wikidata? Data for particular community is in template Modèle:Données/X/évolution population, where X is a name of a community. Mati7 (talk) 18:30, 20 July 2015 (UTC)

I actually came here today to request a bot to import information from fr-WP, including population. It is better to get the information from a reliable source. The French government agency responsible for censuses (INSEE) has published census information since 1962 in XLS format. The only restriction on reuse (in French) is to acknowledge the source, which Wikidata would do anyways in the form of a reference. Spreadsheets with population data from the 2012 census can be found on this page. In addition to population, the spreadsheets contain additional data which I think can be added easily to Wikidata, even if it is not very useful.

For reference, the hierarchy of administrative districts in France is:

  1. région
  2. département
  3. arrondissement
  4. canton - this is primarily a district for elections (ie. an electoral district)
  5. commune

Communes can be split between multiple cantons, cantons can span multiple arrondissements, but other levels cannot be split (eg. a commune cannot belong to multiple arrondissements). There are also "associated communes" ([[en:w:Associated communes of France|Wikipedia article) which are recognized districts within communes.

INSEE codes[edit]

Every administrative district in France has an INSEE code. The ISEE code is used for other purposes where a code is used. The INSEE code for départements is widely used, such as on vehicle license plates and the names of websites, even when not necessary...for example the website of en:w:Haut-Rhin (Q12722) contains numerous subpages with titles that incorporate its INSEE code (68). Since the population spreadsheets contain the codes for all administrative districts, they should be added while adding the populations.

I believe all communes already have the INSEE municipality code (Property:P374). However, the few that I viewed have the Dutch Wikipedia as the source, so the commune INSEE codes should be checked with the INSEE codes in the population spreadsheets and then change the reference to INSEE (the population spreadsheet). A property should be created for "INSEE department code" for departments (French: départements). I don't know if it's necessary to create a property for every administrative level, but the other levels should have "INSEE code" (Q156705) added. The INSEE code should be added to other levels as well (the linked INSEE municipality code is only for communes).

The INSEE codes for arrondissements, cantons, & communes all begin with the two-digit department code. The master file in the next section contains the codes for arrondissements, cantons, & communes without the department prefix. For example, the INSEE code for Colmar in the Haut-Rhin department is 68066 (68 is the INSEE code for Haut-Rhin department), but in the master file there are columns for the department (which has 68) and for the commune (066). The first column in the spreasheet in the "Older populations" section contains the complete INSEE code.

2012 Population[edit]

There are two population values:

  • Population municipale is the number of people who have their usual residence in the district, including people in penitentiaries, homeless people present in the commune at the time of the census, and people in mobile homes.
  • Population totale includes the population municipale plus people residing in the district but usually have a home elsewhere (eg. students living away from their usual home, people without a fixed residence).

The master file for the whole of France is here, using data from the 2012 census (reference date: 1 January 2012). It is produced and published by INSEE (Q156616). It contains 9 sheets:

  1. Regions
  2. Départements
  3. Arrondissements
  4. Cantons - ignore (Canton boundaries were adjusted in 2015 so this is no longer relevant)
  5. Communes
  6. Fractions cantonales - ignore (for communes that are divided between multiple cantons or for multiple cantons in one commune, this lists the population that lies in each canton. However, canton boundaries were adjusted in 2015 so this information is no longer relevant)
  7. Communes associées ou deleguées - "associated communes" (explained above), some may not have a Wikidata page
  8. Collectivitées d'outre-mer - populations of communes in overseas territories (collectivities). Unlike the rest of France, the entire area of an overseas territory is not divided into communes.
  9. Documentation

New boundaries were created for cantons effective in 2015. A spreadsheet with the 2012 population of the cantons based on the 2015 boundaries can be found here.

Older populations[edit]

I think it is most important to add the most recent population (2012 census). The populations from 1962, 1968, 1975, 1982, 1990, 1999, 2007, & 2012 for each commune is contained in this spreadsheet. It is produced by INSEE. It has three sheets:

  1. Métropole - European France
  2. DOM - Overseas departments, which have the same status as departments in the Métropole (eg. like Hawaii is a US state with the same status as a state in the continental US). Note that first three censuses were in 1961, 1967, & 1974.
  3. Arm (populations in the arrondissements of Paris)

Discussion[edit]

Please add comments below, not in the above text. If I do not respond to a comment for a few days, please leave a message on my English Wikipedia talk page en:w:User talk:AHeneen. AHeneen (talk) 07:33, 24 July 2015 (UTC)

Are we allowed tu publish the census data under CC0 (Q6938433)? --Pasleim (talk) 15:33, 24 July 2015 (UTC)
The only restriction on resuse is to mention the source (similar to CC-by). AHeneen (talk) 05:06, 25 July 2015 (UTC)

Hello. Before anyone starts the job, I bring some clarifications (it is I who updates the census data in fr Wikipedia). You should know that unlike most countries where census are done on the entire territory periodically, in France since 2004 legal population is produced for each municipality every year, but the census type varies according to each municipality :

  • Municipalities with fewer than 10 000 inhabitants are recorded every 5 years by complete census
  • For those with more than 10 000, a sample of the population is counted every year. The annual collection covers a sample of addresses drawn randomly and representing about 8% of the population.

Then every year, there are three types of data:

  • complete census : municipalities less than 10,000 inhabitants that are the subject of a real census
  • estimated : municipalities less than 10,000 inhabitants that are'nt the subject of a real census in the year
  • Sampling : municipalities with more than 10,000 inhabitants

In the French WP, the choice was made not display in graphs and tables only data corresponding to those of the actual census and those of towns of over 10,000 inhabitants. Then in Wikidata, it is essential to have this qualifier characterizing the census type.

Unless I am mistaken, there is currently in Wikidata Q39825 that "census" should be added the following qualifiers

  • complete census
  • estimate census
  • sample census

Without this information, graphs and tables of the french wikipedia will never use wikidata datas. I can give these qualifier for each municipality and for each year (before 2006 all census are actual) but I want to see first these new qualifiers before, to be sure that we tell the same language.Roland45 (talk) 05:53, 21 October 2015 (UTC)

@Roland45: Do you mean that like presently for instance in Urt (Q842706) and Arles (Q48292), population (P1082) would be used with the qualifier determination method (P459) and an appropriate value? This value could be
  • before 2004, or after 2004 under pop. 10,000 every 5 years, the real (full) census: census (Q39825)
  • after 2004 under pop. 10,000 the rest of the time, the estimate (from previous years?) without a census taken that year: estimation (Q791801)
  • after 2004 over pop. 10,000, an estimate (for the full population if I understand correctly?) from a restricted sample: maybe sample (Q49906)?
Oliv0 (talk) 12:35, 21 October 2015 (UTC)
It's just that. To know the type of census for each municipality, simply look at the calendar.
You can see :
Urt is a municipality under 10 000 habitants - collection year : 2017 (and 2012, 2022, 2027, etc)
Arles is a municipality above 10000 habitants : collection year : each year by sampling
And then we have the following data
year Urt (Q842706) Arles (Q48292)
Value type Value type
1999 1702 census 50467 census
2006 1988 estim. 51970 sample
2007 2028 census 52197 sample
2008 2092 estim. 52729 sample
2009 2183 estim. 52979 sample
2010 2195 estim. 52661 sample
2011 2208 estim. 52510 sample
2012 2220 census 52439 sample

2006 is the first year of publication under the new method. I precise that even if the type of census can be different by year, all values are legal populations. For each commune and each year, I can give data from 1999 to 2012 and the correspondant type of census (by crossing this spreadsheet, this one and this other one).Roland45 (talk) 16:53, 21 October 2015 (UTC)

Sources are good to know, and here with their url; does each value use all the three of them? Oliv0 (talk) 19:01, 21 October 2015 (UTC)
Sources are :
Year Definition of the population (in french) Source 1 (data) Source 2 (calendar)
1962-1999 Population sans doubles comptes années 1962, 1968, 1975, 1982, 1990, 1999 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2014/zip/HIST_POP_COM_RP12.zip
2006 Populations légales des communes en vigueur au 1er janvier 2009 - Date de référence statistique : 1er janvier 2006 - limites territoriales en vigueur au 1er janvier 2008 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2008/xls/ensemble.xls
2007 Populations légales des communes en vigueur au 1er janvier 2010 - Date de référence statistique : 1er janvier 2007 - limites territoriales en vigueur au 1er janvier 2009 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2009/xls/ensemble.xls
2008 Populations légales des communes en vigueur au 1er janvier 2011 - Date de référence statistique : 1er janvier 2008 - limites territoriales en vigueur au 1er janvier 2010 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2010/xls/ensemble.xls
2009 Populations légales des communes en vigueur au 1er janvier 2012 - Date de référence statistique : 1er janvier 2009 - limites territoriales en vigueur au 1er janvier 2011 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2011/xls/ensemble.xls
2010 Populations légales des communes en vigueur au 1er janvier 2013 - Date de référence statistique : 1er janvier 2010 - limites territoriales en vigueur au 1er janvier 2012 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2012/xls/ensemble.xls
2011 Populations légales des communes en vigueur au 1er janvier 2014 - Date de référence statistique : 1er janvier 2011 - limites territoriales en vigueur au 1er janvier 2013 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2013/xls/ensemble.xls http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/resultats/doc/annee-collecte-2015-commune.xls
2012 Populations légales des communes en vigueur au 1er janvier 2015 - Date de référence statistique : 1er janvier 2012 - limites territoriales en vigueur au 1er janvier 2014 http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/populations-legales/pages2014/xls/ensemble.xls http://www.insee.fr/fr/ppp/bases-de-donnees/recensement/resultats/doc/annee-collecte-2016-commune.xls

From 1962 to 1999, there is no specific calendar because the whole census was done in one year (1962, 1968, etc). From 2006 to 2010, I have no longer the source of the calendar. In fact the qualifier is easy to deduce from the other sources. The only difference can come from the municipalities which population crosses the threshold of 10000 habitants.Roland45 (talk) 05:10, 22 October 2015 (UTC)

So the "calendar" means the table showing (like in the smaller Urt/Arles table above) which year in the 5-year cycle a given municipality has a "census" and not an "estimation". Oliv0 (talk) 06:29, 22 October 2015 (UTC)
That's right. In fact with these sources you can upload data and qualifiers, by doing year by year. The problem you can find is for the municipalities which have disappeared between 2006 and 2012. You won't have the qualifier for theses municipalities.Roland45 (talk) 11:31, 22 October 2015 (UTC)

License[edit]

I would like to add, in case this was felt as a problem here, that as Roland45 said last autumn on his French talk page, INSEE population data such as the 2010 data is published as "Open Data" on data.gouv.fr under "Open Licence" which needs only to attribute the data to the "name of the Producer" (here INSEE), meaning on Wikidata a reference that mentions INSEE (Q156616). Oliv0 (talk) 07:40, 23 January 2016 (UTC)

Thus, it seems to be incompatible with the CC-0 license. - Bzh-99 (talk) 10:27, 26 February 2016 (UTC)
Why? WD:CC-0 does not forbid us to do in the reference field the only thing asked for by "Open Licence": "acknowledging the source (at least by the name of the Producer)" = INSEE. Oliv0 (talk) 11:37, 26 February 2016 (UTC)
I see how it can seem incompatible for some people but I'm pretty sure it's compatible (and if was incompatible, we should deleted all information about France on Wikidata since all data - and even trivial data like the name of the cities - are in the INSEE database). Regardless of the Open Licence and its requirement, the source should always be indicated per Wikidata rules. Cdlt, VIGNERON (talk) 11:56, 26 February 2016 (UTC)
Puisqu'on est entre francophones, je passe au français
La licence ouverte impose de citer le nom de l'auteur. Si je ne m'abuse, cela n'est pas requis par CC-0. - Bzh-99 (talk) 17:33, 26 February 2016 (UTC)
La Licence Ouverte (ici en français) ne requiert pas d'utiliser la même licence sur les dérivés, donc on n'a qu'à indiquer INSEE et date de mise à jour comme demandé et les utilisateurs de Wikidata font ce qu'ils veulent, tel quel je comprends ; mais ce serait bien d'avoir confirmation par des gens qui s'y connaissent, il n'y a pas ici d'équivalent de c:COM:VPC ? Oliv0 (talk) 18:51, 26 February 2016 (UTC)
Non, la réutilisation implique toujours de mentionner le nom que ce soit sur WD ou ailleurs. Celui qui vient prendre la donnée sur WD doit, selon la Licence Ouverte, également respecter cette condition. La condition de réutilisation s'applique à toutes les réutilisations. La conservation de la même licence n'est pas requise, mais tu ne peux pas sous ce prétexte éliminer une des rares conditions posées par la licence d'origine. Snipre (talk) 22:06, 2 March 2016 (UTC)
Non celui qui prend la donnée sur WD ne la voit pas sous Licence Ouverte mais sous CC-0. Seul le robot WD doit mettre INSEE en réf pour respecter la condition de la Licence Ouverte, et a le droit de changer la licence en CC-0. Mais est-ce qu'il n'y a pas ici un forum spécialisé pour ces questions légales ? Oliv0 (talk) 20:26, 3 March 2016 (UTC)
@Oliv0: Merci de lire la license: dans licence ouverte, on lit que cette licence est compatible avec une licence CC-BY. Une licence CC-BY n'est pas une licence CC0 et implique que tout utilisateur et réutilisateur doit mentionner l'auteur des données. Tu ne peux pas réduire les droits de l'auteur, par contre tu peux modifier les données ou les mettre sous une licence plus restrictive. Par exemple mettre les données en CC-BY-NC. Snipre (talk) 15:48, 11 March 2016 (UTC)
@Snipre: Lis-la toi, "compatible" ne veut pas dire que cette licence exige les conditions de CC-BY (puisque les conditions exigées sont différentes) mais que ça peut être utilisé ensemble sans incompatibilité (pas de contradiction entre les conditions exigées), comme par exemple GFDL est compatible avec CC-BY-SA. Selon la licence, seul le "réutilisateur" défini (ici Wikidata) doit mentionner le nom du producteur et la date de mise à jour (ou un lien vers la source), il n'est pas exigé de conserver cette licence (pas comme CC-SA) ni de la rendre plus restrictive. C'est une licence faite pour faciliter l'usage au maximum, pas pour créer des difficultés. Oliv0 (talk) 16:11, 11 March 2016 (UTC)
Pas sur la question de la mention du l'auteur où CC-BY et licence ouverte ont la même demande:
* Licence ouverte: "Mentionner la paternité de l'"Information": sa source, (a minima le nom du Producteur) et la date de la dernière mise à jour."
* CC-BY: "Cette licence permet aux autres de distribuer, remixer, arranger, et adapter votre œuvre, même à des fins commerciales, tant qu’on vous accorde le mérite de la création originale en citant votre nom".
Je ne vois pas où les 2 licences sont différentes sur ce point. Tous les autres point mentionnés par CC-BY ne sont pas demandé par Licence ouverte, mais ce point-là est similaire. Et c'est ce point-là qui fait défaut à la licence CC0. Cela ne te semble pas bizarre que sur 2 licences citées par l'auteur des données, on puisse faire l'impasse sur l'un des uniques points communs entre les 2 licences ?
Et un réutilisateur de Wikidata est un réutilisateur au sens de la licence ouverte. Une licence ne fait pas de distinction entre le premier utilisateur des données (ici WD) et un réutilisateur qui utilise des données de Wikidata. Il n'y a pas de limitation de licence en fonction de l'origine des données, que tu utilises les données directement du site de l'INSEE ou via un autre site tel que WD, la licence ouverte s'applique ou alors au minimum une licence compatible comme la CC-BY. Va falloir que tu prouves qu'il y a une distinction entre utilisateur, réutilisateur et réréutilisateur (et pourquoi pas réréréutilisateur?) pour pouvoir passer ces données sous une licence qui diminue les droits de la licence d'origine. Tout utilisateur des données de l'INSEE qui tire ces données de WD tombe sous le coup de la licence ouverte en tant que réutilisateur. Snipre (talk) 16:51, 11 March 2016 (UTC)
Un réutilisateur au sens de licence ouverte est "toute personne physique ou morale qui réutilise l'"Information" conformément aux libertés et aux conditions de cette licence". Aucune mention que cette licence ne s'applique qu'aux utilisateurs primaires du site de l'INSEE. Un utilisateur des données de WD tombe sous le coup de cette définition, car, encore une fois le fait de tirer les données directement de l'INSEE ou via une source tierce n'est pas mentionné dans le texte de la licence, tout simplement parce que cette distinction n'a pas de sens. Snipre (talk) 17:08, 11 March 2016 (UTC)
La question n'est pas CC-BY mais Licence Ouverte, qui est une autorisation accordée par un "Producteur" INSEE à un "Réutilisateur" Wikidata, sous certaines conditions parmi lesquelles ne figure pas l'utilisation de la même licence. Une fois ces conditions satisfaites, Wikidata peut donc accorder à ses utilisateurs la licence qu'il veut, ceux-ci ne sont alors pas un "Réutilisateur" au sens de la Licence Ouverte puisqu'ils ne réutilisent plus "conformément aux libertés et aux conditions de cette licence" mais d'une autre. Oliv0 (talk) 18:26, 11 March 2016 (UTC)
@Bzh-99, Snipre, Oliv0: la question de la paternité me semble une fausse question dans la mesure où d'une part, quelle que soit la licence, de toute façon la loi impose de façon générale de mentionner la paternité (entre autres, cf. notamment art 121-1 du CPI pour la France mais des dispositions similaires existent dans la plupart des pays du monde, c'est bien pour cela que les textes des licences se ressemblent sur ce point) et d'autre part, il est dans les us et coutumes des projets Wikimédia d'indiquer la source. De toute façon le droit de paternité fait partie des droits moraux, or la licence CC0 indique clairement 1. Droit d'Auteur et Droits Voisins. Il est possible qu'une Œuvre mise à disposition sous CC0 soit protégée par des droits d'auteur et des droits voisins (« Droit d'Auteur et Droits Voisins »). (qui cite plus loin les droits moraux).
Bref, la situation n'est pas claire comme de l'eau de roche (l'est-ce jamais en ce concerne les licences et leurs compatibilités) mais il ne me semble pas vraiment y avoir d'obstacle à l'importation des données INSEE (d'autant moins que l'INSEE est dans une optique d'ouverture de ses données). Et comme je le disais plus haut, vu le nombre de donnée de l'INSEE que l'on reprend sur Wikidata, si on considère que l'importation est impossible, il y aurait quelques propriétés (au moins INSEE municipality code (P374), INSEE canton code (P2506), INSEE department code (P2586), INSEE region code (P2585)) et des milliers d'éléments à supprimer (en gros, toutes les communes et les cantons de France).
Cdlt, VIGNERON (talk) 15:00, 17 March 2016 (UTC)

fr:WP:Legifer/mars 2016#Changement de licence and c:Commons:Village pump/Copyright/Archive/2016/03#"Open License" and CC-0 also conclude that everything is OK and the imports can be done by a bot owner. Oliv0 (talk) 19:09, 24 March 2016 (UTC)

Some stats and info about INSEE codes[edit]

Hi,

I've quickly looked into the use of INSEE municipality code (P374). Currently, it's used on 37,196 items (Query: claim[374]) (for information, there is 36,658 on January, 1st 2015 but there was more in the past), all with country (P17) = France (Q142), with instance of (P31) = commune of France (Q484170) (or = municipal arrondissement (Q702842)), with coordinate location (P625), with located in the administrative territorial entity (P131), etc. (see contraint violation of INSEE municipality code (P374) for more informations).

To go further :

For information, there is INSEE canton code (P2506). It could be useful to check the consistency : the INSEE municipality code (P374) of a commune and the INSEE canton code (P2506) of canton where the commune should begin the the same 2 or 3 number (which the INSEE departement code, there is no know exception, and wich is the end of the ISO 3166-2 code (P300) too for Metropolitan France (Q212429)).

tldr; I did it quickly and further inspection should be done but everything seems pretty fine.

Cdlt, VIGNERON (talk) 12:51, 26 February 2016 (UTC)

Import date of birth (P569)/date of death (P570) from Wikipedia[edit]

Lang 2007
→ ja 334
[en] 326 (~5%)
⇒ ru 124
⇒ uk 121
→ zh 116
pt 115
es 103
→ ar 100
fr 91
hu 83
tr 79
→ ko 56
id 52
et 49
fi 44
→ el 40
→ th 34

Wikidata:Database reports/Deaths at Wikipedia lists items with dates of death at Wikipedia (10-15% of all). Some dates in articles of most languages are likely to be suitable for import by bot. For other dates, the only formatted part may be the year of death category. --- Jura 08:06, 2 August 2015 (UTC)

@Multichill: didn't you had a script for this? It only works when there is a strict format, yes. Sjoerd de Bruin (talk) 18:24, 3 August 2015 (UTC)
Symbol oppose vote oversat.svg Strong oppose for second time imports same data from the same Wikipedia. Any kind of automatic and repeatable Wikipedia->Wikidata copy work makes all others Wikipedia vulnerable to mistakes (and vandalism) in single. -- Vlsergey (talk) 19:55, 3 August 2015 (UTC)
None of these pages currently have P570 defined, thus it's not a matter of re-import. Many articles may only exist in 1 language. --- Jura 21:11, 3 August 2015 (UTC)
1. "Reimport" is not about statements, but about project+property. Having p570 imported from any wiki, it shall not be reimported. Especially not on scheduled/automated basis. Arguments above. 2. I'm okay with single time import of P569/P570 from those projects. -- Vlsergey (talk) 15:03, 4 August 2015 (UTC)
I agree that it shouldn't be done for the current year on an automated basis. If you look at "date added" column on the lists, you will notice that most entries are fairly old. --- Jura 08:30, 5 August 2015 (UTC)
Looking at en:Patri J. Pugliese it seems that the formatted version is fairly recent (2014), en:Victoria Arellano has persondata since 2010, pt:Joaquim Raimundo Ferreira Chaves since 2011. en:Mark Abramson since February 2013, but only the DOB got imported. tr:Yasemin Esmergül has the dates in the article lead. In any case, we can validate the year for P570. Maybe someone can assess ja,zh,uk, etc. To the right, the most frequent ones on the list for 2007. --- Jura 21:11, 3 August 2015 (UTC)
Persondata in en.WP is deprecated and should not be relied on. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:06, 4 August 2015 (UTC)
Can you provide references for your claims? Thanks. --- Jura 10:26, 4 August 2015 (UTC)
Discussion of persondata: RfC: Should Persondata template be deprecated and methodically removed from articles? Jc3s5h (talk) 11:33, 4 August 2015 (UTC)
The conclusion mentioned in the link only supports Pigsonthewing's first claim. How about the second? --- Jura 11:37, 4 August 2015 (UTC)
Q8078 refers. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:41, 4 August 2015 (UTC)
Funny. Wasn't it depreciated because Wikidata could hold the data rather than for data quality reasons? --- Jura 08:30, 5 August 2015 (UTC)
No. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:43, 6 August 2015 (UTC)
Any progress on the missing reference? --- Jura 04:34, 8 August 2015 (UTC)
In reply to @Sjoerddebruin: yes I imported date of birth and date of death in the past. I was certainly not the only one. I'm quite confident the persondata template on the English Wikipedia got scraped to Wikidata quite some time ago. I don't think there is much data left to scrape from that corner. My focus was on items about humans with a link to the Dutch Wikipedia, but without data of birth. I used regular expression to extract the data of birth from the introduction of the article. You could do that for other languages too. You just need to start conservative and expand a bit in each iteration. I was able to import thousands of birth dates this way. Multichill (talk) 17:17, 4 August 2015 (UTC)
Thanks for your helpful feedback. enwiki might indeed be mostly done. For the sample year 2007 in the table above, it's just 5%. BTW nl is not on the reports as there are no nl categories for persons by year of death. --- Jura 08:30, 5 August 2015 (UTC)
Actually of the 326 for enwiki, 300 do have persondata. --- Jura 08:36, 5 August 2015 (UTC)

I imported today birth and death dates of people deceased in 2000 by parsing the introduction phrase of the English article. If the edits [1] are okay, I could continue with other years and other languages. I pay attention not to import dates before 1924 and I will not run the script twice on the same article. --Pasleim (talk) 18:42, 12 August 2015 (UTC)

Thanks! I checked 10 and they were all fine. All but 2 or 3 had the sames dates in infobox and/or persondata too.
I noticed that many trwiki articles have a person infobox, maybe this could be imported as well. --- Jura 11:04, 15 August 2015 (UTC)
That was quick. Good work! It did reduce the numbers a bit. It might be worth applying the same method to some of the templates mentioned for enwiki.
The infobox in trwiki doesn't seem that frequent, but for ptwiki, I found that many use pt:Template:dni/pt:Template:Nascimento and pt:Template:Morte or pt:Template:morte e idade/pt:Template:Falecimento e idade. This is done in infoboxes or the article text. --- Jura 07:11, 16 August 2015 (UTC)
I did some from pt:Template:Morte. --- Jura 07:21, 17 August 2015 (UTC)
pt:Template:morte e idade/pt:Template:Falecimento e idade done as well. --- Jura 09:21, 17 August 2015 (UTC)
  • For jawiki (4370 missing), I had a look at ja:Template:死亡年月日と没年齢, but that would give only about 160 most with just the month of death. eswiki has a couple of templates that could be parsed, but there is no single one. --- Jura 04:13, 19 August 2015 (UTC)
  • I had a look at 2009: Most frequent languages are: ar 125, uk 116, en 114, es 109, ru 99, hu 86
For ukwiki, of 10 articles, 6 had an infobox (5 different ones: the uk ones from Template:Infobox ice hockey player (Q5650114), Template:Infobox scientist (Q5624818), Template:Infobox architect (Q10973090), Template:Infobox person (Q6249834), Template:Infobox artist (Q5914426) normally in the format dd.mm.yyyy), the other 4 had the dates in the beginning of the text in Cyrillic. --- Jura 10:37, 31 August 2015 (UTC)
For ukwiki, I just imported the dates from uk:Template:Особа. --- Jura 13:33, 7 September 2015 (UTC)

Given that we might have exhausted the bot approach, I made a request at Topic:Spgr35wayo8zy15y. --- Jura 06:20, 24 September 2015 (UTC)

Is there any way to import date of birth (P569) and date of death (P570) from Slovenian Wikipedia? We are at the halfway point in updating our infoboxes with Wikidata. We have 2 tracking categories that includes articles with birth and death dates, that are not yet written into Wikidata (birth: sl:Category:Lokalnega datuma rojstva še ni v Wikipodatkih and death: sl:Category:Lokalnega datuma smrti še ni v Wikipodatkih). Our biografic articles have special introduction phrase (example: * (?), ....., † 29. marec 1770, ... or * 7. junij 1707, ...., † 2. januar 1770 or just year † 1770 or unknown † ?. We first want to transfer dates to Wikidata and then continue with cleaning our infoboxes. Afterwars we will update next half of our infoboxes and according to that subsequent import data will be needed. --Pinky sl (talk) 11:11, 17 March 2016 (UTC)

You could try to do part of it with Harvesttemplates, e.g. [2]
--- Jura 11:29, 17 March 2016 (UTC)
You mention dates during which Europe was transitioning from the Julian to the Gregorian calendars, but you don't mention the data having any calendar indication. Thus I would suggest you not import any dates before 1924. Jc3s5h (talk) 12:08, 17 March 2016 (UTC)
Ok, thanks, will see what we can do. --Pinky sl (talk) 16:20, 18 March 2016 (UTC)

ALL CAPS[edit]

A series of items for people have labels in capital letters .. if these could be converted into a more standard format .. --- Jura 08:00, 28 August 2015 (UTC)

To start off with it, I created a quarry list: http://quarry.wmflabs.org/query/3966 Most of these labels can be converted but there are also some excpetions, e.g. RUBEN XYZ (Q7277963) --Pasleim (talk) 16:19, 16 September 2015 (UTC)
I think the Japanese ones in that list could be left out. Latin isn't the usual script for Japanese, so a Latin script name is likely be deliberately written the way it is. The ones I checked all had a jawiki sitelink which is all caps like the label. - Nikki (talk) 18:06, 16 September 2015 (UTC)
I just did the en ones for items with P31:Q5. I don't think there was any CamelCasing in it. Thanks for the list! --- Jura 09:49, 30 September 2015 (UTC)
It seems that some didn't get included, sample: Q20734549 (has two spaces). --- Jura 09:27, 1 October 2015 (UTC)
@Jura1: There are a few items using the property Dictionary of Welsh Biography ID (P1648) for people with three- or four-part names which haven't been converted, e.g. Gomer Morgan Roberts (Q20733078) and Griffith Richard Maethlu Lloyd (Q20821426). (See the list of all DWB entries here.) Would you be able to fix these? Ham II (talk) 20:43, 17 March 2016 (UTC)
I could, but as there are few other things on my todo list, I'd rather leave this to others. This is probably a re-occurring task, so maybe someone wants to build a set of lists to handle it. Once one has the item and the label to correct, corrected ones can be added with QuickStatements (Q20084080)
--- Jura 13:56, 18 March 2016 (UTC)
Things like this should be converted as well.
--- Jura 14:07, 15 April 2016 (UTC)

Commons Category <> Sitelinks?[edit]

Hello, Is it possible/feasible for a bot to sync the Commons Category property with the sisterlinks? There were some cases where I added a sisterlink but forgot to add the property, e.g. Q6226954. A bot could easily read/write these back and forth. I'm not sure if this has been proposed before or not. I'd do it myself, but admittedly, I don't understand enough about how Wikidata works. Avicennasis (talk) 05:03, 18 September 2015 (UTC)

I think this would be an interesting task. I added hundreds of commons category properties using QuickStatements but as far as I know there is no equivalent tool to add sitelinks. Using commons category property to add sitelinks would be great.--Pere prlpz (talk) 22:28, 9 October 2015 (UTC)
Symbol oppose vote.svg Oppose Items which do not represent categories (without instance of (P31)  Wikimedia category (Q4167836)) should not have sitelinks to commons categories. We have Commons category (P373) which is apparently also used by the OtherProjectsExtension for this cases. I would propose the following: For all items which do not represent categories (without instance of (P31)  Wikimedia category (Q4167836)):
This would clean up the mess we have when it comes to commons sitelinks. -- T.seppelt (talk) 05:27, 6 April 2016 (UTC)
I don't think it's fair to remove sitelinks while they're still needed for Commons itself. There are issues that need resolving before something like that could be a workable solution.
  • Interwiki links from Commons don't work if the Commons category is on another item. Moving the sitelink is nice for consistency in Wikidata, but horrible for Commons who suddenly lose all their interwiki links (... unless they ignore Wikidata and stick to old-style interwiki links).
  • The "Add link" function in the sidebar for adding interwiki links puts the Commons category on the main item.
  • Worst of all: Our own notability policy is unclear. Some people believe it says Commons categories shouldn't be added to non-category items (i.e. create a separate item for them), other people, including some admins, believe it says Commons categories are not notable items (i.e. delete the separate items).
Commons wants interwiki links to articles. People on Commons will keep both accidentally and deliberately linking categories to non-category items. We can either continue as we are (with lots and lots of those) until the issues are fixed and we can model things in a way that works for Commons too, or we can try to force Commons categories to be separate items more aggressively and create even more friction between us and Commons (potentially to the point of Commons giving up on Wikidata because we would be actively and deliberately obstructing them). - Nikki (talk) 10:07, 6 April 2016 (UTC)

Cyrillic merges[edit]

This included pairs of items with articles at ruwiki and ukwiki each (Sample: Q15061198 / Q12171178). Maybe it's possible to find similar items merely based on labels in these languages and merge them. --- Jura 03:33, 19 September 2015 (UTC)

I cannot find any ru-uk pairs. Are they all done? --Infovarius (talk) 16:27, 3 November 2015 (UTC)
The ones on that list are identified based on dates of birth/death and we regularly go through them. The occasional findings there (also with ru/be) suggest that there are more (without dates). A query would need to be done to find them. --- Jura 16:33, 3 November 2015 (UTC)
Today the list includes quite a few, thanks to new dates of birth/death being added. --- Jura 16:43, 2 December 2015 (UTC)
A step could involve reviewing suggestions for missing labels in one language based on labels in another languages with Add Names as labels (Q21640602): sample be/ru. --- Jura 11:44, 6 December 2015 (UTC)
I came across a few items that had interwikis in ukwiki to ruwiki, but as they were on separate items, these weren't used to link the articles to existing items (sample, merged since). --- Jura 10:17, 15 December 2015 (UTC)
SELECT DISTINCT ?item ?Spanishlabel ?item2 ?Italianlabel
WHERE 
{
  	VALUES ?item { wd:Q19909894 }
  	?item wdt:P31 wd:Q5 .

    VALUES ?item2 { wd:Q16704775 }
  	?item2 wdt:P31 wd:Q5 .

    ?item rdfs:label ?Spanishlabel . FILTER(lang(?Spanishlabel)="ru")
	BIND(REPLACE(?Spanishlabel, ",", "") as ?Spanishlabel2)

    ?item2 rdfs:label ?Italianlabel . FILTER(lang(?Italianlabel)="uk")

    FILTER(str(?Spanishlabel2) = str(?Italianlabel))
  	FILTER(str(?Spanishlabel) != str(?Italianlabel))
}
LIMIT 1

#added by Jura1


Try it!

The above currently finds one pair. It times out when not limited to specific items ;) Maybe there is a better way to find these.
--- Jura 14:19, 3 April 2016 (UTC)

In the meantime the two items were merged, so it doesn't work anymore.
--- Jura 16:54, 4 April 2016 (UTC)

VIAF Identifiers for Dutch streets[edit]

There are about 900 items like Q19302580 for individual streets in the Netherlands. They all are instance of (P31)  street (Q79007), country (P17)  Netherlands (Q55), have dutch label "straat in ...", no sitelinks at all and additionally properties P276, P969, P625 and P281.

VIAF has got notice of these items and wrongly assigned them to (mostly geographic) entities. Recently these assignments had been imported here, resulting in P214 with reference stated in (P248)  Virtual International Authority File (Q54919) and retrieved (P813) some time this year. To remedy that situation on both sides P214 should be deleted on these items and then set to "novalue" (VIAF will take this as a hint to dissassociate the WD item from its clusters). -- Gymel (talk) 19:31, 16 October 2015 (UTC)

This could be done with Autolist (Q21640555). --- Jura 12:06, 4 December 2015 (UTC)
Neither deleting values with specific qualifiers nor setting to novalue is possible with autholist, at least when I checked last some weeks ago. -- Gymel (talk) 22:22, 29 December 2015 (UTC)
Removal does, but not setting novalue. --- Jura 19:16, 30 December 2015 (UTC)

Remove "was a", "is a" etc at the beginning of English descriptions[edit]

A few description start with "is a", samples:

etc. That part of most descriptions can be removed. --- Jura 10:15, 7 November 2015 (UTC)

If you have a text editor with an advanced enough find and replace function, I think this would be pretty easy to do by selecting the IDs and descriptions using Quarry, removing the "is a" (etc) with a find/replace that only matches the start of the description and then use QuickStatements to do the edits. There might be a way to do the find/replace step in SQL instead of in a text editor, but I'm not very familiar with MySQL and a quick search suggests MySQL doesn't have an equivalent of PostgreSQL's regexp_replace. - Nikki (talk) 10:01, 9 November 2015 (UTC)
Something like http://quarry.wmflabs.org/query/3351 could do. I had fixed once a few of these.
As it's probably a re-occurring task, maybe someone wants to do some as well. --- Jura 10:34, 9 November 2015 (UTC)

Probably would be good to clean-up "he/she is/was" etc. See this, for example. Of course, previous description was crap, but still :) --Edgars2007 (talk) 18:17, 2 March 2016 (UTC)

Pasleim may be able to tell us more about Special:Diff/193414448. --Leyo 18:39, 2 March 2016 (UTC)
This is the short description in the Persondata template of en:Philippe Rizzo --Pasleim (talk) 19:11, 2 March 2016 (UTC)

Import names in Latin script from kowiki[edit]

There are a few items for persons that link only to kowiki and don't have labels in English. Samples:

The articles for these in kowiki have names in Latin script (or other scripts) defined in the introduction.

This could be imported to Wikidata as label or alias.

The two samples were already merged as they appeared on the report for identical birth and death dates. --- Jura 10:54, 14 November 2015 (UTC)

All samples have been merged now. Maybe these items were all redundant? --Pyfisch (talk) 19:41, 27 December 2015 (UTC)
Does it matter? I'd assume there are other items without labels and articles that include names in Latin script at kowiki. No need for a bot for 3 items ;) --- Jura 10:37, 28 December 2015 (UTC)

Maybe the following could work for this:

  • Generate a list of items for people that don't have labels in a series of languages (including en), but (e.g.) kowiki
  • Maybe exclude items that already meet some other criteria
  • Scan these articles for names in Latin script at predefined places
  • Present the result in a browser like the ones for dates of birth/death to confirm by a user.
    --- Jura 09:06, 16 March 2016 (UTC)

Paulys Realenzyklopädie der classischen Altertumswissenschaft[edit]

Hi folks, after a discussion a few months ago and a personal discussion with User:Jonathan Groß on the very same issue, we now want to request a bot connecting the Pauly items to their respective Wikidata items and therewith with their respective Wikipedia articles. Example: Hyampolis Q1534902 and RE:Hyampolis Q19996427. As an anchor for the connection, you can use the Wikipedia parameter of s:de:Vorlage:RE (see, for example, s:de:RE:Domitius 25, connected via the template to w:de:Gnaeus Domitius Ahenobarbus (Konsul 32), that is Q351503). If this would be at all possible, the bot should also produce a list of the linked items in order for us to maintain the Wikisource/Wikipedia links. Is this understandable or do you need more information? --Tolanor (talk) 01:02, 4 December 2015 (UTC)

Hi, I understand that for s:de:RE:Hannibalianus 2 for example the following claims should be added:
Is this correct? I have script that can export this data from Wikisource and add it to Wikidata. A list with the RE title, the Wikidata item, and the related subject item is also generated. --Pyfisch (talk) 13:43, 4 January 2016 (UTC)
Yep, that's it. Regarding the list: Would it be possible to add links to the respective Wikisource site, the Wikipedia article and the Wikidata item? We want to use this list for maintenance in the future, and that would make it much easier to do so. Would it also be possible to show when a Wikisource site links to a Wikipedia article that doesn't exist? Thanks a lot for your help, --Tolanor (talk) 14:59, 7 January 2016 (UTC)
Yes this is possible, I will add this information to the list. --Pyfisch (talk) 15:57, 7 January 2016 (UTC)
Here is an excerpt: https://www.wikidata.org/wiki/User:Pyfisch/RE --Pyfisch (talk) 17:06, 7 January 2016 (UTC)
Great! Are these all? Could you maybe refresh your list once more (I did some changes already), so that we can go through it before you activate your bot? There are still quite a lot of wrong links. --Tolanor (talk) 18:23, 7 January 2016 (UTC)
It is incomplete. The complete list will have 20000 rows. I am just creating it. --Pyfisch (talk) 14:32, 9 January 2016 (UTC)
The run was successful (I had to rewrite the script because Pywikibot was too slow). I have derived a list with all items referencing a Wikipedia article. --Pyfisch (talk) 21:08, 9 January 2016 (UTC)
(Note in German): Die meisten Einträge in Paulys Realencyclopädie verweisen auf keinen Artikel. Da die Liste mit diesen Einträgen zu lang geworden wäre (MediaWiki kann sie nicht mehr speichern) habe ich nur die Einträge mit einem Verweis auf Wikipedia hinzugefügt. Man sieht, dass viele Artikel auf Begriffserklärungen verweisen, ich bin mir unsicher ob diese Einträge in Wikidata verbunden werden sollten. Oft verweisen mehrere Artikel in der RE auf den gleichen Artikel in der Wikipedia. Für mich sieht das noch nach sehr viel manueller Arbeit aus. --Pyfisch (talk) 21:08, 9 January 2016 (UTC)
I fixed a bug and updated the MediaWiki table. Here is alsow the raw list: https://raw.githubusercontent.com/pyfisch/fischbot/master/paulyre/list.csv --Pyfisch (talk) 12:29, 10 January 2016 (UTC)
Great, thanks. We will clean the list and then you can activate your bot. We'll let you know when we're done. --Tolanor (talk) 19:02, 17 January 2016 (UTC)

Correct a massive redundancy problem in references related to a document[edit]

The url http://www.scb.se/Statistik/MI/MI0810/2010A01Z/05_Tatorter2010_befolkning_1960_2010.xls is massively used to reference some statements, but its title is inline into references and is using the old string datatype. There is an item for this document now Population in urban areas 1960-2010 (Q21855710) View with Reasonator See with SQID and substituting the set of claims used in those references would save a lot of work when using https://tools.wmflabs.org/pltools/addlingue/ as it keeps popping up for every claim, while solving a massive redundancy problem. Thanks :) author  TomT0m / talk page 17:29, 28 December 2015 (UTC)

  • Same for "List of FIPS region codes" and its parts : "List of FIPS region codes (D-F)" for example, about 230 results on wikidata seen by google : "https://www.google.fr/?gws_rd=ssl#q=site:wikidata.org+%22List+of+FIPS+region+codes+%28D%E2%80%93F%29%22&start=10 . A little more difficult. author  TomT0m / talk page 13:09, 29 December 2015 (UTC)
    • I'm not convinced. It seems that some people prefer this approach (non-contributing users of Wikidata and talk-show hosts). --- Jura 13:53, 29 December 2015 (UTC)
      • I'm not asking for a policy, just for a robot. Here clearly it makes maintenance hard to hardcode the string in every references as to make a correction we would add to correct every single instances of this string. The same for the next correction that might occur. Although we should clearly enlight the need to create an item in those cases in our good practices list. author  TomT0m / talk page 16:04, 29 December 2015 (UTC)
        to be more explicit : this string is monolingual and needs to be "multilinguised". This is fastidious and boring, (kind of fast to do manually, but still a lot of work). Making the job by a bot makes all subsequent corrections just to be made on the item. author  TomT0m / talk page 16:13, 29 December 2015 (UTC)
        last thing : the reference, once this work done, is just a click a way, for consultation (or no click away with a popup) or for correction. So I would not understand what would be the drawbacks, even if some contributors would prefer to write the informations again and again (an idiot thing to do in my mind if you use the same reference in many statement ...) author  TomT0m / talk page 16:24, 29 December 2015 (UTC)

Too many people born on January 1[edit]

It seems that some entries have the precision of date of birth (P569) set to 11 despite to being based on the year only.

The result is that Wikidata:Database reports/birthday today is a bit crowded compared to other days (5200 entries compared to 2900 for January 2).

I'm a bit too lazy to investigate this myself today in detail and figure out how to fix it. Maybe a single bot's edits need to be corrected only. Help would be appreciated.
--- Jura 15:35, 1 January 2016 (UTC)

I checked a few (random) entities:

I also found a duplicate Peter de Rijcke (Q18535320). So there is no common mistake for these entries. One explanation could be that for many people where the exact birthday is not known January 1. is entered into identity cards and passwords. This information is then propagated to the Web. There are many Turkish and Arabic persons in the list this would support the hypothesis. --Pyfisch (talk) 14:27, 2 January 2016 (UTC)

One of the first ones I came across was Q2053290 (a manual edit) which reminded me of some date related bot edit issues. Might just be a random incorrect edit as it happens to me as well. --- Jura 15:13, 2 January 2016 (UTC)
Here are some numbers for January 1/2/3. It doesn't seem limited to recent years numbers by day/year. Nationalities with more frequent births are Pakistan (Q843), Turkey (Q43), Iran (Q794), India (Q668), Greece (Q41), South Africa (Q258), Romania (Q218), People's Republic of China (Q148), Soviet Union (Q15180), Ukraine (Q212). But even United States of America (Q30) has twice as many. If there is an easy way to check who added them, maybe we can start from there.
--- Jura 14:50, 4 January 2016 (UTC)

Replacing "(OBSOLETE) title" (P357) with "title" (P1476)[edit]

title (P1476) was created more than a year ago to replace (OBSOLETE) title (use P1476, "title") (P357), but we still have 26782 items with P357, which seems like something that should be addressed by a bot. Pinging participants of the property proposal discussion: @Zolo, Vlsergey, Fralambert, ValterVB, Filceolaire: @Yair rand, Jakec:. --Daniel Mietchen (talk) 15:15, 4 January 2016 (UTC)

The new title property requires a language tag. The old property is a string, so for every deprecated title the language must be checked. A bot can not (reliably) do this. --Pyfisch (talk) 16:03, 4 January 2016 (UTC)
I have done more or less 80 item with Italian lang., using (OBSOLETE) title (use P1476, "title") (P357) and original language of work (P364) but is all manual task. --ValterVB (talk) 17:54, 4 January 2016 (UTC)
Two weeks ago I published a tool which helps with replacing P357 with P1476. Since then, around 20000 statements were replaced. There are btw more than 26782 items with P357, as P357 is also used in qualifiers and references. --Pasleim (talk) 18:17, 4 January 2016 (UTC)
I use your tool and I noticed that some strings end with "(Russian)". It should be possible to filter those out and edit them with a bot. There are also some items with HTML entities. --Pyfisch (talk) 19:18, 4 January 2016 (UTC)
Mbch331 made a list that proved useful to sort out P513: Naam in moedertaal.
--- Jura 20:13, 4 January 2016 (UTC)

HTML entities in monolingual strings[edit]

Working with addlingue it seems that there is a few strings that have HTML entities instead of their unicode equivalent character. The substitution could be safe to make by a bot I think.

If possible (and if that exists please point me to one) maybe a constraint report on the relevant properties should be set up. author  TomT0m / talk page 15:24, 10 January 2016 (UTC)

@TomT0m: SPARQL found only 4 titles with HTML entities: tinyurl.com/jmpemym. Matěj Suchánek (talk) 13:53, 16 April 2016 (UTC)
This also applies to other properties with string datatype and labels, as well. --Edgars2007 (talk) 09:05, 17 April 2016 (UTC)
Don't forget that HTML entities can also be written as decimal or hexadecimal numbers. this query includes those as well as HTML tags. Still not very many left for this property though. - Nikki (talk) 12:11, 17 April 2016 (UTC)
Another query, this time including some wiki markup too. There's also Template:Complex constraint which could be used to add constraints to the relevant properties. - Nikki (talk) 12:03, 24 April 2016 (UTC)

Metadata on listed buildings in Wales from en:Template:Cadw listed building row[edit]

en:Template:Cadw listed building row is transcluded in 41 list articles about historic buildings in Wales on the English Wikipedia. The pages contain a lot of information it would be helpful to have on Wikidata, notably images, Commons categories and locations. Wikidata items exist for all the buildings in question but in most cases they don't have this information – here's a random example: Church Of Saint David (Q17743261). The hb = parameter in the template matches up with Cadw Building ID (P1459), so that's how the items would be identified. Could a bot add the following metadata to the items?

Cheers, Ham II (talk) 16:50, 29 January 2016 (UTC)

Ham II, not sure you discovered Wikidata:WikiProject Cultural heritage yet. I did what you're describing several years ago for the Rijksmonument (Q916333). Strategy is to find all items already here and clean them up and then grab data from the monuments database. Multichill (talk) 20:07, 5 February 2016 (UTC)
@Multichill: Thanks for the pointer towards WikiProject Cultural heritage. The reason I think importing data from the Wikipedia lists is important is because of the images and Commons categories that have been added to those pages over the years. The rest of the data, I'm sure, could come from the monuments database. Ham II (talk) 09:45, 6 February 2016 (UTC)

Canadian lakes[edit]

@Laurianna2, VIGNERON: As mentioned on WD:Bistro#Liens interlangues, user:Lsj has created more than 50k items about Canadian lakes in svwiki and they need to be linked to Wikidata. Relevant catgegory: sv:Kategori:Insjöar i Kanada). Sample article: sv:Étang de Hart.

We should

  • link articles:
  1. create a new item when the article does not correspond to an existing label (parentheses excluded)
  2. when the title matches an item, either list them to check it by hand, or given that there will be hundreds of them, devise an algorithm to determine if it refers to the same lake based on coordinates.
  • add data
  1. P31: lake (Q23397) and P17:Q16
  2. P131 from the "region" param and GeoNames ID (P1566) from the geonames param of sv:Template:Geobox.
  3. add labels at least in English and French that should be equal to the Wikipedia article. When the title starts with "Lac .. ", "Baie ", "Bassin " or "étang" the first letter should be lower-cased, at least in French.
  4. elevation above sea level (P2044) and coordinate location (P625) from the Infobox, Geonoames or wherever.

Most of that can be done with creator.html, autolist, and harvesttemplates, but I think adding labels requires a real bot. If someone can do the whole thing in one go, that would probably be best. -Zolo (talk) 09:34, 30 January 2016 (UTC)

Add Names as labels (Q21640602) can work for labels. It seems that svwiki prefers that we wait a month or so after they created such stubs. Apparently it could happen that they delete entire bot created sets.
--- Jura 09:39, 30 January 2016 (UTC)
Even if the article end up being deleted in svwiki, I think it makes sense to have the items.
I didn't know Add Names as labels (Q21640602), that could do the job. So, I guess I can do it with the standard tools, but that will require something more than 10 edits by item so a flood of more than 500k edits in all, maybe it is better to wait for a bot that do it in fewer edits ? --Zolo (talk) 09:57, 30 January 2016 (UTC)
Well, if they delete it, I don't think we want to have it either. It's something Innocent bystander mentioned on some of the other series. As for the number of edits, I'm not sure if it matters.
--- Jura 10:31, 30 January 2016 (UTC)
Yes, I thought about using robots for this project. Where did you see that svwiki wanted to delete this robot's stub? Btw, I've noticed lot of mistakes in Geonames, and it seems that the site has not been updates since December.--Laurianna2 (talk) 19:41, 4 February 2016 (UTC)
If pages like these are deleted on svwiki it is most likely done because they have found mistakes in the database or that the quality of the data is poor. Pages are also deleted when Lsj find mistakes in the bot code. It is then sometimes easier to delete the pages and restart the bot. That is why I recommend you to wait a month. One problem we have detected in Canada is that there are often duplicate items in GeoNames. One item with an English name and one item with the French name. -- Innocent bystander (talk) 07:15, 2 March 2016 (UTC)

Storing ICD9 and 10 codes from EN medical templates[edit]

Proposal

I propose that the ICD9 and ICD10 codes are located on medical templates in the English Wikipedia and stored here.

Context

Most templates associated with WikiProject Medicine on the english wikipedia have associated ICD9 and 10 stored in their titles: eg. [3]

This is an attempt to store related data that is better stored here, on Wikidata. This benefits readers by allowing data to be stored in a more appropriate location, and benefits data handlers by giving them more data to play with and analyse at some future date :)

Precedence

A previous bot took similar data (Gray's Anatomy and Terminologia Anatomica data) from anatomical templates and stored them here. The bot request for that is here: [4]

Comments

Ping to @ValterVB (talkcontribslogs) who was so helpful last time :). --LT910001 (talk) 22:13, 10 February 2016 (UTC)

Just for record: Wikidata:Bot requests/Archive/2015/02#Move all template ICD9 and ICD10 references to wikidata. --Edgars2007 (talk) 22:31, 10 February 2016 (UTC)

(Voice) actors[edit]

I propose to move all cast members for subclasses of animated film (Q202866) from cast member (P161) to voice actor (P725). --Infovarius (talk) 16:53, 14 February 2016 (UTC)

Not all animated films are completely animated. This couldn't be accurately done without some level of manual supervision. --Yair rand (talk) 17:02, 14 February 2016 (UTC)
How many cases do we have of this? --Izno (talk) 12:37, 15 February 2016 (UTC)
On en.wikipedia 270 (more or less) --ValterVB (talk) 19:12, 15 February 2016 (UTC)
Excellent! So we have a small excluding set for manual work, and other huge set for automated work. --Infovarius (talk) 16:41, 20 February 2016 (UTC)
So, @ValterVB:, can you help? Meanwhile some statements with wrong property are simply deleted (without moving to right property). --Infovarius (talk) 10:59, 18 March 2016 (UTC)
Can anyone make a move for The Little Prince (Q16386722)? It needs also some refinement (there are English and original French voice actors mixed). --Infovarius (talk) 22:09, 25 April 2016 (UTC)

Sorting flags by level of government[edit]

Hello. I'm trying to control constraint violations for applies to jurisdiction (P1001). Could someone please:

  1. For items in w:en:Category:National flags, change instance of flag (Q14660) to national flag (Q186516).
  2. For items in w:en:Category:United States state flags could you please change instance of flag (Q14660) or national flag (Q186516) to flag of a country subdivision (Q22807280).
  3. For items in subcategories of w:en:Category:Flags of cities by country, change instance of flag (Q14660) to flag of a municipality (Q22807298).

Thank you! --Arctic.gnome (talk) 20:59, 15 February 2016 (UTC)

@Arctic.gnome: Sorry for the delay, I'm ready to do this. Could please just in case provide a rationale why it is okay to do this task? Matěj Suchánek (talk) 13:32, 16 April 2016 (UTC)

Import number of state representatives in Congress[edit]

Please import the total number of seats in the US Hourse of Representatives, as listed in wikipedia:List_of_U.S._states_and_territories_by_population#States_and_territories table, into each US state. I think Property:P1410 is a perfect candidate for that, as it requires to qualify that this is related to US House of Reps. Also, it would be amazing to do the same for other similar legislature, like European parliament. And lastly, historical data is always amazing, if one could find it (this data is connected to US census). Having this data would allow interesting political visualizations like these demos. --Yurik (talk) 05:05, 17 February 2016 (UTC)

Property:P1410 does not seem suitable to me in this context because the most straightforward interpretation is the number of seats in the state legislature, not the US House of Representatives. It could also be interpreted as the number of seats the state has in the US Senate and US House of Representatives combined. Further confusion result because each state has its own name for its legislature and the houses that make up the legislature. Jc3s5h (talk) 13:07, 6 March 2016 (UTC)
Jc3s5h, I think that's why that property has a mandatory Property:P194 qualifier. So for my request, you can make 3 values in each state: US Congress, US Senate (2 each), and US House of representatives. --Yurik (talk) 09:59, 13 March 2016 (UTC)
I'm not familiar with mandatory qualifiers. Will the UI or the API prevent the storage of entries that lack the mandatory qualifier? Jc3s5h (talk) 13:05, 13 March 2016 (UTC)
Jc3s5h, seems that even though it is "required", the only enforcement comes from the bots at this point. I added a sample entry Q99 - seems to look good. Would be great to automate the import, plus it would be amazing if the historical numbers are also added (they kept changing throughout the history based on the population) --Yurik (talk) 21:39, 13 March 2016 (UTC)

Possible paintings report[edit]

Articles about paintings are created all the time, but unfortunately not all of them have an updated item. Every once in a while I use autolist to generate a list of items that are in the category tree under Category:Paintings (Q6009893) and don't have instance of (P31) or location (P276) (for example possible paintings on the English Wikipedia). This only works on a per wiki basis and takes quite a while to load. Does someone feel like building an onwiki report for this?

My approach would probably be:

Anyone up for this? Multichill (talk) 12:07, 21 February 2016 (UTC)

Sounds like something for WikiProject Sum of all paintings.
--- Jura 08:35, 8 April 2016 (UTC)

Taxon labels[edit]

For items where instance of (P31)=taxon (Q16521), and where there is already a label one one or more languages, which is the same as the value of taxon name (P225), the label should be copied to all other empty, western alphabet, labels. For example, this edit. Please can someone attend to this? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:11, 10 March 2016 (UTC)

Do you mean label or alias? I would support the latter where there is already a label and that label is not already the taxon name. --Izno (talk) 17:03, 10 March 2016 (UTC)
No, I mean label; as per the example edit I gave. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
See your last request: Wikidata:Bot_requests/Archive/2015/08#Taxon_names. --Succu (talk) 18:57, 10 March 2016 (UTC)
Which was archived unresolved. We still have many thousands of missing labels. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 10 March 2016 (UTC)
Nope. There is no consensus doing this. Reach one. --Succu (talk) 20:22, 10 March 2016 (UTC)
You saying "there is no consensus" does not mean that there is none. Do you have a reasoned objection to the proposal? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:56, 10 March 2016 (UTC)
Go back and read the linked discussions. In the nursery of wikidata some communities had strong objections. If they changed their mind my bot can easily execute this job. --Succu (talk) 21:19, 10 March 2016 (UTC)
So that's a "no" to my question, then. I read the linked discussions, and mostly I see people not discussing the proposal, and you claiming "there is no consensus", to which another poster responded "What I found, is a discussion of exactly one year old, and just one person that is not supporting because of 'the gadgets then need to load more data'. Is that the same 'no consensus' as you meant?". There are no reasoned objections there, either. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:24, 10 March 2016 (UTC)
For the lazy ones:
--Succu (talk) 21:53, 10 March 2016 (UTC)
I already done for Italian label in past. Here other two propose: May 2014 and March 2015 --ValterVB (talk) 09:54, 11 March 2016 (UTC)
@ValterVB: Thank you. Can you help across any other, or all, western-alphabet languages, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:18, 16 March 2016 (UTC)
Yes I can do it, but before to modify 2,098,749 items I think is necessary to have a strong consensus. --ValterVB (talk) 18:14, 16 March 2016 (UTC)
@ValterVB: Thank you. Could you do a small batch, say 100, as an example, so we can then ask on, say, Project Chat? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:03, 18 March 2016 (UTC)
Simply ask with the example given by you. --Succu (talk) 15:16, 18 March 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Pigsonthewing:

  • Test edit: Q14945671, Q21444273, Q2508347, Q25247.
  • Languge: "en","de","fr","it","es","af","an","ast","bar","br","ca","co","cs","cy","da","de-at","de-ch","en-ca","en-gb","eo","et","eu","fi","frp","fur","ga","gd","gl","gsw","hr","ia","id","ie","is","io","kg","lb","li","lij","mg","min","ms","nap","nb","nds","nds-nl","nl","nn","nrm","oc","pcd","pl","pms","pt","pt-br","rm","ro","sc","scn","sco","sk","sl","sr-el","sv","sw","vec","vi","vls","vo","wa","wo","zu"
  • Rule:

Very important: is necessary verify if the list of languages is complete. Is the same that I use for disambiguation item. --ValterVB (talk) 09:42, 19 March 2016 (UTC)

    • I really don't like the idea of this. The label, according to Help:Label, should be the most common name. I doubt that most people are familiar with the latin names. Inserting the latin name everywhere prevents language fallback from working and stops people from being shown the common name in another language they speak. A very simple example, Special:Diff/313676163 added latin names for the de-at and de-ch labels which now stops the common name from the de label from being shown. - Nikki (talk) 10:29, 19 March 2016 (UTC)
      • @Nikki: The vast majority of taxons have no common name; and certainly no common name in every language. And of course edits can subsequently be overwritten if a common name does exist. As for fallback, we could limit this to "top level" languages. Would that satisfy? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
        • As far as I'm aware most tools rely on the absence of certain information. Adding #10,000 csv file of Latin / Welsh (cy) species of birds. would be rendered to handcraft. --Succu (talk) 23:11, 19 March 2016 (UTC)
          • Perhaps this issue could be resolved by excluding certain groups? Or the script used in your example could overwrite the label if it matches the taxon name? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
        • It may be the case that most taxon items won't have a common name in any language, but I don't see anything here which is only trying to target the taxon items which have no common names. Adding the same string to lots of labels isn't adding any new information and as Succu pointed out, doing that can get in the way (e.g. it makes it more difficult to find items with missing labels, it can get in the way when merging (moving common names to the aliases because the target already has the latin name as a label) and IIRC the bot which adds labels for items where a sitelink has been recently added will only do so if there is no existing label). To me, these requests seem like people are trying to fill in gaps in other languages for the sake of filling in the gaps with something (despite that being the aim of the language fallback support), not because the speakers of those languages think it would be useful for them and want it to happen (if I understand this correctly, @Innocent bystander: is objecting to it for their language). - Nikki (talk) 22:40, 22 March 2016 (UTC)
          • Yes, the tolerance against bot-mistakes is limited on svwiki. Mistakes initiated by errors in the source is no big issue, but mistakes initiated by "guesses" done by a bot is not tolerated at all. The modules we have on svwiki have no problem handling items without Swedish labels. We have a fallback-system which can use any label in any language. -- Innocent bystander (talk) 06:39, 23 March 2016 (UTC)
            • @Innocent bystander: This would not involve an "guesses". Your Wikipedia's modules may handle items without labels, but what about third-party reusers? Have you identified any issues with the test edits provided above? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
              • No, I have not found any issue in the examples. But this is not my subject, I would not see an issue even if it was directly under my nose. Adding correct statements for Scientific names and Common names looks more important here for the third party users than labels, which cannot be sourced. NB, the work of Lsjbot have done that Swedish and Cebuano probably have more labels than any other language in the taxon set. You will not miss much by excluding 'sv' in this botrun. -- Innocent bystander (talk) 07:00, 24 March 2016 (UTC)
                • If a taxon name can be sourced, then by definition so can the label. If you have identified no errors, then your reference to "guesses" is not substantiated. true, adding for Scientific names and Common names is important, but the two tasks are not mutually exclusive, and their relative importance is subjective. To pick one example at random, from the many possible, Dayus (Q18107066) currently has no label in Swedish, and so would benefit from the suggested bot run. indeed, it currently has only 7 labels, all the same, and all using the scientific name. Indeed, what are the various European language's common name for this mainly Chinese genus? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:34, 25 March 2016 (UTC)
          • No, this is not "trying to fill in gaps in other languages for the sake of filling in the gaps". Nor are most of the languages affected served by fallback. If this task is completed, then "find items with missing labels" will not be an issue for the items concerned, because they will have valid labels. Meanwhile, what is the likelihood of these labels being provided manually? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:14, 23 March 2016 (UTC)
            • If this is not trying to fill in the gaps for the sake of filling in the gaps, what problem is it solving and why does language fallback not help? (I'm sure the development team would be like to know that language fallback is not working properly). The taxonomic names are not the preferred labels and valid is not the same as useful (adding "human" as the description for humans with no description was valid, yet users found it annoying and useless and they were all removed again), the labels for a specific language in that language are still missing even if we make it seem like they're not by filling in all the gaps with taxonomic names, it's just masking the problem. I can't predict the future so I don't see any point in speculating how likely it is that someone will come along and add common names. They might, they might not. - Nikki (talk) 23:02, 24 March 2016 (UTC)
              • It solves the problem of an external user, making a query (say for "all species in genus X") being returned the Q items with no labels, in their language. This could break third party applications, also. In some cases, there is currently no label in any language - how does language fallback work then? How does it work if the external user's language is Indonesian, and there is only an English label saying, say, "Lesser Spotted Woodpecker"? And, again, taxonomic names are the preferred labels for the many thousands of species - the vast majority - with no common name - or with no common name in a given language. The "human" examples compares apples with pears. This is a proposal to add specific labels, not vague descriptions (the equivalent would be adding "taxon" as a description). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:26, 25 March 2016 (UTC)
                • Why should an external user query a Wikidata internal called label and not rely on a query of taxon name (P225)? --Succu (talk) 22:04, 25 March 2016 (UTC)
                  • For any of a number of reasons; not least that they may be querying things which are not all taxons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:32, 26 March 2016 (UTC)
                    • Grand answer. Maybe they are searching the labels for aliens, gods, fairy tales or something else? A better solution would be if the Wikibase could be configured to take certain properties like as taxon name (P225) or title (P1476) as a default value as a language independent label. --Succu (talk) 21:09, 27 March 2016 (UTC)
                      • Maybe it could. But it is not. That was suggested a year or two ago, in the discussions you cited above, and I see no move to make it so, no any significant support for doing so. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:19, 27 March 2016 (UTC)
                        • So what? Did you reached an agreement with svwiwki, cebwiki, warwiki, viwiki or nlwiki we should go along your proposed way? --Succu (talk) 21:43, 27 March 2016 (UTC)
    • @ValterVB: Thank you. I think your rules are correct. I converted the Ps &Qs in your comment to templates, for clarity. Hope that's OK. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 19 March 2016 (UTC)
  • Symbol oppose vote.svg Oppose That majority of taxons does not have a common name, does not mean that all western languages should automatically use the scientific name as label. Matěj Suchánek (talk) 13:23, 16 April 2016 (UTC)
    • Nobody is saying "all western languages should automatically use the scientific name as label"; if the items already have label, it won't be changed. If a scientific label is added as a label, where none existed previously, and then that label is changed to some other valid string, the latter will not be overwritten. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:31, 20 April 2016 (UTC)

We seem to have reached as stalemate, with the most recent objections being straw men, or based on historic and inconclusive discussions. How may we move forward? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:28, 16 May 2016 (UTC)

That's simple: drop your request. --Succu (talk) 18:33, 16 May 2016 (UTC)
Were there a cogent reason to, I would. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:57, 17 May 2016 (UTC)

ceb.wikipedia language link import[edit]

So apparently a bot has created ~half a million pages on https://ceb.wikipedia.org since November. At least some of them have language links. Could someone please run the "usual bot" to add them to the Wikidata item? --Magnus Manske (talk) 18:19, 15 March 2016 (UTC)

@Ladsgroup: Maybe something for your bot? --Pasleim (talk) 17:01, 25 March 2016 (UTC)
Definitely, I'm running my bot now. It's cleaning this wiki ATMAmir (talk) 17:57, 26 March 2016 (UTC)

Integrate languages[edit]

There are probably tens of thousands of minority language wikipedia articles that are not integrated into the wider interwiki linkage of more prominent languages. This is because several non-tech savvy editors find it too technical to figure out how to integrate the languages. I have even had such difficulties myself a while ago. This is a problem especially prominent among IP page creators. Therefore I propose some measures be taken to help novice editors to integrate the languages. My suggestion is to create a bot that automatically converts the old form (i.e. "en.articletitle") into the newer format. Example problem page.  – The preceding unsigned comment was added by 92.6.184.213 (talk • contribs) at 16:21, 16 March 2016‎ (UTC).

@Ladsgroup: Maybe something for your bot? --Pasleim (talk) 17:01, 25 March 2016 (UTC)
Yes, but can you give me list of language code for those wikis? Amir (talk) 18:02, 26 March 2016 (UTC)

Talk pages of constraint reports[edit]

It would be useful to have the talk page of each constraint report redirected to the talk page of the respective property.

For example, I have just redirected Wikidata talk:Database reports/Constraint violations/P2611 to Property talk:P2611. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:17, 25 March 2016 (UTC)

  • No, such redirects are just confusing. There is already a link to the talk page on each report.
    --- Jura 08:14, 8 April 2016 (UTC)

@ValterVB: I this something your bot could do? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:30, 15 April 2016 (UTC)

@Pigsonthewing: You mean the my bot must create the discussion page with redirect in all these pages? Probably is possible, I must check, but before is necessary to fix these talk @Pigsonthewing: --ValterVB (talk) 12:12, 15 April 2016 (UTC)
@Jura1, Pigsonthewing: Yes I can do it. I can start? --ValterVB (talk) 12:33, 15 April 2016 (UTC)
Is it necessary? There are also another ways how to prevent creating new discussions there. Matěj Suchánek (talk) 13:25, 15 April 2016 (UTC)
Yes, it is necessary. It is not just about preventing discussion on under-watched pages, but also directing people to the best place for those discussions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:27, 15 April 2016 (UTC)
[ec] Thank you. I have cleared all the existing pages (only ten, which shows how little call there is for them). So far as I am concerned, this should proceed ASAP, but it may be best to get third parties opinions. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:26, 15 April 2016 (UTC)
  • I'd rather not, ValterVB. Clearly it's not much of an issue as Pigsonthewin only found 10.
    --- Jura 13:33, 15 April 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Ten pages, on most of which old comments or questions went unanswered. We can do better. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:38, 15 April 2016 (UTC)

  • I really doubt you can. Or maybe you could illustrate your point with some diffs showing the contrary.
    --- Jura 13:44, 15 April 2016 (UTC)
  • Very bad idea. It's our main working list for homonys! --Succu (talk) 14:05, 15 April 2016 (UTC)
    • Moved to Property talk:P225/homonyms. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:14, 15 April 2016 (UTC)
      • Would be helpful if you would talk to the people before you take actions. --Succu (talk) 16:57, 15 April 2016 (UTC)
        • There is no requirement to obtain prior permission. I did open a talk page discussion, explaining and linking to what I'd done. Brya deleted my comment. This has nothing to do with the above Bot request. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:33, 15 April 2016 (UTC)
          • Wow, civilized manners are not necessary? Reverting is fun? You did not anounced this step! So confusion is the result. --Succu (talk) 17:40, 15 April 2016 (UTC)
            • Please stop breaking the list markup that indents our comments. Since you attempt to put words into my mouth, I won't engage further, other than to reiterate: I did open a talk page discussion, explaining and linking to what I'd done. Brya deleted my comment. This has nothing to do with the above Bot request. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:50, 15 April 2016 (UTC)
If there is one thing which I really abhor about Wikipedia projects, it is users who have nothing useful to do, and then go around enforcing their views of what they find to be pretty, completely disregarding how much this disrupts the project. - Brya (talk) 17:52, 15 April 2016 (UTC)
You acted first and moved the subpages. Afterwards you added a „hint” to justify your action. No you didn't talk to the people using this page. --Succu (talk) 18:06, 15 April 2016 (UTC)
  •  Not done clearly no consensus to do this. This page is not the location to big discussions about how to organize things. First get consensus somewhere else that this is actually a good idea to do. Once you have that you can come over here to have a bot operator do the task. Multichill (talk) 07:43, 16 April 2016 (UTC)
    • Poppycock. There is only one editor opposing this request, and they have given no substantive argument against. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:00, 16 April 2016 (UTC)
      • Then we loose the natural place to talk about the reports, and the gain is limited. -- Innocent bystander (talk) 19:37, 17 April 2016 (UTC)
        • The natural place to discuss constraint reports is on the pages where the constraints are set; which is where virtually all such discussion already takes place; the pages which those interested in the property are most likely to be watching: the property talk pages. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:28, 17 April 2016 (UTC)
The "work" with homonyms is quite questionable: "violations of Wikipedia policy (at least 50% fictitious taxa)"... --Averater (talk) 05:32, 18 April 2016 (UTC)
The natural place to discuss constraint reports is on the Talk pages of the constraint reports, while the natural place to discuss properties is on the Talk pages of the properties. These are quite different topics. - Brya (talk) 05:50, 18 April 2016 (UTC)
Allow me to remind you where the constraints, which determine what is in the constraint reports, are set: on the property talk pages. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:22, 19 April 2016 (UTC)
Which probably is a bad place for these constraints in the first place. Here, I expect to see discussions, but most of these edits are changes to templates. Even Nikkis +8000-edit is a change of a template, not the reply in a discussion I expected it to be. -- Innocent bystander (talk) 10:49, 19 April 2016 (UTC)
Indeed - a better solution would be to have pages like Property:P:496/Constraints (or Property:P:496/Documentation), with Property talk:P:496/Constraints redirected to Property talk:P:496. But we are where we are. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:24, 19 April 2016 (UTC)
And then, I suppose I have to point out that the constraint reports are at the constraint report pages ... - Brya (talk) 11:19, 19 April 2016 (UTC)
Brya, please stop feeding it. This is a more of a todo list and, as Multichill noted, not a place for big debates.
--- Jura 11:58, 19 April 2016 (UTC)
@Jura: That was an unnecessary insult. Please stop that. Innocent and Andy are giving constructive comments of how to improve the situation which should be taken seriously. --Averater (talk) 06:28, 22 April 2016 (UTC)
There is no such thing as a necessary insult. Besides, there is nothing insulting about saying that a debate that isn't at its place shouldn't be fed any further.
--- Jura 07:27, 22 April 2016 (UTC)
I was reacting since you and Andy have been kind of harsh towards each other and thought you were refereeing to him with your comment. But my apologies since you meant the discussion. --Averater (talk) 05:32, 23 April 2016 (UTC)
By now, it is clear enough that Averater is looking for opportunities to disrupt the project. Just ignore him. - Brya (talk) 10:42, 22 April 2016 (UTC)

Add labels from sitelinks[edit]

There used to be a bot that added labels based on sitelinks (enwiki sitelink => en label). I think it stopped running at some point. Maybe some alternative should be found.
--- Jura 08:32, 8 April 2016 (UTC)

I have seen, that Pasleim's bot is doing some job in this area, at least for German and French. --Edgars2007 (talk) 16:20, 9 April 2016 (UTC)
I do it for all the languages, but only for item that have one of these values in instance of (P31):

There is the problem with uppercase/lowercase --ValterVB (talk) 16:30, 9 April 2016 (UTC)

Another rule that I use: add label if the first letter of sitelink is one of this list:
  • (
  • !
  • ?
  • "
  • $
  • '
  • ,
  • .
  • /
  • 0
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9

If you have other suggestion I can add it --ValterVB (talk) 16:41, 9 April 2016 (UTC)

  • Pictogram voting comment.svg Comment Just to make sure this is clear: this is mainly for items that exist and where someone added manually a sitelink to, e.g., enwiki, but the items doesn't have a label in the corresponding language yet. It does not concern items that don't have an English label, but no sitelink to English. I don't think search finds such items if they have no label defined at all. It's key that at least a basic label is defined for such items.
    If you are looking for rules to implement, then try the ones used by PetScan (Q23665536). It mainly removes disambiguators in round brackets. I think this works fine for Wikipedia. A large amount of pages are created that way. It might not work well for Wikisource.
    --- Jura 10:50, 10 April 2016 (UTC)
Jura, these rules are applied only on item that have a sitelink but don't have a label in language of the sitelinki. I check for all the sitelink that end with "wiki", excep "commonswiki", "wikidatawiki", "specieswiki", "metawiki" and "mediawikiwiki" and I delete disambiguation with parenthesis. --ValterVB (talk) 12:13, 10 April 2016 (UTC)

Import GNIS ID (P590) from en:Template:Infobox settlement[edit]

We have started to use that property on frwiki, through Template:Infobox settlement (Q23779748), and as you can see in fr:Catégorie:Page utilisant P590. Thank you to any bot who would do this! Thierry Caro (talk) 04:33, 11 April 2016 (UTC)

Do you have some examples where it hasn't been imported? I already added thousands of those a couple of months ago. - Nikki (talk) 18:45, 12 April 2016 (UTC)
I was about to mention some place like Cheraw (Q1070214), but apparently you've found this and have already added the data. Thank you! Thierry Caro (talk) 17:16, 13 April 2016 (UTC)
The same thing with FIPS 55-3 (locations in the US) (P774) would be awesome, by the way. Thierry Caro (talk) 17:17, 13 April 2016 (UTC)

Oh! I found an example of missing GNIS ID (P590). Chincoteague (Q1073686) for instance. Thierry Caro (talk) 17:19, 13 April 2016 (UTC)

@Thierry Caro: Thanks! :) It turns out I hadn't actually finished importing the ones I'd started to import before and I'd missed some. Most of them should be there now, we now have twice as many as before. :D There's still a few hundred left that I'm going to look at once the GNIS site is back up (they're quite inconsistently represented in Wikipedia so there could be more I missed, so feel free to continue linking me to any that are missing if you want to).
Regarding FIPS codes, how urgent is it? I'm currently trying to write a bot to add information from GNIS and the US census data (and hopefully also look for information that's wrong or add missing references if I get that far). It looks like the data includes FIPS codes, so I should be able to add them from there once I'm far enough with the bot that I can add data. That would be easier than trying to extract data from templates (and I could add references too).
- Nikki (talk) 13:07, 16 April 2016 (UTC)
OK. Perfect! i'll wait for the bot, don't worry! Thierry Caro (talk) 13:50, 16 April 2016 (UTC)

Add P1082 (population) and P585 (point in time) from PLwiki to Wikidata[edit]

Looks like PLwiki has lots of population information other Wiki does not have. It will be useful to have it for all of us. בורה בורה (talk) 18:23, 12 April 2016 (UTC)

It might be helpful to give some supporting links here, to be sure to get the right information from the right place into the right fields. Can you list one pl-article and one corresponding wikidata-item that is manually filled with the desired information? Than I can see if I can get the information filled by a script in the same way. Edoderoo (talk) 18:26, 16 April 2016 (UTC)
Edoderoo sorry for the late reply. I was on vacation. Take for example the article "Żołynia" in PLwiki. It has a population of 5188 as of 2013. However this information does not exist on Wikidata item (Q2363612). There are thousands of examples like this, but you got the idea... PLwiki is really great on population. Share it with us all. בורה בורה (talk) 10:19, 4 May 2016 (UTC)

Moving coordinates to headquarters (again)[edit]

In a case of company items instance of (P31)=business enterprise (Q4830453) the coordinates coordinate location (P625) are related to headquarters headquarters location (P159), so if listed separately (WDQ (CLAIM[31:783794] OR CLAIM[31:4830453]) AND CLAIM[625] AND CLAIM[159]), they need to be moved as qualifier of headquarters location (P159). Any bot can help me with this again (app 200+ items)? --Jklamo (talk) 00:31, 8 February 2016 (UTC)

Take care of disambiguation items[edit]

Points to cover

Somehow it should be possible to create a bot that handles disambiguation items entirely. Not sure what are all the functions needed, but I started a list on the right side. Please add more. Eventually a Wikibase function might even do that.
--- Jura 13:36, 18 April 2016 (UTC)

Empty disambiguation: Probably @Pasleim: can create User:Pasleim/Items for deletion/Disambiguation . Rules: Item without sitelink, with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410). For the other point my bot alredy do something, (for my bot a disambiguation is an item with P31 that have only 1 value: Wikimedia disambiguation page (Q4167410)). Descriptions I use description used in autoEdit Label: I add the same label for all the latin language only if all the sitelink without disambiguation are the same. With these 2 operation I detect a lot of duplicate: same label+description. For now the list is very long (maybe >10K item) but isn't possible to merge automatically too much errors. Another thing to do is normalize the descriptions, there are a lot of item with not standard description. --ValterVB (talk) 18:02, 18 April 2016 (UTC)
  • Personally, I'm not that much worried about duplicate disambiguation items. Mixes between content and disambiguations are much more problematic. It seems they keep appearing through problems with page moves. BTW, I added static numbers to the points.
    --- Jura 10:06, 19 April 2016 (UTC)
    You will always have duplicate disambiguation items, since svwiki has duplicate disambiguation-pages. Some of these duplicates exists because they cover different topics and some of them exists since the pages otherwise becomes to long. A third category are the bot-generated duplicates. They should be treated as temporary, until a carbon based user has merged them.
    And how are un-normalized descriptions a problem? -- Innocent bystander (talk) 10:58, 19 April 2016 (UTC)
About "un-normalized descriptions": ex I have a disambiguation item with label "XXXX" and description "Wikipedia disambiguation", if I create a new item with label "XXXX" and description "Wikimedia disambiguation" I don't see that already exist an disambiguation item "XXXX", if the description is "normalized" I see immediately the the disambiguation already exist so I can merge it. --ValterVB (talk) 11:10, 19 April 2016 (UTC)
For some fields, this proved quite efficient. If there are several items that can't be merged, as some point, there will be something like "Wikimedia disambiguation page (2)", etc.
--- Jura 12:10, 19 April 2016 (UTC)

Lazy start for point (4): 46 links to add instance of (P31)=Wikimedia disambiguation page (Q4167410) to items without statements in categories of sitelinks on Category:Disambiguation pages (Q1982926): en, es, fr, it, nl, ja, pl, de, sv, ru, zh, az, ba, be, be_x_old, bs, bg, ca, cs, da, el, eo, et, eu, fi, hr, hu, hy, id, ka, kk, la, lt, lv, mk, nn, no, ro, sh, simple, sk, sl, sq, sr, tr, uk,
--- Jura 12:07, 23 April 2016 (UTC)

The biggest problem is to define what pages are disambiguation pages, given names and surnames. For example Backman (Q183341) and Backman (Q23773321). I don't see what is the difference between enwiki and fiwiki links. Enwiki page is in category "surnames" and fiwiki page in categories "disambiguation pages" and "list of people by surname", but the page in fiwiki only contains surnames, so basically it could be in the same item as the enwiki link. --Stryn (talk) 13:10, 23 April 2016 (UTC)

I think people at Wikidata could be tempted to make editorial decisions for Wikipedia, but I don't think it's up to Wikidata to determine what Wikipedia has to consider a disambiguation page. If a language version considers a page to be a disambiguation page, then it should go on a disambiguation item. If it's an article about a city that also lists similarly named cities, it should be on an item about that city. Even if some users at Wikidata attempted to set "capital" to a disambiguation page as Wikipedia did the same, such a solution can't be sustained. The situation for given names and family names isn't much different. In the meantime, at least it's clear which items at Wikidata have what purpose.
--- Jura 14:20, 23 April 2016 (UTC)
You then have to love Category:Surname-disambigs (Q19121541)! -- Innocent bystander (talk) 14:35, 23 April 2016 (UTC)
IMHO: In Wikipedia disambiguation page are page that listing page or possible page that have the same spelling, no assumption should be made about the meaning. If we limit the content to partial sets whith some specific criterion we haven't a disambiguation page but a list (ex. list of person with the same surname List of people with surname Williams (Q6633281). These pages must use tag __DISAMBIG__ to permit bot and human to recognize without doubts a disambiguation from a different item. In Wikidata disambiguation item are item the connect disambiguations page with the same spelling. --ValterVB (talk) 20:02, 23 April 2016 (UTC)

Disambiguation item without sitelink --ValterVB (talk) 21:30, 23 April 2016 (UTC)

I'd delete all of them.
--- Jura 06:13, 24 April 2016 (UTC)

Some queries for point (7):

A better way needs to be found for (7a).
--- Jura 08:07, 25 April 2016 (UTC)

I brought up the question of the empty items at Wikidata:Project_chat#Wikidata.2C_a_stable_source_for_disambiguation_items.3F.
--- Jura 09:39, 27 April 2016 (UTC)

As this is related: Wikidata:Project chat/Archive/2016/04#Deleting descriptions. Note, that other languages could be checked. --Edgars2007 (talk) 10:30, 27 April 2016 (UTC)

I don't mind debating if we should keep or redirect empty disambiguation items (if admins want to check them first ..), but I think we should avoid recycling them for anything else. --- Jura 10:34, 27 April 2016 (UTC)
As it can't be avoided entirely, I added a point 10.
--- Jura 08:32, 30 April 2016 (UTC)

duplicate disambiguations from svwiki (statements and descriptions)[edit]

All pages with sitelinks to svwiki with the suffix "(olika betydelser 2)" should have the statements P31:Q4167410 (they probably already have that) and P31:Q17362920.

The Swedish (sv) description should be "grensidedubblett" (duplicate disambiguation page).

All of these pages are created by Lsjbot and should be regarded as pages that should be merged with man made disambigs. Also some pages with (olika betydelser) are such duplicates but they are more difficult to identify. -- Innocent bystander (talk) 11:05, 22 April 2016 (UTC)

For items connected to pages in sv:Category:Robotskapade förgreningssidor, you could try PetScan to add instance of (P31)=Wikimedia disambiguation page (Q4167410) (login with Widar first). Currently that gives just four pages.
--- Jura 11:20, 22 April 2016 (UTC)
NB Not all "Robotskapade förgreningssidor" are duplicates. -- Innocent bystander (talk) 11:32, 22 April 2016 (UTC)
The main reason I ask for this, is that adding the same description to every disambigs gives conflicts with the duplicate disambigs. I guess that is why items like Q21046230 only have labels and descriptions in a few languages, while most others has such in almost every latin language. -- Innocent bystander (talk) 07:00, 23 April 2016 (UTC)
The main reason of having standard titles is to find duplicates. I think that is more correct create a redirect from sv:Vara (olika betydelser 2) to sv:Vara. --ValterVB (talk) 08:10, 23 April 2016 (UTC)
It is, but we have thousands of such pages, it will take time before it is done. I am not suggesting this for permanent duplicates, they are few. -- Innocent bystander (talk) 10:41, 23 April 2016 (UTC)


Import P569/P570 dates from slwiki (text)[edit]

Wiki slwiki
Items without P569 count (all)
As of Oct 12 11116 (31 %)
overview


slwiki has dates in the formats "* YYYY" and "† YYYY." or more precise. These could be imported by bot. --- Jura 07:50, 11 October 2015 (UTC)

There seem to be a lot of dates with year-precision only.
--- Jura 06:15, 24 April 2016 (UTC)

Exploitation visa number[edit]

Hello,

Can we add the exploitation visa number (P2755) (number of the exploitation visa of a movie in France) to all movie avaible in the website of the CNC? Maybe the bot can compare the label in french with the title of the movie or the year, the duration, the country, etc.

Then, can it add the CNC film rating (P2758) with :

It's written in the legal notice:

Sauf mention particulière, toute reproduction partielle ou totale des informations diffusées sur le site internet du CNC est autorisée sous réserve d’indication de la source.
Unless otherwise stipulated, any total or partial reproduction of the information published on the CNC website is authorized subject to indication of the source.

--Tubezlob (🙋) 16:58, 26 April 2016 (UTC)

Integrate data about the relationships from the Foundational Model of Anatomy into Wikidata[edit]

Most Anatomical concepts on Wikidata already have information about the Foundational Model of Anatomy ID. On the other hand the lack the information about hypernyms, holonyms and meronym that are found in the Foundational Model of Anatomy ID ontology.

On their website they describe the availability of the database:

The Foundational Model of Anatomy ontology is available under a

Creative Commons Attribution 3.0 Unported License (link is external). It can be accessed through several mechanisms:

1. The latest OWL2 files are available at http://sig.biostr.washington.edu/share/downloads/fma/release/latest/fma.zip. These can be viewed in the latest version of Protege.

Furthermore I think that valuable infoboxes could be created based on the data from the FMA-ontology within Wikipedia.

ChristianKl (talk) 20:47, 26 April 2016 (UTC)

  • Our database is CC0, not CC3.0 Unported. This would be a copyright problem. --Izno (talk) 14:03, 27 April 2016 (UTC)
    • CC3.0 Unported doesn't require derivitive works to use CC3.0 Unported. It's not a share-alike license. What's required is attribution. If every entry in Wikidata would cite the FMA as the source, Wikidata would fulfill the attribution requirement and thus the terms of the license.ChristianKl (talk) 09:47, 18 May 2016 (UTC)
      • You have to consider our reusers as well--attribution is not required under CC0, and we cannot guarantee our reusers would observe the original license. --Izno (talk) 12:13, 18 May 2016 (UTC)
        • CC3.0 Unported does not require users to gurantee that reusers of their content provide attribution. ChristianKl (talk) 09:06, 19 May 2016 (UTC)

Import of data from UNESCO Intitute for Statistics[edit]

Hi all

I'd like to request the import of some data from the UNESCO Intitute for Statistics, specifically data on out of school children. I have been working Jens Ohlig to prepare a spreadsheet to import the data on a Google Doc which is available here and have created the property Number of Out of School Children.

If you have any questions or I can help with the upload please let me know

Thanks

John Cummings (talk) 12:54, 27 April 2016 (UTC)

Does Q222#P2573 look good? --Pasleim (talk) 13:24, 27 April 2016 (UTC)
Hi @Pasleim:, I'm sorry, I need to sort out the properties and qualifiers in the sheet, I missed a step before sending this by mistake, I'll separate them all out and then ping you again. Thanks very much for your help :) John Cummings (talk) 14:19, 28 April 2016 (UTC)

Mother/father/brother etc.[edit]

Can some bot (regularly) update these statements, putting information in all relevant items? --Edgars2007 (talk) 14:57, 12 May 2016 (UTC)

There is a list at User:Yamaha5/List_of_missingpairs_querys.
--- Jura 07:59, 20 May 2016 (UTC)
The Bot LandesfilmsammlungBot is currently under test for this request --Landesfilmsammlung (talk) 13:01, 2 June 2016 (UTC)
User:Landesfilmsammlung: Please check the P31 values as suggested on Yamah5's list. This to avoid edits like on Q629347.
--- Jura 05:26, 3 June 2016 (UTC)
Oh thanks... I will fix it... can someone correct the Node Samuel (Q629347). Cause the children-Property seems very unusual. --Landesfilmsammlung (talk) 11:55, 3 June 2016 (UTC)
Landesfilmsammlung: the general idea is that that someone should be you. You might want to keep an idea on the constraint violation reports for properties you added a day or two earlier.
--- Jura 09:42, 4 June 2016 (UTC)

Import Identifiers from Commons Authority Control templates[edit]

Commons finally joined Wikidata community with enabling of arbitrary access on Commons and I just rewrote c:Module:Authority control to use identifiers from Wikidata. When comparing Wikidata and Commons identifiers there are occasional mismatches and wikidata items missing identifiers specified on Commons. Help is needed to write bots that can copy identifiers from Commons to Wikidata. See c:Category:Authority control maintenance subcategories. --Jarekt (talk) 16:19, 12 May 2016 (UTC)

Somethin for User:T.seppelt? --Pasleim (talk) 17:03, 12 May 2016 (UTC)
Definitely, I have a script ready. Unfortunately I'm off for the weekend. Next week I'm going to fill the request for approval on commons. – T.seppelt (talk) 18:32, 12 May 2016 (UTC)
T.seppelt, If you need any help with request or navigating Commons, please contact me directly on my talk page on Commons . --Jarekt (talk) 13:17, 18 May 2016 (UTC)
We also have hundreds of pages with identifiers on Commons that need to be copied to Wikidata. See for example c:Category:Wikidata with missing NLA identifier. All Creator and Institution templates there have NLA identifier while corresponding wikidata pages do not have "NLA (Australia) ID (P409)". Those "just" need to be imported to Wikidata without a need to touch any pages on Commons (or getting approval there). We also have categories like c:Category:Pages with mismatching GND identifier where GND ID (P227) does not match GND value stored on Commons: either one of them wrong or both correct. --Jarekt (talk) 13:14, 18 May 2016 (UTC)
My bot will take care of this. It's part of the process which I already used on several Wikimedia projects. Thank you Jarekt, -- T.seppelt (talk) 13:27, 18 May 2016 (UTC)
@Jarekt: It's not that simple - as the data is imported into Wikidata, it should be removed from Commons, so that it is stored once, not twice. Commons' templates will then fetch the data values from Wikidata as required. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:11, 18 May 2016 (UTC)
Andy, Once the date is moved to Wikidata than the pages will automatically move to c:Category:Pages using authority control with parameters matching Wikidata. Then, I routinely remove identifiers from {{Authority Control}} templates found there, so they only use wikidata. It seems like T.seppelt has a process working which he used on "on several Wikimedia projects" and I might be reinventing a wheel here, but it seems to me that in the majority of the cases all we need to do is to add wikidata q-code, verify that the identifiers match wikidata and remove identifiers from Commons. Only small minority of files need more complicated processing. --Jarekt (talk) 16:21, 18 May 2016 (UTC)
I was probably a bit unclear about what my bot is actually doing: It loads all pages in a tracking category for pages with local (redundant or not) parameters. Then these values are compared to the identifiers stored in Wikidata. If values are missing, the bot adds them to Wikidata. At the end it removes all values from Commons which can be found on Wikidata. Since on Commons everything is a bit more complicated (Q paremter etc), I'm very glad that you created a pretty smart module and a more advanced tracking system. The bot will manage it. I'd recommend to let it start and see then what we can do woth the left-overs. Warm regards, –T.seppelt (talk) 05:25, 19 May 2016 (UTC)

labels from name properties[edit]

For people for which we know family name (P734) [SQID] and given name (P735) [SQID], it could be possible to autogenerate a label in several relevant languages following certain rules . Could someone set up a robot to do this ? author  TomT0m / talk page 10:13, 15 May 2016 (UTC) Ash Crow
Dereckson
Harmonia Amanda
Hsarrazin
Jura
Чаховіч Уладзіслаў
Sascha
Joxemai
Place Clichy
Branthecan

Pictogram voting comment.svg Notified participants of Wikiproject Names

date of birth (P569) with century precision (7)[edit]

For date of birth (P569) with century precision (7), values could be changed

Change Sample (displays as "20. century") WQS
From +(century)00-00-00T00:00:00Z/7 +2000-00-00T00:00:00Z/7 2000-01-01
To +(century-1)01-00-00T00:00:00Z/7 +1901-00-00T00:00:00Z/7 1901-01-01

For dates of birth, it seems that for century precision "dates", it would be better to specify the first year in the century rather than the last one.

When queried at WQS these appear as January 1.
--- Jura 07:38, 16 May 2016 (UTC)

Symbol oppose vote.svg Oppose With the current implementation of the time datatype lower order elements can be omitted for reduced precision without the need of doing any further calculations. --Pasleim (talk) 09:14, 16 May 2016 (UTC)
That actual leads you to mix-up 20th century people (born maybe 1910) with people from the 21st century (born 2005).
--- Jura 09:59, 16 May 2016 (UTC)
I don't unterstand your example. A person born in 1910 has the value +1910-00-00T00:00:00Z/9, born in 2005 the value +2005-00-00T00:00:00Z/9, born in 20th century the value +2000-00-00T00:00:00Z/7 and born in the 21th century the value +2100-00-00T00:00:00Z/7. If precision 9 is given, you have to omit everything except the first 4 digits, with precision 7 you have to omit everything except the first 2 digits. --Pasleim (talk) 10:34, 16 May 2016 (UTC)
The sample I have in mind would be a person born (maybe) in 1910 using +2000-00-00T00:00:00Z/7 compared to a person born in 2005 using +2005-00-00T00:00:00Z/9 . If you use just wdt:P569 rounded to the century digits you would get "20" for both.
--- Jura 15:30, 16 May 2016 (UTC)

Undo edits by Edoderoobot[edit]

User:Edoderoobot has added thousands of incorrect instance of (P31) statements (see User talk:Edoderoobot#Hundreds_.28if_not_thousands.29_of_incorrect_P31_statements). It's now been nearly 10 weeks since User:Edoderoo was first told about it and despite numerous messages, I have seen no progress at all. Although Wikidata:Bots makes it quite clear that it's Edoderoo's responsibility to clean up the mess, the incorrect statements have been there far too long already and I don't want them to continue being there indefinitely so I'm asking here to see if someone else is willing to help.

The problematic statements are instance of (P31) statements with imported from (P143) Swedish Wikipedia (Q169514) as a reference, so I'd like a bot to go through all the edits by Edoderoobot and remove any statement it added which matches that as being questionable. If the bot is able to determine whether the statement is actually correct and only remove the incorrect ones, that would be a bonus.

- Nikki (talk) 17:32, 21 May 2016 (UTC)

@Alphos: Could you help here? Matěj Suchánek (talk) 12:30, 22 May 2016 (UTC)
Most definitely can, but I just noticed this ping, and it's getting awfully late for me - and unlike some, I'd rather be available to shut RollBot down immediately, should anything undesirable happen Face-wink.svg
The last "offending" edit seems to be on May 5th (and Edoderoobot seems to be inactive since then), but could you point me to the first one, or at least give me the date ? If not don't worry, I'll find it tomorrow.
Another note is that some of these edits appear to be legit, and RollBot cannot discriminate : should it revert all of them nonetheless ?
Alphos (talk) 22:37, 22 May 2016 (UTC)
@Nikki: It's been a few days, and I haven't started RollBot on the task yet - good thing, as it turns out I was thinking of another bad set of edits Edoderoobot made, for which I'll probably contact Tacsipacsi, EncycloPetey, Tobias1984 and Multichill to offer RollBot's services, when the task at hand is done.
I really need more details (first and last edit in that set of "bad" edits, mainly), and possibly a decision as to "nuking" (reverting pages to their former state regardless of edits made by other users since, thus also reverting subsequent edits by other people) ; RollBot usually doesn't nuke, leaving the pages the pages with edits by other users and listing them instead, but it can nuke.
It may seem counterintuitive, but my bot doesn't technically revert edits, it brings pages/entities to their prior state, and there is a difference.
Alphos (talk) 13:46, 26 May 2016 (UTC)
I'm really not sure when it started or ended. :( The biggest problem for me is finding the edits. Edoderoboobot was doing multiple bot things simultaneously, so the edits adding incorrect P31 statements are mixed in with thousands of unrelated edits to descriptions and there are far more edits than I can possibly go through by hand, so I can't actually find all the bad edits. The examples I have to hand are from the 2nd and 11th of March. I'm not aware of any bad edits since I first reported it on the 15th of March (but I could just have missed them amongst all the other edits) and I think I've also seen bad edits from February too.
Since there were multiple things happening at the same time, reverting everything between two dates would also revert lots of unrelated edits. I'm not sure how many of those also had issues. It would work a lot better if it could filter based on the edit summary, the descriptions have a different summary to the P31 ones.
I'm not sure about nuking. Of the handful of examples I have to hand, most of them have been edited since. Some are good, some are bad (based on the bad bot edits), others just undid the bad edits. If there were a list of items (and it's not too many), I could maybe check them by hand, but like I said, I can't even find the bad edits. :/ - Nikki (talk) 14:52, 26 May 2016 (UTC)
Bots that do several tasks at once are a nightmare, and it's even worse when the tasks aren't thoroughly tested first >_<
But now that I have the dates (further in the past than I initially thought), I can try and see if there's a way to do it (whether for instance there are time slots where Edoderoobot worked on one task rather than another, and work in segments on the slots for that P31 task), or if, on your suggestion, I can (or should) alter RollBot to add a summary matching condition, whether by blacklisting or whitelisting or both - this alteration will however take me some more time to get into, my health being what it is.
I'll keep you posted either way.
I'll also ask the contributors to the second complaint I saw on that bot's talk page if they want me to do anything about it.
Alphos (talk) 15:30, 26 May 2016 (UTC)
@Nikki: It seems that, for Edoderoobot's March spree, Stryn has removed already a significant chunk using Autolist (example) - more or less all the ones I looked at before starting RollBot.
It doesn't excuse the fact Edoderoo didn't do it in over a month, with Stryn having done it in late April instead. It does however mean that RollBot is probably unneeded here.
On another note, thanks for the summary matching suggestion, I'll definitely think of implementing it Face-wink.svg
Alphos (talk) 16:31, 26 May 2016 (UTC)
PS: I'll contact Tacsipacsi, EncycloPetey, Tobias1984 and Multichill, to see if they need help with their issue with Edoderoobot.
I don't need help, thanks. I work on a very limited set of items here regularly. My issue was the addition of English phrases to data items as Dutch labels, and the bot continued to make the same mistake after the user was alerted and responded. For the literary entries where I saw this happen, I've cleaned up the problems already. For the taxonomic entries, I haven't looked them over because the issue will be far more complicated: many plants, animals, and other organisms will have no Dutch name and will be known only by their Latin binomial. --EncycloPetey (talk) 17:47, 26 May 2016 (UTC)
Thanks for your message Face-smile.svg
I noticed it too after reading the section of Edoderoo's talk page you replied in. The same occurred for categories, where the english labels were pretty much copy-pasted into the dutch labels, which Multichill made a note of.
Thanks also for your work rolling back those changes. Next time, you may consider to ask RollBot to do it for you Face-wink.svg It's a bit crude ("Hulk SMASH !", if you will), but it does the deed !
I'll wait for the other users to chime in on what they noticed.
Alphos (talk) 18:31, 26 May 2016 (UTC)

Bloomberg Privat Company Search[edit]

Crawl all ~300.000 companies and add them to wikidata.  – The preceding unsigned comment was added by 192.35.17.12 (talk • contribs) at 16:09, 23 May 2016 (UTC).

(related) I've research and spoken to Bloomberg employees previously on importing their symbols (BBGID). I've tried quickly proposing clear cut properties with some taking nearly a year to be approved (What you'd need). Disappointingly we've imported notability from Wikipedia with people worrying about too many items. There's also significant structural problems with Wikidata because its a crappy mirror of Wikipedia (and the smaller ones at that). Movie soundtracks can't be linked to the article's Soundtrack section (many items => 1 article). Multi-platform video games are currently a mess (1 article => many items).

To start you'll need to propose a new property Dispenser (talk) 20:09, 23 May 2016 (UTC)

MCN number import[edit]

There are 10,031 identifiers for MCN code (P1987) that can be extracted from [5] or this English version. Many (but not all) items cited are animal taxons, which can be easily machine-read. For the rest, it would be useful if the bot generated a list presenting possible meanings (by comparing the English and Portuguese versions of the xls file with Wikidata language entries). Pikolas (talk) 12:38, 14 August 2015 (UTC)

What's the copyright status of those documents? Sjoerd de Bruin (talk) 13:04, 14 August 2015 (UTC)
It's unclear. I've opened a FOIA request to know under what license those are published. For reference, the protocol number is 52750.000363/2015-51 and can be accessed at http://www.acessoainformacao.gov.br/sistema/Principal.aspx. Pikolas (talk) 13:40, 14 August 2015 (UTC)
I heard back from them. They have assured me it's under the public domain. How can I prove this to Wikidata? Pikolas (talk) 01:48, 2 October 2015 (UTC)
@Sjoerddebruin: Reopening this thread since I forgot to ping you. NMaia (talk) 15:45, 1 June 2016 (UTC)
Updated links: Portuguese version, English version. NMaia (talk) 19:35, 2 June 2016 (UTC)

Migration of P761 to P2856[edit]

See WD:PFD. -- Innocent bystander (talk) 16:02, 2 June 2016 (UTC)

The deletion proposal has been retracted, since it seems that they can not be merged. --Srittau (talk) 22:40, 6 June 2016 (UTC)

Connecting pt.wikiquote articles to Wikidata entries[edit]

It's fairly common that in pt.wikiquote, they use the Template:Autor, which has links to sister projects. A bot could be on the lookout for these sister links to make the automatic changes in Wikidata. Currently there are way too many unconnected pages to do it manually. NMaia (talk) 11:42, 5 June 2016 (UTC)

Also be on the lookout for pt:q:Template:Wikipedia. NMaia (talk) 11:46, 5 June 2016 (UTC)
@NMaia: My bot has already imported the author template, having added many sitelinks to Wikidata. Now I'm running it over the Wikipedia template. I wrote a simple Python script which can be easily modified and run over another wiki but I don't have good oversight where it could be helpful... Matěj Suchánek (talk) 15:19, 8 June 2016 (UTC)
@Matěj Suchánek: thanks for the reply. I didn't quite understand it, though. Did you mean to say that your bot is fishing for these templates to link Wikiquote pages to their respective Wikidata entries? NMaia (talk) 12:47, 10 June 2016 (UTC)
Yes, when it finds an unconnected page with a template linking to a page of a sister project, it loads its item and adds there the page. Matěj Suchánek (talk) 13:06, 10 June 2016 (UTC)
Fantastic! Thanks a lot. NMaia (talk) 13:54, 10 June 2016 (UTC)

Mark items being used in lists as "used" on Wikidata:Requests for deletion[edit]

Currently Benebot marks some items as "in use" when there are links on a given item.

As sites such as Wikipedia start using arbitrary access to retrieve information from Wikidata, the above approach doesn't capture what may be key uses for some items.
--- Jura 16:01, 5 June 2016 (UTC)

Pokémon species[edit]

Hello, I'm currently preparing a series of bot requests for elements of the Pokémon species. The first two requests (Pokémon by generation and Pokémon by Pokédex number) are ready to go, can somebot take charge of it ?

For future reference, is it easier to have a list of element number for the or did english names list (easier to prepare for a human) is enough ?

Ju gatsu mikka (talk) 05:07, 10 June 2016 (UTC)

Correcting a set of reference URLs[edit]

For the Flemish art collections on Wikidata project, we have unfortunately added wrong URLs to quite a bit of statements while importing data. :-(

This Google Drive folder contains a set of excel sheets that show which URLs need to be changed for which Wikidata items, and in which specific places.

The excel sheets are hopefully as clear as possible :-) but feel free to contact me if more information or explanation is needed. I would be extremely grateful to any user who could fix these with his/her trustworthy bot! Many thanks in advance, Spinster 💬 07:04, 10 June 2016 (UTC)

Currency conversion rates[edit]

Hello everyone,

with the new quickbar module on enwikivoyage we're now able to transclude currency conversion rates automatically from Wikidata currency items. A good example is Ukraine. My bot updates the price (P2284) of hryvnia (Q81893) on a daily basis using data provided by the National Bank of Ukraine (Q1070388). Does anybody have experiences with automatic conversion rate updating on Wikipedia projects? It would be great to provide a similar system for all currencies. This would be an ideal way to show the value of Wikidata as central data repository.

Thank you, -- T.seppelt (talk) 16:15, 10 June 2016 (UTC)

I hate to come with bad news like this. But I think we (at least today) have to limit how many claims like that should be added to an item. There is probably enough data out there to have price (P2284) for every pair of currencies for almost every day of the year for decades. But that would very soon make it impossible to use the information in our items in Wikipedia/Wikivoyage today. There are memory and time-limits in Scribunto. (And it would take hours to download a single page to the browser.) -- Innocent bystander (talk) 16:27, 10 June 2016 (UTC)
Agree. Please find a way to store them elsewhere than on the item for the currency itself.
--- Jura 16:32, 10 June 2016 (UTC)
@Jura1, Innocent bystander: I would broadly agree that a 'running history' of conversions would be a bad idea due to the size limitation. Why not just a once daily update, changing the claim rather than adding the claim? --Izno (talk) 16:45, 10 June 2016 (UTC)
And in fact, that's how it's updated above at #The cost of hryvnia. --Izno (talk) 16:47, 10 June 2016 (UTC)
That just bloats the edit history of the item. I think it would be fine on a separate item. We are already challenged by (generally annual) population numbers.
--- Jura 16:53, 10 June 2016 (UTC)
Okay. That's true. I didn't take the item's edit history into consideration when I wrote this script. What could a working system look like? Having items like "the cost of hryvnia (Q81893)" which have claims like "value" → x USD, EUR... We would have to agree on a couple of key currencies then. -- T.seppelt (talk) 17:29, 10 June 2016 (UTC)
Maybe a property for a "regularly update rate"-item including a qualifier for the currency. The linked item would give the rate, date, possibly some paramters (and the currency once more). This would make it fairly easy to get the rate from the currency.
--- Jura 17:35, 10 June 2016 (UTC)
Symbol oppose vote.svg Oppose Against any regular update of the currency: we don't need to have update about the price of a service provided in the past, we need update about the present service itself. Example if the cost of a night in a hotel is 100$ in 2015, I don't need to know the converted 2015 price of that night with the conversion rate of 2016. I need the price of that hotel in 2016. Snipre (talk) 11:37, 11 June 2016 (UTC)
What does this have to do with the price of services in 2015? There are plenty of things that cost money in 2016 whose price can be calculated given a conversion rate.--Anders Feder (talk) 00:24, 17 June 2016 (UTC)
  • It seems that the use of price (P2284) for exchange rates confuses users. Maybe we'd need another property for this as well.
    --- Jura 12:04, 11 June 2016 (UTC)


Lowercase adjectives[edit]

It might be worth doing another conversion of lowercase adjectives in descriptions of people, "italian" → "Italian", "british" → "British", etc.
--- Jura 11:51, 22 June 2016 (UTC)

Correct movie titles[edit]

I don't know how things are done here but I would really appreciate if somebody could remove appendices like "(film)" in the different languages from movie titles. It seems very common and it is annoying to remove them manually. It's very frequent in Spanish, there titles are often called "Title (película)" instead of just "Title". Look at my changes to see other languages. --88.77.6.68 15:44, 29 June 2016 (UTC)