Shortcut: WD:PC

Wikidata:Project chat

From Wikidata
(Redirected from Wikidata:Project Chat)
Jump to: navigation, search
Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Also see status updates to keep up-to-date on important things around Wikidata.
Requests for deletions can be made here.
Merging instructions can be found here.

IRC channel: #wikidata connect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2016/10.
Filing cabinet icon.svg
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 1 day.





for permissions


for deletions


for deletion

for comment

and imports

a query


Allow non-existing article in a language to list articles in other languages[edit]

Suggestion: Allow indicating that a non-existing article in current language exists in another language, so show the other language(s) in the Languages list, when suggesting creation of article.

Solution (idea): WikiData must allow language specification for a non-existing article in a certain language.

Advantage: Clicking on a read link for a non-existing article in a certain language could then give the option to read the article in other languages, in addition to creating the article. The reader can then get the information in another language of own choice, while the link still shows that the article is non-existing in the current language.

Note: Suggestion was also previously posted at [1].  – The preceding unsigned comment was added by MortenZdk (talk • contribs) at 11:35, 11 September 2016 (UTC).

Bot generated data[edit]

sv:User:Lsj runs a bot that generates geographical articles (e.g. villages, rivers) in the Swedish and Cebuano wikis using freely available data from NASA and other sources. The bot extracts data about a location, then formats it into text and generates stub articles. Example for en:Abuko, with data items bolded:

Abuko has a savanna climate. The average temperature is {{convert|24|C}}. The hottest month is April, with {{convert|27|C}} and the coldest month is July, with {{convert|22|C}}.<ref name = "nasa">{{Cite web |url=|title= NASA Earth Observations Data Set Index|access-date = 30 January 2016 |publisher= NASA}}</ref> Average annual rainfall is {{convert|1148|mm}}. The wettest month is August, with {{convert|449|mm}} of rain, and the driest month is February, with {{convert|1|mm}} of rain.<ref name = "nasarain">{{Cite web |url=|title= NASA Earth Observations: Rainfall (1 month - TRMM)|access-date = 30 January 2016 |publisher= NASA/Tropical Rainfall Monitoring Mission}}</ref>
  • Would there by any problem with the bot storing the data in Wikidata?
  • Would there be any problem with articles embedding the wikidata items for a location into standardized text at display time?

Other types of data that could be stored by bots for settlements include census data and election results. Standard templates could then pull the data into chunks of Wikipedia text for articles in all languages, picking up the latest values at display time. Crazy? Aymatth2 (talk) 23:41, 24 September 2016 (UTC)

I think such data *should* be stored in Wikidata, and not as generated text in the Wikipedias, together with the provenance (i.e. whether it is from NASA or other sources). Whether the Wikipedias accept this kind of text, and whether they accept queries in running text, is up to the individual Wikipedias (I am rather skeptical regarding that approach), but that's really up to the local Wikipedia communities
Even without generating the text via queries from Wikidata, there are many ways that the projects could benefit from having the data stored in Wikidata, e.g. for checking the Wikipedia text whether it corresponds with Wikidata, etc. --Denny (talk) 01:32, 25 September 2016 (UTC)
+1. We don't need GeoNames stubs, in no language.--Sauri-Arabier (talk) 10:15, 25 September 2016 (UTC)
The data in question comes from GeoNames which is CC-BY licensed. A Wikipedia can important data with less copyright (sui generis database) concerns than Wikidata can. There's also a question about data quality. There are people who are concerned about importing a lot of wrong data into Wikidata. ChristianKl (talk) 08:15, 25 September 2016 (UTC)
An alternative to import only very high quality data is to provide wikidata data to quality heuristics or metrics, as told in the recent RfC : Data quality framework for Wikidata. The positive stuff about importing datas into Wikidata is that this is a starting point to improve datas by collaborative work, as opposed to trying to clean a dataset by someoneself alone. This also allows to spot inconsistencies between dataset because Wikidata can store several inconsistent datasets, hence provide some heuristic to know where the datas should be improved. author  TomT0m / talk page 08:37, 25 September 2016 (UTC)
Another piece of the puzzle is mw:Extension:ArticlePlaceholder who can generate text from wikidata datas and generates stub articles on the fly. This could basically make the bot useless. author  TomT0m / talk page 08:31, 25 September 2016 (UTC)

Can a bot reliably tell if there is already an item about the village, river, etc.? This can be difficult due to spelling variations, alternate names, and different levels of government. For example, near me, there are Rutland County (Q513878), Rutland (Q25893), and Rutland (Q1008836). Jc3s5h (talk) 12:37, 25 September 2016 (UTC)

  • The Swedish bot is steadily creating articles in the sw and ceb wikipedias for all locations in the world, mostly using geonames / NASA data. I assume these all get Wikidata entries. There may be errors, e.g. not realizing that Paraty and Parati are the same place, but that can be sorted out. The Swedish bot data is in the public domain: nobody can copyright mere facts on average rainfall or temperatures. Can we backtrack from existing Wikidata entries to the corresponding NASA data, then update attributes like "average July rainfall" from the NASA data, giving the source? That would give the Wikipedias a higher level of confidence about importing the data into their articles, and possibly generating articles to match the Wikidata entries. Aymatth2 (talk) 15:09, 25 September 2016 (UTC)
Having doublicate items isn't necessarily an error. It's not ideal but if someone notices they can merge. On the other hand GeoNames often contains wrong coordinates for an items. If then the temperature data are pulled based on the incorrect coordinates the whole item would have real errors in it's data. ChristianKl (talk) 19:08, 25 September 2016 (UTC)
  • @ChristianKl: Do we know how often the GeoNames coordinates are wrong, and how far they are wrong? The Swedish bot seems to be causing entries to be made in Wikidata for a great many places. I assume this includes coordinates. If they are within a kilometer or two, the temperature and rainfall data will be close enough - they are rough values anyway. If only 0.001% of the coordinates are completely wrong, we can live with that. Perfection is the enemy of excellence. But if 10% of the coordinates are completely wrong we have a very serious problem. Aymatth2 (talk) 21:59, 25 September 2016 (UTC)
  • @Aymatth2, ChristianKl: A lot of the data in GeoNames is just garbage, especially for Central America. I have no idea where GeoNames gets their data from, but it definitely isn't reliable. From spot checks of areas I know well, I would estimate that about 5% of their data for Central America is totally bogus. Kaldari (talk) 23:06, 26 September 2016 (UTC)
  • It looks like GeoNames gets their data from 74 other databases, which explains why some of the data is high quality and some of it is garbage. Kaldari (talk) 23:27, 26 September 2016 (UTC)
  • As far as the temperature data goes, there's are currently proposal to have currently to add a property for it . Currently there isn't a property for it. ChristianKl (talk) 18:47, 25 September 2016 (UTC)
  • Oppose - A large amount of the data from GeoNames is poor quality (especially outside of Europe and North America). GeoNames is the largest geography database on the internet, not the most accurate. They aggregate data from 74 other databases, some of which are high quality and some of which have no quality control whatsoever. Our species data is already polluted by Lsjbot. I would hate to see the same thing happen with our geographical data. Kaldari (talk) 23:46, 26 September 2016 (UTC)
  • @Kaldari: I get the impression that as Lsjbot churns out geo-articles in the sv and ceb wikipedias, the coordinates from GeoNames get loaded into Wikidata. It would help to have some hard numbers on what percentages of these coordinates in Wikidata are a) accurate b) within 1km c) within 10km d) off by more than 10km. Is there a way to check a random sample of the coordinates against what we would consider reliable sources? Perhaps it could be done on a country-by-country basis. The bot data on climate etc. derived from coords+NASA could then be accepted for countries where coordinates are fairly accurate, rejected for others.
If there are countries where other sources give more accurate coordinates than GeoNames, is there a way to override the GeoNames Wikidata coordinates with data from those sources? Which are those countries? Aymatth2 (talk) 03:04, 27 September 2016 (UTC)
The problem is that the data quality from GeoNames is essentially random, as it depends mostly on which original database the data came from. Evaluating the quality of such an aggregated meta-database is practically impossible. It's like asking "What is the quality of data in Wikidata?". What Swedish Wikipedia should be doing is evaluating the quality of each of the 74 sources that GeoNames uses, figuring out which ones have high-quality data and importing only that data directly from the original sources. Kaldari (talk) 08:24, 27 September 2016 (UTC)

What is the percentage of errors in Wikidata, Wikipedia and Geonames? The data IS in Wikipedia and consequently it should be in Wikidata. The best thing we can do is work on this data and improve where necessary. Dodging the bullet by making it appear to be outside of what we do is plain silly. It is what we do among other things. Thanks, GerardM (talk) 10:25, 27 September 2016 (UTC)

I disagree with GerardM (talkcontribslogs)'s statement "The data IS in Wikipedia and consequently it should be in Wikidata". The whole idea of importing data from Wikipedia is dicey, since the quality of Wikipedia data is not as good as some other sources. Certainly if I came across some demonstrably wrong data in Wikipedia, and couldn't find a correct replacement, I should delete the data from both Wikipedia and Wikidata. Jc3s5h (talk) 12:25, 27 September 2016 (UTC)
  • Have we talked to the GeoNames people? I assume they have tried to use the most accurate data sources they can access, but in some cases have had to make do with imperfect sources. Spot-checks can give a good measure of the quality of data in GeoNames or, for that matter, in Wikipedia. If we find that GeoNames coordinates for British locations are 99.99% accurate in GeoNames, and 98.4% accurate in Wikipedia, we should replace all the British coordinates in Wikipedia and Wikidata with the Geonames coordinates. It is possible that one of the 00.01% of inaccurate GeoNames coordinates will replace an accurate Wikpedia coordinate, but the trade-off seems reasonable. We can then use a modified version of the Swedish bot to match the coordinates to the NASA data to get the altitude, temperature and rainfall data for those British locations and store it in Wikidata for use by Wikipedia. Why not? Aymatth2 (talk) 12:48, 27 September 2016 (UTC)
I don't know about all the Wikipedia's, but at the English Wikipedia, if a bot repeatedly replaces information that has been individually researched by a human editor, and for which reliable sources have been provided, with incorrect values, that bit will find itself indefinitely blocked. The current compromise on using Wikidata information at the English Wikipedia (other than linking to equivalent articles in other languages) may be found at w:Wikipedia:Requests for comment/Wikidata Phase 2. Jc3s5h (talk) 13:41, 27 September 2016 (UTC)
  • @Jc3s5h: An approach that may work is to have a bot take the coordinates given in a Wikipedia infobox (which may come from Wikidata), and use those coordinates to fetch the temperature and rainfall data from NASA and format them as text in the appropriate language. The chunk of text would be held in a separate Wikipedia file, transcluded into the article like a template, and the text would make it clear that it is NASA data for those coordinates as of the retrieval date. The bot could be rerun occasionally, or on demand, to refresh the data. It would be nice to store the data in Wikidata so all the Wikipedias could use it, but I get the impression that getting the Wikipedias and Wikidata to agree is tough. Aymatth2 (talk) 16:26, 27 September 2016 (UTC)

One large problem with Geonames is that they have matched data from different databases, but matched them so poorly that a small village with two families near my home got a population of several hundreds. This error was introduced because the village share the same name as a small town 1000 kilometers from here. The population data was correct, but GeoNames did not match the correct place. Another large problem is that Geonames have many duplicate items. Both French and English databases have been used for Canada, therefor many Canadian items in Geonames can be found twice. Once with a French name and once with an English name. A lake at the border between Northern Territories and Western Australia can be found at least twice. Places who ends with the letter Ö in Sweden, are categorised as islands, even if they are not islands. Large parts of Faraoe Islands can be found at the bottom of the Atlantic Ocean. Almost every coordinate is rounded to nearest minute, locating mountain peaks floating in the air and lakes on dry land. Many items about buildings does not tell very much about the building at all. It only tells that this kind of building at least have existed here at some point between Stone Age and today. -- Innocent bystander (talk) 13:37, 27 September 2016 (UTC)

  • The immediate concern that triggered this discussion is with villages, where we need accurate enough coordinates to derive rainfall and temperature data from NASA. Are the GeoNames coordinates usually "good enough" for this purpose? Duplicate names are probably not a huge issue with villages. In Canada a lake, river or mountain might have variants (e.g. Lake Champlain/Lac Champlain), but a village would have the same name in both languages. Aymatth2 (talk) 16:26, 27 September 2016 (UTC)
Duplicate names are an issue with villages. Villages names often aren't unique. ChristianKl (talk) 17:13, 27 September 2016 (UTC)
  • If GeoNames has two entries for one village, St. Jean and Saint John, whatever, and they both have roughly accurate coordinates, good enough for climate data, there is no problem for the purpose being discussed as long as one of them can be matched to the Wikidata entry. The problem is when GeoNames places St. Jean, Quebec somewhere in Alabama. I suspect that wildly inaccurate coordinates are rare. Aymatth2 (talk) 17:28, 27 September 2016 (UTC)
  • I'm not convinced that getting the wrong village in the same county (or similar geographic unit) is good enough. I've hiked in an area where one side of a mountain ridge line is a temperate rain forest, and the other side is an ordinary northern forest. Jc3s5h (talk) 18:07, 27 September 2016 (UTC)
  • Climate data is always an approximation. My garden has different microclimates and vegetation on the dry, sunny slope in front of the house and the moister, shaded hollow behind. The climate data for a village in the Congo may be based on reports from meteorological stations more than 100 kilometers away. If we insist on perfect data we will get no data at all. Aymatth2 (talk) 23:10, 27 September 2016 (UTC)
When data is used to create articles in Wikipedias we are not talking about English Wikipedia we are talking about the process whereby new content is created in multiple Wikipedias. When we refuse to acknowledge processes like this and not include the data we have no way of improving the data before it is actually used to created articles. What use is it for us to be the data repository for Wikipedia when we refuse to be of service? It is wonderful to disagree but what does it bring us? NOTHING. We can do better and we should do better. Thanks, GerardM (talk) 20:03, 27 September 2016 (UTC)
  • We should provide the best data we can, then constantly work on improving quality. Spot checks on accuracy must show the data are good and steadily getting better. Surely Wikidata can do a better job of assembling and maintaining accurate bulk data like coordinates, temperatures and rainfall than editors of individual Wikipedia articles. Aymatth2 (talk) 23:10, 27 September 2016 (UTC)
  • @GerardM: The problem here isn't just the accuracy of coordinate data. We're talking about potentially importing data for 10 million place names, many of which don't even exist, are misclassified, are duplicates of other places in GeoNames, are conflations of multiple places, or are duplicates of places with different names in Wikidata. Can we seriously hope to check and fix even a tiny fraction of that? Adding new items is easy. Deleting and merging bogus ones is much more difficult. If we aren't willing to import the data directly from GeoNames, why should we be willing to import it indirectly from Swedish Wikipedia? The real danger here, in my mind, is that in the rush to fill Wikidata (and Swedish Wikipedia) with as much data as possible, we are eroding the trust that the larger Wikipedias have in Wikidata's data quality and thus alienating Wikidata from a huge editor pool, dooming it to die a slow death by data-rot. Kaldari (talk) 04:54, 28 September 2016 (UTC)
  • @Kaldari: We will not do this for all the Chinese places; we already have them. We will import them anyway if they import them into Wikipedias first. We will then not have the nNow the question is: What is Wikidata good for. Why are we considering best practices for data quality when we do not make them operational, when we do not use them for the needs that are there. Yes, there will be problems but we will have them anyway and, it is much better to be in the driver sear and think on how to improve the data before they become Wikipedia articles. Just consider, all these places have likely red links in one of our Wikipedias. Kaldari, use what we have for our mutual benefit and forget about the big Wikipedias. We are there for the smaller ones as much and data and data quality is what we are there for. Thanks, GerardM (talk) 06:28, 28 September 2016 (UTC)


  • A bot developer can combine web searches with AI techniques to check whether a GeoNames place name is a) the name of a populated place, b) not a duplicate of some more common name and c) has accurate coordinates. The process is iterative: the bot generates a confidence score for a sample of items; the developer checks the high-scoring items; where there is a problem, the developer trains the bot to detect and downgrade items like this. Eventually the bot reaches the level where 99.99% of items above a given score are clearly correct. That is, among 10,000 items there is just one error. All other items are discarded or placed in a list for manual attention.
@Kaldari: Would you accept having the bot populate Wikidata on a one-shot basis with the high-scoring names and coordinates if it reached this level of accuracy? If not, what level of accuracy would you accept? Aymatth2 (talk) 13:29, 28 September 2016 (UTC)
Do you think there's currently a person who wants to write such a bot? ChristianKl (talk) 15:20, 28 September 2016 (UTC)
  • I might write one myself, but would not want to start unless there were clearly defined and agreed acceptance criteria. I would want assurance that I would not run into a stone wall of resistance to implementation after it had been proved to meet these agreed criteria. Let's see what user:Kaldari has to say. Aymatth2 (talk) 15:39, 28 September 2016 (UTC)
  • You are dodging the issue. The developer of this data is able to do a lot of all this if not all of it. THe point is when we do not cooperate he can just opt to add all this data to Wikipedias and then what! They are obviously articles and there will be multiples in Wikipedias so there will be items. When he STARTS with cooperating in Wikidata, we can start with disambiguation. We can add all the rest and compare with other sources and do what is necessary (whatever that is). The articles may be placeholders in any language and in any language we can seek cooperation. Now ask yourself, is this not a perfect example of how we can leverage Wikidata in a positive way, we would be proactively working on the data quality of Wikipedia or do you really want to insist on working after the fact. In all cases we have to deal with this shit. It is in our best interest to cooperate and not be so afraid what a subset of some Wikipedia communities have to say. Thanks, GerardM (talk) 18:30, 28 September 2016 (UTC)
  • @GerardM: I think you are overestimating the ability of the Wikidata community to clean up this data. No one has cleaned up any of the bogus species data that we imported from Swedish Wikipedia 3 years ago, nor is it even practical to do so. Let's say that I wanted to remove a totally bogus species from Wikidata, like Zygoballus mundus (Q5345040) (which has actually been deleted from the original database it was imported from). With a lot of effort (and Google Translate) I could probably get it deleted from Swedish Wikipedia, but it would still exists on the ceb and war Wikipedias, neither of which I have any clue how to interact with, so it will still persist on Wikidata indefinitely. Multiply that by the thousands of bogus species that need to be deleted and it quickly becomes an impossible task. I'm sure it won't be any easier getting all the abandoned logging camps and real estate developments (see below) removed from Wikidata after they are imported. Kaldari (talk) 19:21, 28 September 2016 (UTC)
  • @Aymatth2: If there was such a way to automatically determine accuracy, I would probably be willing to endorse it, but this sounds like a very challenging goal to accomplish. There is also the issue of notability to consider. GeoNames has no threshold for notability. It classifies neighborhoods, ghost towns, and real estate developments as "populated places" with no way to distinguish them from actual towns and cities. To give you one clear example of the problem, let's look at Chiquibul National Park in Belize. This has been a national park since 1995 and no one is allowed to live within the park except for park rangers. Within the boundaries of Chiquibul, GeoNames includes over a dozen logging camps that haven't existed for at least 20 years. These were never permanent settlements, just camps for loggers, yet GeoNames classifies them as "populated places". If you want to double check that these are in fact abandoned logging camps and not villages or towns, here's a list of some of them: Aguacate Camp, San Pastor Camp, Los Lirios Camp, Cebada Camp, Valentin Camp, Cowboy Camp, Retiro, Puchituk Camp, Mountain Cow, Blue Hole Camp, Cubetas. The reason these are included in GeoNames is because back in the 1970s (when Belize was British Honduras), the British government did a survey of the logging camps, and this survey data eventually ended up in GeoNames. How would you propose training an AI to detect cases where "populated places" were actually just abandoned logging camps or real estate developments? I imagine you would have to give it input from more reliable databases, and if you're already doing that, why not just use those databases to start with rather than GeoNames? Kaldari (talk) 19:04, 28 September 2016 (UTC)
@Kaldari: Well, to me it looks like these places are in fact "Populated places with an end date". There is nothing strange about that. We have many such items already. I started a thread about such items here some time ago.
But we'll never be able to actually supply an end date since there are no reliable sources about these camps. All we know is that they definitely don't exist anymore. And regardless of this specific example, my point is that we shouldn't be creating items for places that aren't covered in reliable sources. As it stands now, I could create 100 totally bogus cities in GeoNames (via their editing interface) and in a few months they would automatically become articles on Swedish Wikipedia complete with official looking NASA references, and then they would be copied to other wikis and imported into Wikidata where they would live forever without anyone ever questioning their existence. Even if someone did discover that one of them was fake, there would be no way to link them to the other fake cities. Doesn't that seem like a problem? Shouldn't we demand some minimum level of quality control and verifiability for the data we import? Kaldari (talk) 22:33, 28 September 2016 (UTC)
I strongly advise against importing any other data than GeoNames ID (P1566) from these svwiki or cebwiki-articles. If you want to import any other data from GeoNames, then do it directly from the poor database. We have detected many strange errors in these articles on svwiki. Many of the problems were detected when the bot reached Finland. Finland is a country with a fair share of active users on svwiki, since Finland is partly Swedish speaking. The articles were found describing savanna (Q42320) in parts of the arctic country, February were the hottest month in some cases. And the data about the lakes were often hilariously wrong. The bot was halted for some time, to discuss the quality-problems but it has started again, in full speed I'm afraid. -- Innocent bystander (talk) 20:20, 28 September 2016 (UTC)
  • I feel like someone who has poked a stick into a hornet's nest. It would be useful, if we know the name and coordinates of a populated place, to store that information in Wikidata and then to also store data derived from the name or coordinates such as census or NASA climate data so it could be shared by all the Wikipedias. I had no idea there was so much controversy about GeoNames. Lets forget about that as a source and look at the Datasources used by GeoNames in the GeoNames Gazetteer. Some of these look good to me. For example, the Instituto Brasileiro de Geografia e Estatística is a very reputable Brazilian government agency that provides a wealth of data about municipalities such as Cambuci that could be used to enhance the decidedly minimalist en:Cambuci article. I see no reason to treat a source like this with suspicion. This is what the Brazilian government says about their country. Is there a problem, in principle, with importing data from it so the Wikipedias can share it, and share updates? Aymatth2 (talk) 23:42, 28 September 2016 (UTC)
You are still dodging the bullet and, it may miss. When you import all this data you will have a certain percentage of error. It will probably be within the 3% range and that is better than all the work that I have done. I do make mistakes particularly when I am carefully adding content by hand. So when we want a mechanism to both update Wikidata and the Wikipedias, there is a precedent, there are two precedents. Listeria is able to update all the lists we have and, with a little bit of effort it can show the content for an item in a Reasonator kind of way. There is always the Placeholder, it is the official version of all this. The point is that we are thinking in one way; quality must be maintained and each project is an island. Yes and no. Quality must be maintained and refusing this data and having it through the backdoor is absolutely the way of NOT improving data. Improving quality can be done in many ways and YES we have communities. Why not ask our friends in India to verify and complete the data for India, why not ask the same for our Welsh friends. It is then for a part up to them to help us out but they CAN have the same data available to them if they so choose, available in a Listeria / Reasonator / Placeholder kinda way.
For all the nay sayers, tell me: what prospect do you have to improve this data that is better? It is not a good idea to say: "You may not import data from any Wikipedia" because I was told that there was no option but to accept erroneous data so we have a precedent whereby dodgy data is to be accepted. If we do not accept the data I will again open a can of wurms. Thanks, GerardM (talk) 04:46, 29 September 2016 (UTC)
  • If the IBGE data is imported mechanically to Wikidata it will be 100% accurate - as a reflection of the IBGE data. It will be safe for any Wikipedia article to say "According to the 2010 census by the Brazilian Institute of Geography and Statistics, the population was 12,456, of which 53% were female and 49% were male." The numbers do not add up, but that is indeed what the census says. The IBGE site is the official publication. IBGE may correct the numbers, and there will be another census in 2020, so we will want to periodically rerun the import to freshen up the data. As for the Wikipedias, there are two options, and I am not sure which is best:
  1. Dynamically pull the data from Wikidata at display time
  2. Periodically pull the data from Wikidata, format it and store it in each Wikipedia as a "template" to be embedded in the article.
The second approach is less immediate, but perhaps gives more control, and may be more efficient. Either way, Wikipedia and Wikidata editors would not update the data, which are identified as the IBGE numbers, not the "true numbers". If an editor finds a better source of population data than the census, they can include that in their article and suggest that it too is held in Wikidata. The Wikipedias may format the data as text or in tables according to editor preference. I see applying this approach to other reliable sources as a huge benefit to all Wikidata consumers, including all the Wikipedias. Aymatth2 (talk) 12:51, 29 September 2016 (UTC)
When the IBGE is imported, it still needs a lot of prepatory work. We already have many places of Brazil in Wikidata. It will only bring the current places and not the abandoned places. So yes it is valuable data but it is not all the data. Thanks, GerardM (talk) 05:04, 30 September 2016 (UTC)
@Aymatth2: Pulling data directly from the primary databases sounds like a much better idea to me. At least then we have a real source for verifiability and can assume a certain level of reliability (rather than it being a crap-shoot). Kaldari (talk) 18:44, 29 September 2016 (UTC)
Even if we assume that all data at the source is correct, there is still a lot of work to match each item in IBGE with each item here at Wikidata. You will then still get a percentage of errors. By hard work, we can improve that. GeoNames and the Lsjbot-project has here unfortunately made it worse. -- Innocent bystander (talk) 18:53, 29 September 2016 (UTC)
Sticks and stones. You do not address the issue. Lsjbot and GeoNames are realities we have to deal with. There are also other sources that have been imported that are way more problematic. Wikidata is not operating in a vacuum. It is dangerous to think we should ignore an opportunity that allows us to have an influence on the eventual content of multiple Wikipedias. It is discrimination pure and simple. Thanks, GerardM (talk) 05:04, 30 September 2016 (UTC)
@GerardM: I am not here to solve everything. My main opinion here is that we should not use svwiki or cebwiki as direct source for such things as height of mountains and surface area of lakes and some other data, since the methods the bot has used to find such data is very problematic. We found very large mistakes in Finland, and that is the only country we have been able to review. It becomes even worse since the bot a little to often has not been able to match correct GeoNames-item with correct Wikidata-item. That is not a big deal, if they are not matched at all. But a little to often "John Doe (city)" have been matched with "John Doe (mountain)" or "John Doe (parish)". The links to Wikipedia inside GeoNames has made it even worse since those are very often wrong. And that bad data has already been imported here. I used to daily correct such mistakes, but since I cannot see that I will finish before heat death of the universe (Q139931) I have quit doing so. -- Innocent bystander (talk) 07:19, 30 September 2016 (UTC)
It is not about you. It is about what we face. You propose discrimination on the fact that Cebuano and Swedish do not matter to you. What this issue brings is to the front that according to you a lot of GeoNames data we already hold needs work and we are already doing that work. You do not qualify the error rate in GeoNames, you do not compare it to other sources. It is opinion only. Compare that to 12% of the most subscribed medicines are not proven to be effective and we are to have all recognised substances approved for medica use in Wikidata.. REALLY? I often fix links to people where there is according to English Wikipedia a link only to find that Wikipedia has no link and the linked item is a person with the same name of a different century. Wikidata is as bad as GeoNames if not worse. But we have more resources than GeoNames to improve our data and we can help them fix their data. We, not you but you as well. We cannot say that Swedish does not matter. You can say it but that is just you. We cannot because improving the data in the Wikipedias is one of the most important functions of Wikidata and when we do this well, there is no real argument left not to use Wikidata for its data. Thanks, GerardM (talk) 04:41, 1 October 2016 (UTC)
  • All sources have errors. People are born and die while a census is being taken. Clerks make transcription errors. We cannot expect to record the truth, only what plausible sources like IBGE have said. I see no difficulty matching the IBGE entries for municipalities in each state of Brazil with the Wikipedia / Wikidata entries. There are only a few thousand of them. What is involved in getting accepted definitions in Wikidata of the official census data attributes, and approval to run a bot to load them for the Brazilian municipalities? Aymatth2 (talk) 22:28, 29 September 2016 (UTC)
Does the Brazilian census have Ids that they use to identitfy Brazialian municipalities? If so it would make sense to propose a new property for that Id.
In general it makes sense to announce the bot project a few days beforehand in this project chat, offer a few examples and see whether somebody objects. If nobody objects you can go ahead. In the case of the Brazilian census I doubt that anybody will object, but that's a project that has little to do with the GeoNames data. ChristianKl (talk) 17:46, 30 September 2016 (UTC)
  • Yes, Brazil assigns municipal codes. For Cambuci, Rio de Janeiro, it is 3300902. They also participate in the Open Geospatial Consortium, as do many other sources of high-quality geographical data. Perhaps Wikidata should too, as a consumer and distributor of the data. Aymatth2 (talk) 16:16, 1 October 2016 (UTC)

Bot generated data (break)[edit]

As GerardM I find the approach to ignore reality strange. I have on svwp followed Lsjbot closely with a focus on quality and I was responsable to issue a pause in August, in order for all community to discuss different issues that had turned up during the first million article being generated. There were some minor adjustment we agreed upon in order to be able to support lsjbot to continue, like not include itemtype "cliff/stoneblock" which geonames had existing on both land and in sea.

For the 1,3 M articles on species I have done an extensive analaysis, where the input from Wikidata was of great help. I found errors in a few hundred of the articles generated, representing around 1-3 per 10000 articles. In the same analysis I found that of the manual created ones the error frequences was 1-3 per 100 articles (it is easy to get the letters wrong in a latin name of 30-40 charters). I also found that of the errors reported in Wikdata about 1/3 was in fact no error. I consider this Botcreation a 100% success and also see it beinge repeated in a number (6-8 in total) of other language versions. There are challanges, though, where I beleive you with your Wikidata knowledge could help out. The taxons change frequent, meaning a number of taxons has been changes since the COL database of 2012. Could this be handled on Wikidata level and how then to tranfer this updates to the different language versions?

For geonames the quality issue is much more complex and I would very much appriciate if you put your energy and competence in discussing these. For example we know that duplicates are generated, like, as IB mentions above, in Canada where there are often created one with an English name and on with a French. On svwp we have said this is no (real) problem, as it gives the reader value anyway and it will be enogh with a mergetemplate. But how should Wikidata take care of these? We have also found that the approach on how to handle the case where a city and a commune (municipally) is more or less the same. On svwp we always treat this as two items, but we know that on other versions these are reprented in only one article. What is the Wikidata view on this? We have had a long dicssion on the quality of coordinates where it seems geonames often use a grid, making an error in the precision. But here we see this better then nothing, and when more exact coordiantes exists in existing articles these are used, and if more exact coordianted exist on other versions/wikidata these ought to be used (where you know better on how these more precise values can replace the rough ones). And there are issues worth discussing for several oher itemtypes, like weather data which is the start of this thread. Hope to see more of you in helping us in making the data from the botgerention even more valuable, and not only on a few language versions.Yger (talk) 08:19, 2 October 2016 (UTC)

Wikidata has per default one item for every svwp article. Duplicate Wikidata items aren't a huge deal. For Wikidata it's more important that they data in the items is correct. That said if svwp merges two items and Wikidata items exist to map the two concepts of the svwp articles it might make sense to merge the Wikidata items as well. :As far as taxons go, could you give an example of a taxon that recently changed it's name and the data source you have for it changing it's name? ChristianKl (talk) 22:33, 2 October 2016 (UTC)

Having far too much time on my hands, I checked the species mentioned in the main source for en:Rio Cautário Federal Extractive Reserve#Environment, the Brazilian Ministry of the Environment (MMA). I was interested in what the Wikipedias had, and what Wikispecies had. This is just one example of a collection of species from a location in western Brazil by someone who clearly favors reptiles over birds, so not "typical", but sort of interesting. Findings are shown below:

MMA name Alternative .en .es .sv .species Comments
Amburana acreana Y Y Y Y
Apuleia leiocarpa - Y Y Y
Bertholletia excelsa Y Y Y Y
Cedrela odorata Y Y Y Y
Dinizia excelsa - Y Y Y The .es and .sv entries are not linked
Dipteryx odorata Y Y Y Y
Erisma bicolor - - Y -
Erisma uncinatum - - Y Y
Hymenolobium petraeum - - Y Y
Mezilaurus itauba Mezilaurus ita-uba Y - Y Y In Wikispecies as Mezilaurus ita-uba
Swietenia macrophylla Y Y Y Y
Atractus insipidus - - Y -
Bothrocophias hyoprora Bothrops hyoprorus Y - Y -
Bothrocophias microphthalmus Bothrops microphthalmus Y - Y Y .sv entry as Bothrocophias
Bothrops mattogrossensis Bothrops matogrossensis - Y - -
Callithrix emiliae Mico emiliae Y Y Y Y .sv entry as Callithrix
Callithrix melanura Mico melanurus Y Y - Y
Chironius flavolineatus - - Y Y
Coluber mentovarius Masticophis mentovarius - Y Y Y .sv and Wikispecies as Masticophis
Crotalus durissus Y Y - Y .sv redirects to Crotalus adamanteus
Drymobius rhombifer - - Y Y
Drymoluber brazili - - Y Y
Enyalioides laticeps Y - Y Y
Enyalius leechii - - Y -
Epicrates crassus - Y - -
Epictia diaplocia Leptotyphlops diaplocius Y - Y - .en and .sv have Leptotyphlops
Erythrolamprus mimus - - Y -
Hoplocercus spinosus Y - Y -
Leposoma osvaldoi - - Y -
Micrablepharus maximiliani - - Y -
Micrurus mipartitus - - Y -
Ninia hudsoni - - Y -
Oxyrhopus formosus Y - Y -
Oxyrhopus rhombifer - Y Y -
Oxyrhopus vanidicus - - - - .fr has a stub
Pseudoboa nigra - - - Y
Saguinus fuscicollis Y Y Y Y
Siagonodon septemstriatus Leptotyphlops septemstriatus Y - Y Y Leptotyphlops in .en, .sv, Siagonodon in .species
Siphlophis worontzowi - - Y -
Tupinambis longilineus - Y Y -
Xenodon merremii Xenodon merremi
Waglerophis merremi
Y - Y - .en Xenodon and .sv Waglerophis not linked

The taxonomy sometimes changes, but it takes a while before consensus is reached on the new structure. Mico vs Callithrix seems to still be under debate. Every species mentioned by the source has an article in one of the wikis, although apart from .sv and .ceb most individuals wikis get less than half the species. A central clearing house for new entries and updates giving data on taxonomy, IUCN status and range would be a major step forward. Surely that is what Wikidata is for? Aymatth2 (talk) 11:57, 4 October 2016 (UTC)

Bot generated data (break2)[edit]

When there are quality sources such as IUCN, it's certainly the role of Wikidata to host that data. We do have IUCN-ID (P627), IUCN protected areas category (P814) and IUCN conservation status (P141). I doubt anybody would oppose a bot that imports that data directly from IUCN. ChristianKl (talk) 10:22, 5 October 2016 (UTC)
  • @ChristianKl: You are an optimist. When the IUCN Redlist does not find a species (e.g. Erisma bicolor), it directs the reader to the Species 2000 & ITIS Catalog of Life. The Catalog of Life seems reputable to me, but it is the primary source for Lsjbot, and to quote User:Kaldari (above) "Our species data is already polluted by Lsjbot." I imagine that introducing Catalog of Life common names / taxa via the IUCN back door would be just as contentious.
It would help if we had some well-defined criteria and process for determining which sources will be considered good enough for a bot to import their data to Wikidata. Then a bot developer who has met the criteria and followed the process can safely invest in the effort of developing the bot to load Wikidata. After that, of course, they have to get the Wikipedias to accept information from Wikidata. Aymatth2 (talk) 18:19, 5 October 2016 (UTC)
I understand the phrase "import data from IUCN" to mean to import information of species that do have and IUCN-ID (P627). If you would cite IUCN as a source for CatalogueOfLife data for species that don't have an IUCN-ID (P627) than I would think that people would rightfully object. References are very important for Wikidata. Many Wikipedia's don't like to import claims without references and currently most of the GeoNames and CatalogueOfLife imported data doesn't have references on Wikidata about the provenance of the data. It would be good to focus on quality of data and not on quantity.
In general importing data from it's original source and with a link to the original source is optimal. The CatalogueOfLife is a merged data set from 143 taxonomic databases.
As far as I understand the status quo is that it's desired that people who import massive amounts of data into Wikidata with a bot ask beforehand and seek consensus. I don't think there a history of this community being angry with people who announced what they wanted to do then did what they announced with a bot. ChristianKl (talk) 19:22, 5 October 2016 (UTC)
To be clear, I strongly support importing data from reliable sources, but neither GeoNames nor Catalog of Life qualify as a reliable source. Both include self-published data that is not vetted or reviewed. For example, at the time of Lsjbot's species project, Catalog of Life used a self-published non-peer-reviewed website as the authoritative source for all data on the animal family Salticidae, which includes over 5000 species. They have since corrected this problem and now use a totally different database for this family, but the damage is done and now 3 different Wikipedias and Wikidata have bogus, idiosyncratic data for this family. With GeoNames the problem is even worse. Anyone can add, edit, or delete data from GeoNames with no oversight whatsoever (similar to Wikipedia or OpenStreetMap but without a community to patrol the changes). They also have an extremely low standard for including data in the database and poor accuracy for place classification in some areas. In both the Catalog of Life and GeoNames cases most people aren't noticing these problems because these problems don't occur with popular items. For example, the Catalog of Life data for birds is pretty impeccable, but for obscure arthropods it's hit or miss. The GeoNames data for Sweden is awesome, but for Belize it's a mess. In both of these cases, high quality data does exist; it just takes more work to find, vet, and import. Using these mega-aggregate-databases is lazy and short-sighted. As admirable as Lsj's goals are, compiling data for all the world's places or species just isn't a task that should be undertaken by a single person or bot. It should be done with careful deliberation and only using vetted reliable sources (or at least sources that have some sort of community that is keeping the data updated and clean). Regardless, I'm not a member of the Swedish Wikipedia community and I have no influence there, so GerardM is probably right. We just have to learn to live with this mess. In the meantime, I don't support making it worse by importing any data directly from GeoNames or importing anything from Swedish Wikipedia besides article titles and GeoNames IDs. Kaldari (talk) 22:27, 5 October 2016 (UTC)
@Kaldari, Aymatth2, ChristianKl: Question : do CoL or geonames informations are traceable ? This means, do they cite their sources ? This would mean by reimporting datas from them we could source them from CoL then add the primary source they use to second the claims. Actually we already face problems with alignement to databases who have their own bugs with VIAF datas, and the solution seem to be a cooperation with them to upstream the wikidata corrections and periodic updates of Wikidata thanks to their input. If we could achieve that and source some of the claims that were directly or indirectly imported from them by up-to-date datas, the datas they deleted imported here could be left without sources and we could deal with them - delete? deprecate? - here after we're confident a large part of taxonomy datas are sourced from over databases, for example.
Halleluja! The problem we face is that a lot of assumption we have are wrong. When you want to be inclusive about plant names consider IPNI as a source. What we consider incorrect names are often scientifically valid names. Once we decide to seriously consider collaboration, it does not follow that what a CoL or Geonames hold is incorrect. What follows is that we continue to source data to multiple sources and compare statements. We will seek understanding about differences and in this way we contribute to our quality. The point is very much about the point of view we take. We are no longer new, we do provide service to other projects. What we do is not about importing data, it is about how we deal with the data we import. For Wikipedias we MUST accept their data but that does not mean that what they hold is good. We have been curating their data in Wikidata and this is largely unnoticed.
There is a distinction between valid and valuable. Our data is no better than any of the other user curated projects. Only when we consider how we narrow down where we spend our time improving the data our work will become more valuable. In such a process our data becomes more valid. Thanks, GerardM (talk) 09:51, 6 October 2016 (UTC)
  • The Catalog of Life gives its sources. See Erisma uncinatum for an example. The database is run by subject experts and is worth more than the sum of its sources since the merging and review process turns up problems to be fixed. The Catalog of Life gives a more complete and accurate overview of species than the Wikipedias, although lacking the depth a Wikipedia article may give on a given species. Yes, it has errors; all data sources have errors.
There are databases on everything from extragalactic objects to shipwrecks that provide more complete data than the Wikipedias, and keep adding entries and making corrections. They still have errors, of course; all data sources have errors. But providing data from these sources, saying where the data came from, is better than providing no data. If two sources give different values for the same data element, we can record both versions.
An update mechanism would be needed, so Wikidata would pick up additions and corrections from the data sources. For example, the accepted scientific name for a plant may change, with the former name now listed as a synonym. If the Catalog of Life scientific name value changes, Wikidata should change the value of scientific name that it shows as sourced from the Catalog of Life.
Perhaps the key is to view Wikidata as a repository of fairly current data from more-or-less reliable sources, with the sources identified, not as a repository of 100% true and accurate data. If we demand perfection we will achieve nothing. Aymatth2 (talk) 14:56, 6 October 2016 (UTC)
The value of taxon name (P225) of an item should never be changed. Create a new one, move sitelinks. --Succu (talk) 15:08, 6 October 2016 (UTC)
  • If the source corrects a spelling error (e.g. Bothrops mattogrossensis should be Bothrops matogrossensis), presumably Wikidata should reflect the correction. Aymatth2 (talk) 15:28, 6 October 2016 (UTC)
There are very rare cases this could be carefully done. It was not necessary for Bothrops matogrossensis (Q2911754). The misspelling of Mato Grosso (Q42824) could be found in the original desription of Bothrops neuwiedi matogrossensis (Q27118116) and was reintroduced in 2008 ([2]) as the subspecies was raised to a species. --Succu (talk) 16:19, 6 October 2016 (UTC)
  • I think we are violently agreeing. The sources are mostly accurate, but when an error is found it should be fixed. I expect the correction will often come from the source, since they are constantly working on their data. If we find an error in a source like the Catalog of Life we should report it back to them, cooperating in improving data quality. Aymatth2 (talk) 17:30, 6 October 2016 (UTC)
I don't think so. Data aggregators like CoL, GBIF or EOL are a bad starting points to enrich Wikidata with reliable data. As far as I'm aware Lsj failed to correct his early (2012?) CoL import. So we (as Wikidata) had to deal with this. I doubt your analysis „I found errors in a few hundred of the articles generated, representing around 1-3 per 10000 articles. In the same analysis I found that of the manual created ones the error frequences was 1-3 per 100 articles (it is easy to get the letters wrong in a latin name of 30-40 charters).“ is well grounded. Expanding the scope of Wikidata (=taxa not treated by any Wikipedia) should be carefully done. So the mapping of Flora of North America taxon ID (P1727) is nearly complete. Flora of China ID (P1747) lacks this completness due to a high amount of spelling errors. --Succu (talk) 21:35, 6 October 2016 (UTC)
@GerardM: „consider IPNI as a source“ - International Plant Names Index (Q922063) is far away from being a reliable source. --Succu (talk) 21:35, 6 October 2016 (UTC)
@Succu: „consider IPNI as a source“ - International Plant Names Index (Q922063) is inclusive of all the literature on plant species. Its origins are impeccable and it is superior at registering all the permutations over time. Trust me I analysed their data for all the succulents. I ended up with a 60Mb database where I normalised their data to bring all the errors ever produced in literature to a more manageable series of imho correct entries. When you talk about IPNI and its errors, you obviously do not know what IPNI is about. Thanks, GerardM (talk) 05:53, 7 October 2016 (UTC)
For „all the succulents“?! - No, I dont't trust this statement of yours. I know pretty well what the goals of IPNI are and what they reached until now. --Succu (talk) 08:16, 7 October 2016 (UTC)
You doubt my word.. Why? I am to trust your judgment on succulents based on what.. What we do does not conform with nomenclature. Too much is missing. The author, the publication and the publication date are essential parts of a valid name. We do not hold that information so all the data is deficient in principle. Thanks, GerardM (talk) 17:53, 7 October 2016 (UTC)
The trait „succulence“ is not well defined. So you have to say on what kind of definition your dataset is based on. Thats all. Yes, too much is missing. But that has nothing todo with nomenclature. It's a titanic workload we have to do and every help adding taxon authors and publications is welcome. --Succu (talk) 18:23, 7 October 2016 (UTC)
Sorry, why bother. My approach fitted my needs. The current approach of nomenclature is wrong at best and you think what others have done is of no consequence because you do not understand it. For me most of the arguments used are problematic what galls me most is the notion that you know best and try to enforce what is not correct in the first place. Thanks, GerardM (talk) 19:22, 7 October 2016 (UTC)
Whatever your „need“ was... I see nothing at your side that Wikidata helps to close those titanic gaps or made IPNI a better resource. Maybe you blogged about it? --Succu (talk) 19:32, 7 October 2016 (UTC)
I blogged about it in may 2007. You are not really interested beyond your own scope and that is fine. It is why I did not bother with taxonomy. There is enough to do anyway. My point, the one that you acknowledge is that our taxonomy data is flawed. IPNI has quality data it is a reliable source. My problem with Wikidata is that people like do not appreciate what Wikidata is about, what it can do and why it is so relevant. Thanks, GerardM (talk) 05:31, 8 October 2016 (UTC)
Is IPNI fit enought to be a source for the scientific names described in Descriptions of three hundred new species of South American plants, with an index to previously published South American species by the same author (Q21775025)? --Succu (talk) 05:51, 8 October 2016 (UTC)
@User:Yger: Sorry missed you. --Succu (talk) 21:55, 6 October 2016 (UTC)
Wikdata can give excellent value at little effort if it distributes information from databases maintained by specialists. We should attribute the data to the sources and refresh the data from the sources to ensure we reflect their current view. We do not have the resources to do it ourselves. User:Yger analyzed the articles on species, found errors in 1-3 per 10,000 bot-generated articles, and found errors in 1-3 per 100 manually created articles. If we try to compete with the specialist databases we will fail. Aymatth2 (talk) 23:18, 6 October 2016 (UTC)
Flora of North America (Q1429295) is a series of books accompanied by a website. An what you propose is allready done. --Succu (talk) 08:16, 7 October 2016 (UTC)
Aymatth2: ITIS is a lame duck and not an up-to-date resource. ITIS is not a „specialist databases”. FishBase (Q837101) or Avibase (Q20749148) are far better. --Succu (talk) 20:46, 7 October 2016 (UTC)
The analysis as run by Yger seems meaningless (to put it kindly). Any analysis depends on input data, and in this case there are no reliable input data. It is also meaningless to use the amount of error recorded in Wikidata as a starting point: finding errors is a difficult and thankless business, and therefore it is mostly not done. At a rough guess only 1% of the errors in svwiki has been marked as such in Wikidata.
        As Kaldari says, CoL is very variable in the quality of its data at any one time, and this will vary with time (what was hopless last year is better this year, etc).
        There are areas in svwiki (based on CoL) where the error rate is something like 50% (CoL does give its sources, and almost invariably these sources make it clear that what ended up in CoL is wrong). Hopefully this 50% is a maximum, found only in limited areas, but there is nobody who can really know how much error there is in svwiki. All I can tell is that the error rate is off the scale.
        And by errors I do not mean taxa that have changed their name (in such cases both names are found in the literature, and are good data), but I mean 'taxa' that do not exist, never have existed, and never will exist. - Brya (talk) 10:56, 7 October 2016 (UTC)
@Yger: You say I consider this Botcreation a 100% success. An example I'm running into today. In Four new species of Hypolytrum Rich. (Cyperaceae) from Costa Rica and Brazil (Q27137125) four new species are described: Hypolytrum amplissimum (Q27136913): Hypolytrum espiritosantense (Q15587199), Hypolytrum glomerulatum (Q15588104), Hypolytrum lucennoi (Q15588880) and Hypolytrum amplissimum (Q27136913). Lsjbot (Q17430942) failed to create the latter one. Do you include such omissions in your analysis? --Succu (talk) 20:25, 7 October 2016 (UTC)
@Brya: These are serious allegations against organizations that receive significant public funding. Can you point us to examples of errors in items on the Catalogue of Life database? Aymatth2 (talk) 22:47, 7 October 2016 (UTC)
These are not allegations but observations. They are not new either. For heavy rates of error check the Ebenaceae or Apiaceae in svwiki against the current CoL (CoL has realized its error). For an example that illustrates that Yger is personally putting back complete nonsense see here. - Brya (talk) 04:15, 8 October 2016 (UTC)

Bot generated data (break3)[edit]

  • @Brya: I am not particularly interested in what has happened in the past, except in what we can learn from it. There were teething problems with the Catalogue of Life, and we have no automated process to refresh our data as they make corrections and additions.
Your example is useful. Maba quiloënsis was described by Hiern in 1873, named Ebenus quiloënsis by Kuntze in his 1891 Revisio generum plantarum vascularium... and named Diospyros quiloënsis by White in 1956. The last is now the accepted name. The Museum of Natural History (Q688704) "Virtual Herbaria" had entry 260951 for Ebenus quiloënsis, and entry 69345 for Diospyros quiloënsis aka Maba quiloënsis. They have since merged the entries so they are identical, giving all three names, but the Catalogue of Life has not yet picked up the merger.
It would be correct for us to record that the Catalogue of Life shows Ebenus quiloënsis and Diospyros quiloënsis as separate species, while the UofV "Virtual Herbaria" shows them as synonyms. This is not "complete nonsense". Then, when the Catalogue of Life makes the correction, we should refresh our data to show their current view. Have you notified the Catalogue of Life of this problem, which may be an oddity or may be systemic? Do you know of other problems with items on the current Catalogue of Life database? Aymatth2 (talk) 13:06, 8 October 2016 (UTC)
The CoL has been going for quite a while, some fifteen years, with a new version every year. It had and has more than teething problems. Quite a few are structural.
        Maba quiloensis, Ebenus quiloensis, Diospyros quiloensis are three different names (three different formal nomenclatural entities), so there can be / should be three items in Wikidata. These names are homotypic, so they can not refer to different species, by definition. They refer to the same species (not necessarily the same circumscription) and in any particular taxonomic viewpoint only one of these names can be used at a time. If one believes in a genus Maba (which nobody has for quite a while) then Maba quiloensis is (likely) the correct name for a species. If one believes in an all-encompassing genus Diospyros (which has been the consensus for quite a while) then Diospyros quiloensis is (likely) the correct name for a species. By definition, Ebenus quiloensis is never the correct name of a species (never has been). If svwiki is aiming to be an encyclopedia, then it should have at most one entry for the species. In fact, dewiki would not allow an entry such as held by svwiki, as it has no meaningful content. But svwiki does hold two entries both claiming that the name is the correct name of a species: a miraculous duplication of species. Or to put it differently, a bold faced lie.
        I see you failed to run even a basic comparison between svwiki and CoL (you are just defending svwiki's wrongdoings?). The CoL has had Diospyros quiloensis as the accepted name for something like a year and a half now. Recording that CoL has held different contents earlier would only be useful in a database that collected metadata on errors in databases. Brya (talk) 14:26, 8 October 2016 (UTC)
  • @Brya: I missed the fact that the Swedish wiki is citing the historical 2014 version of the Catalogue of Life, since corrected. This is a useful example because it shows the danger of importing from a source but not updating. If Wikidata had imported Ebenus quiloënsis from the Museum of Natural History (Q688704) "Virtual Herbaria" in 2014, we would have got the same information as the 2014 Catalogue of Life entry. If we had not refreshed that data, we would still be reflecting the error in the 2014 "Virtual Herbaria", as the Swedish Wikipedia does. If we refreshed from the latest "Virtual Herbaria" or the latest Catalog of Life we would automatically get the correction. Again, can you point us to problems with items in the current Catalogue of Life database? Aymatth2 (talk) 16:06, 8 October 2016 (UTC)
You seem to be increasingly separated from reality? The Swedish Wikipedia / svwiki has not been refreshed but is still showing all the errors it has imported. As I remember the Vienna database, it did not have these errors, but these were generated by CoL. - Brya (talk) 16:52, 8 October 2016 (UTC)
  • @Brya: I am not trying to defend the Swedish Wikipedia, which has not been refreshed but is still showing all the errors it has imported. Can you point us to problems with items in the current Catalogue of Life database? Aymatth2 (talk) 23:47, 8 October 2016 (UTC)
OK, the Swedish Wikipedia has not been refreshed and is still showing all the errors it has imported. A great deal of these errors are also in Wikidata, as eliminating them is very difficult.
        I don't closely follow CoL, and would be quite happy if it had never been published, but the errors are inescapable. The only error that is easily pointed at is the BIG ERROR, whereby the names of cattle, sheep, the goat, etc are wrong (disallowed by the ICZN: CoL has them wrong because ITIS has them wrong, ITIS has them wrong because MSW has them wrong, and MSW has them wrong because they were rushed by an oncoming deadline and they panicked). But an indicator of the degree of error can be found in the amount of homonyms: these can be likened to names that jump up and down shouting something wrong here, please take action. Any time I look (which is not often) I seem to see such homonyms. Of course homonyms are not the only errors, but they are easily visible: the tip of the iceberg. - Brya (talk) 06:45, 9 October 2016 (UTC)
  • @Brya: Can you give a specific example, as in "the current CoL entry for Hypolytrum aymatthii is wrong because ..." ? Aymatth2 (talk) 12:04, 9 October 2016 (UTC)
Like I said cattle (the CoL-entries "Bos taurus indicus Linnaeus, 1758", "Bos taurus primigenius Bojanus, 1827", "Bos taurus taurus Linnaeus, 1758" are wrong), sheep (the CoL-entries "Ovis aries Linnaeus, 1758", "Ovis aries aries Linnaeus, 1758", "Ovis aries orientalis Gmelin, 1774" are wrong), the goat (the CoL-entries "Capra hircus Linnaeus, 1758", "Capra hircus aegagrus Erxleben, 1777" are wrong), etc. In this case because the ICZN has ruled against them (see amongst others here). - Brya (talk) 12:35, 9 October 2016 (UTC)
It would be of more interest if they changed their mind in no label (Q21682705). --Succu (talk) 14:09, 9 October 2016 (UTC)
You are twisting facts. Certainly "the traditional approach of treating the wild goat as a sub-species [of the domesticated goat]" is a wild reversal of fact. MSW in its earlier editions deviated from a very well-established tradition among zoologists treating the domesticated animals as part of their wild predecessors, so several zoologists put in a formal case at the ICZN to put a stop to it. After allowing and evaluating input from zoologists across the world the ICZN decided to follow tradition and made this tradition mandatory for the animals enumerated in the case.
        It did not rule "that wild relatives of domestic animals should be named as if they were separate species," and it never would since whether or not a group of animals represents a taxon, and if so, if this taxon should be given the rank of species or subspecies is a matter of taxonomy, not of nomenclature. It is perfectly all right to recognise the wild and domesticated goat as subspecies, but the ruling is that these then must be named Capra aegagrus aegagrus and Capra aegagrus hircus. No way that Capra hircus aegagrus can be the correct scientific name of an animal. Not in this universe.
        Some of the authors of the book allowed themselves to get panicked by the oncoming deadline into perpetuating their defeated rebellion. It may be possible to feel sympathy for them, but that does not make them less wrong. The fact that there is a book that has these names wrong means very little. If somebody publishes a book that the earth is flat, or that 2 + 3 = 17, this does not make the earth flat, or makes 2 + 3 = 17. - Brya (talk) 14:52, 9 October 2016 (UTC)
  • @Brya: These controversies are very exciting. Do you have any other specific examples? Aymatth2 (talk) 15:13, 9 October 2016 (UTC)
There is a presumably large but indefinite number of cases. You have made it pretty clear that it is pointless to list any of them. - Brya (talk) 15:20, 9 October 2016 (UTC)
  • @Succu: @Kaldari: perhaps you could contribute examples. It is important to understand the issues we face with authorizing the import of data. It would not have occurred to me that the Smithsonian's Mammal Species of the World was a controversial or unreliable source. Are there specific examples of other types of problem with other Catalogue of Life sources? Aymatth2 (talk) 16:13, 9 October 2016 (UTC)
I do not use CoL. The point is to be careful when creating new items about taxa. And if possible double check them with a second reliable source. At least this is what I try to do. --Succu (talk) 16:19, 9 October 2016 (UTC)
@Aymatth2: All of those examples are relatively pointless as they are just showing that the CoL has outdated information and information from reliable sources that don't agree with other sources. The problems with the CoL are more substantial than that. Here is a better example. The CoL previously included the species name Modunda narmadaensis, which was imported to Swedish Wikipedia, and subsequently to Wikidata. The name Modunda narmadaensis originates with a self-published website that cites itself as the source of the name (with no other explanation). The name has never been accepted by any peer-reviewed source or the authoritative catalog for the family. It is purely the speculative opinion of one person on the internet who couldn't be bothered to write a paper about it (or lacked the evidence to do so). Same with Modunda pashanensis and numerous other examples. The Catalog of Life is only as good as the feeder databases that it pulls from, and in some cases it has pulled from very low-quality databases that are not reliable. Kaldari (talk) 20:53, 9 October 2016 (UTC)
To the CoL's credit, they have since deleted Modunda narmadaensis entirely, but it still exists on three different Wikipedias. Kaldari (talk) 20:59, 9 October 2016 (UTC)
It's Bianor narmadaensis (Q3150207). --Succu (talk) 21:37, 9 October 2016 (UTC)
  • @Kaldari: I was hoping for examples of problems with the current Catalogue of Life. Past errors are relevant, but errors in the present stable version would be more relevant. So far all that has been identified is the Smithsonian / ICZN difference on domestic animal names, which may be just a problem with publication dates – although that is a type of problem that must be recognized. Given the level of EU / US government funding, the contributors, curators and consumers of the data, one would expect very high quality – certainly higher than most specialist databases on other types of information from which we might want to import data. There is a trade-off between taking data from an aggregator like the Catalogue of Life, perhaps being selective about originators, and going direct to the originators. The aggregator provides a convenient single interface to a bot, with a single agreement for content reuse, and may add value by vetting the originators. On the other hand, they may somehow introduce errors. Do we have specific, current examples of errors in the Catalog of Life that might illustrate other types of problem? Aymatth2 (talk) 01:12, 10 October 2016 (UTC)
@Aymatth2: I don't have any examples of errors in the current Catalog of Life. In fact, I might be OK with importing data from CoL, if two conditions were met:
  1. The bot that imports the data also updates it once per year (or the maintainer provides source code for doing so)
  2. The updates support not only adding data, but also flagging items that may need to be merged or deleted (which should be done with human review)
FWIW, I personally support some of the more conservative taxonomy in the CoL (via Mammal Species of the World) but I know there are widely differing opinions on that. Kaldari (talk) 02:04, 10 October 2016 (UTC)
The problem is not with the taxonomy of Mammal Species of the World; I have no opinion whatsoever on their taxonomy (a matter of science), but with the fact that in some cases they use names that have explicitly been disallowed by the ICZN (the issue is not taxonomical, but nomenclatural, a matter of 'law'). - Brya (talk) 04:23, 10 October 2016 (UTC)
To illustrate, take a comparable case. There is a Google®; suppose there is a small company selling phones that decides to call themselves Google also, arguing that there can be no confusion since they are selling phones, not web-services. Google® takes them to court, and the judge, after hearing the case, rules that the small company may not use this name. CoL is like a phone directory which continues to use the proscribed name. (The difference is, of course, that Google® is a megabuck company that can enforce such rulings, while the ICZN has no direct means to enforce anything) - Brya (talk) 05:49, 10 October 2016 (UTC)
  1. An annual (or more frequent) update is essential for many bots that import data to Wikidata, whether the data is on taxa, galaxies, municipalities or shipwrecks. Even data that should never change will change as the sources make corrections.
  2. Part of the update process would be to flag items for merge or possibly deletion, although I would be inclined to keep dud entries flagged as obsolete rather than delete them altogether.
  3. After updating Wikidata, perhaps from several sources (e.g. Catalog of Life and IUCN redlist), there should be an extract of the data and then updates to the Wikipedias. en:Wikipedia:Village pump (idea lab)#Bot content with updates explores how we could let a Wikipedia article transclude text generated from Wikidata, picking up updates automatically, while also containing content written by editors.
  4. Our role should not be to decide on the "correct" data, but to record what reputable sources have said. Where there is dispute, we should record both versions. Thus we should be able to say that according to Lloyd's the Santa Isabella sank on March 5, while according to the Admiralty she stayed afloat until March 7. A Wikipedia article can report what reliable independent sources say about the difference.
I think something along these lines, and other concepts, need to be formalized as a bot-generated data policy, so we can ensure that bots follow good practice and give bot developers assurance that if they follow the policy their bot will be accepted. Aymatth2 (talk) 12:08, 10 October 2016 (UTC)
Before that we should define rules for users harvesting data from Wikipedias. It remains unclear what update process means. All Create, read, update and delete (Q60500) operations? Why should we to rely on CoL? We match scientific names against GBIF and EOL. Why bother with CoL? Are you aware of the gender problem? I'm updating IUCN conservation status (P141) for a while, but I never would create a taxon name (P225) based on data provided by IUCN. If I did not made substancial errors we have a complete mapping of MSW ID (P959) (=Mammal Species of the World (Q1538807)). Or more recently English common names (=taxon common name (P1843) prefered by IOC World Bird List, Version 6.3 (Q27042747). From time to time I try to close a major gaps in our species data. E.g. we had lots of genera of Foraminifera (Q107027) from eswiki, but not a lot of species. With the help of Fossilworks (Q796451) and World Register of Marine Species (Q604063) I changed this, but should we inform the Wikipedias about they missed them (adding redlinks to their genus article)? We should build a knowledge base of our own. Not copying data from aggregators. Options for taxa are Data paper (Q17009938) (examples) or exploiting papers in the TaxPub format. By the way: this would give us more references we lack. As would the use of ZooBanks nomenclatural acts (=ZooBank nomenclatural act (P1746)). --Succu (talk) 19:49, 10 October 2016 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── @Succu: I will try to respond to some of your points:.

  1. I assume there are already guidelines for importing data to Wikidata from Wikipedias. In the reverse direction, the Wikipedias will develop their own guidelines for importing data from Wikidata. They may decide to generate and later update stub articles that show what Wikidata says, which their editors can then supplement with descriptive text drawing on other sources.
  2. We should not rely on the Catalog of Life, or any other source. But it is a large, reputable database, and it will be useful to record the data that it gives – along with data from other sources. We should clearly identify that this is the Catalog of Life view, by using properties like "CoL synonym", "CoL distribution" or "CoL source". These values may or may not be the same as the equivalent values from other sources. If they are different, that is interesting.
  3. By "update process" I meant a process to synchronize Wikidata with a data source, so that Wikidata accurately reflects what the source now says. That could involve refreshing the values of properties like "CoL distribution", or nulling those values. A Wikidata item where all the properties have been nulled may perhaps be flagged for manual attention as "no longer used".
  4. If the IUCN gave a taxon name that was not found in other sources we could create an entry for it, showing that the name is used only by the IUCN. If possible the entry would point to a "correct" form. That would be useful for users who find the taxon name in IUCN and look it up in Wikidata. Assuming the IUCN later changed the taxon name, the update process would null the IUCN properties, but might leave the entry with its pointer to the "correct" form as a convenience to our readers.
  5. Almost all reference databases or books are aggregations of entries created by many individuals over a period of time. A Catalog of Life entry refers to an entry in the World Porifera database that refers to a 1932 Report on the shallow-water marine sponges in the collections of the Indian Museum, which draws on a description published in an 1885 scientific journal. At great effort we could go back to the 1885 publication, but would we be confident that it was up to date? The aggregator adds value by selecting and vetting sources. They will make mistakes, but should correct them when they are found. The 1885 journal will not correct its mistakes.

Does the above make any sense? The basic concept is that our role should not be to decide on the "correct" data, or to build a knowledge base of our own, but to record what reputable sources have said. We must accept that the scientific community will place more importance on correcting errors in the Catalogue of Life's feeder databases than on correcting errors in Wikidata, and establish a mechanism so we automatically pick up those corrections. Aymatth2 (talk) 13:31, 11 October 2016 (UTC)

Catalogue of Life focuses on being inclusive and might include some data we don't want. The views of a 1885 journal are notable in a way that views from an UGC website aren't. You also claim that Catalogue of Life is reputable without linking to any expert in the field making such a statement and speaking about it's data quality. ChristianKl (talk) 13:26, 14 October 2016 (UTC)
We disagree. I wrote „we should build a knowledge base of our own“ and showed some points how this could be achived. There is no need for another CoL clone. -Succu (talk) 16:03, 11 October 2016 (UTC)
  • Importing and maintaining views of data from the Catalogue of Life, IUCN, BirdLife International, etc. does not prevent us from independently building and maintaining a knowledge base where we have the resources. Importing data adds layers of information and differing viewpoints to the knowledge base. It is not a competition. We can do both. Aymatth2 (talk) 23:10, 13 October 2016 (UTC)
I don't think anybody here spoke against importing IUCN data. I don't see why you still treat it as being in the same category as the Catalogue of Life. It makes me feel like you aren't trying to understand the views of other people but just try to convince people to accept Catalogue of Life or GeoNames data.
Apart form data quality issues the legality of importing Catalogue of Life is also questionable. It's an EU database project (with means it has Sui Genesis in Europe) and it says at it's own website that it not only requires attribution but also noncommercial usage. ChristianKl (talk) 13:26, 14 October 2016 (UTC)
  • Data is not subject to copyright protection. The statement that "cockspur is a common name for Castela erecta" cannot be copyright protected. The API to embed a Catalogue of Life entry in a webpage may only be used for noncommercial purposes, or by permission. Regardless of our legal rights, we would certainly obtain permission before extracting the data. Aymatth2 (talk) 14:34, 19 October 2016 (UTC)
About what „layers of information” are you talking, Aymatth2? What kind of knowledge provided by CoL helps us to be more trustable? --Succu (talk) 21:50, 14 October 2016 (UTC)
  • The Catalogue of Life holds basic information on more than 1.6 million species, sourced from reputable data providers like the Smithsonian, Kew, etc. and naming the sources. It gives the accepted scientific name, synonyms, infraspecific taxon, common names, classification (genus, family etc.) and distribution. Other sources may disagree with the view given by the Catalogue of Life data provider, as with the Smithsonian vs. the ICZN on domestic animals. Showing the differing views of reputable sources would give us more credibility than suppressing any views that we disagree with. Aymatth2 (talk) 14:34, 19 October 2016 (UTC)
It's a competition about who provides the best set of Linked Open Data (Q18692990), Aymatth2. E.g. providing structured data links to Taxon authorities like Carl Linnaeus (Q1043) and for example about Canna (Q161182) classified in his first edition of Species Plantarum (Q849308). CoL gives this „information“ about the genus. --Succu (talk) 21:13, 23 October 2016 (UTC)
  • @Succu: I think I am missing the point here. What Carl Linnaeus said if interesting, of course, but new information may have emerged in the last 250 years. With Canna indica the Catalog of Life is copying from the Kew entry. Is that inaccurate? Do you feel that all "information" from the World Checklist of Selected Plant Families (Q8035497) should be purged, suppressed and disallowed from Wikidata? Aymatth2 (talk) 21:48, 23 October 2016 (UTC)
CoL is a tertiary source, Aymatth2. World Checklist of Selected Plant Families (Q8035497) as a curated secondary source is much better. A link to a primary source somes times needs clarifications (e.g. gender, spelling). The point is Linked Open Data (Q18692990). --Succu (talk) 22:14, 27 October 2016 (UTC)
Canna indica, aka C. achiras, altensteinii, amabilis, ascendens, atronigricans etc.
  • @Succu: Perhaps you are missing the point. The Catalog of Life simply mirrors selected sites, providing a convenient standard interface. The Catalog of Life is not a tertiary source: it is not a source at all. It is a standardized interface to a number of sources. Why would we not take advantage of that standardized interface? Aymatth2 (talk) 00:27, 28 October 2016 (UTC)
The point is that we have a very restricted view on what is "right". Many scientifically correct names are excluded even though there is a publication to back them up. What we do is only include what some consider as current and many old well known names are lost in this way. The problem with CoL is that it is not a source that fits in with this narrow vision. Given that what we include as a taxon is wrong by definition, the ongoing argument is not about science but about assumptions that are not part of what is a taxon. It always includes an author and a publication. Given that there are many autonyms that are not the same it is proof that the current approach is wrong more than CoL. Thanks, GerardM (talk) 09:15, 24 October 2016 (UTC)
We don't. BTW: An Autonym (Q1837887) is not a homonym (Q902085) and we have only minor problems with them. An exact page reference is essential in modern biological nomenclature (Q522190). taxonomy (Q8269924) is subjective judging. One of our ongoing tasks is to model the relationship between different taxonomic opions. CoL provides his own taxonomic opinon. Why should we restrict us to this opinon, GerardM? --Succu (talk) 22:14, 27 October 2016 (UTC)
  • @Succu: We should certainly not restrict ourselves to the opinions provided by the Catalog of Life sources. If the entry for Canna amabilis (Q5032557) says Canna amabilis is a species that may be a legitimate opinion, although citations would be helpful. Others would say it is a synonym of Canna indica (Q163559). Wikidata should be able to handle these divergent opinions. Aymatth2 (talk) 00:27, 28 October 2016 (UTC)

Bot generated data (break4) @Lsj[edit]

Mind to comment User:Lsj? --Succu (talk) 21:37, 9 October 2016 (UTC)

Bot generated data (break5)[edit]

The deletion log at svwiki today shows a set of deleted bot generated articles. (University-articles only describing the geography around a building.) The opposition against the GeoNames-based articles on svwiki is increasing. -- Innocent bystander (talk) 17:26, 15 October 2016 (UTC)

Sounds like good news to me. --Succu (talk) 22:36, 15 October 2016 (UTC)

value sorting according to qualifier[edit]

Is there a way to sort values of property according to their qualifiers? I am interested in the browser view of item data.

To be specific - ascendent/descendent sort of elo ratings according to their date qualifier, not date added (see Vereslav Eingorn (Q2062580) for a chess player that has elo ratings unsorted). Thanks. --Wesalius (talk) 07:28, 18 October 2016 (UTC)

No, currently it's not possible. There should be a phab ticket about it, which I can't find at the moment. --Edgars2007 (talk) 16:25, 18 October 2016 (UTC)
Might get possible - see User_talk:Seb35#sortValues_modification. --Wesalius (talk) 04:41, 25 October 2016 (UTC)

Upload a dataset to wikidata[edit]

Flemish art collections, Wikidata and Linked Open Data - Manual 20160331.pdf

Hello, I am beginner of Wikidata editing, and maybe my question about uploading data (as opposed to inserting "items" one by one) is answered somewhere, but I could not find it.

I am currently having a dataset of around 400 Latvian rockbands, their participant names and the role in the band, (musical instrument they are playing), that I have collected manually. I would like to upload that data set to Wikidata, so an embedded network graph could be made trough Wikidata query, and users could add and contribute to that graph trough Wikidata.

Is there a way by using my dataset to:

  1. -upload a list of bandnames as "items", being "instance of: bands" with "country of origin: Latvia"
  2. -upload a list of musician names as "items", with "instance of: human" and for example "instance of: bassist", "instance of: female"
  3. -upload a list of band-musicion pairs, creating for each of the bands "has part: (the musician name)"

Thank you if someone has the time to answer.  – The preceding unsigned comment was added by LinardsLinardsLinards (talk • contribs).

  • QuickStatements might help you. The pdf on the right side has an introduction.
    --- Jura 22:46, 19 October 2016 (UTC)
  • Big thanks Jura for the pdf. As I see to do this I should request a volunteer for creating a bot. Am I correct?
    --- LinardsLinardsLinards 01:58, 20 October 2016 (UTC)
Apart from what Jura already said, please take care to add your source when you add the data set. ChristianKl (talk) 20:03, 20 October 2016 (UTC)
Hi @LinardsLinardsLinards: — you might be interested in checking out the Wikidata:WikiProject_Music documentation of useful properties. For instance, "has part" is not always the best answer for bands; it's usually preferable to put "member of: (band name)" on the person, instead of the other way around. Sweet kate (talk) 23:26, 26 October 2016 (UTC)

New wikiproject Chess[edit]

In case you would like to participate - WikiProject Chess has been created. --Wesalius (talk) 09:44, 21 October 2016 (UTC)


Hello, I've seen a lot of elements about paintings without articles. Now I'm doing complete work about one russian painter on wikimedia (articles on wikipedia, uploading texts on wikisourse and paintings and drawings to commons). For each painting I create it's own element and put a link on it to wikipedia list-article about artist's paintings and to artwork template on commons. And now I've just read rules, that such elements couldn't be on wikidata, so all these elements will be deleted and I should stop creating them? Look for example here: Writing Desk (Q27493192), c:File:Writing_Desk_(Rozanova,_1914).jpg, w:ru:Участник:Stolbovsky/Список работ Ольги Розановой. --Stolbovsky (talk) 13:03, 21 October 2016 (UTC)

@Stolbovsky: I would think any artwork that is for example listed in a catalog or otherwise known to the world should qualify as notable according to WD:N criterion 2 - "It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references." What "rules" were you reading that suggested otherwise? ArthurPSmith (talk) 13:09, 21 October 2016 (UTC)
(EC) Which rules, @Stolbovsky:? Wikidata:Notability says in its point 2 that an item is notable if It refers to an instance of a clearly identifiable conceptual or material entity. The entity must be notable, in the sense that it can be described using serious and publicly available references.. By this, I'd consider the artworks of a notable artist to be notable & hence would welcome such items. --Tagishsimon (talk) 13:12, 21 October 2016 (UTC)
Sorry guys, someone in ruwiki told me wrong thing, and then I've read the wrong way phrase if it meets at least one of the criteria below. I've read it as: if it meets ALL the criteria below. For a minute I was quite dissapointed. --Stolbovsky (talk) 13:18, 21 October 2016 (UTC)
You may also want to check Wikidata:WikiProject sum of all paintings--Ymblanter (talk) 13:42, 21 October 2016 (UTC)

Petscan Widar advice, please[edit]

I have a couple of Petscan reports. The first, #562208, looks for wikidata items with no corresponding en.wikipedia article, and offers me the opportunity to use (as I understand it) Widar to populate wikidata - e.g. the first three edits on "Phi Un-hui"

The second #565303 finds 13 philatelists with wikidata items. Petscan does not seem to be offering a Widar option.

1. Can I / how can I use Petscan to add philatelist (Q1475726) to the wikidata records for these 13 people as a occupation (P106) (beyond the syntax, if Widar was available, of P106:Q1475726)?

2. Supposing a subset of the 13 have philatelist (Q1475726) as a occupation (P106) value already ... if I have not included a check for this in Petscan, and there is a Widar addition method, would the method cause a second instance of philatelist (Q1475726) to be appended as a occupation (P106) value in the wikidata record?

thanks --Tagishsimon (talk) 14:46, 21 October 2016 (UTC)

1) Select "Wikidata" as "Use wiki" in "Other sources" tab.
2) It shouldn't. Anyway, it's not a big issue, there is a bot, who cleans-up dublicate statements. --Edgars2007 (talk) 15:35, 21 October 2016 (UTC)
Thanks Edgars2007, worked, appeciated. --Tagishsimon (talk) 15:42, 21 October 2016 (UTC)

universe (Q1)[edit]

universe (Q1) seem to have a lot of strange properties related to number 1. It seems wrong to me, does that make sense to anybody?--Jarekt (talk) 02:49, 22 October 2016 (UTC)

It was a clear case of vandalism and it has been reverted by Mahir256 (talkcontribslogs). Mbch331 (talk) 05:57, 22 October 2016 (UTC)
Thanks --Jarekt (talk) 13:39, 25 October 2016 (UTC)

statistics on items and their statements[edit]

Hoi, I blogged about the number of items with no statements and items with more than 10 statements.. The news is good :). Thank you all, GerardM (talk) 06:12, 22 October 2016 (UTC)

@GerardM: Can you add the header to the data? For me isn't clear what the numbers meaning. --ValterVB (talk) 08:08, 22 October 2016 (UTC)
yep, very good news, but please add headers... nothing says which column is which :) --Hsarrazin (talk) 10:25, 22 October 2016 (UTC)
It is a screen shot. Thanks, GerardM (talk) 13:13, 22 October 2016 (UTC)
So the image is useless, can you explain it here or in the post? --ValterVB (talk) 14:56, 22 October 2016 (UTC)
No it is not, it shows all the pertinent data. The article includes a link to the statistics and you could find the data there. So do your best and do not whine. Thanks, GerardM (talk) 21:46, 22 October 2016 (UTC)
Too kind. --ValterVB (talk) 06:37, 23 October 2016 (UTC)
  • The same data can be seen on the daily . I don't agree with the interpretation given. While the number of items with zero is getting lower, I don't see why it can't be closer to zero. The picture looks even better on a per site or per project level: Wikidata:Database_reports/without_claims_by_site.
    Numbers look bad for some Wikisource sites and they have an impact on the overall numbers. However, many items for Wikisource pages are unlikely to get a lot of statement, maybe 2 or 3. Many of these could easily be filled by bot, so it might not matter that much if they don't have any statements.
    Even for items with sitelinks to Wikipedia, many wont get more than 1 statement (categories, disambiguations, templates). So I don't quite see the point of comparing items with 0 statements to the one with many statements.
    --- Jura 17:21, 23 October 2016 (UTC)
  • <grin> I am sure this is not the same data </grin>. You only provide a graph that only shows some data and not from the beginning of Wikidata. Your interpretation is wrong. The point is that the trend is such that for the first time there are more items with loads of statements than items with no statements at all. That is it. Thanks, GerardM (talk) 09:37, 25 October 2016 (UTC)

QuickStatements not supporting area (P2046)?[edit]

I cannot get to use QuickStatements to add area (P2046) with decimal numbers as well as the unit. Is there a format to use? This format makes an error:

Q131870 P2046 +289.20

The area for item Q131870, for example, does not save as 289.20, even whether +289.20 or "289.20" is used. Plus adding the unit "square kilometre" beside it still displays the error.

Sanglahi86 (talk) 10:45, 22 October 2016 (UTC)

@Sanglahi86: currently it's not possible to add this information via QuickStatements. --Edgars2007 (talk) 13:20, 22 October 2016 (UTC)
Thank you for the info. Is there an alternative tool that could be used to add this information in several items in batch? Sanglahi86 (talk) 13:46, 22 October 2016 (UTC)
The easiest is to ask it at WD:BOTREQ. --Edgars2007 (talk) 13:55, 22 October 2016 (UTC)

Force-directed graph template announcement[edit]

Based on Wikidata Graph Builder ideas and Graph Extension, I've constructed a new template {{Force-directed graph}} for building graphs using SPARQL queries. Queries should be compatible with #defaultView:Graph queries in WQS and Wikidata Graph Builder. I've also constructed 2 helper templates for most common scenarios (building taxon trees, family trees, subclasses/superclasses trees, administrative units tree, etc.).

{{Forward graph|Q515|height=600|width=600}} {{Reverse graph|Q515|height=600|depth=2|width=600}}

--Lockal (talk) 17:15, 22 October 2016 (UTC)

Ideally two things need to be changed in this template:
  1. Click handler for text (opening the relevant items is possible by doubleclicking the nodes, which is not intuitive)
  2. Handle static image renderer either by setting "interactive": false just for Graphoid or at least by overlapping the graph with white rectangle with "Enable the interactivity to see the graph". Probably not possible with current codebase.
--Lockal (talk) 17:15, 22 October 2016 (UTC)
Nice. # Language can't be changed. --AVRS (talk) 08:44, 23 October 2016 (UTC)
Added user language to query, now graphs are localized. --Lockal (talk) 09:54, 23 October 2016 (UTC)
Thanks. Could it use longer lines when given more space? The width and height seem to only affect the field size and where the center is. --AVRS (talk) 10:23, 23 October 2016 (UTC)
Hi, pretty cool. Something to say however : it's fun to see a graph stabilizing, I could do this a lot, but in a real life application we don't really want to see this. Could it be possible to compute something not to bad "off-the-record" and show it only then ? (Except if explicitly asked for of course) Interactivity is very useful however of course later on to try something else. author  TomT0m / talk page 18:45, 23 October 2016 (UTC)
Good point. This could also lead to similar looking graphs for all users at all times. At the moment it equilibrates to a completely different arrangement each time I load the data, which makes it a little complicated to get used to the graph structure and difficult to discuss with other users. (Apart from that: it’s really cool to have this functionality available onwiki!) —MisterSynergy (talk) 20:12, 23 October 2016 (UTC)
I've added mode parameter for rendering static images (nodes are not clickable), semi-interactive (default mode: nodes are doubleclickable and draggable without animation) and interactive (nodes are doubleclickable and draggable with animation). However node positions converge to different values on each new recalculation. Feature request in Vega was closed, because d3 does not support this, however there are ideas on stackoverflow. I'll experiment with them. --Lockal (talk) 21:41, 23 October 2016 (UTC)

Merge or not[edit]

Should Electra (Q1325803) and Elektra (Q217340) be merged or not? I can't think of any solid reasons to support either point of view. --EncycloPetey (talk) 21:59, 22 October 2016 (UTC)

Of course not. It's not even possible to merge those pages. --Stryn (talk) 22:03, 22 October 2016 (UTC)
But WHY not? I'm looking for a reason to merge or not, not simply the fact that the two pages are a mess to be cleaned up. Consider: Why are these items separate? What is the difference between them? I can come up with no good answer to that question. --EncycloPetey (talk) 23:29, 22 October 2016 (UTC)
And why yes? Disambiguation items on WD are word-based, not meaning-based. So there is no mess in having two items in this case, the only a bit tricky question is the item for non-latin based scripts. --Jklamo (talk) 00:06, 23 October 2016 (UTC)
It's not just a question of non-Latin. Where would French Électre go? Czech Élektra? Are disambiguation pages to share a common data item only if they have the same written form? And what if the disambiguation pages themselves include forms in addition to the one used for the title? The English Wikipedia disambiguation page includes forms spelled both Electra and Elektra.--EncycloPetey (talk) 00:34, 23 October 2016 (UTC)
If you want Interwiki between Electra and Elektra and Électre and Élektra you can do that by the help of templates on Wikipedia like Interwiki extra (Q21286810) The main purpose of Wikidata is not to provide non-stringent connections. -- Innocent bystander (talk) 08:36, 23 October 2016 (UTC)
But it isn't just Wikipedia. There is also the English Wikisource and the Polish Wikisource, and (potentially) every Wikisource project too. Are you suggesting that, for these two data items, we have to "solve" the problem by replicating that template to every Wikipedia and Wikisource in every language and populate them with values on each and every MW project? Instead of simply working out the problem here with two data items?
And this still doesn't answer my original question: Why is this set up with this particular two data items? Why not just one? Why not more? I want to know what Wikidata is trying to do with these disambiguation data items, not what it's not doing. Under what conditions are two disambiguation pages added to the same data item? Because I don't see any rationale, much less a criterion, being given by anyone yet. The implication is that it's by precise spelling of the page title, but that isn't what's going on here, since we have more than two page titles. And further, the content of the pages does not match the page titles: some of the pages with one form of page title include information that would be expected for the other page title. But more often, there have a mix because the content was sorted in the local languages by pronunciation and near-spelling, and not by spelling or written form. So to impose an external assumption that the pages are strictly about word form would be to miss the point entirely. --EncycloPetey (talk) 11:30, 23 October 2016 (UTC)
For disambiguation item Electra is different from Elektra, in fact de.wikik has two elements for this. You can read mor about disambiuguation in our Wikiproject --ValterVB (talk) 12:03, 23 October 2016 (UTC)
(edit confliect) Yes, de has two disambiguation pages, one for Electra and one for Elektra, but the English Wikipedia has a single page that covers both spellings. Your project doesn't address this issue or the other issues I raised. For example, the project allows for transcription, but doesn't solve the question of whether אלקטרה should go with Electra or with Elektra. Either transcription is possible depending upon the target language. So the only reason Polish Elektra is not placed with English Electra is that in Polish the letter "c" doesn't make the right sound. And in any event, doesn't this promote western European bias, since Electra and Electra are both transcriptions of the same Greek word Ηλέκτρα? --EncycloPetey (talk) 13:07, 23 October 2016 (UTC)
Bonny and Clide strikes back. A problem for WD:XLINK. author  TomT0m / talk page 18:46, 23 October 2016 (UTC)
You also have to be aware of that Wikisource have (at least) two kinds of disambiguation-pages. If there is disambiguation page only devoted to "Electra, a play by Euripides" (different versions of that text) it should be linked to the item about that work, not to any other disambiguation-page. The exception to that is if there is more than one Wikisource-project that have such a disambiguation-page. The Bible can for example be found in more than one version in many WS-projects. -- Innocent bystander (talk) 13:03, 23 October 2016 (UTC)
I am aware of this difference, but the English Wikisource disambiguation page I talking about is for different works with the title "Electra", not for the play by Euripides. There is a play by that title by Euripides, one by Sophocles, as well as many articles with that title, and so we have a disambiguation page. You assumed incorrectly what I meant without looking. --EncycloPetey (talk) 13:07, 23 October 2016 (UTC)
@EncycloPetey: It would seem that it is because that is the rule based on the rule "that they exist in that form". WD lists them as they are notable at the respective wikis, and that is enough, it does not need to discriminate against them in that regard. Similarly to how the wikisources do have their varieties of disambiguation pages for standard disambig, version disambig, and translation disambig, each of those would appear here (so 3 disambig pages) even though at WP they may just be the one encyclopaedic page, or one disambiguation page. For enWS we would link to the corresponding spelling here. Noting that for a versions page, I would normally link that to the WD item about the work, as they do align. It is still messy with WSes with their editions, and the WPs with the works.  — billinghurst sDrewth 22:45, 23 October 2016 (UTC)
@billinghurst: We're going off on a tangent here. This does not address the original question: What is the rationale behind the way the links at Electra (Q1325803) and Elektra (Q217340) are divided? No one has yet offered an answer to that question. --EncycloPetey (talk) 23:04, 23 October 2016 (UTC)
That has been addressed IMO. Different spelling. If enWP, or anyone creates two disambig pages with alternate spellings, each needs a home, and that can only be at the alternate spellings.  — billinghurst sDrewth 23:11, 23 October 2016 (UTC)
@billinghurst: If the criterion is precise spelling of the page name (and not the contents), then why are there links with non-Latin page titles included? --EncycloPetey (talk) 23:18, 23 October 2016 (UTC)
<shrug> I can explain why there can be the need for two disambiguation pages for similar terms, explaining linking and how it occurs is an art rather than a science. Best guess is all I can say for some of the choices. It is still better than how some of the WPs choose the home person and then disambiguate <duck, run, dink>  — billinghurst sDrewth 05:52, 24 October 2016 (UTC)


I am trying link a page from the English Wikiversity in Greek and I can't. Please fix the problem.--Πανεπιστήμιο (talk) 08:08, 23 October 2016 (UTC)

@Πανεπιστήμιο: Please be more descriptive - which pages you were trying to connect and what error message did you get? --Edgars2007 (talk) 08:21, 23 October 2016 (UTC)

Hi @Edgars2007:! I tried link page Greek language in Τμήμα:Ελληνικά. I get this message «Error: $1 You have attempted to add the name of a language as the label/description/alias of an item. Please see Help:Label, Help:Description or Help:Aliases for information on proper item descriptions. Press save again to save your edit.»--Πανεπιστήμιο (talk) 08:31, 23 October 2016 (UTC)

What's so hard about "Press save again to save your edit"? Sjoerd de Bruin (talk) 10:39, 23 October 2016 (UTC)
Sounds like you got caught by an abuse filter or something as you tried to add a "Ελληνικά". Ok, I was correct, see the log. Anyway, now created by some user at (Q27514826). --Stryn (talk) 12:55, 23 October 2016 (UTC)
Hm, why new item? I've merged'em. --Infovarius (talk) 21:03, 27 October 2016 (UTC)


We are using for example

< Apollon Limassol (Q2858459) View with Reasonator See with SQID > instance of (P31) See with SQID < association football club (Q476028) View with Reasonator See with SQID >

. Do you think is better to have an item or a qualifier to show if a club is a women football team? Xaris333 (talk) 18:27, 23 October 2016 (UTC)

There is this old “a club is not team” problem. Do we have a solution for that? Within a club you can have men’s and women’s teams, so association football club (Q476028) is okay. But if the item is about a team (and the connected articles also describe the team), then it is a good idea to use separate items for P31. Unfortunately the articles typically describe the club (by the title), but the majority of the content deals with the (men’s) association football team. —MisterSynergy (talk) 20:05, 23 October 2016 (UTC)
Hmm, I wanted to get solution for mentioning gender in sport's teams, disciplines etc. items. I have no problems in adding "P31: women's football team", but on item about "women's football team" I should mention female (Q6581072) somehow. sex or gender (P21) is not appropriate.
< women's football team > subclass of (P279) See with SQID < association football club (Q476028) View with Reasonator See with SQID >
of (P642) See with SQID < female (Q6581072) View with Reasonator See with SQID >
? P642 as qualifier, of course... And what to do here - Athletics at the 2016 Summer Olympics – Women's heptathlon (Q26234145)? --Edgars2007 (talk) 08:15, 24 October 2016 (UTC)
We have competition class (P2094) which I find difficult to use correctly. The best would be to define a structural item for the competition class “open women’s association football” with instance of (P31) competition class (Q22936940), and to use it then with competition class (P2094) on the item which classifies for this class.
The problem would then shift to a proper definition of these competition class items. If we consider female (Q6581072) as an abstract competition class, one could also use competition class (P2094) here within the competition class item, maybe with a qualifier criterion used (P1013) gender (Q48277). Additionaly, it could have competition class (P2094) open (Q2735683) with qualifier criterion used (P1013) age (Q185836) and competition class (P2094) association football (Q2736) with qualifier criterion used (P1013) sport (Q349). Properly set up we’d have a couple of competition classes for each type of sport, and a structure which could really nicely be queried. —MisterSynergy (talk) 09:06, 24 October 2016 (UTC)
Sounds very nice. Thanks! --Edgars2007 (talk) 09:30, 24 October 2016 (UTC)
It seems to me that we would better have something whose subject is the competition class itself as the property association football (Q2736) (from the example) seem to link a participant class to a competion class - which would force us to create something like a "junior male football player" class to link to "junior association football male competition". It may be smarter to create a property "admissible participant type" with stuff such as
< junior association football male competition > admissible participant type search < men >
< junior association football male competition > admissible participant type search < licenced association football player >
< junior association football male competition > admissible participant type search < junior aged >
for example - but only something minor. Maybe a little smarter would be to use the same statement for criteria that goes together and separate statements if several classes of players are allowed - for example for a competition in which licenced dart-player children and their (not dart player) grandparents are allowed we could use
  • subject > admissible participant type search < values in qualifier >
    of (P642) See with SQID < child >
    of (P642) See with SQID < licenced dart player >
  • subject > admissible participant type search < values in qualifier >
    of (P642) See with SQID < grandfather >
where it's understanded that someone that meets the criteria of one of the statement at least can participate. (Unsigned contribution by TomT0m 19:17, 24 October 2016‎)
Whew, what a comment! Frankly, I don’t understand it. Three questions:
  • Why would we be forced to have occupations such as "junior male football player"? competition class (P2094) as outlined could be used on event items and on team items, not on player items.
  • Which items should get this "admissible participant type" property?
  • Do you have any events in mind which have complicated admission requirements as in your child/grandparents example? We typically deal with open age or junior age class events on a very high level here…
I have already tried to set up a structure as proposed above a while ago in the field of rowing, and I think it works nicely. Unfortunately I did non come to the point of intense usage of the structure (basic work on person items is more important at this point), but details are outlined here: Wikidata:WikiProject Rowing#Competition classes. —MisterSynergy (talk) 18:38, 24 October 2016 (UTC)

Negative reference[edit]

Here, I have stated that this entity no longer is a vacation area (Q10499251) from 2005-12-30, since it is missing in the report by Statistics Sweden describing that year. To be clear, that reference does not explicitly says that Slottshagen has lost its status. It is the fact that it isn't mentioned in that report of Statistics Sweden, that makes me come to that conclusion. Could that be described in a better way? This interactive map confirms my conclusion, but that page is not very helpful if you do not know how to use that page and knows exactly what you are looking for. -- Innocent bystander (talk) 08:48, 24 October 2016 (UTC)

Nominated for X[edit]

Is there any way to model that a person has been nominated for a prize? Concrete use case: for the Nobel prize one can get historical nomination data until 1963 (there's an enforced gap of 50 years). I think it could be interesting to have this in Wikidata as well, to answer questions like "how many times has X been nominated before winning?" or "find person who has been nominated many times without actually winning". – Jberkel (talk) 09:00, 24 October 2016 (UTC)

Use nominated for (P1411). See example at Meryl Streep (Q873). --Edgars2007 (talk) 09:03, 24 October 2016 (UTC)
Thanks. I just ran a query and only 17 results for "X nominated for Nobel Prize of Literature", I'll see if I can complete this with the official data from the Nobel archive. – Jberkel (talk) 17:19, 24 October 2016 (UTC)
I've added the nominations for literature from 1901-1965. To answer my own question, query: many times unsuccessfully nominated for literature Nobel prize. The winner is Ramón Menéndez Pidal (Q381953) with 23 nominations. – Jberkel (talk) 09:44, 26 October 2016 (UTC)

What to do with plurals?[edit]

I've found many items with (Portuguese) aliases which are just the plural form of the label, probably created by a bot based on existing redirects. Shouldn't these be removed? Helder 10:12, 24 October 2016 (UTC)

This should be specify in Help:Label and Help:Aliases. If not we should define that label and aliases have to be in singular form. Snipre (talk) 14:36, 24 October 2016 (UTC)
Plural form is usually OK for "list of smth...". --Infovarius (talk) 21:06, 27 October 2016 (UTC)
  • I don't see why they would be problematic. It would be interesting to see if there are cases where plural and singular get different items.
    --- Jura 09:01, 28 October 2016 (UTC)

Wikidata weekly summary #232[edit]

+- workaround[edit]

See this. If I add "156±30", the result is "160±30", but in editing form I see what I actually entered. I know, why it's happening, but is there some workaround to have right values in GUI? Except changing uncertainty part, of course. --Edgars2007 (talk) 10:37, 24 October 2016 (UTC)

I don't know of a way to make it stop doing that. It probably won't be possible until phab:T95425 is fixed. - Nikki (talk) 11:42, 24 October 2016 (UTC)
Nope, despite multiple threads here and on the dev's noticeboard and several related phab tickets going back months at least, there is currently no fix or workaround. @Lydia Pintscher (WMDE): what is the current status of work on this? Thryduulf (talk) 12:11, 24 October 2016 (UTC)
Ok. --Edgars2007 (talk) 15:49, 24 October 2016 (UTC)
A patch is prepared. I need to do the announcement writeup with some explanations and we need to deploy it. Hope to get to it after my vacation. I'll be back on the 3rd. --Lydia Pintscher (WMDE) (talk) 14:26, 25 October 2016 (UTC)

Notability of chess players[edit]

In my opinion, untitled players are not notable for wikidata, only Grandmasters, International Masters and FIDE Masters (and the analogous women titles). Players without titles are mainly hobby players, and with ratings down to 1200 Elo, every young adult could have such a rating and hence a FIDE rating card, there is nothig special about it. We would flood wikidata with thousends of completely irrelevant people. Other opinions? Steak (talk) 14:25, 24 October 2016 (UTC)

No one (including me) is aiming to flood wd with thousands of completely irrelevant people. I created items for players with elo above 2300. That is all, I am not creating items for players below that elo (as you are implicating). --Wesalius (talk) 14:29, 24 October 2016 (UTC)
Ok, maybe this was a misunderstanding. However, we should clarify this to avoid the flooding by someone else :) 2300 seems arbitrary, I still would suggest to limit ourselves to titled players (espcially because the rating of women is in general lower than for men). Steak (talk) 14:30, 24 October 2016 (UTC)
How many titles are there for women alone? -- Innocent bystander (talk) 14:35, 24 October 2016 (UTC)
Woman Grand Master, Woman International Master, Woman FIDE master (and Woman Candidate Master, but I would neglect this one because its in some points different from the others). Steak (talk) 14:37, 24 October 2016 (UTC)
Holding a title, not only having elo above some treshold is probably a good "criteria of notability" of chess players. If others agree to this then I have to admit I created some items with quickstatements for players that are not notable by this criteria, since I just took the elo of 2300 as a treshold for notable players (wrong interpretation of w:en:FIDE_titles#FIDE_Master(FM). I did it in good faith, no vandalism/flooding intended. --Wesalius (talk) 18:40, 24 October 2016 (UTC)

What's the data source for making these statements? What kind of information does the source give about the relevant people? Without looking at the data sources that exists it's hard to say which people can be described by "serious and public sources" (our standard). ChristianKl (talk) 20:16, 24 October 2016 (UTC)

The source is here. --Wesalius (talk) 04:42, 25 October 2016 (UTC)
I propose this: extending possible values of title of chess player (P2962) according to consensus, then importing title of chess player (P2962) claims from databases of the federations that give the titles which are possible values of title of chess player (P2962) and then running a query to get chess players that dont have any title of chess player (P2962) value set to find not notable chess players items and mark them for deletion. What do you think? --Wesalius (talk) 07:43, 25 October 2016 (UTC)
This sounds very good! But of course, not every non-titled player is "not notable", especially before 1950, but there may be also later notable players which didnt get a title. Steak (talk) 08:48, 25 October 2016 (UTC)
How do you want to check whether a person who's listed in that list already exists in Wikidata? I don't think that there's an existing data base that links the Chess ID number to VIAF numbers or other authority control and the name and the country (Federation) might not be enough data to uniquely identify many people. If a professor John Smith who's notable for being a math professor also plays chess, how do you tell whether the John Smith in your list is that professor?
If the math professor John Smith is a Chess grandmaster that might be public information that can be used to link the items and therefore I would consider being a grand master a sufficient criteria. I would expect that an Elo score of 2300 is not enough to create the matches in most cases and therefore not use it as criteria. Do you think it provides the necessary information to know whether people with the same name are the same person? ChristianKl (talk) 17:19, 25 October 2016 (UTC)
Check these GM, FM, IM, WGM, WIM, WFM mixnmatch catalogs. There is not too many people that the tool automatched for confirmation by the end user. So I am not really sure what your question is, since just above your post I proposed matching according to titles and not elo 2300+ itself. --Wesalius (talk) 18:04, 25 October 2016 (UTC)
Do you think all the items you created can be matched based on the available information with exiting Wikidata items? ChristianKl (talk) 18:32, 25 October 2016 (UTC)
Can you be more specific? I find your questions hard to understand. What matching are you talking about? I created articles about chess players, some of them are notable, some probably not. Then I proposed a way to defining notability of chess players and way how to filter out those that may be in the category of "not notable, therefore candidates for deletion". What existing items to be matched are you talking about? --Wesalius (talk) 18:41, 25 October 2016 (UTC)
@ChristianKl: This matching can never be 100% sure. Also in wikipedias in can happen that an article to e.g. a historian exists, an article to a chess player with the same name is created, and several years later it turns out that both persons are one individuum. We cannot exclude such cases, but we can do our best to avoid them. Steak (talk) 19:47, 25 October 2016 (UTC)
I'm not calling for 100% certainty. Could you estimate the certainty that you believe you have? ChristianKl (talk) 19:52, 25 October 2016 (UTC)
I cannot quantify an (un)certainty because I dont know how other users match. I match only if the base data are identical (country, date of birth) and article in some wikipedia exists where the person is described and the FIDE profile is already linked. Steak (talk) 19:55, 25 October 2016 (UTC)
How is this related to the notability of chess players? I cant tell what will be the base of potential decision in matching performed by others, therefore any certainty you are asking for in possible matching is just guessing... --Wesalius (talk) 20:36, 25 October 2016 (UTC)

How do you track the items (data) that you have subscribed in your watchlist?[edit]

Hi folks!

I would like to ask a quick question :-)

How do you keep track the huge amount of items (data) that you have subscribed in your watchlist? My assumption is that, because you subscribe to a lot of items, it will generate a long list of item notifications to your watchlist.

Thanks for your answer! --Glorian Yapinus (WMDE) (talk) 15:59, 24 October 2016 (UTC)

It's hard sometimes as I want to see certain bot edits, but some bots are okay for me. It would be great if I could filter those out more easily (username CSS classes are maybe a start, maybe in combination with a namespace one). Another suggestion is a tool where you can clean up your watchlist based on SPARQL queries. So I could, for example, remove all taxonomy items from my watchlist. Sjoerd de Bruin (talk) 14:27, 25 October 2016 (UTC)
I have no item in my watchlist, only user and Wikidata pages. Sometimes, I´m using SPARQL-queries to find false statements. --Molarus 15:20, 25 October 2016 (UTC)
~10 000 items on my watchlist, no problems to keep track of all changes (with rare exceptions). I’m not sure whether this is already a lot. —MisterSynergy (talk) 15:35, 25 October 2016 (UTC)
If you talk about whatchlist in Wikidata I have very few items in my watchlist. Normally I use SPARQL or lists automatically generated from me or from other users --ValterVB (talk) 16:41, 25 October 2016 (UTC)
I do not use anything technical except for a gadget which highlights unseen items (to keep track of them when I switch computers), and it costs me between an hour and two hours per day just to go through my watchlists on the four Wikimedia projects I have administrator privileges.--Ymblanter (talk) 20:51, 25 October 2016 (UTC)
I have app. 14 000 items on my watchlist. Built-in filtering is useful (mainly bot and ORES-based probably good edits), but it may be useful to upgrade with ability to filter out semi-automated edits (at least those with reCh or Widar tags) as well.  – The preceding unsigned comment was added by Jklamo (talk • contribs).
My watchlist contains pages created by me, some other interesting pages and then those pages that are popular targets for vandalism. I'm not so interested of seeing bot edits, as they are usually correct. Sometimes my watchlist is full of widar edits, and I can't find easy to way to hide them. --Stryn (talk) 14:01, 26 October 2016 (UTC)

Wikidata:Bureaucrats' noticeboard#Request for flooder flag[edit]

May I please ask for some input there? It is like a bot request, but for a flooder flag. Thanks.--Ymblanter (talk) 18:11, 24 October 2016 (UTC)

Rejecting claim for universe (Q1) fails[edit]

When I try to reject a claim in universe (Q1) (the two point in time properties) I get a pink message box that says something like "set state to wrong failed" and some grey and black boxes next to the claim keep flashing until I navigate away from the page. Jc3s5h (talk) 13:32, 25 October 2016 (UTC)

Hmm, are you trying to remove the claim, or set it to the deprecated level? -- Ajraddatz (talk) 22:35, 26 October 2016 (UTC)
Trying to remove it. Jc3s5h (talk) 00:56, 27 October 2016 (UTC)

Sidebar section for ITEMS?[edit]

Hello. This morning I had to merge two items (Q4226027) and found it kinda hard to find back how to do it. (Eventually remembering it's somewhere in "Special pages".) In the sidebar, why not add a section above TOOLS, dedicated to ITEMS? Maybe like this:


  • Create a new item
  • Merge two items
  • Item by title
  • Recent changes
  • Random item

They're all elements from the current sidebar, plus "Merge two items". 15:34, 25 October 2016 (UTC)

There is a dropdown menu left to the search box which offers convenient access to a merge form on all item pages, where you just need to provide the other item’s Q-ID. I think this needs to be activated on the gadgets section of the preferences. I never missed merge links anywhere else. —MisterSynergy (talk) 15:44, 25 October 2016 (UTC)
Someone finds unlinked articles on two Wikipedias, you think they're going to install some gadget? I think without "Merge two items" in the sidebar, most will give up. 19:57, 26 October 2016 (UTC)
You should be messaging the appropriate higher authorities to request that the merging gadget be enabled by default on new accounts, rather than threaten on behalf of other IPs to not contribute to Wikidata otherwise. Mahir256 (talk) 21:43, 26 October 2016 (UTC)
Editing logged-in does give many better options for the regular user. It is also somewhat reasonable to not put powerful tools, like merge, waving teasingly in the purview of some IP editors. I think that it is a reasonable compromise to make the merge tool available but not to overly advertise it. There is ready information available at Help:Merge that steps someone interested people through the process.  — billinghurst sDrewth 05:41, 27 October 2016 (UTC)

Invitation for review: Technical Collaboration Guideline[edit]

Wikimedians, please review something we are working on for the Wikimedia Foundation, the mw:Technical Collaboration Guideline.

The Technical Collaboration Guideline (TCG) is a set of best practice recommendations, for planning and communicating product and project information to Wikimedia communities, in order to work better, together. The TCG allows Wikimedia Foundation (WMF) Product teams and Wikimedia communities to work together in a systematic way in the product development and deployment cycle. It is hoped that the TCG is useful enough to be utilized in planning and communications regarding any project, from anyone. The TCG is intended to be flexible as plans and products change in development; it is a guide whose contents will help build collaborative relationships.

The initial draft of the TCG was written after discussions in small groups with members of the Community Liaisons and Product Management teams, to identify successes and failures in communication, and what we can do to encourage collaboration with the communities. Over the next month, we are seeking review and feedback from Wikimedia community members. All feedback that is left will be read; if there is a case for immediate action, it will be made. All feedback will be taken into consideration when editing the next draft of the TCG. Please keep in mind that the TCG is intended to be lightweight information and instruction and will not be completely comprehensive. The TCG and the conversations about it are in English, but comments from all languages are welcome. We look forward to reading your comments at mw:Talk:Technical Collaboration Guideline. Thanks. Quiddity (WMF) (talk) 19:27, 25 October 2016 (UTC)

Wrong links in side bar[edit]

What is causing this[3] situation where you get the wrong item (Pseudomyrmecinae rather than Tetraponera) linked in the side bar? JMK (talk) 22:29, 25 October 2016 (UTC)

Edit the category @ Commons and you'll see old skool wikilinks. They can obviously be deleted. Jared Preston (talk) 00:18, 26 October 2016 (UTC)


How should we link fish (Q152) (the animal) with fish (Q600396) (the food)? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:17, 26 October 2016 (UTC)

What about natural product of taxon (P1582)? --Micru (talk) 12:52, 26 October 2016 (UTC)
Thank you. I've done that for now, but it feels a bit of a cludge. What do others think? I see we also have the inverse, this taxon is source of (P1672). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:22, 26 October 2016 (UTC)

Location of HongKong[edit]

The coordinate location (P625) of Hong Kong is precise up to 11 meters which is rather shocking since Hong Kong is an island between 4 and 16 km wide so the island location should be at least +- 1km. This discrepancy come to my attention because c:Category:Pages with local coordinates and mismatching wikidata coordinates lists pages with discrepancy between Commons and Wikidata locations and Commons location of Hong Kong is 3 km away. How do I alter or even see the geo-precission stored with each coordinates? Right now I can see it at c:Category:Hong Kong (yellow box under location) which is pulled by LUA from Wikidata. But I do not know how to fix it. --Jarekt (talk) 14:26, 26 October 2016 (UTC)

Its precision is 1/1000 or an arcsecond now. You can change that by going to Hong Kong (Q8646) edit it and put your cursor in the field with the coordinates. Then will a field drop down and show you the precision. You can change that to almost whatever you like. -- Innocent bystander (talk) 15:44, 26 October 2016 (UTC)
Innocent bystander Thank you that what I was looking for; however, I can not reproduce it. You suggested to "put your cursor in the field with the coordinates" and the "field [will] drop down and show [] the precision". I can put my cursor anywhere over that field and nothing happens, except to show me the link to GeoHack if I park it over the coordinates. Clicking on "1 reference" will show me that it was "imported from" "English Wikipedia". I can not make it to show me the precision or globe or change them. Do you have some gadget or extension that helps. Some of the things affecting interface can be very trivial. I just figured out that the easiest way to allow me to import labels and descriptions to Wikidata for languages I do not speak is to add XX-0 flags to my {{#babel}} tag, one for each language I do not speak but want to copy. --Jarekt (talk) 16:11, 26 October 2016 (UTC)
I am not aware of any Gadget installed for this. The only strange I have here is that I use the Monobook skin. But what I did is that I pushed the edit-button, then put the cursor in the field with the "6*7'8"N 9*10'11"E" and then the field with the precision droped down. -- Innocent bystander (talk) 16:49, 26 October 2016 (UTC)
Thank you. That worked in vector skin as well. I do not know how I missed that. May be because sometimes it takes a while for the full page to load and various features do not work properly until it does. I had the same issue arguing that links are not added to identifiers like VIAF or to links to pages on Commons. They do ... eventually. and if the do not than reloading the page works most of the time. --Jarekt (talk) 17:44, 26 October 2016 (UTC)

create an account[edit]

it isnt letting me create a account  – The preceding unsigned comment was added by (talk • contribs) at 18:26, 26 October 2016‎ (UTC).

In order to help you, we need more info - what error message do you see? One work-around is to create the account using a different internet connection. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:02, 27 October 2016 (UTC)

Open commenting on WMF Seeking Additional Resources for Structured Data on Commons[edit]

The Wikimedia Foundation in cooperation with Wikimedia Deutschland, has a unique opportunity to potentially secure additional resources to expedite development work on Structured Commons. We would like feedback on a 3 year plan that describes accelerated software development if these resource becomes available. We would like to invite you to participate in a conversation at: this page which provides an overview of that proposed timeline. We look forward to your comments and thoughts.

Joseph Seddon and Alex Stinson 22:30, 26 October 2016 (UTC)

Как подтвердить право собственности на интернет магазин?[edit]

Валерия Браун (talk) 07:03, 28 October 2016 (UTC)Браун Валерия. Добрый день! Столкнулась с проблемой и не совсем понимаю как ее решить. Создала сайт с адресом , подключила его к Google Webmaster Tools и к Яндекс Вебмастеру, но в инструкциях по продвижению было указано об необходимости подтвердить право собственности на интернет магазин через Возможно я не совсем по адресу или не правильно обращаюсь. Помогите пожалуйста с данным вопросом или перенаправите по нужному адресу. Буду Вам благодарна при любой помощи. Заранее благодарна Валерия Браун.

@Валерия Браун: Совсем не по адресу. Как может энциклопедия помочь с правами? P.S. К тому же здесь не Википедия, и на этом форуме принято говорить по-английски. --Infovarius (talk) 10:04, 28 October 2016 (UTC)

Birthday is coming![edit]

Hello folks,

Wikidata's fourth birthday is tomorrow, and we have a lots of things for you!

Some Wikidata editors organize events for the birthday on several continents \o/ Three meetups already took place in Turin (Q495), San Francisco (Q62) and Tokyo (Q1490), next ones are tomorrow in Paris (Q90) and Utrecht (Q803). You can find all the other places and dates here. You can also follow what happens on Twitter with #Wikidatabirthday.

During the week to come, I'll share with you the gifts that the Wikidata team and some community members prepared for the birthday. These are very exciting new features, visuals, games... You can check every day this page, the mailing-list and the birthday page to see what's new :)

I will also share with you some user stories, blog posts and other stories that people wrote for the birthday. If you write something, please send me the link so I can highlight it during the week!

Wikidata cupcake II.svg This user is celebrating Wikidata's 4th birthday.

Let's start with this birthday user template that you can add to your user page, made by @Pigsonthewing:, thank you :) Thanks also @Incabell: for the banner on the birthday page!

I can't wait to start this birthday week and share all these nice gifts with you :)

Cheers, Lea Lacroix (WMDE) (talk) 10:30, 28 October 2016 (UTC)