Shortcuts: WD:PC, WD:CHAT, WD:?

Wikidata:Project chat: Difference between revisions

From Wikidata
Jump to navigation Jump to search
Content deleted Content added
TomT0m (talk | contribs)
Tag: 2017 source edit
Ghuron (talk | contribs)
Line 1,061: Line 1,061:
--[[User:Gryllida|Gryllida]] ([[User talk:Gryllida|talk]]) 23:51, 8 March 2018 (UTC)
--[[User:Gryllida|Gryllida]] ([[User talk:Gryllida|talk]]) 23:51, 8 March 2018 (UTC)
:[[Special:Diff/646435431]] --[[User:Liuxinyu970226|Liuxinyu970226]] ([[User talk:Liuxinyu970226|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 10:25, 9 March 2018 (UTC)
:[[Special:Diff/646435431]] --[[User:Liuxinyu970226|Liuxinyu970226]] ([[User talk:Liuxinyu970226|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 10:25, 9 March 2018 (UTC)

== How vandalism in wikidata affect local wikis ==

Yesterday at 01:42 someone {{diff|diff=645343810|label=renamed}} {{Q|159}} into "mainkra". It was {{diff|diff=645381529|label=reverted}} with an hour, but unfortunately vandal version spread somehow into ru-wiki infoboxes (probably because of mysterious cache algorithms). Right now google shows that thousands of articles still affected (see [https://prnt.sc/iovdez]). I understand that this might not be a top priority issue for wikidata community, but is there anything we can do to decrease probability of similar incidents in the future? For instance, what is the reason why we allow anonymous contribution for highly used items like {{Q|159}}? --[[User:Ghuron|Ghuron]] ([[User talk:Ghuron|<span class="signature-talk">{{int:Talkpagelinktext}}</span>]]) 12:47, 9 March 2018 (UTC)

Revision as of 12:47, 9 March 2018

Wikidata project chat
Place used to discuss any and all aspects of Wikidata: the project itself, policy and proposals, individual data items, technical issues, etc.
Please take a look at the frequently asked questions to see if your question has already been answered.
Please use {{Q}} or {{P}}, the first time you mention an item, or property, respectively.
Requests for deletions can be made here. Merging instructions can be found here.
IRC channel: #wikidataconnect
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/04.

Xantus's Murrelet

See also User:Pigsonthewing @ Xantus's Murrelet (Q46338167) --Succu (talk) 21:38, 25 February 2018 (UTC)[reply]

We seem to be having a problem at Xantus’s murrelet (Q46338167), which User:Succu persists in repeatedly (five times, so far) trying to merge into one or another item about patently different concepts; or from which he removes cited statements. Given previous difficulties I and other editors have experienced when attempting to discuss similar matters with that user, I'm raising it here, and not on the item's talk page which presumably has no other watchers. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:16, 26 December 2017 (UTC)[reply]

And a sixth. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:34, 26 December 2017 (UTC)[reply]
Mind to count your own reverts too? The item was originally created for the eBird entry xanmur. This is about two species called en:Xantus's murrelet (= Scripps's Murrelet (Q3120531) and Xantus’s murrelet (Q1276043)). Then Mr. Mabbett added ABA bird ID (P4526) = xanmur, witch is referring only to the common name „Xantus's murrelet“ and a duplication of the value ARKive ID (archived) (P2833)=xantuss-murrelet/synthliboramphus-hypoleucus. Finally (after some reverts) he claimed taxon name (P225) = Synthliboramphus hypoleucus (=Xantus’s murrelet (Q1276043)) about this item. Maybe he could explain here, why and on what base he thinks this is a „patently different concept“. --Succu (talk) 21:00, 26 December 2017 (UTC)[reply]
I'm glad that Succu has confirmed that the item in question is about a different concept to the items to which he has variously redirected it (albeit he is confused as to why this is so; and about the edits I have made to the item). Perhaps he will now cease doing so? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:14, 26 December 2017 (UTC)[reply]
I'm confirming nothing. I asked for an explaination. --Succu (talk) 21:17, 26 December 2017 (UTC)[reply]
"The item ... about two species called en:Xantus's murrelet (= Scripps's Murrelet (Q3120531) and Xantus’s murrelet (Q1276043))". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:51, 26 December 2017 (UTC)[reply]
Hence the first merge. Is this item about two species? Would be nice if you could explain your viewpoint to other readers of this topic. --Succu (talk) 21:59, 26 December 2017 (UTC)[reply]
Your first merge was to an instance of Wikimedia disambiguation page (Q4167410). My viewpoint is that Q46338167 represents a different concept to any of those with which you have tried to merge it. I'm also sure "other readers" can read both the item's description, and the sources used. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:24, 26 December 2017 (UTC)[reply]
So no explaination at all, why your new item is a „patently different concept“. Different from other species? Repeat: Is this item about two species? Would be nice if you could explain your viewpoint to other readers of this topic. Looks like are unwilling to do so, Mr. Mabbett. --Succu (talk) 22:33, 26 December 2017 (UTC)[reply]
To make it easier fr you, is your new item Xantus’s murrelet (Q46338167) about:
  1. the two species Scripps's Murrelet (Q3120531) and Xantus’s murrelet (Q1276043)) supported by xanmur
  2. the common name „Xantus's murrelet“ supported by ABA bird ID (P4526) = xanmur
  3. the species name Xantus’s murrelet (Q1276043) supported by ARKive ID (archived) (P2833)=xantuss-murrelet/synthliboramphus-hypoleucus
If your answer is "all of them" (=current status) then please explain it to us. Thanks in advance. --Succu (talk) 22:58, 26 December 2017 (UTC)[reply]
No Succu, there's explanation aplenty. My reason for raising the matter here is to solicit third-party input. I won't be answering questions such as yours, based on false premises. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:24, 27 December 2017 (UTC)[reply]
Then please list my „false premises“ and explain me why they are wrong. But I don't think you have some real arguments. Otherwise it would be easy to you to give them. By the way: Do you think giving only a ISBN like 0198540329 is a sufficient source? --Succu (talk) 21:01, 29 December 2017 (UTC)[reply]
OK, I did another merge. --Succu (talk) 22:44, 30 December 2017 (UTC)[reply]
Mr. Mabbett? ISBN 0198540329 stands for what book? On which page does this ISBN supports your view? --Succu (talk) 21:14, 23 January 2018 (UTC)[reply]
It's a simple question, Mr. Mabbett. Mind to respond? ---Succu (talk) 21:54, 26 January 2018 (UTC)[reply]
I don't see a good reason to reply here and generally think it makes more sense to have such a discussion on the talk page by pinging relevant Wikiprojects. ChristianKl12:59, 27 December 2017 (UTC)[reply]
I agree with ChristianKl. I must admit I am completely mystified with what concept Andy Mabbett has in mind. Certainly the item as it now is, seems inconsistent with any way of expressing any concept ever included in Wikidata so far. - Brya (talk) 05:36, 28 December 2017 (UTC)[reply]
You're "completely mystified" and - according to your comment on the item's talk page, are "guessing" what it represents; yet you see fit to make changes to the item, which are unsupported by the sources used (and you offer no new sources). That's not a healthy way to proceed. I have again fixed your broken indenting. Wilfully mis-indenting your comments, having been told that doing so is harmful, and having been given advice on how to do so correctly, is disruptive. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:34, 29 December 2017 (UTC)[reply]
The problem is that there is very little offered in the way of sources. I see just the ARKive ID link, and given how much junk we already suffered from that source, it is a frail reed to lean anything on.
        And please don't "fix" my comments: you should restrict your religious [?] beliefs to your own comments. - Brya (talk) 11:38, 29 December 2017 (UTC)[reply]
There are at least three sources used on the item; none of which are from ARKive. Please stop posting falsehoods. And like I said; disruptive. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:48, 29 December 2017 (UTC)[reply]
ARKive ID (archived) (P2833)=xantuss-murrelet/synthliboramphus-hypoleucus states this is Xantus’s murrelet (Q1276043). ---Succu (talk) 21:01, 29 December 2017 (UTC)[reply]
BTW, same is true for your weblink to the entry at US ECOS. --Succu (talk) 21:10, 29 December 2017 (UTC)[reply]

I've just undone a seventh attempt by Succu to delete this item through a merger to an inappropriate target. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:28, 30 December 2017 (UTC)[reply]

Please argue here and do not revert blindly. --Succu (talk) 07:05, 31 December 2017 (UTC)[reply]
And an eighth... Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 31 December 2017 (UTC)[reply]
 Comment Perhaps it's time to find a source that they are actually different... Matěj Suchánek (talk) 21:10, 31 December 2017 (UTC)[reply]
You can find one on Xantus’s murrelet (Q46338167). HTH. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:18, 1 January 2018 (UTC)[reply]
Then the easiest way to settle the matter is to cite your "reference" here. --Succu (talk) 13:56, 1 January 2018 (UTC)[reply]
The American Ornithological Union checklists "are the official source on the taxonomy and nomenclature of birds found in this region, including adjacent islands." see here: http://www.americanornithology.org/content/checklist-north-and-middle-american-birds . Looking at the current checklist here: http://checklist.aou.org/taxa/ we have Synthliboramphus scrippsi (Scripps's Murrelet) and Synthliboramphus hypoleucus (Guadalupe Murrelet). Therefore I believe, officially Xantus's Murrelet has been split, I don't think what other authorities say is relevant. A species with name Synthliboramphus hypoleucus (Xantus's Murrelet) was deleted from the AOU list as per the 53rd supplement in 2012 http://americanornithologypubs.org/doi/pdf/10.1525/auk.2012.129.3.573 I'm not really a wikidata expert, but I would suggest the best course of action is to retain Xantus’s murrelet (Q46338167) but change the instance of (P31) from taxon (Q16521) to something which indicates that this is a formerly recognised taxon, but which has been deleted. I had a quick look but couldn't find an item that would describe that, but this must have happened before. Species are split all the time. I don't really think that Xantus’s murrelet (Q46338167) should be merged into Xantus’s murrelet (Q1276043) they are different. Just my twopenneth. JerryL2017 (talk) 15:23, 1 January 2018 (UTC)[reply]
Wikidata follows a NPoV policy, not a Single Point of View policy; for that try Wikispecies. So, the American Ornithological Union checklists are only one source, not THE source. Of course, it may be possible to start creating items based only on American Ornithological Union concepts, but this would be a fairly big departure from existing practice. - Brya (talk) 05:37, 2 January 2018 (UTC)[reply]
We do not model different taxon concepts this way. Thats why I merged the items several and was asking for a good reference to proceed. None was given. --Succu (talk) 16:04, 2 January 2018 (UTC)[reply]
Here is the defining reference that concludes Xantus’s murrelet (Q46338167) is 2 species: http://www.bioone.org/doi/full/10.1525/auk.2011.11011 based on that paper the AOU adopted that taxonomy as detailed in the 53rd supplement, http://americanornithologypubs.org/doi/pdf/10.1525/auk.2012.129.3.573 which I had already given above. However, given that not all sources have yet adopted this taxonomy, and based on what others have said here and what is stated in the wikidata taxonomy project guidance it would seem sensible to retain Xantus’s murrelet (Q46338167) for the time being, with the correct links to sources that are still using the former taxonomy. That said, there are issues with Xantus’s murrelet (Q1276043). This item refers to the "split" Guadaloupe Murrelet but has links to sources that do not recognise the split. It also includes the alternative name of Xantus's Murrelet, which is confusing. JerryL2017 (talk) 17:44, 2 January 2018 (UTC)[reply]
Rangewide population genetic structure of Xantus's Murrelet (Synthliboramphus hypoleucus) (Q29541111) is proposing a taxonomic opinion about elevating the two subspecies Synthliboramphus hypoleucus hypoleucus (Q47012916) and Synthliboramphus hypoleucus scrippsi (Q47012925) of Xantus’s murrelet (Q1276043) to species level. The American Ornithological Society (Q465985) was following the recommendation. I do not see Xantus’s murrelet (Q46338167) is expessing this. --Succu (talk) 18:37, 2 January 2018 (UTC)[reply]
Yes, whatever the intent is, execution seems sloppy. - Brya (talk) 04:01, 3 January 2018 (UTC)[reply]
Since Mr. Mabbett refuses to argue here I will merge both items once again. --Succu (talk) 19:27, 5 January 2018 (UTC)[reply]
And if you do, absent a consensus here, you will be reverted again; for the reasons already given. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:01, 5 January 2018 (UTC)[reply]
I expected this. Other people try to argue. Your are not. What a pitty for you. Hopefully you do not miscount your reverts. --Succu (talk) 22:10, 5 January 2018 (UTC)[reply]
And this revert of him. --Succu (talk) 07:21, 6 January 2018 (UTC)[reply]
And this revert of him. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:35, 6 January 2018 (UTC)[reply]
Obviously you are unwilling to give a reasonable answer here. Probably you can't and are defending the item only because you've created it. --Succu (talk) 21:02, 6 January 2018 (UTC)[reply]
No progress here from Mr. Mabbetts side. --Succu (talk) 21:01, 8 January 2018 (UTC)[reply]
Now at AN. --Succu (talk) 20:59, 9 January 2018 (UTC)[reply]
ok from the point of view of a taxonomist. Reading the paper from 2011 linked above by @JerryL2017: the two taxa in question were considered subspecies, however overlap and do not interbreed in sympatry. By the definition of a subspecies this is not possible, hence they should be species and have been recommended as such by the paper also. As such from this viewpoint you have two species and should have two items one for each. Any other refs, unless you find one that refutes this primary ref with data not opinion, are irrelevant. I see no reason for any further argument. The nomenclatural act has been made, follow it. Where the common names go whatever, they are vernacular names and not relevant to the concept of the species. That is my view on this so I would suggest fixing the pages to reflect this and as for the IOC, ummm they are not a primary taxonomic reference so why would you be adamant about it. Cheers Scott Thomson (Faendalimas) talk 21:46, 13 January 2018 (UTC)[reply]
All major bird checklist (including IOC) followed this viewpoint. The "official" english common name of Synthliboramphus hypoleucus was changed from „Xantus’s Murrelet“ to „Guadalupe Murrelet“. --Succu (talk) 22:57, 13 January 2018 (UTC)[reply]
Xantus’s murrelet (Q46338167) represents a concept, described by the three reliable source used on that item, which we can refer to, for the sake of brevity as "A". You are saying that a different source refers to the concept "B". The Wikidata model, as I understand it, is that to concepts should be represented by different items, (with, if applicable, mutual "said to be the same as" properties). However, If your contention is that "A" and "B" are the same concept, but with different attributes, then the Wikidata model is to include properties with values stating both attributes, cited to their respective sources. What the Wikidata model does not do, is to pretend that the (reliably-cited) concept "A" does not exist. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:17, 13 January 2018 (UTC)[reply]
Name it (the concept)! --Succu (talk) 23:23, 13 January 2018 (UTC)[reply]
Umm..... I am seriously not getting this. Ok three reliable sources, I looked on the page I did not see anything I would consider as reliable. Please correct me if I am wrong, do you have separate pages for the scientific name and the vernacular names?? I do not understand why you would do that, but no matter. The vernacular name needs to follow the scientific name according to the most recent primary sources. No I do not consider the collective political opinions of eBird or the IOC nd simalar checklists as primary sources. When the species was split I imagine this did require modification of the accepted common names. As such you would use the primary refs that state this coupled with the research paper that did the nomenclatural act to move the common names accordingly. By the way, a taxon is not a concept it is an hypothesis, the concept is the theory of how species are differentiated, ie the grounds for calling something a species. That is for example the Biological Species Concept. However a species is a hypothesis circumscribed under a concept of the primary authors choosing. So please what are we getting t here cause your paragraphh Andy makes no sense. You are reporting science, it needs to reflect science. Cheers Scott Thomson (Faendalimas) talk 23:48, 13 January 2018 (UTC)[reply]
OK, I will try and summarise and make some recommendations. Up until about 2012/2013 the ornithological world recognised 5 species in the genus Synthliboramphus (Q287293) . One of these species had the latin name Synthliboramphus hypoleucus, common name Xantus's Murrelet. This is Xantus’s murrelet (Q46338167). In 2012, in the paper cited above, it was proposed that the 2 recognised sub-species of Xantus’s murrelet (Q46338167) (Synthliboramphus hypoleucus scrippsi (Q47012925) and Synthliboramphus hypoleucus hypoleucus (Q47012916) be considered separate species and that Xantus’s murrelet (Q46338167) be deprecated. This hypothesis was accepted by the American Ornithological Union (they have a scientific committee which decides on such matters) and their taxonomy was updated so that the genus Synthliboramphus has 6 recognised species including: Synthliboramphus hypoleucus, common name Guadalupe Murrelet (Xantus’s murrelet (Q1276043) AND Synthliboramphus scrippsi, common name Scripps's Murrelet (Scripps's Murrelet (Q3120531)). Subsequently a number of other sources have also adopted this new taxonomy (and I make no comment about how reliable these might or might not be) e.g Avibase ID (P2026), IUCN taxon ID (P627), eBird taxon ID (P3444) and others. However a number of others, that are variously cited in wikidata's items have not adopted the new taxonomy. How do I know they have not adopted the new taxonomy? Because when you look at it in detail it does not include Scripps's Murrelet (Q3120531). Examples of these include: NCBI taxonomy ID (P685)), ITIS TSN (P815) and WoRMS-ID for taxa (P850) although there are probably others, I haven't yet reviewed all sources. So, why have these sources not adopted the new taxonomy recommended by experts in the field? I can see 3 reasons: 1) They don't agree with the hypothesis that Xantus's Murrelet is actually 2 separate species - I believe this is unlikely in this case) 2) They haven't updated their taxonomy (highly likely) and over time will probably update their data. 3) They have a scientific requirement to retain Xantus's Murrelet in their taxonomy because of pre-existing data that was linked to Xantus's Murrelet and not either of the sub-species. A good example of this is eBird taxon ID (P3444) where their taxonomy has Guadalupe AND Scripps AND (by their nomenclature) an entity called Scripps/Guadalupe Murrelet (Xantus's Murrelet). So, my conclusion is that wikidata should retain Xantus’s murrelet (Q46338167), some sources are still using the former taxonomy and some bona fide sources WILL ALWAYS have an entity that is best described as Xantus's Murrelet. However, having said that there are some outstanding questions we should resolve 1) For Xantus’s murrelet (Q46338167) what is the best instance of (P31). is it taxon or some other concept that better captures its status. 2) The existing wikidiata items should be updated to ensure each item is linked to sources that best reflect that item i.e. Xantus’s murrelet (Q1276043) should NOT link to sources that have a taxonomy that does not inlcude Scripp's Murrelet. 3) Other data such as images and synomyms should be updated to make it less confusing. (NOTE: if we have agreement on this I am happy to go and make these changes IF I have assurance that they won't just all be reverted). 4) Consideration should be given within the wikidata taxonomy project for additional avian taxonomy properties (some key sources are not available as properties, which isn't helping here) 5) I've seen various comments on here that some sources are "unreliable" if there is a consensus that they are unreliable then why do we retain them? 6) IF we can resolve this case, consideration should be given as to how this is more widely applied within wikidata - for instance it is very common practise across many data recording schemes to "combine" species into groups when they are difficult to identify. I personally think wikidata should be able to reflect that, we are a broad data resource not a wildlife taxonomy. Thanks, I hope this is useful and moves the discussion on a little. JerryL2017 (talk) 09:58, 14 January 2018 (UTC)[reply]
Maybe just a slip of a pen, JerryL2017: „the genus Synthliboramphus has 6 recognised species“ but I count only five... --Succu (talk) 23:16, 14 January 2018 (UTC)[reply]
Sorry Succu, you are correct.

────────────────────────────────────────────────────────────────────────────────────────────────────ok what I have been trying to say is the taxonomic view of the situation, then it is for wikidata to determine how best to reflect the science. In answer to your questions from my point of view. This is a wonderful example of why common names are a pain, unprofessional and honestly useless. I agree with you @JerryL2017: that the issues of different nomenclatures between sources is most likely lack of updating.
1. There are three common names available for 2 species the names Xantus's Murrelet and Guadalupe Murrelet both apply to the scientific name Synthliboramphus hypoleucus however the former is now considered depreciated, and Scripp's Murrelet which applies to Synthliboramphus scrippsi. When the species was split the current common name goes with the species that retained the original combination. There is no justification in retaining the common name Xantus's Murrelet for Synthliboramphus scrippsi, or honestly at all. The name Xantus's Murrelet does not technically apply to a taxon anymore, it is considered depreciated, it is at best an outdated name of historic value only.
2. Agreed, removal of sources that are outdated in their nomenclature will avoid confusion, or if they must be stated have it as "stated in and as" so the the page is generally set up with the current nomenclature but make note of any departures from it without supporting them.
3. Agreed, all information possible should be updated, including where necessary file names and metadata for images. I would not revert anything. Cannot speak for others. But I think if we hammer out an agreed position I believe people are professional enough to follow it.
4. Avian taxonomy should honestly be following the ICZN code, which they do not. Further they have made recent efforts to dictate to other fields of taxonomy that their viewpoint should be followed, to a massive backlash. However, this is not our problem, we present the science we do not revise it. If you feel the need for further avian properties please elucidate these.
5. Your guess is as good as mine. I think there is a generalised tendancy in projects like this to grab every online reference possible, unfortunately with little consideration of the quality of what is presented and no fact checking. Basically the equivalent of google says this, it must be true. Again I think this is also unprofessional, I think sources that are questionable should be examined by wikidata taxonomy project for validity and if rejected they are removed.
6. The act of combining species into groups is I think beyond the realm of a database. This is done through analysis of the given issue. Wikidata should be presenting the data, with reliable and good sources. As best as possible the primary taxonomic literature, in the scope of this issue, the thing that can come out of this is a better discussion on what is a good resource, the acceptance that complex cases need to be analyzed using only primary references, and that in taxonomic issues the relevant codes are the primary determinant on availability and validity. That is, if a name is published in accordance with the Code it is to be accepted as valid or refuted, a point the avian taxonomists breach the code on repeatedly.
Cheers Scott Thomson (Faendalimas) talk 15:14, 14 January 2018 (UTC)[reply]

Yes, it was established early on that in the real world there are/have been two circumscriptions for Synthliboramphus hypoleucus. That is not a problem. There appear to be several problems:
  • Is this wider/older circumscription notable enough to rate a separate item of its own?
  • Is this wider/older circumscription indeed what Andy Mabbett intends with Q46338167, given that he has already denied this. If not, what does he intend?
  • Is it worthwhile discussing if Wikidata should have items for concepts denoted by a standardised common name set by some bird organization? These clearly exist, but are they notable enough?
  • Given that bird organizations use deviant 'scientific names' with rules of their own, should we have a property for that? Something like "Avian scientific name" or more general "deviant taxon name, used by special interest groups" (to include butterflies)? Clearly, it is not a good idea to put non-Code-compliant names in P225.
Brya (talk) 18:10, 14 January 2018 (UTC)[reply]
I agree with your analysis @Brya: specifically:
Is this wider/older circumscription notable enough to rate a separate item of its own? I do not think so.
Is this wider/older circumscription indeed what Andy Mabbett intends with Q46338167, given that he has already denied this. If not, what does he intend? My impression was that this is what is being suggested here, I also acknowledge Andy has denied this, but I have no idea what the purpose of this is in that case.
Is it worthwhile discussing if Wikidata should have items for concepts denoted by a standardised common name set by some bird organization? These clearly exist, but are they notable enough? I do not think they are notable in any great degree, unless they are a highly notable list. I would encourage the avoidance of confusion as a priority.
Given that bird organizations use deviant 'scientific names' with rules of their own, should we have a property for that? Something like "Avian scientific name" or more general "deviant taxon name, used by special interest groups" (to include butterflies)? Clearly, it is not a good idea to put non-Code-compliant names in P225. I prefer something along the lines of your second option, since it can be applied outside Aves (Birds) this can apply to Amphibians also. But I definitely agree anything that is non code compliant should be avoided in almost any circumstance. Cheers Scott Thomson (Faendalimas) talk 20:43, 14 January 2018 (UTC)[reply]
All concepts are still in use at Wikimedia projects: de:Lummenalk is about the old concept, en:Guadalupe murrelet is about the new concept. What e.g. is species:Synthliboramphus hypoleucus is about remains unclear to me. So how to deal with #2) Where should we place a) outdated Wikimedia articles, b) outdated external identfiers (in case we can judge they are) And yes, this thread needs some insights by Mr. Mabbett. --Succu (talk) 23:01, 14 January 2018 (UTC)[reply]
The Wikispecies account is about the species in question as part of this, we do not worry so much about common names there as its not really what we are about. Vernacular names get added by people occasionally as they see fit, I ignore them as best as I can. If someone wants to add the english common name they can. Cheers Scott Thomson (Faendalimas) talk 00:54, 15 January 2018 (UTC)[reply]
I do not care much about common names, but I care about references. The Wikispecies article has only a reference to the origninal combination Brachyramphus hypoleucus (not mentioning it at all). So it's hard to know about which taxon concept this entry is. :( --Succu (talk) 19:30, 16 January 2018 (UTC)[reply]
Fair enough, sorry I do not do the birds. I would not know the relevant refs. When I do turtles I already have pretty much all the literature, so is easier for me. However, since species:Synthliboramphus scrippsi also exists, then the other species can only be considering the new combination. Cheers Scott Thomson (Faendalimas) talk 00:17, 17 January 2018 (UTC)[reply]
But this only a guess. Even the genus species:Synthliboramphus has no reference to a current taxonomic treatment... --Succu (talk) 20:38, 19 January 2018 (UTC)[reply]

Any suggestions how to resove this „probem“? Mr. Mabbett is not responding. --Succu (talk) 22:04, 30 January 2018 (UTC)[reply]

I will update wikispecies with the appropriate refs to show the two currently accepted species, and with a common name for each, with the relevant references (if anyone has them please send them to me) I need the original descriptions of both species and the treatment that recognises them as currently valid species. If you wish then you can use this to model your database entries on this. That is up to you. My suggestion is to follow what has largely been discussed here, in the absence of any other explanation. So I suggest you have data entries for each of the two species with the currently accepted common names for each taxa, with the original refs. I further suggest that you could delete the entry for the now depreciated common name and list it only as an alternative, older, no longer used name for the species Synthliboramphus hypoleucus in older treatments. Cite the paper that split them as species for justifying this. I would suggest calling the two data entries by the scientific name with the common names as description. This way the taxa are clearly defined and it can be noted the common names are less clear. Just my suggestions, your call. Cheers Scott Thomson (Faendalimas) talk 22:17, 30 January 2018 (UTC)[reply]
Please refer to my reply to you (so much for not responding!), above, time-stamped "23:17, 13 January 2018 (UTC)". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:13, 30 January 2018 (UTC)[reply]
Your usual gibberish, Mr. Mabbett. You didn't responded to some question directed to you (what concept, give a page). --Succu (talk) 22:57, 31 January 2018 (UTC)[reply]
OK, I will merge both items again. --Succu (talk) 19:07, 5 February 2018 (UTC)[reply]
If you do, I will revert you, because nothing here refutes my reason for doing so previously. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:16, 5 February 2018 (UTC)[reply]
Why would you revert? I thought consensus was what was followed here. Cheers Scott Thomson (Faendalimas) talk 12:56, 6 February 2018 (UTC)[reply]
So, why would you revert, Mr. Mabbett? Is nothing here refutes my reason for doing so previously an argument? --Succu (talk) 22:51, 7 February 2018 (UTC)[reply]
As you can see, my arguments are laid out above. As a courtesy to our fellow editors, I see no need to repeat them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:16, 7 February 2018 (UTC)[reply]
As a courtesy to our fellow editors [...], Mr. Mabbett? An „interesting argument“. I'm one of your fellow editors. I don't think you have an answer to my questions, because you are avoiding to give a unequivocally answer for weeks now. --Succu (talk) 22:10, 8 February 2018 (UTC)[reply]
Done. --Succu (talk) 22:55, 8 February 2018 (UTC)[reply]
This was again reverted by Mr. Mabbett with the comment „per project chat“. --Succu (talk) 20:21, 14 February 2018 (UTC)[reply]
Looks like we have to wait till eternity, to get an explaination by Mr. Mabbett. --Succu (talk) 18:40, 20 February 2018 (UTC)[reply]

Despite the above, Succu's latest reason for reverting me was "no argument given". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:10, 24 February 2018 (UTC)[reply]

Where above? So what's this item about? You didn't tell us. --Succu (talk) 13:32, 24 February 2018 (UTC)[reply]
Still no explaination, by Mr. Mabbett. --Succu (talk) 21:53, 24 February 2018 (UTC)[reply]
Still no clarification by Mr. Mabbett. --Succu (talk) 06:11, 2 March 2018 (UTC)[reply]

I moved the value of ARKive ID (archived) (P2833). Could you please explain your revert, Mr. Mabbett? --Succu (talk) 19:14, 6 March 2018 (UTC)[reply]

Plants of the World Online database

Plants of the World online (at [1]) looks to be an important database in future for plants. So could it be set up please? An example of where it is needed is Malva acerifolia (Q47519412) – see discussion page for the exact reference. Peter coxhead (talk) 11:06, 30 January 2018 (UTC)[reply]

That should not be a problem, although your example is testimony of a wrong attitude. - Brya (talk) 11:45, 30 January 2018 (UTC)[reply]
Started the ball rolling. - Brya (talk) 12:06, 30 January 2018 (UTC)[reply]
These values are already stored in IPNI plant ID (P961), with the PotW link as a third-party formatter URL (P3303). For the above example, the P961 value is 561509-1 which gives a PotW URL of http://www.plantsoftheworldonline.org/taxon/urn:lsid:ipni.org:names:561509-1 Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:20, 30 January 2018 (UTC)[reply]
@Pigsonthewing: Andy, please see my last comment at Wikidata:Property proposal/Plants of the World online. My only interest is to get Plants of the World online included in appropriate Wikidata items so that can be made to show up in articles when {{:en:Taxonbar}} is added. Please help to get this done in whatever way is appropriate. Peter coxhead (talk) 22:14, 2 February 2018 (UTC)[reply]
Before seeing your comment here, I had just written over on en.Wikipedia: "Taxonbar can be made to display a link to the PotW site, using values from Wikidata property P961". You don't need any change to Wikidata for that. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:51, 3 February 2018 (UTC)[reply]
That is an approach that could be adopted, provided one does not mind that it works for only part of the cases. - Brya (talk) 03:40, 10 February 2018 (UTC)[reply]
Please provide an example of a case where it does not work. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:36, 10 February 2018 (UTC)[reply]
This was done allready at Wikidata:Property proposal/Plants of the World online. --Succu (talk) 19:17, 10 February 2018 (UTC)[reply]
No such example is provided on that page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:59, 10 February 2018 (UTC)[reply]
It is. And tons of plants not covered by IPNI. --Succu (talk) 22:02, 10 February 2018 (UTC)[reply]
Please give an example - here - of some of the "tons" of plants not covered by IPNI, which use IDs matching the definitions in the property proposal. And, if I'm wrong, prove it: give the former example here, too. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:59, 11 February 2018 (UTC)[reply]
Succu provided urn:lsid:ipni.org:names:128853-1 as an example missing from POWO; Peter coxhead provided urn:lsid:ipni.org:names:503872-1. ArthurPSmith (talk) 16:00, 12 February 2018 (UTC)[reply]
Those are examples in the wrong direction; the model proposed (giving a valid POTW URL as a reference, to indicate that a page exists on POTW) would clearly not be used in such cases. They are not "plants not covered by IPNI". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:01, 12 February 2018 (UTC)[reply]
No such examples, then. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:16, 17 February 2018 (UTC)[reply]
Wrong direction: Your intention was to spare us a property because this property could be "remodeled" via third-party formatter URL (P3303). This is untrue. --Succu (talk) 21:58, 17 February 2018 (UTC)[reply]
False. And I asked you to "Please provide an example of a case where it does not work". Still no such examples. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:18, 18 February 2018 (UTC)[reply]
Probably only a faux pas (Q1398885) at your side, Mr. Mabbett? --Succu (talk) 22:21, 20 February 2018 (UTC)e[reply]

Wider issue

I have held off from creating this property because of the heated debate (which is apparently still on). I think we should just have one general discussion about these cases. When an identifier is shared by multiple databases, but these databases do not have the same coverage of that identifier, what do we do?

  • Either we create two separate properties, holding the same values but with different formatter URLs (and different coverage obviously)
  • Or we find another way to indicate that an identifier is available in one of the databases (Andy suggested to use references like this).

This is a fairly general problem that was raised in other proposals (such as Wikidata:Property proposal/Google Arts & Culture entity ID) so it would be worth settling it once and for all… Should we have a RFC or something like that? Or is it overkill because the consensus for one solution or another already clear somehow? − Pintoch (talk) 19:04, 24 February 2018 (UTC)[reply]

A similar issue arises at Wikidata:Properties for deletion#eFlora properties, where we currently have two properties, and potentially twenty or more, for a single set of identifier values. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:07, 24 February 2018 (UTC)[reply]
@Pintoch: We have properties for small datasets (some hundreds of usages). We have a lot of (nearly) unused properties. Creating another property shouldn't that problematic. The usage recommendations of third-party formatter URL (P3303) is a bit fuzzy. If another database uses the same set of identifiers all is fine to me. But what about sub-/supersets? Assuming that third-party formatter URL (P3303) is intended to be used as an alternative reference this won't work. Supersets will return 404 errors (POWO). Same is true if we want a direct link to a describing site (eFloras) to make it avaiable as a reference. Mr. Mabbett, as the property proposer of third-party formatter URL (P3303) could probably help to sort out this. --Succu (talk) 23:41, 24 February 2018 (UTC)[reply]
Gladly Succu. Which aspect of P3303, a property created with zero objections and which multiple editors have used without issue, albeit never as a reference, are you having difficulty understanding? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:10, 25 February 2018 (UTC)[reply]
To frame it with Pintoch's words: „When an identifier is shared by multiple databases, but these databases do not have the same coverage of that identifier, what do we do?“ You didn't answered that question. Why do you think using third-party formatter URL (P3303) is the best solution at hand? --Succu (talk) 21:43, 27 February 2018 (UTC)[reply]
You say I didn't answer Pintoch's question, but he already includes my answer, in his post, immediately beneath it. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:41, 27 February 2018 (UTC)[reply]
Avoiding a direct answer is not really helpful to solve problems. Why do you think using third-party formatter URL (P3303) is the best solution at hand? The idea reffered in your answer to me makes no use of third-party formatter URL (P3303). Do you think a third-party formatter URL (P3303) that constructs URLs pointing to nothing is OK (404 error)? If yes than why? --Succu (talk) 22:03, 28 February 2018 (UTC)[reply]
The http specification, RFC 7231, tells us that "The 404 (Not Found) status code indicates that the origin server did not find a current representation for the target resource". That is - in the appropriate circumstance - quite useful information. HTH. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:33, 28 February 2018 (UTC)[reply]
Again you are avoiding a direct answer: „A 404 status code does not indicate whether this lack of representation is temporary or permanent“. Is the usage of third-party formatter URL (P3303) intended to povide a permanent URL? --Succu (talk) 22:49, 28 February 2018 (UTC)[reply]
You had your chance to comment on P3303 when it was proposed. If you have concerns now which you failed to express then, you can use Wikidata:Properties for deletion. Speaking of "avoiding a direct answer", I note that you have yet to give examples - here - of some of the "tons" of plants which you apparently believe to exist, which are not covered by IPNI, but which use IDs matching the definitions in the property proposal; and which I asked you for above. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:25, 1 March 2018 (UTC)[reply]
This "Wider issue" is not about deleting a useful property. I never suggested this. „The domain of POWO is much broader than that of IPNI (fungi and algae are not the subject of IPNI)“ --Succu (talk) 22:47, 1 March 2018 (UTC)[reply]
Again you are avoiding a direct answer; and again you have not given the examples requested; you have not even given evidence of the example you claim to have given elsewhere. The claim that the model I proposed - and which Pintoch quotes as a possible solution for "the wider issue" discussed here - does not work is false; you cannot substantiate it. As I said to you above: "if I'm wrong, prove it". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:31, 1 March 2018 (UTC)[reply]
Is this an answer of a parrot? In the case of POWO applying elementary logic should be enough. All this has little to do with the concerns raised by Pintoch. --Succu (talk) 21:14, 2 March 2018 (UTC)[reply]

Cordillera Azul Antbird, Myrmoderus eowilsoni

Q46624807 has the English common name "Cordillera Azul Antbird" and the scientific name "Myrmoderus eowilsoni". The former commemorates Cordillera Azul National Park, Q264948; the latter E.O. Wilson, Q211029,

When I created the item, I added statements indicating the etymolgy of each of these names; citing sources and giving quotes (" We select the English name to draw attention to the little known but biogeographically important and biodiverse mountain range that contains the type locality of the species." and " We name Myrmoderus eowilsoni in honor of Dr. Edward Osborne Wilson to recognize his tremendous devotion to conservation and his patronage of the Rainforest Trust, which strives to protect the most imperiled species and habitats in the Neotropics and across the globe. (English)", respectively).

For some reason, User:Succu has twice ([2], [3]) removed the cited etymology from the latter of these names. (I say "for some reason", as the only explanation given was the edit summary "per chat disk".)

I have, naturally, restored it. Repeatedly removing cited data with no cogent explanation is clearly unhelpful to the project, and to our users. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:57, 6 February 2018 (UTC)[reply]

And while I was writing the above, did so third time ([4]), with the edit summary "please do not remove a valid source, thx" - despite removing data cited to a valid source in the same edit. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:59, 6 February 2018 (UTC)[reply]
And now a fourth time ([5]), with edit summary "??!!". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:06, 6 February 2018 (UTC)[reply]

 Info Wikidata:Project_chat/Archive/2017/07#Editwar_at_Desmopachria_barackobamai_(Q30434384). --Succu (talk) 20:10, 6 February 2018 (UTC)[reply]

Thanks for the reminder. That's another example of you edit warring to remove cited data on the origin of a specific (both senses) name. I've duly restored it there, too. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:13, 6 February 2018 (UTC)[reply]
Sigh: „duly restored“. OMG. --Succu (talk) 20:17, 6 February 2018 (UTC)[reply]
...and I have been reverted there also ([6]), again with the loss of cited metadata. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:19, 6 February 2018 (UTC)[reply]
Beyond „I have“ and „I was“ do you have some additional arguments to the discussion I reminded you above? --Succu (talk) 21:23, 6 February 2018 (UTC)[reply]

Redux

I've restored the above, unresolved, topic from this month's archive, because we have a similar issue to the one originally raised (i.e. not the sp. nov. matter which side-tracked it; hence now collapsed) at Draba kananaskis (Q47507633), where User:Succu persists in removing a cited qualifier of taxon name (P225) which describes the etymology of the specific name. I raised the same issue last year, but that too petered out without resolution. It is simply not tenable to store the etymology of such names at item level, because that fails when an item can have different names/ labels in different languages, or where the scientific and vernacular names have different roots (see the 'Kentish Plover' example in last year's discussion). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:27, 16 February 2018 (UTC)[reply]

What is a „cited qualifier“? BTW: The edit history of Draba kananaskis (Q47507633) is revealing. --Succu (talk) 22:03, 17 February 2018 (UTC)[reply]
Wikidata:Project_chat/Archive/2017/07#Summary?, was the résumé. --Succu (talk) 22:12, 17 February 2018 (UTC)[reply]
Do you have a credible data model, that caters for the use-cases given above, other than the one which you keep undoing in your reverts? If so, what is it? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:14, 17 February 2018 (UTC)[reply]
Could you please refine (or rephrase) your use case (Q613417)? What information do you want to extract say with a SPARQL query? BTW: What is a „cited qualifier“? --Succu (talk) 20:45, 20 February 2018 (UTC)[reply]
As you can see, my arguments are laid out above. As a courtesy to our fellow editors, I see no need to repeat them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:19, 22 February 2018 (UTC)[reply]
You offered your opinion but not a use case. Just another discussion where you are either unwillingly or unable to argue in a comprehensibe way. What a pitty for our project. --Succu (talk) 21:07, 22 February 2018 (UTC)[reply]
So, you offer no credible data model, then; just ad hominem. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 00:29, 23 February 2018 (UTC)[reply]
A data model about what? Without use cases it's not possible to develop suggestions. --Succu (talk) 19:22, 23 February 2018 (UTC)[reply]

This still requires a resolution. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 23:33, 1 March 2018 (UTC)[reply]

It still needs some input by you, Mr. Mabbett! --Succu (talk) 22:47, 3 March 2018 (UTC)[reply]
Hopefully this renewed your attention. --Succu (talk) 22:54, 6 March 2018 (UTC)[reply]

Adding the Lexeme namespace to the licensing footer text

Proposal

Hi everyone,

As you might know we’ve been working on adding support for lexicographical data over the past year. We are now getting close to a first version and I am tidying up the last pieces before we can get started collecting lexicographical data here on Wikidata and remix, query and reuse that data to learn more about the languages of this world. You can check out the demo system with the current state.

One of the remaining tasks is around licensing. Since the beginning of Wikidata all our structured data is released under CC-0. This has helped significantly with spreading our data widely and quickly and thereby helping us give more people more access to more knowledge. Our current licensing footer text however explicitly mentions the main and property namespaces as the places holding data under CC-0. Since lexicographical data is in a new namespace we need to adjust this text.

I am convinced it is in the best interest of Wikidata to extend CC-0 to all structured data namespaces. The reasons (in addition to my reasons for CC-0 in general):

  • We have fared very well with CC-0 so far and many partners use it as one of the main reasons they are attracted to Wikidata - both for re-use and contribution of data.
  • Having a mix of licenses is a potential legal minefield that can be exploited by some actors, threatening not only re-users but also our own contributors. It is a huge hassle for re-users, in particular small re-users like individual contributors, hobby developers, and small organizations, and will lead to less usage by these, and thus to less spreading of our knowledge.
  • It is the sound thing for data - much better explained by Luis in his blog posts (1, 2, 3, 4).
  • It will mean that we can not import some data from Wiktionary and other sources that is incompatible with CC-0 but that is already the case now. We have always leaned towards making re-use easier at the expense of easy importing. (See input from the legal team for more details on what kind of lexicographical data can be protected.)

So I would like to adjust the license text to say “All structured data (e.g. main, Property and Lexeme namespace) is available under the Creative Commons CC0 License; text in the other namespaces is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy.”.

Cheers --Lydia Pintscher (WMDE) (talk) 09:34, 22 February 2018 (UTC)[reply]

  •  Support. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:23, 22 February 2018 (UTC)[reply]
  • Of course, the adjustment has my full and  Strong support, since I never doubted one moment this is the correct approach. Sannita - not just another it.wiki sysop 10:35, 22 February 2018 (UTC)[reply]
  •  Strong support Linking data with a mix of licenses is just a gordian knot. --Andrawaag (talk) 11:50, 22 February 2018 (UTC)[reply]
  •  Question What's the plan for textual definitions?
    https://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L15#Senses doesn't show any but the substance of https://de.wiktionary.org/wiki/Leiter seems to be these.
    Similarly https://wikidata-lexeme.wmflabs.org/index.php/Lexeme:L13 compared to https://en.wiktionary.org/wiki/hard#English
    Would they have to remain at Wiktionary, be reduced to statements as on the prototype, added to another namespace or need to be re-written?
    --- Jura 11:53, 22 February 2018 (UTC)[reply]
    • You can see some of the definitions on the demo system (where it says "Führungsperson" for example). They would be CC-0 too. Just like with Wikipedia now we will build the infrastructure and collect data here that the Wiktionary editors are free to use as they deem useful for their work. Hope that helps. --Lydia Pintscher (WMDE) (talk) 07:19, 23 February 2018 (UTC)[reply]
      • Somehow I doubt there is much room for Wiktionary.org to make use of Wikidata's Wiktionary namespace to annotate its content further. Are there any plans for a structured way to enable this? Maybe a separate Wikibase(Wikidata) instance as for Commons?
        --- Jura 14:26, 24 February 2018 (UTC)[reply]
  • The choice of a permissive license is unfortunate but not entirely surprising, given the big corporate players that funded the creation of Wikidata. It's a departure from the early ideals of Wikimedia projects, of creating content that will be always free down the line. Now there's no point debating this, since it would make no sense to make the namespaces have incompatible licenses. The real discussion with the community should've been carrier out much, much earlier. NMaia (talk) 12:13, 22 February 2018 (UTC)[reply]
    This is fundamentally inappropriate, and most of all repetitive: we've been through this time and time again, of course there's no way to convince that no "big corporate players" were involved in the discussion and that it was a community decision, if you're convinced otherwise. Source: I was there when we discussed it. --Sannita - not just another it.wiki sysop 23:53, 22 February 2018 (UTC)[reply]
    Interesting, can you provide details when and when this "community decision" was made "offline"?
    --- Jura 07:00, 23 February 2018 (UTC)[reply]
    It wasn't made "offline" - and this is a final notice: please, do NOT put in my mouth words I've never spoken - it was made in the mailing list of Wikidata, while the project was still in beta. The first discussion was made in April 2012, then another in August 2012, and these are the first two discussions I can find just by casually browsing the ML archives. Check them out yourself if you don't believe me, I've got work to do, and frankly I'm tired of repeating the same things all over again. --Sannita - not just another it.wiki sysop 09:05, 23 February 2018 (UTC)[reply]
    I thought this was somehow related to Wikitionary, but it's about Wikidata in general. I took your "I was there" literally.
    --- Jura 20:53, 23 February 2018 (UTC)[reply]
  •  Strong support CC-0 has been a key of Wikidata's success. Mixing it with less-free licenses will create significant hurdles for on- and off-project users. --Magnus Manske (talk) 13:25, 22 February 2018 (UTC)[reply]
  •  Support ArthurPSmith (talk) 15:59, 22 February 2018 (UTC)[reply]
  •  Neutral I agree that using CC-0 for all data make senses. On the other hand, I believe that using such licence will not allowed to import a lot of interesting stuff from the Wiktionaries. Pamputt (talk) 18:13, 22 February 2018 (UTC)[reply]
  •  Support obviously. VIGNERON (talk) 19:59, 22 February 2018 (UTC)[reply]
  • I also have the same question as Jura, since senses seem like they'd be derived from existing Wiktionary definitions. Mahir256 (talk) 21:30, 22 February 2018 (UTC)[reply]
  •  Support This is indeed a major development. John Samuel 23:19, 22 February 2018 (UTC)[reply]
  •  Support It would be crazy to start mixing licenses now, good to clear this up right away. I9606 (talk) 03:14, 23 February 2018 (UTC)[reply]
  •  Support It might make it impossible to import data from Wiktionary, but in the long term it is better for reuse. Me too I am very attached to keeping data open, but Wikimedia has reached a stage where the embrace, extend and extinguish (Q1335089) strategy would not against us anymore, so better make the data as open as possible, which means CC0. Syced (talk) 06:21, 23 February 2018 (UTC)[reply]
  •  Oppose It don't think of any lexicographer or linguist who may accept to publish under CC0 a work they spent five to twenty years on. CC0 does not respect the time spent in collection of words and meanings, structuring the language for a dictionary and edition. CC0 is in favor of compagny that will just use the data without considering to diffuse the knowledge, it will not reinforce the free reuse but only the stealing of data. Finally, I think this decision concern wiktionarians and deserve a better explanation of the problem, one that include the pro and the con. A this point, I still consider you are doing a fork of Wiktionary in Wikidata with your own agenda. -- Noé (talk) 07:54, 23 February 2018 (UTC)[reply]
    • Who is being asked to "publish under CC0 a work they spent five to twenty years on"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:31, 23 February 2018 (UTC)[reply]
    • How does CC0 not respect the time spent? I think we are confusing the legal license with the provenance. If the data provider, also provides the proper references and qualifiers, it is respecting the work spent, don't you think? Yes, people can use wikidata content without pointing to this provenance, but what is it's value if they can't support those claims with original sources? Also, a lot of work is made possible with public funds, so not sharing those result with a public license is quite unfair?. I agree with Andy, nobody is pushing people to share they knowledge under CC0, if you don't like it you don't need to. But for those who would like to share knowledge publicly, CC0 provides the means. Different scientific resources did make the change to share the knowledge with the general public eg: example --Andrawaag (talk) 12:04, 23 February 2018 (UTC)[reply]
      Andy: Well, you're right, lexicographical data in Wiktionary could be written only by individuals and never by big imports from published sources. Good luck to start again from scratch.
      CC0 do not respect the time spent because it do not force reusers to mention the source of information. If references are provided, it is equal to diffuse it with CC BY or with CC0. Public funds = sharing with public licence, I agree, avec CC BY-SA is also a public licence, lucky us. I pointed out that I am quite sure a CC BY-SA licence may create a better environment to include integral of recent works directly given by their authors. You may not agree, but no study was provide for or against this, and I think a proper analysis and prospectives have to be made before such a vote. Noé (talk) 12:32, 23 February 2018 (UTC)[reply]
      Instead of rhetoric, please answer my question; "Who is being asked to 'publish under CC0 a work they spent five to twenty years on'"? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 13:50, 23 February 2018 (UTC)[reply]
      It was not rhetoric, maybe vague broken English by a non-native speaker, but not a manipulation through figures of style. I am not that mean. So, you cut half of my sentence. I was assuming a lexicographical project would like to have lexicographers to participate (people that already made dictionaries or linguists that did some lexicographical works and have already though about dictionaries issues), and I wrote a CC0 will not convince this kind of profiles to share data. But I was probably wrong by assuming such a goal for this project. More I read on this project and more I realize is not grounded on lexicographers needs and knowledge nor wiktionarians needs and knowledge but on wikidatians needs and vague idea of linguistic and lexicography practices and difficulties. Noé (talk) 14:19, 23 February 2018 (UTC)[reply]
      @Noé: So in your opinion, we should re-license Wikidata as WTFPL (Q152481) which is a little better than CC0 for the Public Domain software usages, but that opinion is not recommend by Free Software Foundation (Q48413) (cf. https://www.gnu.org/licenses/license-list.en.html#WTFPL). --Liuxinyu970226 (talk) 15:18, 23 February 2018 (UTC)[reply]
      I was not postulating anything for Wikidata in general, my messages were about the namespace for lexicographical data. As I understand it, WTFPL is made for software, not for data, so I don't get your point here. Noé (talk) 15:35, 23 February 2018 (UTC)[reply]
      Your English is clear. You said that you: "don't think of any lexicographer or linguist who may accept to publish under CC0 a work they spent five to twenty years on"; and I was asking you for evidence that anyone is being asked to do that. It now seems that you concede that no-one is. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:45, 23 February 2018 (UTC)[reply]
      @Pigsonthewing: Yes, I concede I had a wrong vision of this project goal. I though Wikidata is a project that host databases, and Lexeme will be a storage place for several lexicographical databases. So, I though it will be a place to include data collected by lexicographers such as published dictionaries, published wordlist or existing online databases (for example RefLex). It appears I was misguided and Lexeme will only host one lexicographical database. So, published documents will not be added into it. Lydia first dot was about partners and I imagined it was implicit mention to owners of lexicographical database. If it was the plan, CC0 would be a wrong option in my opinion, knowing a lot of people that published already this kind of database. As I wrote, my opinion is not a good indicator. A prior consultation directed to such owners of database could help the choice for the licencing to be based on stronger arguments. Finally, this consultation appears to be unnecessary. Fine. Then, I'll keep my vote to oppose by Yair rand arguments. -- Noé (talk) 13:43, 26 February 2018 (UTC)[reply]
      You'll note that Yair rand wrote "I think that CC-0 is the right license for this". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:30, 27 February 2018 (UTC)[reply]
      He also wrote "The discussion is happening here, as opposed to with any of the Wiktionary communities, which is inappropriate." and I keep my vote oppose to the procedure itself not to the choice offered here. Noé (talk) 18:27, 1 March 2018 (UTC)[reply]
      Yes, and I have explained below why he was wrong to write that. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:19, 1 March 2018 (UTC)[reply]
  •  Question If Wikidata data are in CC0 and a Wiktionary wish to include some, is this copyfraud? Noé (talk) 08:04, 23 February 2018 (UTC)[reply]
    • No that is fine. --Lydia Pintscher (WMDE) (talk) 08:09, 23 February 2018 (UTC)[reply]
      • No, it's not fine. What Wikidata community is already doing is massive copyfraud. I found a jurist specialized in free licenses and so far she confirmed that this doesn't seem legal at all. --Psychoslave (talk) 09:24, 23 February 2018 (UTC)[reply]
        • Psychoslave you misunderstood the question (or maybe Noé didn't ask what he meant to ask) the question here is the reuse of Wikidata data outside Wikidata, so in this case, the re-user is responsible ; there is no way the Wikidata community can do copyfraud in this scenario. I'm guessing you are thinking of the import of data from an external source inside Wikidata (here copyfraud by the Wikidata community is technically possible) but this is a different subject and one that has been raised multiple time already and even answered with some professional legal advice. Cdlt, VIGNERON (talk) 18:56, 23 February 2018 (UTC)[reply]
          • You are totally right that I was talking about data from an external source inside Wikidata, and this as also consequences for re-users. If Wikidata is creating a data bank by illegal means, then any re-users of its data set is also concerned. I agree that using an other license would not completely solve this problem, but using CC-by-sa would at least solve it for importation from Wiktionary. Would you be kind enough to provide links about the legal advises you have in mind? --Psychoslave (talk) 10:09, 28 February 2018 (UTC)[reply]
    • Lydia, can you provide evidences for your assumption? Noé (talk) 12:32, 23 February 2018 (UTC)[reply]
  •  Strong oppose With no surprise for those who followed my research on the topic, I strongly oppose to this. I will give more feedback bellow as soon as I find time for this. --Psychoslave (talk) 09:24, 23 February 2018 (UTC)[reply]
  •  Neutral A database of lexical information lacking definitions will be quite bland. So what will happen? Data will be imported from CC0-compatible sources or reworded, probably starting with definitions from WordNet and out-of-copyright dictionaries. Sounds like a reboot of Wiktionary in a way, this time with a more permissive license. – Jberkel (talk) 09:42, 23 February 2018 (UTC)[reply]
  •  Support - CC0 FTW! Wittylama (talk)
  •  Support --Jarekt (talk) 17:00, 23 February 2018 (UTC)[reply]
  •  Comment Why is this discussion happening here ? Surely the lexemes aren't going to be managed by the Wikidata community -- they are for the Wiktionary community to administer, and will be subject to that community's policies on content and every other aspect -- just as the upcoming CommonsData wikibase will be administered by Commons, not by us. That's one of the points for them being federated wikibases. This is not a decision for us to make. It is up to that community to choose how they wish to licence their work. My view therefore is we have no standing here; this is not our choice to make. This discussion is therefore not appropriate and should be closed, and/or re-started in a more appropriate forum. Jheald (talk) 18:07, 23 February 2018 (UTC)[reply]
    • The data is going to be here on Wikidata in a new namespace. It is the license of the content on Wikidata. --Lydia Pintscher (WMDE) (talk) 18:12, 23 February 2018 (UTC)[reply]
      • But do we think we, the Wikidata community, are going to be the ones administering it, making day-to-day rules and guidelines for its content and organisation? Or the Wiktionary community? Far better, it seems to me, if the Wiktionary community felt that they were the owners of these items. Jheald (talk) 18:45, 23 February 2018 (UTC)[reply]
        • @Jheald: Just like Wikidata was not created with the only purpose to support Wikipedia, lexicographical data on Wikidata doesn't have the only purpose of supporting Wiktionary. Of course, Wiktionary communities will be free to experiment with the data, improve it, include it in their projects, mixing it with other content they have. Other parties can find interest in data about words: students, researchers, applications developers... we want the data to be structured, accessible and reusable for everyone without distinction. Anyone helping improving the data will take part in the ownership. Lea Lacroix (WMDE) (talk) 18:41, 25 February 2018 (UTC)[reply]
          • @Lea Lacroix (WMDE): Project divisions matter, and giving current Wikidata admins and policies authority over lexical structured data has the potential to be really divisive and problematic and cause a lot of friction between the various groups. These are very different types of communities we're dealing with, and Wiktionaries are more likely to actually know what they're doing with regards to lexical data. I've been a contributor and admin on both Wikidata and the English Wiktionary, and I'm quite confident that these communities will not get along at all if an adversarial context is built by attempting to subsume Wiktionary into Wikidata entirely from the outside (as this will likely be viewed). Wiktionarians have been building up free lexicographic content within Wikimedia for more than 15 years, know very well how to do it, and would likely make up most or all of what could be a thriving community for structured lexical data if we don't strangle it from the start by trying to do it here, a project really not built for that kind of thing. (Lexical content is different.)
            I think that CC-0 is the right license for this, but I'm still going to  Oppose this change, because this community shouldn't have the authority to make this decision in the first place. --Yair rand (talk) 20:11, 25 February 2018 (UTC)[reply]
      @Lydia Pintscher (WMDE): Is there any reason for the data to be here in a new namespace? What advantages come from having it here, as opposed to a different project? This is a question that comes up in many areas, and I think there are some good questions to ask when trying to figure this out.
      • Does the new content fit well within the current project's scope and structure?
      • Will any existing specialized Wikidata gadgets be useful for this new type of content?
      • In Wikidata's primary forums, is there likely to be any overlap between topics discussed relevant to existing Wikidata content and topics relating to the new lexical content?
      • Are the project's current policies well-built for the addition?
      • Will there be any significant overlap between those watching the recent changes feed for Wikidata items and for Lexemes?
      • Are there any content-level benefits to having them on the same project?
      In my opinion, the answer to all of these questions is "no". Others may disagree, but there needs to at least be some discussion about this before rushing into assuming that we're just creating a new namespace here and leaving the decisions to the existing Wikidata project. --Yair rand (talk) 21:03, 25 February 2018 (UTC)[reply]
      • Hey Yair, thank you for your questions. There are many reasons for not creating a new project. The biggest one being that the lexicographical data and the data we have in items now should be closely connected. In addition Wikidata is _the_ central knowledge base for Wikimedia and this data is part of that - there is no central place like that for Wiktionary. This data is supposed to not only be used by Wiktionary but also a lot of other re-users, just like our current data isn't just used by Wikipedia. Additionally we have a community here who has spend the past 5 years taking care of structured data in Wikimedia and has a lot of experience in that. Starting that from scratch in a separate project wouldn't be helpful. And on top of that none of the other Wikimedia projects have their own knowledge base because we want to share the data across all of them. (Commons will be a bit different but is also not comparable to the case of lexicographical data.)
        You asked if any of the gadgets will be useful for the new content. Yes at the very least the merge and constraint gadgets as well as the primary sources tool are going to be useful there. (I have not checked the rest.)
        About overlapping discussion content: Yes I believe so. For example in the property definitions as well as everything we've learned about data quality processes over the past 5 years as well as usage of the data inside and outside Wikimedia.
        Are our current policies well-built for the new content? Maybe or maybe not. But that was no different when we added other new Wikimedia projects like Wikinews. We adapt where needed.
        Will there be any significant overlap between those watching the recent changes feed for Wikidata items and for Lexemes? I would hope so because the statements on Lexemes will link to a lot of the items.
        Are there any content-level benefits to having them on the same project? Yes a lot because the interconnections of the items and Lexemes through statements are a huge part of the value we are going to deliver with this. And people will for example want to run queries that include data from items and lexemes together.
        I hope this clarifies things a bit. --Lydia Pintscher (WMDE) (talk) 18:06, 26 February 2018 (UTC)[reply]
        @Lydia Pintscher (WMDE): The links between items and lexemes do not seem to be a valid justification for merging, assuming federated wikibases are going to function as expected. Wikidata, like most Wikimedia projects, currently details concepts and things. Wiktionary explains the means by which people communicate them. The substantial difference between explaining lexemes and ordinary things is clearly agreed upon, otherwise we wouldn't be talking about a separate namespace and format in the first place. Wikidata's policies and practices are built around things like labels, descriptions, aliases, sitelinks, badges, the ontologies we build around entities. We need to determine how to structure the links between the various concepts in the world, whether or not an item is notable. Our items are about people, places, concepts and ideas and objects, eras and events, art and books and organizations and families and ideologies. Lexical data has none of that.
        The Wikidata community is very good at what it does. Lexical content, structured or not, is not what it does. --Yair rand (talk) 03:48, 28 February 2018 (UTC)[reply]
      @Yair rand: You summed it up nicely. One of the technical/political shortcomings I see with Wikidata is indeed the centralisation. There is often talk about “the Wikidata community”, but in reality there'll probably be a multitude of (sub-)communities, but as you point out the power structures lie in the current userbase/adminship. It's difficult to create a sense of shared ownership in this context, especially when there is an impression that certain decisions are "forced" onto other communities. I'm still optimistic about the long-term success of the project, and wonder what could be done to remove some of the friction you mentioned. It's a rocky start, to say the least. – Jberkel (talk) 10:11, 27 February 2018 (UTC)[reply]
  •  Oppose Hardening my position to "Oppose", per Yair rand above. This project started out as "Structured Data for Wiktionary". Somewhere along the line the aim changed, and it became "Wikidata for Lexicographic Data". I think that change was a mistake. I don't think there can healthily be two rival lexicographic projects at Wikimedia. It's just a recipe for poison, constant friction and bad blood. I think the two projects would end up tearing each other apart, with collateral damage all round, and both probably rapidly bleeding out support. So my definite view is that if this is going to be done at all, it needs to be done with the active support of Wiktionary. If this project does not have the support of Wiktionary, then it should be shut down. I hear the points that Tpt in particular makes well below. But, if the price for proper Wiktionary integration and Wiktionary community support is, for all free-text values to be CC-By-SA, as suggested by Deryck Chan below, at least until the Wiktionary community decides otherwise, then that is a price I think we have to be prepared to pay. Or we should pull the plug and shut the whole project down. Jheald (talk) 13:03, 27 February 2018 (UTC)[reply]
  •  Support --Pymouss (talk) 20:27, 23 February 2018 (UTC)[reply]
  •  Comment It would be good to know about why the alternative approach (with the same model) was rejected. Please see my question/comment at: Wikidata_talk:Lexicographical_data#Separate_installation_for_Wiktionary_?.
    --- Jura 20:53, 23 February 2018 (UTC)[reply]
  •  Oppose. This is exactly what we feared all along over at Wiktionary: that Wikidata would start handling lexicographical data without even bothering to consult the people who already create and manage lexicographical data on Wikimedia every day. Given the licensing situation and the glaring lack of communication, two parallel projects are going to work on the same problems, but separately. It should concern everyone here that out of the only usernames I recognise as active in any of the Wiktionaries, none of them have voted Support. Metaknowledge (talk) 22:07, 23 February 2018 (UTC)[reply]
  •  Weak oppose. I would prefer that structured data be CC-0 and free-text data be CC-By-SA in general. The Lexeme namespace is likely to contain a lot of free text (definitions and example sentences) which will fit better with CC-By-SA than CC-0, though I agree that the linkages between different Lexemes and Items should stay in CC-0 to avoid database rights disputes. Deryck Chan (talk) 11:51, 24 February 2018 (UTC)[reply]
    • @Deryck Chan: Did you have a source « Lexeme namespace is likely to contain a lot of free text », in the contrary, from what I've seen, there is almost no free text (and meanwhile, there is already a lot of free text on the Q items). Cdlt, VIGNERON (talk) 09:09, 26 February 2018 (UTC)[reply]
      • @VIGNERON: m:Wikilegal/Lexicographical Data, which you have contributed to, implies that the Lexeme namespace will likely carry a lot of copyrightable "Definitions with room for creativity", "Pragmatic information", and "Encyclopedic information and example sentences". The existing example Lexemes in the WMFLabs site don't have any free text, but it is hard to imagine that definitions, pragmatic information, and example sentences won't make their way to the Lexeme namespace quickly. I agree that there is already a lot of free text on the Q items, but you may remember the days when Wikidata items only had labels and sitelinks. Lexemes will likely contain more free text than Q items after
  •  Oppose Because it's not interesting for people. The thing you create isn't reality of languages, of linguistic studies and of community needs... Lyokoï (talk) 18:38, 24 February 2018 (UTC)[reply]
  •  Support using CC-0 for lexicographical Wikidata data (and only importing CC-0-eligible data). Jc86035 (talk) 10:16, 25 February 2018 (UTC)[reply]
  •  Support --Pasleim (talk) 19:52, 25 February 2018 (UTC)[reply]
  •  Strong oppose for the same reasons as Noé's. I could'nt say it better. Delarouvraie (talk) 08:31, 26 February 2018 (UTC)[reply]
  •  Question Lydia first dot is about "many partners" liking CC0. Who are or will be the partners for lexicographical data? How their opinion was consulted to know if they have the same position as previous partners on other kind of structured data? Noé (talk) 13:49, 26 February 2018 (UTC)[reply]
    • I obviously don't know everyone who will be contributing and using the lexicographical data we will have. (I don't even know everyone using our current data.) I talked to a number of researchers and people building applications (for translation and language learning for example) from big and small projects and companies. They all said CC-0 is the right choice. For the contribution side I have talked to considerably fewer. There it seems to be ok outside Wikimedia as well. As I said above we have always leaned towards making re-use easier if we have to make the choice. --Lydia Pintscher (WMDE) (talk) 18:36, 26 February 2018 (UTC)[reply]
  •  Support as per my arguments five years ago. --Denny (talk) 16:00, 26 February 2018 (UTC)[reply]
  • Note: Some of the reasons why an attribution-required condition is harmful for open data are enumerated in point 4 of 'The 5 Things Open Data Publishers are Doing to Keep their Data Closed'. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:50, 26 February 2018 (UTC)[reply]
  •  Strong support I believe that having CC0 for lexemes data is very important for the success of the project. A few reasons why:
    • Having a license like CC BY-SA would introduce unclear legal implications (e.g. is copyright applicable in this case? if not people could claim using CC BY-SA is "copyfraud") and inequalities between re-users related, between other things, to their lawyer support and the existence or not of database rights in their jurisdictions. Reference: Luis' blog posts (1, 2, 3, 4).
      • To my mind, Wikidata is right now in extremely unclear legal state. Just bluster that the database is under CC-0 doesn't make it so. Without proper data traceability facility, it's impossible to prove this claim true. On the other hand, massive import of data from other data banks, like Wikipedia, are well known, which is illegal in Europe and United States without the data repository owners permission, as far as I understand. Note that this point is not really about CC-0, as switching to any other license would not make this problem magically disappear. --Psychoslave (talk) 09:56, 28 February 2018 (UTC)[reply]
    • Let's assume that we have set of facts would require attribution. It would require people reusing data to do proper attribution. And it is hard. For example, what is currently done for OpenStreetMap is just having a "copyright OSM contributors". But it is definitely not nice (not much that having a "source: Wikimedia Commons" near a Commons image) and I am not sure that having a "copyright Wikidata contributor" is going to please the "lexicographer or linguist who may accept to publish under CC0 a work they spent five to twenty years on" that Noé described. More, if the consider that that proper attribution should be kept for CC BY-SA compatible data imported into Lexeme namespace, it means that the Query Service and other tools that allows to retrieve Wikidata content should be able to return the proper attribution when they return query results. I believe for that we would want the query service to return the minimal set of "sources" to credit for the result set. Indeed the same query result fact could be derived multiple times from different facts in Wikidata, each possibly having different sources. It looks like an optimization problem and so it seems likely that having an efficient system to provide such sources is going to be very hard (it's just my feeling, I have not studied this problem in details. If you are interested in this topic search "sparql provenance semirings" in your favorite search engine).
      • To my mind, it's not about having your name written everywhere, that is just pissing territory vanity in the noosphere. This is about data traceability. And also this exposal of the problem seems to mix two different topic: 1. keep record of provenance and respect license of sources, 2. ability to generate report about data provenance even for mashed up data set. I agree that the later is a hard problem, but the first is technically extremely easy, the only difficulty here pertain to importation policy management. --Psychoslave (talk) 09:56, 28 February 2018 (UTC)[reply]
    • As Lydia said having a more restrictive license is going to hit mostly small reusers: big companies have lawyers to take care of all the legal implications and if they choose not to reuse Wikidata content they could without hurting their revenues. For example Google have its Knowledge Graph and Microsoft its Satori database, both created before Wikidata and I do not see any reason to to think they (or similar companies) could not do the same with lexicographic data if they need them. Tpt (talk) 20:53, 26 February 2018 (UTC)[reply]
      • Sure big companies have the money for lawyers, but how does CC-0 help in any way here? Knowledge Graph use, inter alie, Wikidata input, and Google even was even a major fund provider for launching Wikidata, plus they hired Denis which was the project leader (not sure what its official role today), but before that they also bought the Freebase which was under CC-by-sa. There are no doubt that big companies are happy with the current Wikidata policy. I wasn't pointed with any evidence of small companies or individual that would be angry to see a demand of more equity regarding freedom of derivative works. --Psychoslave (talk) 09:56, 28 February 2018 (UTC)[reply]
  •  Support Toni 001 (talk) 11:19, 27 February 2018 (UTC)[reply]
  •  Support From the point of views of Wikidata CC-0 is a good choice. But doing this without prior discussion with users from Wikitionaries is a bit inappropriate.--Jklamo (talk) 11:52, 27 February 2018 (UTC)[reply]
  •  Strong support --Egon Willighagen (talk) 16:43, 27 February 2018 (UTC)[reply]
  •  Strong support given the ridiculously complicated mixture of copyright and database rights for data in different jurisdictions having the simplest to reuse license is very helpful. John Cummings (talk) 17:53, 27 February 2018 (UTC)[reply]
    • @John Cummings: Some lexicographic data here in CC0 and some lexicographic data in Wiktionaries with CC BY-SA. This is a mixture of license, don't you think? Noé (talk) 09:01, 2 March 2018 (UTC)[reply]
      • @Noé: The issue is that an individual fact cannot be copyrighted and so cannot have a license, copyright only applies to databases of facts in some jurisdictions and has database rights in others. Also database rights or copyright of a database is only broken when 'a significant portion' of the data is reused, the response I've had from lawyers I've asked about this is that there is no case law to define 'a significant proportion' either in terms of percentage or overall size or anything else. --John Cummings (talk) 22:50, 6 March 2018 (UTC)[reply]
  •  Strong support In the above discussion I see a lot of off-topic objection and no convincing reasons for restricting the structured data under discussion. MartinPoulter (talk) 12:35, 28 February 2018 (UTC)[reply]
  •  Strong support As a researcher, I find CC-0 much more welcoming than other licenses. We are trying to invest as much resources as possible into contributing to Wikidata with our research, and complicated or overly restrictive licensing terms are an obstacle for us, since we are not legal experts. We don't want to consult lawyers each time we want to try out some new idea and put the results on the Web! (Neither do we want to disrespect the legal terms chosen by a community, like many researchers do!) Also as a data producer and contributor, I am strongly in favour of CC-0. My contributions are donations, which I hope to be useful to all of us (not just for those with a legal department at their command). In research, attribution is usually not done because of legal force, but because of academic standards that each research community has to hold up. I don't believe legal terms are an effective way to enforce respect and honesty in research. --Markus Krötzsch (talk) 17:01, 28 February 2018 (UTC)[reply]
  •  Strong support For the reasons stated above. I think our work (and many others') in the biomedical domain shows that there are significant licensing challenges in any data integration effort. Wikidata has greatly simplified access to CC0-licensed resources, and also spurred several biomedical groups to change their license to CC0 based on (at least in part) what it enables within Wikidata. Cheers, Andrew Su (talk) 17:08, 28 February 2018 (UTC)[reply]
  •  Support What else? --Succu (talk) 21:16, 28 February 2018 (UTC)[reply]
  •  Oppose I was a bit hesitant on this and not really convinced by the arguments advanced.
    Wikidata's Wiktionary namespace is likely to supplant much of Wiktionary.org. Contributing to Wikitionary.org hadn't really been favored by the use of MediaWiki-software and I don't think much had been done to develop its infrastructure over the years. Now that a Wikibase-structured installation is to be created, I don't think there will be that much done to integrate its content in Wiktionary.org in an efficient way. At least, I haven't seen any prototype for that. There may not be much use of doing that either, as most if not all information can be included in a Wikibase site in a structured way. This is fundamentally different to Wikidata's main namespace that replaced interwikis in Wikipedia and maybe some infobox content. Wikipedia as such continues to operate as an encyclopedia.
    Similar to Wikipedia, Commons and Wiktionary.org use Wikimedia's default licensing model and this hadn't hindered its growth. Commons will eventually have its separate installation of Wikidata(Wikibase) and continue with its licensing model. So the use of structured data doesn't seem to constraint people to use cc0 nor constraint them to store all data within Wikidata itself.
    Already now, federated queries and queries to Wikipedia content are possible leading users to retrieve content with different licenses in one query.
    From a query perspective, it wouldn't really matter if Wiktionary content is on a separate Wikidata(Wikibase) installation as that of Commons and content could still be made available to Wikidata users. We could hold Wiktionary content on a separate installation and the namespace question wouldn't come up. Additionally, it's not clear why a distinction within Wikidata couldn't be made especially as for now no textual content is available.
    From a Wikidata perspective, the suggested approach initially made sense, but then I noticed it prevents us from expanding some of the dictionary related elements we already have with content from Wikitionary and Wikipedia.
    Further, the solution doesn't seem ideal for the Wiktionary community: most if not all of its content would be held outside the sites themselves.
    In conclusion, I think it would be good to develop an alternative installation for Wiktionary content as it's being done for Commons. It's regrettable that this wasn't evaluated and presented as an alternative from the beginning.
    --- Jura 12:11, 1 March 2018 (UTC)[reply]
  •  Support XIII (talk) 09:21, 2 March 2018 (UTC)[reply]
  •  Support. Rehman 10:22, 2 March 2018 (UTC)[reply]
  •  Support Language resources are vital for natural language processing technologies. In my experience as a researcher, it is very difficult to easily find and freely access them, since they are often protected by license barriers. CC-0 is essential here! --Hjfocs (talk) 16:52, 2 March 2018 (UTC)[reply]
  •  Support. I understand the scares of some Wiktionary editors who fear that Wikidata will put Wiktionary data that are (for them) in CC-BY-SA, but I think that CC0 is the best licence for the reuse as said above by Hjfocs. Tubezlob (🙋) 17:54, 2 March 2018 (UTC)[reply]
  •  Support Ainali (talk) 22:33, 2 March 2018 (UTC)[reply]

Discussion elsewhere

Please be aware of wikt:Wiktionary:Beer parlour#Wikidata and CC0 licence for lexicographical data. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 22:28, 23 February 2018 (UTC)[reply]

Also in the French Wiktionary: wikt:fr:Wiktionnaire:Wikidémie/février 2018#Licence pour l'espace de nom Lexeme sur Wikidata. --Thibaut120094 (talk) 17:30, 25 February 2018 (UTC)[reply]
Mentioned in Actualités, the French Wiktionary monthly magazine (similar as Signpost). Also exist in English. Noé (talk) 10:30, 1 March 2018 (UTC)[reply]
Mentioned in Regards sur l'actualité de la Wikimedia et d'ailleurs, French Wikipedia monthly magazine. Noé (talk) 18:17, 1 March 2018 (UTC)[reply]

A reply to the proposal

One of the remaining tasks is around licensing. Since the beginning of Wikidata all our structured data is released under CC-0. This has helped significantly with spreading our data widely and quickly and thereby helping us give more people more access to more knowledge.
There is no way to check whether a different license, like the ODbL used by OSM for example, would have done a less good job at this. There was no A/B test with different license. On the other hand impossibility to enforce fair same condition of use on derivative works make it far less likely to be sustainable, and also go against a good data traceability. Both this feature go against our strategic direction of knowledge equity and ability to use different forms of free, trusted knowledge.
I am convinced it is in the best interest of Wikidata to extend CC-0 to all structured data namespaces. The reasons (in addition to my reasons for CC-0 in general)
This have already been been replied in November.
We have fared very well with CC-0 so far and many partners use it as one of the main reasons they are attracted to Wikidata - both for re-use and contribution of data.
It would be interesting to have pool about which actors did accepted to contribute to Wikidata for this specific reason, and even better comparison of how many actors refused to participate due to this specific reason. Without that kind of metrics, no success can be honestly attributed to this license choice.
Having a mix of licenses is a potential legal minefield that can be exploited by some actors, threatening not only re-users but also our own contributors.
This is clearly FUD, and it's sad to see such a practice used here. All the more, pretending that Wikidata is under CC0 is not enough to make sure it is. Without appropriate license tracking of data sources, the legal uncertainty of Wikidata growth with the base itself. As there is not such a scrupulous control, and that on the contrary Wikidata community refuse to admit its massive imports from other sources, like Wikipedia, is illegal.
Indeed A person infringes a database right if they extract or re-utilise all or a substantial part of the contents of a protected database without the consent of the owner. It should be noted, however, that extracting or re-utilising a substantial part of the contents Database rights: the basics
In this circumstances, until the situation is cleared, using Wikidata as input is actually a legal Russian roulette: there's no harm playing with it until it will blow into your face.
It is a huge hassle for re-users, in particular small re-users like individual contributors, hobby developers, and small organizations, and will lead to less usage by these, and thus to less spreading of our knowledge.
On the contrary, that's typically the kind of public that would be positively impacted by a copyleft license on the overall. It's an other far more annoying problem for really big transnational business, obviously. There are the only kind of structure that benefit of this lake of equity in reuse.
It is the sound thing for data - much better explained by Luis in his blog posts (1, 2, 3, 4).
That is one person which is explicitely stating "Wikidata did the right choice", it's obviously a strongly biased source. Not that everything is to drop there, but there is clearly no balanced view of pros and cons in this blog posts.
It will mean that we can not import some data from Wiktionary and other sources that is incompatible with CC-0 but that is already the case now.
No it means you won't be able to legally import any substantial part of Wikitionary within Wikidata. But judging by how large extract of Wikipedia were injected into Wikidata, it's seems that legality is not a very important matter for Wikidata. Sure, it might be not illegal in some unknown country, but at least in Europe and United-States, it seems Wikidata blithely crossed the line of legality. The Wikidata team promised it would not import data from Wikipedia, but they broke this promise. Based on this experience, they can be trusted about statement that they won't let happen massive import and license laundering of Wikitionary works.
We have always leaned towards making re-use easier at the expense of easy importing.
And more importantly, at the expense of legal confidence.

--Psychoslave (talk) 09:20, 28 February 2018 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── Here's Lydia's response to your tl;dr screed from the mailing list:

I understand you care a lot about this topic and are posting about it in many places but I have a personal rule that a lot of the people in Wikidata know. I am willing to discuss and explain basically anything on a calm and rational basis. (And I did this on-wiki I believe.) The rule is simple: The more loud, aggressive and pushy someone gets about a topic the less likely I am to engage. This rule has a simple reason: I don't want Wikidata to get into a spiral of shouting. If we do this people get into the mode where only if they shout they get heard so they shout all the time. This is toxic for a community. So I fear I can't contribute to this thread beyond this message.

-- Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:29, 28 February 2018 (UTC)[reply]

What is your point here? Do you mean any feedback should be stopped to be given as soon as it goes further than "I agree with your proposal"? I did agree at the time to the reply of Lidya as I perceived how my answer back then could be qualified as "loud, aggressive and pushy". I did apologize for this, both on the list and off-list to people that might have felt personally attacked. Although to my mind her resolution to not participate at all in the conversation was not an appropriate conduct ever, her critic on the form was fair, but her avoidance of any reply to the substance was far less valid. I didn't want to add fuel to fire, so I just shut up on this.

Now in this current thread, so far I'm convinced my intervention can not be qualified under the same reproach. So please point me to where you do believe I've been impolite, or reply to critics with valid factual arguments, or come with new well documented arguments in favour of the proposal if you have some. But please don't throw my past errors at my face like anathema and perpetual prohibition of giving critical cordial feedback. Maybe you are perfect, but mere mortal like me only improve through recognition and overtaking of their errors, unduly perpetually reproaching them about one of them is not a way to help them progress. Not being able to provide critical feedback is also toxic for a community. --Psychoslave (talk) 09:02, 1 March 2018 (UTC)[reply]

Past errors? You were talking about "the Wikidata team self-hagiographic rethoric" literally in the hour before your response here. I would say that if you don't even recognize any issues with what you write without having to be explicitly pointed to them, I don't see enough benefit in spending time to engage with you. --Denny (talk) 17:42, 1 March 2018 (UTC)[reply]
Maybe it would be better to put back this quote in its context where its part of a direct reply to @Noé: who complained several time in different channels of the lake of cons exposure in this section proposal. So this was more an allusive joke with the over pedantic wording intended to reinforce it. This wasn't "loudly pushed in an aggressive manner" in this thread, it was pulled out of a unlinked context where it can at worst be qualified as sarcastic. Now instead of trying to discredit each other on ad hominem bases, what about talking the real problems that create this tensions and try to solve them?
Namely, how will this choice of CC-0 be the best option for Wikitionaries when all its community have built until now is covered by an incompatible license preventing any massive import that wouldn't cast heavy legal doubt in many countries where possibly most contributors are currently living? Sure Wikidata doesn't target only Wikitionaries, and that's good to be open at other uses. But our Wikimedia community should be given the tool to leverage on what was already achieved without this legal doubtfulness. On this regard, using CC-BY-SA for the Lexeme namespace would make far more sense. And it's not only Wikitionary, look at the license of database like Google ngrams or Les Vocabulaires du Ministère de la Culture et de la Communication: CC-BY-SA 3.0 unported. This are just two very interesting sources that won't be includable if CC-0 is retained as exclusive license for the Lexeme, and there are many other out there. --Psychoslave (talk) 21:39, 1 March 2018 (UTC)[reply]

Wiktionary edits

 Comment Hey folks.

  • support. ~5910 edits in wiktionaries (~5000 by VIGNERON alone);

VIGNERON (~5055 edits), I9606 (0 edits), Andy Mabbett (97 edits), Sannita (12 edits), Andrawaag (0 edits), Magnus Manske (0 edits), ArthurPSmith (0 edits), Jsamwrites (0 edits), Syced (278 edits), Wittylama (4 edits), Jarekt (24 edits), Pymouss (233 edits), Jc86035 (38 edits), Pasleim (0 edits), Denny (edits), Tpt (0 edits), Toni 001 (0 edits), Jklamo (150 edits), Egon Willighagen (0 edits), John Cummings (1 edits), MartinPoulter (3 edits), Markus Krötzsch (0 edits), Andrew Su (0 edits), Succu (0 edits), XIIIfromTOKYO (0 edits), Rehman (0 edits).

  • oppose ~158 027 edits in wiktionaries:

Jura1 (0 edits), Noé (13181 edits), Psychoslave (2885 edits), Yair rand (32394 edits), Jheald (1 edits), Metaknowledge (83933 edits), Deryck Chan (8 edits), Lyokoï (23023 edits), Delarouvraie (2602 edits)

  • neutral/so-so ~127000 edits in wiktionaries:

Pamputt (~102000 edits), Jberkel (23187 edits) Nmaia (1740 edits).

In whose interest is this particular arrangement? Not wiktionary's one, it seems. Shouldn't wiktionary communities have a say? After all, who's gonna look after lexicographical data in Wikidata? The "0-wiktionary-edits-guys from above? strakhov (talk) 16:09, 2 March 2018 (UTC)[reply]

Are you arguing that if a single contributor - Metaknowledge - had a change of mind, it would be all fine?
Also, I know I haven't contributed to the Wiktionaries a whole ton (although quite a bit more than what you say, I feel rather omitted :) ), one reason why I did not contribute more was because, to be honest, the idea of having a dictionary of every language, separately maintained by every language, a rather unachievable one. I was convinced for more than a decade that an approach where we centralize the data and maintain it only once is far more productive. In fact, I am arguing that by having lexicographic data in Wikidata we will not only see current Wiktionary contributors contribute to Wikidata, but we will have an influx of new contributors that are currently not contributing to Wiktionary at all. We saw the same in Wikidata with respect to Wikipedia. I would even go so far and make a bet about how long it will take to have more active contributors working on the lexicographic data in Wikidata than we currently see contributors in any of the Wiktionary projects, if someone is willing to take that bet. This is, in my mind, a great opportunity for increasing the number of contributors, and for increasing the chance for the Wiktionary projects to achieve their mission. --Denny (talk) 16:58, 2 March 2018 (UTC)[reply]
Nop exactly. I just meant the only significant contributor in Wiktionary supporting this proposal was apparently VIGNERON (sorry if I missed something in my quick recount). I do not see how antagonizing an entire project is good for us, even if it attracts some additional guys from the outside. I do not oppose 'structuring' Wiktionary, as I find pretty inneficient the work done there too (anyway, my wiktionary-experience is pretty pretty scarce), but I'd try to involve those communities instead of outvoting them here by brute force. To ask them and to give them what they need. If a significant (is it?) part of Wiktionary communities feel their work is plagiarised or miss-licensed by this Wikidata-CC0-lexeme-approach, maybe it's time to rethink the proposal (whether it's "legal" or not. That's on the legal team, I'm not a lawyer). strakhov (talk) 17:32, 2 March 2018 (UTC)[reply]
Now do the same counts for Wikidata edits - this is, after all, a discussion of what licence to use on Wikidata - something which more than one of your high-scoring Wikitionary examples have not addressed at all. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:28, 2 March 2018 (UTC)[reply]
Well. You imply that this thing being stored in Wikidata is something already set in stone. Maybe it's not, or at least it shouldn't, as Jura pointed out. strakhov (talk) 18:59, 2 March 2018 (UTC)[reply]
Imply? I'm able to assert it with confidence. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:08, 3 March 2018 (UTC)[reply]
Yeah, whatever. Let's see if I get your point:
  • Should we create the lexeme space? Yeah, it's awesome.
  • But should not wiktionarians have a say? Nop, because this is all about wikidata and licensing, and we the wikidatians Template:Sic know so much about that stuff.
  • But couldn't this thing be installed in Wiktionary instead of here? Nah, I assert with full confidance it has to be here.
Well, I don't know, this all seems pretty shallow, doesn't it? After all Wikimedia Commons is going to host their metadata in their own project, I do not know why wiktionarians should not host their lexemedata there. Sincerely, hosting it in Wikidata seems an excuse for licensing it with CC-0 because mixing licenses is so bad. So, it is here. Then it should be CC0. And here. And CC0... And so on. strakhov (talk) 20:06, 3 March 2018 (UTC)[reply]
"Let's see if I get your point" You don't. And don't try to put words into my mouth. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:22, 3 March 2018 (UTC)[reply]
Then try some reasoning instead of your it has to be because it has to be in Wikidata because I assert it. Word the ideas coming out from your mouth as you prefer, I don't care. strakhov (talk) 20:28, 3 March 2018 (UTC)[reply]
[ec] Didn't I just tell you not to try to put words into my mouth? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 20:31, 3 March 2018 (UTC)[reply]

Just a comment, not a vote, by Lmaltier (250.000 contributions on fr.wiktionary, and about 1.342.000 page creations including creations by bot). I'm not interested by this vote, because I think that it would be best for Wikidata not to import anything from Wiktionaries, and best for Wiktionaries not to import automatically anything from Wikidata. I would oppose to any automated change on fr.wiktionary when something is changed on Wikidata (manual exploitation might be possible noneless, e.g. if there can be some alert when linguistic Wikidata info is corrected, because this might be something to be checked on Wiktionaries, but this interest is minor). Wikidata might be helpful to some external projects, but I think that it will be negative for Wiktionaries, as it might attract some good contributors to Wiktionary's detriment (what happened with OmegaWiki). It might even be fatal to some Wiktionaries, and remember that Wiktionaries are what's useful to readers (no reader will ever read Wikidata). From the strict Wiktionary point of view, I would be happy if the Lexeme namespace is discontinued altogether, and I think that its negative consequences would probably outweigh its usefulness. Lmaltier (talk) 20:13, 2 March 2018 (UTC)[reply]

  • It is a bit odd that the proposal hasn't much support by people who actually edit Wiktionary.
    Personally, I was hoping the new features would allow me to expand Wiktionary-related content at Wikidata I already edited (some was substantially expanded or started by myself) and I notice now that this wouldn't be possible if we moved ahead with the proposal. Looks like we should have made a better use of other Wikidata features earlier.
    --- Jura 04:39, 3 March 2018 (UTC)[reply]
[…] to be honest, the idea of having a dictionary of every language, separately maintained by every language, a rather unachievable one. I was convinced for more than a decade that an approach where we centralize the data and maintain it only once is far more productive. In fact, I am arguing that by having lexicographic data in Wikidata we will not only see current Wiktionary contributors contribute to Wikidata, but we will have an influx of new contributors that are currently not contributing to Wiktionary at all.
The usefulness of a way to factorize some information so that it can be more easily shared between each Wikitionary is clear, and a good solution to this problem would be indeed very welcome, at least by me. Now this doesn't imply that this solution should be centralized, nor that productiveness is the sole criteria which out-stand all the others. Producing a lot of crappy data would still hold the objective of being productive. All the more, even taking the centralizing approach, it also have to propose a data model which will actually fulfill the evolving needs of the wikitionarian community. And sadly the proposed model don't match such an expectation. It's like someone would propose to an hypothetical Wikiphys community an Aristotelian or even Newtonian data model and hope that all there will ever be stated about physic will hold in this, while ignoring that many useful material this community already created won't be legally usable within it due to the selection of an incompatible license. Not only almost none of the feedback that the Wiktionary community gave back wasn't taken into consideration, but it seems that the idea of hiring some skilled people on both linguistic and computer science wasn't considered either when actually there are some dedicated degree mixing both topics out there. Admittedly, this technical considerations are not the central point here, but since it advanced as a pro argument above, it surely fair to provide some perspective of whit this present proposal is proposing to introduce. --Psychoslave (talk) 06:18, 4 March 2018 (UTC)[reply]

Call for civility

I would appreciate if we could keep the conversation civil, on both sides. Fortunately most of us do so, but there is neither a need to dissect every argument presented on each side, nor is it polite to call out anyone's opinion as wrong. We should let everyone express their opinion, whether dissenting or assenting with Lydia, and continue to be friendly to each other. It is obvious that, no matter what happens, not everyone will agree with the course of action, and that's OK. There is no need to further burn bridges. As usual I am reminded of the Wikimania talk last year - was it last year? - it is among people with the same goal that the fiercest fights are fought. We shouldn't. We should all work together towards our common goal, towards our mission: a world in which everyone can share in the sum of all human knowledge - and we should do so in the understanding that even if we disagree in some points, we still have the same goal and are on the same side. So, please, let's be friendly, and in case you still want to write some hot-headed answer, sleep over it at least once.

We're in this together. --Denny (talk) 04:29, 4 March 2018 (UTC)[reply]

That being said, Denny, we must take care to state and restate clearly the specific pretenses on which we are approaching this proposed change and state as clearly as possible the bases for the arguments we wish to make about the change, even if we end up just repeating ourselves or stating the obvious within the span of just minutes. It is unfortunate that claims about the effects of this change are being made here with insufficient evidence on either side and that some in the conversation are not thoroughly explicitly declaring the sources for their assertions (instead of pointing to Léa's talk page or some "research on the topic" not summarized anywhere, point to specific sections of the page or give some quotes from there to better defend your point—who knows, maybe they do agree with you and they just don't know it).
(Full disclaimer: I concur precisely with the rationale behind Jheald's, and by extension Yair rand's, oppose vote. I do not think we should antagonize those people who frequently work with lexicographies on Wikimedia projects by letting a choice about lexicographies on this Wikimedia project be nearly wholly determined by those who lack consistent work with lexicographies on Wikimedia projects.) Mahir256 (talk) 08:05, 4 March 2018 (UTC)[reply]

Call for civility is fine and surely we should all embrace any reasonable call for this. Stopping to analyze and reply to each arguments is a far more questionable demand. Not that such an approach can not come with its own kind of error in its conclusions, but to make it an uncivil approach per se is probably exaggerated, isn't it?

If the Wikimania talk mentioned was recorded and is available somewhere, a link allowing to watch it would be interesting.

Not even making the slightest mention of well known concern of some of our community members when presenting a proposal on this topic and not making a call for comment in Wiktionary main talk pages is probably not the best way to lead to a cordial discussion including all interested stakeholders. Also calling for personal bet on who is right on any topic is probably not the most efficient way to avoid generating antagonistic conversations. Do we agree on this? --Psychoslave (talk) 08:36, 5 March 2018 (UTC)[reply]

@Denny: I agree completely that it's important to have this discussion civilly.
I feel that part of the reason this discussion has become difficult is that people are talking past each other and discussing different points, partly because Lydia's original proposal was about several different things, with some ambiguity left about the status of the various parts. The proposal was to change the Wikidata license text to include a Lexeme namespace in the list of namespaces licensed under CC0, thus presenting two questions: Should there be a Lexeme namespace on Wikidata, and if so, should it be licensed under CC0? The former is being taken by some to be a given, and it's not even completely clear whether the dev team is willing to let this be subject to community consensus, and if so, which community's consensus matters here. Wikidata? Wiktionary? The Wikimedia community at large? What if the Wiktionary communities get consensus for establishing a separate wikibase installation? Would there then be two projects working on structured lexemes, or would that be reason enough not to add the namespace to Wikidata, or would the developers refuse the request? The ambiguity and the resulting repeated miscommunications are contributing to the rising temperature of the discussion. What we need now is official clarification of what the parameters are here, we need to figure out where the data will go and who decides, and then whichever community or communities are the relevant one(s) should have a discussion about what the license should be. --Yair rand (talk) 21:28, 5 March 2018 (UTC)[reply]
The decision about having a Lexeme namespace on Wikidata has been taken already. This has been discussed on Wikidata for several years, and has been discussed with Wiktionary communities since 2016. We've been asking for feedback about the data model during the same period. The development team has been working on it since 2016, and the first version is about to be released.
Here is a short list of the arguments that made us decide on storing lexicographical data on Wikidata, instead of having a separate Wikibase instance on Wiktionary:
  • Just like Wikidata concepts are not only for Wikipedia, lexicographical data is not only for Wiktionary but for everyone: other Wikimedia projects, third parties.. Wikidata is already recognized, both inside and outside the Wikimedia movement, as a central deposit of reusable data.
  • Being able to interlink lexicographical data and data about concepts, reusing the same properties, items, etc, is way easier if all the data is stored at the same place. On another site, editors would have to recreate everything that is already existing on Wikidata. On the same level, when everything will be on Wikidata, we will be able to build queries mixing concepts and words.
  • The Wikidata community worked on many tools to make their life easier, to add, edit, reuse data, and these tools may be adapted for lexicographical data.
  • Wikidata community has strong knowledge about how to handle structured data, and will be curious to learn about lexicographical data, just like they learned about ontologies, and all kind of topics, when editing Wikidata. Together, we can combine our fields of expertise.
  • Enthusiasm about structured data and multilingual collaboration is way higher on Wikidata than on Wiktionaries. Wikidata community is experienced in multilingual collaboration and can support Wiktionary editors.
  • We expect to attract more new editors if everything is centralized on Wikidata, than on a new platform that would be stored on Wiktionary.
I hope that helps understand why this decision has been taken. Lea Lacroix (WMDE) (talk) 16:27, 7 March 2018 (UTC)[reply]
  • Can we see links about "The decision about having a Lexeme namespace on Wikidata has been taken already." and "This has been discussed on Wikidata for several years"? So the actual alternative we should discuss is do we want to be able to include data from Wiktionary or not?
    --- Jura 16:52, 7 March 2018 (UTC)[reply]
Jura, you regularly contribute to 'Wikidata weekly summary'. Are we really expected to believe that you do so without actually reading it? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:58, 7 March 2018 (UTC)[reply]
And maybe, since you want them and know where to find them, you could. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:48, 7 March 2018 (UTC)[reply]
It seems you can't do that either. Let's wait for Lea Lacroix (WMDE) then.
--- Jura 19:02, 7 March 2018 (UTC)[reply]
@Lea Lacroix (WMDE): To respond a few of your points:
  • I am reasonably confident that there would not be any overlapping properties between items and lexemes. Links to items in external databases are possible in any case.
  • Combining expertise would be possible on a new site, where Wikidatians and Wiktionarians would be on equal footing forming a new project. If you don't expect Wikidata users to go to a new site with a blank slate, why would you expect Wiktionary users to venture into a site dominated by a different culture and an existing foreign structure of policy and administration? We would get the most out of both communities with a neutral place to work focused on lexical data.
  • The Wikidata community does not do direct multilingual collaboration, despite our best efforts. Several mechanisms were imported from Commons, but I don't think we've ever had any important discussion here that had as much as 10% non-English-speaking participation. Wiktionary, on the other hand, is filled with language enthusiasts and some actual professional translators.
These issues deserve to be discussed, but there has been no discussion on this topic, and no consensus in favor of the decision to launch here. There are opportunities here that we may be throwing away. --Yair rand (talk) 21:51, 7 March 2018 (UTC)[reply]
Noé (talk) 14:57, 8 March 2018 (UTC)[reply]
@Yair rand: All important discussions (that involved all the contributors) are in English because it's the international language and the most speak by the Wikidata editors. Sometimes, on the French project chat, we choose to continue the discussion here on the English chat. It's totally normal and necessary. But it doesn't stop the French Bistro from being very active. With our efforts, 81% of the translatable messages are translated in French. It is possible to contribute to Wikidata only in French, without knowing a single word of English, while discussing the project only in French too. Tubezlob (🙋) 18:17, 8 March 2018 (UTC)[reply]

town clerk

Are Public Notary (Q1047879) and municipal clerk (Q883211) the same thing for a merge? It seems that it is the same local government official in different languages. Where I live the town clerk is licensed to be a notary public. --RAN (talk) 16:23, 22 February 2018 (UTC)[reply]

What about notary (Q189010)? I'd merge this one with Public Notary (Q1047879), at least on first sight. Grüße vom Sänger ♫ (talk) 17:47, 22 February 2018 (UTC)[reply]
In the Netherlands, a notary is an actual academic degree and a salaried position. In the USA any secretarial position can become a notary through passing a test (like for a driver's license). I don't think you can compare these across country borders and the items should probably be per jurisdiction. Jane023 (talk) 17:54, 22 February 2018 (UTC)[reply]
For notary (Q189010), the English Wikipedia article it's linked to is an umbrella term that covers both the US-style notaries public with limited training and authority, as well as notaries in other countries with training comparable to attorneys. The item Public Notary (Q1047879) has no English Wikipedia article linked to it, and the English description, "clerk of the competent local government", and the also-known-as "town clerk" should not be applied to "notary". Jc3s5h (talk) 18:04, 22 February 2018 (UTC)[reply]
Certainly some merging seems to be in order; there are many items found when searching for "notary" or "notary public", and some of these seem to be suitable for merging. But municipal clerk (Q883211) should not be merged with any of the notary or notary public items. In some jurisdictions, notaries have vastly different education and qualifications than a municipal clerk, and it is unlikely the same person would fill both roles. In some places, like where I live, all town or city clerks are notaries public, but the vast majority of notaries public are not town or city clerks. Jc3s5h (talk) 17:56, 22 February 2018 (UTC)[reply]
Well I see the Dutch wiki article for "Notaris" now links to notary (Q189010) and I can see at a glance that the interwiki to the English article is incorrect. This probably is true for a multitude of professions that have been carried over in different ways by different countries over time. Sorry I have no time to look into this and help clean up though. Jane023 (talk) 18:05, 22 February 2018 (UTC)[reply]
  • It seems that notary public (Q15479268) (a licensed position) and notary (Q189010) (an historical position) are very similar and perhaps the Wikipedia articles should be merged. It seems that Public Notary (Q1047879) and municipal clerk (Q883211) are the same. The links to Wikipedia articles need to be sorted out, the problem is the wording "public". We use it to mean "civil position that serves the public" as in "notary public" and we use it as "public office" meaning an "appointed or elected political position". I changed "clerk" and "notary" to "municipal clerk" and "municipal notary" to distinguish the political offices. I think they can be merged, there is little overlap in the languages links to the Wikipedias and the ones that overlap are meant for "notary public", the civil position. --RAN (talk) 19:22, 22 February 2018 (UTC)[reply]
  • This is the wrong place to discuss merging Wikipedia articles, no matter which of the several Wikipedias for the various languages is being referred to. The Wikipedias write whatever articles they want to, and Wikidata links to them as best it can.
In English, there are three notary-related articles that cover large parts of the world: notary (Q189010) is any kind of notary who deals with legal papers. notary (Q189010) is an umbrella term for two kinds of notaries, notary public (Q15479268), the type of notary prevalent in most of the US and much of Canada, and civil law notary (Q23838068) who are prevalent in continental Europe and countries that derive their legal traditions from continental Europe. None of these terms are historical terms; they all apply to notaries active today.
All three of these articles discuss notaries who are installed and recognized by the government. The term "public" refers to the fact that all these notaries are awarded their positions by the government and the government accords extra recognition to their acts, beyond the acts of an ordinary private person. Sometimes the presence or absence of the word "public" is used as a shorthand to distinguish American-style notaries from continental-style notaries, but they are all recognized by their governments. A good contrast in the US would be a notary of the Roman Catholic Church (en:Notary (cannon law) in English Wikipedia or Notary (Catholic canon law) (Q25345637) in Wikidata. Such a notary's acts would only be recognized by the Church and the government would not give any special recognition to the acts of a Church notary.
"Municipal clerk" is a pretty good term, but "municipal notary" is not. In most of the US notaries are appointed by the state (e.g. California), not by a city or town.
By the way, I am a notary public in US State of Vermont, and was appointed by the assistant judges of my county. Jc3s5h (talk) 21:18, 22 February 2018 (UTC), corrected 20:42, 25 February 2018 (UTC).[reply]

I have reverted a terrible English translation from Public Notary (Q1047879) which led to a whole reinterpretation of the item. Please always refer to the original sitelinks when making a judgment about possible merges. Maybe now the statements make sense. I am not sure if the Hungarian sitelink belongs to Public Notary (Q1047879), and I have no idea about vi, but the rest of the sitelinks seems alright. Anyway, it is a good idea to check if some sitelinks need to be moved to another item rather than merging two items. Andreasm háblame / just talk to me 03:18, 26 February 2018 (UTC)[reply]

@Andreasmperu: Maybe "notary services" as per Yandex translate of Slovene: https://translate.yandex.com/?lang=sl-en&text=Notariat? --Liuxinyu970226 (talk) 11:39, 2 March 2018 (UTC)[reply]
In the US there are for-profit services, mostly web-based, who maintain databases of notaries public around the country, and will help a person or financial institution who needs a notary public find one. Most of the notaries public in these databases are willing, for a fee, to travel to the location of the person requesting the notary public. These services could be labeled "notary services", and I guess they are different from what the item mentioned by User:Liuxinyu970226 is referring to (but it's hard for me to tell since I only read English). Jc3s5h (talk) 11:55, 2 March 2018 (UTC)[reply]

Wikinews categories

Our current practice is to connect article sitelinks to one item (for example Senegal (Q1041)) and category sitelinks to one item (for example Category:Senegal (Q6975863))). The Wikinews people keep moving the sitelinks to Wikinews category from the category item to the article item. This messes up our data structure here on Wikidata and I don't think we have consensus on this project that we want this. We only tolerated it in the past because without arbitrary access they no other way to show links to Wikipedia articles on Wikinews categories. With a template the Wikinews people can show whatever links they want on their categories without needing to move links around here. It's just a matter of copying over Commons:Template:Interwiki from Wikidata and Commons:Module:Interwiki to Wikinews and update it to suit their needs. Let's get this sorted out. Multichill (talk) 15:55, 24 February 2018 (UTC)[reply]

@Multichill: What's the position if there otherwise is no category-item here? Are Wikinews people forced to create one to match their category, or (like Commons) are they fine to link to the article-item in such a circumstance?
Also, what is the harm in systematically linking from article-item here to a category there? What is the benefit in preventing such links? Wikinews articles are all designated instance of (P31) Wikinews article (Q17633526), so a regular item here is not going to be linked to both a category there and to a news article. If there is a story, the news article will have an item of its own. There is no chance of a collision. Why is there therefore any advantage in their not linking a subject to a regular item here?
We have to use Commons:Template:Interwiki from Wikidata and Commons:Module:Interwiki on Commons because there are sometimes gallery pages there. But there are (I think?) no equivalents of gallery pages on Wikinews. So why add this clunky indirection, when a regular sitelink would do the job just as well?
There is also a difference with Commons, in that if there is a Commons category (P373) statement on an item, then most connected Wikipedias will directly show a sitelink from their article to the Commons category. But, as far as I am aware, no equivalent mechanism is in place for Wikinews, so if there is no sitelink from the article-item, then there will be no sitelink to Wikinews at all shown on the Wikipedia item.
That to me makes it entirely understandable that Wikinews editors would seek to link from article-items to their subject categories. I don't see any particular good reason to stop them. Jheald (talk) 16:41, 24 February 2018 (UTC)[reply]
Create a category just like in for Wikipedia. What I'm saying is not something new is not something new, I'm just getting rid of an exception that has grown. Exceptions are an indication the data modeling is wrong. Wikinews makes our data inconsistent. Part of the categories are like Category:Royal Air Force (Q7404780) and links keep getting moved around. If Wikipedia's would want to link to Wikinews they can still do it. Multichill (talk) 18:38, 24 February 2018 (UTC)[reply]
Or we could just say: if article-items are systematically a better sitelink for these pages, then go for it. For all of them. Site-wide. What is the downside?
And you didn't answer my first question: What is the position if there otherwise is no category-item here? Are Wikinews people forced to create one, or (like Commons) are they fine to link to the article-item in such a circumstance? What does that serve, other than create a redundant item that links to nothing and has no meaningful statements on it? Jheald (talk) 19:46, 24 February 2018 (UTC)[reply]
I'd thought this was settled years ago. Wikinews topic categories correspond to Wikipedia articles, just as Wikisource author pages do.

As a practical matter, is there a way to propose deletion, or merging, of spurious Wikidata items such as Q47478970? --Pi zero (talk) 14:27, 25 February 2018 (UTC)[reply]

There are three types of categories on Wikinews:
Little bit schisophrenic, isn't it? JAn Dudík (talk) 08:36, 27 February 2018 (UTC)[reply]
A clarification, for the benefit of third parties who might be reading this (so misinformation doesn't sit here unremarked): A Wikinews topic cat is the primary page on that project associated with its topic, just as a Wikipedia article is the primary page on its topic. To state what should be obvious, the purpose of sister links is to help readers, when looking at a page one sister project, to find corresponding pages on other sisters, and patently that means leading readers in either direction between the Wikipedia article on Zimbabwe, the Commons category for Zimbabwe, and the Wikinews category for Zimbabwe. --Pi zero (talk) 13:13, 27 February 2018 (UTC)[reply]
The general guideline for Wikinews seems to be that its categories go with Wikipedia articles: see Wikidata:Wikinews/Development#Interproject links for people new to the question.
--- Jura 17:09, 27 February 2018 (UTC)[reply]
  • We have consensus that usual Wikinews category (news on the topic) is the same as the encyclopedic article on the topic in Wikipedia or a list of quotes on the topic in Wikiquote, for example. There is the word "category" only because of the Wikimedia engine. Identical entities must be linked directly with each other. The reverse spoils our data structure.
    In addition, many Wikinews categories, at least in the Russian edition, deeply use information from Wikidata items for description, categorization and design. Changing these links will automatically destroy almost half the project. --sasha (krassotkin) 13:25, 3 March 2018 (UTC)[reply]
This consensus was estabilished because of missing arbitrary acces in that time. What is difference between Wikinews categories and e.g. Wikiversity categories? or Wikisource categories? JAn Dudík (talk) 19:14, 7 March 2018 (UTC)[reply]
The reference to Wikiversity and Wikisource is specious.

A Wikinews topic category is the focal page on the project for that topic, just as a Wikipedia article is. If Wikidata means to be helpful to sister projects, and to readers of those projects, there is no question that a Wikinews topic category is associated with the Wikipedia article; any other choice of mapping between the two projects would be actively deceptive. --Pi zero (talk) 01:45, 8 March 2018 (UTC)[reply]

Gap between Wiktionary.org and Wiktionary namespace at Wikidata?

In terms of technical functionalities, what are the gaps between the two? In other terms, is there any scope left for Wiktionary.org beyond differently licensed content and/or different visual presentations?
--- Jura 05:34, 27 February 2018 (UTC)[reply]

@Jura1: Much like how Wikidata can't include encyclopedia articles, it also can't include non-structurable elements of Wiktionary. For example, the elaborate wikitext-based definitions, extensive usage notes, and all but the simplest etymologies can't be included on Wikidata. And all of Wiktionary's appendices detailing areas of language and grammar, the specialized glossaries, details on reconstructed terms, textual details of use of alternative forms, useful details in rhyme guides for working around things, etc, etc... Take a look at fr.wikt's accommodation or ripopée or en.wikt's háček, or even the pronunciation section of pecan. There's a lot there that can't reasonably go on Wikidata.
That's not to say there isn't also a lot that can go on Wikidata. Wiktionary will probably have more use for structured data than Wikipedia, but that doesn't mean that it's independent scope would be minimized that much. --Yair rand (talk) 03:49, 28 February 2018 (UTC)[reply]
When you are writing "it also can't include" is this an affirmation you are making or a technical limitation?
--- Jura 07:24, 28 February 2018 (UTC)[reply]
@Jura1: The technical limitation is that neither Wikidata items nor Lexemes can contain wikitext, formatting, paragraph breaks, etc. While in theory, a central database could also have wikitext pages in a different namespace (or just use giant strings, or something), that wouldn't really fit the idea of a structured database. Free text isn't structured, it means nothing to a machine and can't be cleanly divided into meaningful component parts. --Yair rand (talk) 21:26, 5 March 2018 (UTC)[reply]
While https://fr.wiktionary.org/wiki/accommodation includes text, I don't think it is unstructured. The various elements can be added in a structured way to the suggested lexeme-type at the appropriate parts.
wikitext is already being requested as a new datatype and might eventually be available, but I don't think this minor technical limitation is much of an issue for the sample.
On a more systematic level, I think including (e.g.) usage samples within Wikibase in a structured way provides information necessary for understanding lexemes. If this information would be disconnected from lexemes and stored in an unstructured way at another site, people couldn't query it. It could be compared to storing references for any statements outside Wikidata.
--- Jura 07:00, 7 March 2018 (UTC)[reply]

Architect versus notable work

If I am giving the architect (P84) for an item. could it then be possible to have notable work (P800) filled in for that architect?  – The preceding unsigned comment was added by Pmt (talk • contribs) at 16:10, 27 February 2018‎ (UTC).[reply]

If it's a notable one, sure.
--- Jura 17:01, 27 February 2018 (UTC)[reply]
If the notable work is not in our database, you have to create it. --RAN (talk) 18:02, 27 February 2018 (UTC)[reply]
@Richard Arthur Norton (1958- ): It was indeed ment that both the work and the architect is in the database, and that you are working in the item for the specifik notable work or architecht. Breg Pmt (talk) 19:58, 27 February 2018 (UTC)[reply]
@Jura1: and @Richard Arthur Norton (1958- ): Sorry for being unclear, what I was thinking about was to have this happen automatically. As an example. If i was creating an new item about a building and is adding the architect (P84) for that building, and the architect is notable and already has an wikidata item/"Q". Instead of then open up the item for the architect and add the new building item just created With the architect added. Why can'nt the programe do it automatically?  – The preceding unsigned comment was added by [[User:|?]] ([[User talk:|talk]] • contribs).
How would it know that the building is notable for that architect?
--- Jura 06:59, 1 March 2018 (UTC)[reply]
@Jura1:Thinking of already existing as an item in wikidata, and as so it is a notable work for the architech who created it. Breg Pmt (talk) 07:20, 1 March 2018 (UTC)[reply]
Some architects built hundreds of buildings, some happen to be notable for them, others not. Some have items, others still need to be created. I don't think we can assume that even if there is just one, it's necessarily notable. Besides, the full list can always be queried.
--- Jura 07:23, 1 March 2018 (UTC)[reply]

@Jura1: Ok, as mentioned above, I was thinking about items existing in wikidata and that do have an archicht or a building not having an architect given for that item. But since you are bringing it up. Do you mean that there, an now in general, that works by an architect, author or designer who has its own item on wikidata is not necessary notable for that creator? Who then desides what works are notable for that creator. For instanse for William Shakespeare how many of his works here at wikidata is not notable, can you provide me With a list. For Sigurd Hoel (Q138650) is Meeting at the Milestone (Q6807901) notable? Breg Pmt (talk) 08:39, 1 March 2018 (UTC)[reply]

@Pmt: IMO:
  • If the architect is a "clearly identifiable conceptual or material entity", it's notable for Wikidata (not. 2).
  • If the building can be attributed to the architect using reliable sources, it's notable for Wikidata, as "it's (somehow, to some extent) a clearly identifiable conceptual or material entity" (not. 2.) and "it fulfills some structural need": databasing this architect's production (not. 3).
  • If the building is a "clearly identifiable conceptual or material entity", it's notable for Wikidata too (not. 2) regardless the architect is known or not.
  • Is the building notable enough for being displayed through notable work (P800) in architect's item? Don't know, don't care, as already said "the full list can always be queried".
I created a few weeks ago a pretty simple template in es.wikipedia ({{wikidata arquitecto}} "example", external links), and it works nice without using P800. strakhov (talk) 16:31, 3 March 2018 (UTC)[reply]

Correct way to link commons category

Hello.

So, which is the correct way to link a Commons category on Wikidata? Since now Wikidata has two links pointing to Commons. I couldn't find anything about it in the Help:Contents. Thank you, Rehman 03:49, 28 February 2018 (UTC)[reply]

Hello. The recent discussion over it can be observed here. - Kareyac (talk) 05:15, 28 February 2018 (UTC)[reply]
Thank you for that, Kareyac. So from what I understand, the current practice mostly is linking twice from the same wikidata item... Until consensus is reached (or a technical "fix" is made), maybe someone familiar with the local policies should mention that in one of the help pages? Just so that more questions like this can be avoided... Rehman 06:04, 28 February 2018 (UTC)[reply]
On the Commons category page, I see a link to Reasonator (top right corner of the page), which does the job anyway.--Ymblanter (talk) 06:41, 28 February 2018 (UTC)[reply]
@Ymblanter: because you have added this script to your common.js at Commons... --Edgars2007 (talk) 07:19, 28 February 2018 (UTC)[reply]
@Ymblanter: +1, 99 % of people reading this page won't see this Reasonator link. And as wikimedian, I activated this gadget but I would prefer a direct to Wikidata than to have to click two times to get to Wikidata. Cdlt, VIGNERON (talk) 07:51, 28 February 2018 (UTC)[reply]
Yes, sure, but to be honest I do not see why anybody who is not Wikimedian would be interested in a connection of a Commons category to a Wikidata item. 99.999999% of non-Wikimedians have never heard of categories, Wikidata, and most of them of Commons.--Ymblanter (talk) 08:03, 28 February 2018 (UTC)[reply]
True but this is a bit of a vicious circle: people don't see the link so don't display the link, so people won't see the link... Plus, I don't like to guesstimate what people want or not, I've been too often surprised to learn what people are interrested in or not. And Wikimedia and/or Wikidatian maybe are 0,000001% of the readers but I guess most of them would prefer a direct link than a reasonator link (reasonator is more intended for non-wikimedian but they don't see this link :/ the navigation flow is a bit off). Cdlt, VIGNERON (talk) 08:34, 28 February 2018 (UTC)[reply]
Categories are rather visited actually, even by unregistered users. For some groups they're a lifesaver. They're also linked from rather popular non-Wikimedia websites (e.g. sbn.it in Italy). For this reason, it's important that Wikidata items on a subject (those linked to a main namespace article in Wikipedia, Wikiquote etc.) include the corresponding Commons categories in their sitelinks, so that people can easily reach Commons categories from related articles (and vice versa). --Nemo 08:45, 28 February 2018 (UTC)[reply]
for me, on Commons categories, I see a big blue (+) near the name of the category, and when I click on it, I get a small popup which allows to "Edit data". When I click on it, I directly access the wikidata item. I do not remember what I did to get this, but it's been there for as long as Commons categories on wikidata. Is it a gadget, a script, or a normal behaviour ? --Hsarrazin (talk) 09:16, 28 February 2018 (UTC)[reply]
No, not "standart" behavior. Probably User:Yair rand/WikidataInfo.js. --Edgars2007 (talk) 14:29, 28 February 2018 (UTC)[reply]
Be aware also that the new commons:Template:Wikidata Infobox uses our site links to Commons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:38, 28 February 2018 (UTC)[reply]
I've added the infobox to that category as a demo. Note that if you add a commons sitelink, then a bot will come along and add it to P373. Unfortunately the reverse isn't (yet) true. Thanks. Mike Peel (talk) 12:07, 28 February 2018 (UTC)[reply]
I've opened an RfC at Wikidata_talk:Notability#RfC:_Notability_and_Commons, based on text previously discussed here in November, to try to update and clarify our guidance on notability and sitelinks for Commons categories. Jheald (talk) 23:37, 5 March 2018 (UTC)[reply]

Is it possible to provide contextual statement ranking when adding statements?

Hi

Sorry for the weird title, basically I'm adding information by hand for a group of women and when I add sex or gender (P21) the first option in the list when I type 'Female' is is female organism (Q43445) not female (Q6581072), which I know is wrong but it gives me this as the first option every time. Is it possible/realistic to provide more contextual suggestions?

Thanks

--John Cummings (talk) 14:19, 2 March 2018 (UTC)[reply]

Hello John Cummings,
to avoid this specific problem with P21, I use and recommend User:Magnus Manske/wikidata useful.js, which allows to add the statement with a single click ;)
more generally, I would agree that it would be very useful (for adding names, countries, languages, occupations), because there are a lot of them that have the same label, which are not at all the same ;) --Hsarrazin (talk) 14:28, 2 March 2018 (UTC)[reply]
Yeah we're looking into that - specifically User:Smalyshev (WMF). Input is being collected at Wikidata:Suggester ranking input. --Lydia Pintscher (WMDE) (talk) 14:33, 2 March 2018 (UTC)[reply]
@Lydia Pintscher (WMDE):, 👍 , --John Cummings (talk) 10:41, 3 March 2018 (UTC)[reply]

Looking up properties using Search

I know there were improvements in the pipeline for the search boxes, both the one at the top-right of the page with the incremental suggester, and the main text search function. (Which, curiously, still give different suggestions -- the incremental suggester is usually better). Can anyone give an update on the progress of these? Are they now implemented, or are further adjustments still coming?

In particular, I note that currently when I key "Property:named as" into the search, not only does subject named as (P1810) not appear at the top of the list, it does not in fact appear in the list returned at all!

(Instead I think "named" gets stemmed to "name", and then various hits come back containing the word "name" -- indeed those seem to come back higher than hits containing the word "named" itself).

Similarly, searching for "Property:stated as", the property object named as (P1932) only appears at #3 on the list returned, despite being an exact match for the words keyed in.

Pinging User:Smalyshev (WMF) -- where are we currently at on this? Are there modifications you're still looking at? Jheald (talk) 16:47, 2 March 2018 (UTC)[reply]

(Which, curiously, still give different suggestions -- the incremental suggester is usually better). Thank you, this is because completion suggester is driven by the new code, while the fulltext still uses the old one. Watch https://gerrit.wikimedia.org/r/c/380895/, that improvement is coming to fulltext too. Not sure why your property search didn't work, I'll check into it. Smalyshev (WMF) (talk) 22:13, 2 March 2018 (UTC)[reply]

Create "cmd.exe command" item or add "instance of: command", "part of:cmd.exe" to every cmd command?

I have already created "cmd.exe command" and "command.com command" items and I started to add to them all their commands. However, some commands are available in some Windows versions and not others, so I should create "Windows 7 cmd.exe command", "Windows 8 cmd.exe command", "Windows 7 PowerShell command", "Windows 8 PowerShell command"... The same result can be reached adding the properties "instance of: command" and "part of: cmd.exe | command.com | powershell" to every command items. Which of the two approaches is better?--Malore (talk) 16:24, 3 March 2018 (UTC)[reply]

WikiProject Informatics has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.
@Malore: Out of curiosity, cmd.exe and command.com does evolve ?? I doubt there is any problem for them.
The same result can be reached adding the properties "instance of: command" and "part of: cmd.exe OK if you have « Windows 8 cmd.exe » item but if you just have « cmd.exe » it’s not as expressive.
Apart from this it does not seem a good idea to tight powershell commands versions to Windows one but to powershell version directly (see https://en.wikipedia.org/wiki/PowerShell#Versions ) or to « Windows Management Framework » as there is several windows variants (server and so on.) and each of these may or may not include a powershell version.
For a more direct answer to your question, I wrote the template {{All instances}} which allows, from items like COMMAND.COM command (Q50320434) to list all commands (or instances of the class), whether or not they are explicitely instance of the class. For example : all instances of « command.com commands ». See the documentation of the template for more informations on how it works, but in summary both of your proposed approaches would work with this template. author  TomT0m / talk page 13:26, 5 March 2018 (UTC)[reply]
  • @Malore: Are you trying to model the abstract specification of commands interpreted by "cmd.exe" (parameters, effect, etc)? Or are you trying to model different versions of "cmd.exe" software? If modelling commands of "cmd.exe", then create a new item each time a new release of "cmd.exe" changes the parameters, effect or other behaviour of a command. If modelling software, create a new item for each version/build of "cmd.exe". Dhx1 (talk) 11:11, 5 March 2018 (UTC)[reply]
@TomT0m: My main fear was to create too many items (like "WIndows 8 cmd.exe command") that turned out to be useless because the same result could be easily achieved by a more versatile query (like your template). Thank you very much for the template.--Malore (talk) 00:41, 8 March 2018 (UTC)[reply]
@Malore: My template needs the items anyway to generate the queries, wether or not they are used in « instance of » statements. The solution to use those items it comes with less statements however (I’m developping the converse function in Module:class : search by « main snak » instead of searching by class. The idea is that we find items in the case they have such a statement with a main snak <prop:val> or they are instances of a class that declares that its instances have a <prop:val> statement. author  TomT0m / talk page 07:41, 8 March 2018 (UTC)[reply]

Inherently ambiguous birth dates

I want to flag instances of humans that have an "inherently ambiguous birth date". Some people were born without birth certificates at home in rural poverty areas in the 1800s. For some people up to the 1600s records do not exist and we can only give the year they were born. Some people have given different years in different documents. I have found multiple people, especially celebrities, who keep making themselves younger as they age in their official documents. Unless these people that give multiple birthdates have a birth certificate online, they are one category of people that have an "inherently ambiguous birth date". I have been adding in hundreds of full birthdays where the year was only known, based on the WWI and WWI draft or passport applications for people in the USA. Sometimes the value is already in Wikipedia and Wikidata has not been updated. I run this: tinyurl.com/o26zc83 However, after searching and finding nothing, I want to create a field to let myself know a search has been done and nothing has been found, so I do not keep looking when I run the program again a month later. One day in the future when more records are online someone can search all the humans that have an "inherently ambiguous birth date" and look again. This would be a "Wikidata-specific criterion" Can someone suggest a scheme? Does anyone else see this as useful? --RAN (talk) 21:24, 3 March 2018 (UTC)[reply]

@Richard Arthur Norton (1958- ): Maybe add a qualifier of sourcing circumstances (P1480) = presumably (Q18122778) (or a new QID for 'ambiguous')? Thanks. Mike Peel (talk) 00:10, 4 March 2018 (UTC)[reply]
@Richard Arthur Norton (1958- ):
if I understand correctly what you want, it is a mean to not check again each date, because you want to know which have already been checked ?
for this purpose, on VIAF ID (P214) and Bibliothèque nationale de France ID (P268) which did not exist when I checked, I use retrieved (P813) as qualifier. Could this help you ? --Hsarrazin (talk) 08:46, 5 March 2018 (UTC)[reply]
Yes, both schemes are good ideas. I will see which one works best and then modify my SPARQL search to look for this qualifier, so I do not search the same people over and over.

Bracteates

bracteate (Q848960) currently covers migration period/ancient pendants and medieval coins that were based on these pendants. These probably need to be two separate items. There are two AAT identifiers and the dates (when we add them) will be different. Some of the linked Wikipedia articles cover both, some only one or the other. Does anyone with expertise want to take a stab at disentangling these items? - PKM (talk) 04:29, 4 March 2018 (UTC)[reply]

@PKM: I can have a look. Breg Pmt (talk) 18:23, 5 March 2018 (UTC)[reply]

New « season » property and « part of »

Just walked on

   Under discussion
Data typeMISSING
Example 1MISSING
Example 2MISSING
Example 3MISSING

through this impressive diff : https://www.wikidata.org/w/index.php?title=Property_talk:P361&diff=643090919&oldid=634204131&diffmode=source and saw that the argument to create this that it created mass constraint violation. I think that the issue actually is that we require that « part of » to be actually an inverse of « has part » and that we require it to actually have that claim. Actually has « part of » is also transitive ,so requiring to have explicit claim we would also need to include all « has part » transitively to bigger parts, this seemr rather unpracticable. Why do wo actually require that such statements has explicit inverses, by the way ? « part of » is transitive but we actually require, like in several cases, to link onto the smallest part it is the part of and to not put a « part of » statement if there is already have a « part of* » path from the small to the bigger path. This is a tension between a constraint and an inference semantics that is unconfortable to resolve at current state.

Markus Krötzsch Svavar Kjarrval TomT0m Emw Bovlb Peter F. Patel-Schneider Daniel Mietchen Akorenchkin (Maximilian Marx) YULdigitalpreservation Jsamwrites (John Samuel) Waldyrious Malore David L Martin (David Martin) Arlo Barnes (Arlo James Barnes) alonsopaz23 (Leopoldo Alonso Paz Hernández) AWesterinen (Andrea Westerinen)

Notified participants of WikiProject Reasoning WikiProject Properties has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead.

Also there is the option, and I think it has been used, to overload « series » to be able to handle seasons items. Actually a season is also a sequence of episode. And we know the object is a season of something, or another kind of subsequence. There is also the possibility to qualify « preceded by » by « series » because, say for star wars, there is several way to order the episode : by narration order, by diffusion order, there may be « in between » episodes that appears laters. We don’t solve this by « series » nor « season ». Different order may lead to different « series » items, that could be used to different « preceded by » statement qualified by « series : original star wars order » or « preceded by : ep 1 prequel -> series : extended star wars narrative order ». « Season » seems to be a popular solution but honestly I don’t think it’s really useful or especially expressive, as we already know that « seasons » are sequences of episodes, that they are seasons of some longer sows like a TV series, and that it does not solve more complex issues so more creative solutions have still to be invented :/ to me it’s a false good idea. WikiProject Movies has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. @Jura1: author  TomT0m / talk page 14:20, 4 March 2018 (UTC)[reply]

You wrote "we require that « part of » to be actually an inverse of « part of »". Is it a typo? Thanks! Syced (talk) 02:08, 5 March 2018 (UTC)[reply]
corrected. author  TomT0m / talk page 09:15, 5 March 2018 (UTC)[reply]

L10n help and multilingual advice needed for bot

I'm working on a task for my bot which would find items about villages and then a) remove disambiguation from labels' names, b) set all Latin-script languages' labels to the same thing (if there was no prior disagreement among the labels), and c) describe the village along the pattern of "village in <parent entity>, <grandparent entity>, <great-grandparent entity>" (see RFBOT for full documentation). With tasks (b) and (c), I run into an l10n (localization) issue. I'll start with the simpler one, task (c). Right now, my bot only sets a description in English. I'd like to support more languages, though. So I invite people to tell me how to say "village in ..." in their language, as well as what format the language uses for nested parent entities, if it isn't "a, b, c".

For (b), I was initially using all 196 exclusively-Latin-script languages supported by Wikidata, but Ymblanter observes that some languages, e.g. Crimean Tatar ([crh]), won't use identical labels to most other Latin-script languages. So that's the first thing I'm asking for help on from the Wikidata community: Can people help me pick out languages in which the native name for a village would reliably be a valid label, or where the native name could easily be turned into a valid label with some RegEx magic? (90.191.81.65 raises a valid concern, namely that some languages will use exonyms for certain villages. However, since this only sets labels when no previous one existed, I don't see that as a problem. The worst-case scenario is that we'll have an imperfect-but-not-incorrect label instead of no label at all.)

These are the 196 exclusively Latin-script languages I originally identified:

Here are the ones I've identified so far as being almost certainly safe to use (i.e. major Germanic and Romance languages):

español, English, Simple English, British English, Canadian English, português, Deutsch, français, italiano, Nederlands, svenska, dansk, norsk bokmål, norsk nynorsk

Here are the ones I've identified as being most likely safe to use:

Interlingue, interlingua, Patois, Ligurian, Norfuk / Pitkern, Ænglisc, Deitsch, føroyskt, Plattdüütsch, Frysk, Nordfriisk, asturianu, Boarisch, estremeñu, Ripoarisch, vèneto, Esperanto, furlan, lumbaart, Napulitano, Papiamentu, sicilianu, sardu, Limburgs, corsu, Alemannisch, Picard, Nedersaksies, emiliàn e rumagnòl, Österreichisches Deutsch, Mainfränkisch, Zeêuws, Gegë, Lingua Franca Nova, tarandíne, jysk, Plautdietsch, íslenska, Latina, català, kréyòl gwiyanè, română, Lëtzebuergesch, aragonés, galego, rumantsch, occitan, Afrikaans, Scots, arpetan, Piemontèis, Nouormand, Mirandés

I reckon the next-most likely candidates would be any other Indo-European languages, followed by languages from other families.

Thanks for any assistance that anyone is able to provide. — PinkAmpers&(Je vous invite à me parler) 21:27, 4 March 2018 (UTC)[reply]

  • Fortunately, I don't see Polish here, but I would suggest caution in many languages mentioned above, as many of them are inflected languages (so using a bot to add village in... would be not possible without the knowledge of declension for each word). What's more: no data is better than imperfect/inorrect label/description — by adding such data you can make a mistake to spread even outside Wikimedia projects. So the worst-case scenario is really the adding of imperfect data, not the lack of it. Wostr (talk) 00:07, 5 March 2018 (UTC)[reply]
  • @Amire80:, Amir, may be you have any ideas or know who might have any?--Ymblanter (talk) 13:07, 7 March 2018 (UTC)[reply]
    Some thoughts:
    • I suggest asking on Translators-L or maybe even Wikimedia-L.
    • In general, calling it "L10n" is a quite misleading, because it's a rather different task. It's more like multilingual consultation, and each language may need a special approach.
    • Other languages that are likely to be unsafe: az, lv, lt. But please verify. There may be more.
    • Is it actually good to replicate a lot of labels? This might be perceived as a confirmation that they are actually written identically in these language. The fallback doesn't work perfectly at the moment (see this bug), but it's better to fix the fallback than to replicate a lot of labels. --Amir E. Aharoni (talk) 13:29, 7 March 2018 (UTC)[reply]
    @Amire80: Fair point about "l10n". I meant it more in reference to the first request, so I've updated the section title to reflect that. Anyways, I'll email Translators-L when I get the chance. And to be clear, at this point I'm more planning on ruling in languages that are safe rather than ruling out ones that are unsafe. Also, could you please clarify your "confirmation" point? Are you talking about the endonym/exonym question, or a transcription question? Because, if it's the latter, I posit that languages can be divided into two categories: those where the label would always be the same as the native name (except in the rare cases that there's an exonym), and those where they would not always be. My whole goal here is to figure out which languages are in which category. — PinkAmpers&(Je vous invite à me parler) 22:06, 7 March 2018 (UTC)[reply]
    My "confirmation" point is pretty simple: If your bot adds a label in, say, Turkish, to Pedro II (Q1934329), somebody somewhere may think that "Pedro II, Piauí" is the correct way to write this name in Turkish. Maybe it is, and maybe it isn't; are you sure?
    If you're less than 100% sure, then what is the benefit of filling this label?
    And even if you are 100% sure, the question still stands: what is the benefit of filling this identical label?
    I'm not a very big Wikidata expert, so it's conceivable that I'm missing something. --Amir E. Aharoni (talk) 22:23, 7 March 2018 (UTC)[reply]
  • Maybe it would be good to have a plan on maintenance of these descriptions after an initial run, e.g. what to do if the layers change or are found to be incorrect. If it uses essentially cebwiki, maybe a first step should be to cross-check that data.
    --- Jura 13:36, 7 March 2018 (UTC)[reply]
    @Jura1: Cross-check it with what? — PinkAmpers&(Je vous invite à me parler) 22:06, 7 March 2018 (UTC)[reply]

P1461

Patientplus ID (P1461): shouldn't this property be an external-identifier? I've just noticed this alongside the 'normal' properties in ubidecarenone (Q321285). Wostr (talk) 00:09, 5 March 2018 (UTC)[reply]

Yes. The only comment about them at Identifier migration/1 is "only 96.77% unique out of 773 uses". A quick check of the constraint report showed that roughly half of the bad matches were for disambiguation pages; I have removed them. The rest seem to be Bonnie-and-Clyde cases, a few of which require somebody with medical knowledge to resolve. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:26, 5 March 2018 (UTC)[reply]
For consistency, we should convert other similar ones as well if we convert this one.
--- Jura 07:17, 7 March 2018 (UTC)[reply]

Given names as child (P40)?

Noticed, that Antonio Cavalieri Ducati (Q15059980) has male given name (Q12308941) type values for child (P40). Of course, it's wrong, but maybe we can allow them (maybe a specific property)? Having names of children would be better than having simply number of children (P1971), imho. And sometimes the only information about children is their name (and not always the surname, which may be different from parents). Of course, they would be notable per WD:N, but... --Edgars2007 (talk) 07:53, 5 March 2018 (UTC)[reply]

I've removed them. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:16, 5 March 2018 (UTC)[reply]

Q2965940 - Christine Laurent

Christine Laurent (Q2965940) was created as representing fr:Christine Laurent. It has since - and several times - been re-purposed as "P31 conflation (Q14946528) of Christine Laurent (Q45180949) + Christine Laurent (Q45180738)". Nonetheless, it currently includes numerous external IDs. This re-purposing does not seem helpful. How should it be resolved? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 11:02, 5 March 2018 (UTC)[reply]

descriptions for tv series episodes

I'm just adding descriptions for tv series episodes in en, fr and de. You might add some other languages descriptions with that query. Queryzo (talk) 13:48, 5 March 2018 (UTC)[reply]

this query is for tv seasons, which I add now for de. Queryzo (talk) 16:39, 5 March 2018 (UTC)[reply]
@Queryzo: Don’t know if this is related, but I see edit like this https://www.wikidata.org/w/index.php?title=Q2817294&curid=2698014&diff=644038140&oldid=629359946 on series seasons. It seems that the description in those cases contains less informations than the label … Does not seem like a good idea since it can shbdow automatic descriptions sometime that m

y be more informative. author  TomT0m / talk page 11:56, 6 March 2018 (UTC)[reply]

A description should (in combination with the label) be suitable for identifying the subject, see Help:Description/de. In fact, a proper description would have been "Staffel einer Fernsehserie", but this is not very common right now. Queryzo (talk) 12:30, 6 March 2018 (UTC)[reply]
@Queryzo: Not really. You’re not identifying the subject with this kind of desciption, or worse with your last suggestion, you just giving its type, which is the same as many other objects, so it does not help identifying the subject )identifying would mean express how different it is from other similar objects.) Quote :

« A useful template for creating definitions […] is provided by what are called Aristotelian definitions, which is to say definitions of the form «S = def. a G which Ds»where ‘G’ (for: genus) is the parent term of ‘S’ (for: species) in some ontology. Here ‘D’ stands for ‘differentia’, which is to say that ‘D’ tells us what it is about certain Gs in virtue of which they are Ss. An example Aristotelian definition (from the Foundational Model of Anatomy Ontology): « cell =def. an anatomical structure which consists of cytoplasm surrounded by a plasma membrane » «plasma membrane =def. a cell part that surrounds the cytoplasm » »

from the recommandations for descriptions, which I found quete good, in another project (quoted from https://pdfs.semanticscholar.org/6ff2/f127a6c75cd3461eff16ad62a4d0b0b5a090.pdf author  TomT0m / talk page 14:01, 6 March 2018 (UTC)[reply]
There is no need to identify a subject by a description itself, f.e. there have been a lot of former Prime Ministers of the United Kingdom, but Margaret Thatcher (Q7416) is only described as "Former Prime Minister of the United Kingdom". The only reason to specify a description is in case of a possible ambiguity in connection to the label! In the exemple above this would be the case, if there are two series named "Operación Triunfo" with an eight season. This would mean that "Operación Triunfo/Staffel 8" exists twice, so I have to specify descriptions with "Staffel von Operación Triunfo (year or sth.)" and "Staffel von Operación Triunfo (the other year)". The number of the season is sufficient in the label. Queryzo (talk) 14:55, 6 March 2018 (UTC)[reply]
That’s not really what you did in the edit I quote, you just repeat informations of the label. author  TomT0m / talk page 15:10, 6 March 2018 (UTC)[reply]

Wikidata weekly summary #302

P2306

Why property (P2306) added to property constraint (P2302) : required qualifier constraint (Q21510856) have to be splitted if there are more than one required qualifiers? It does not make sense; I merged few property (P2306) into one in some properties, but after I got notification that it should be 'single value', I reverted my edits. But I can't see any reason why it should be like this (every required qualifier in different required qualifier constraint (Q21510856)). Wostr (talk) 16:53, 5 March 2018 (UTC)[reply]

@Wostr, Jura1: Merging all required qualifiers into a single constraint statement doesn’t allow you to mark only some of them as constraint status (P2316)mandatory constraint (Q21502408). See Help talk:Property constraints portal/Mandatory qualifiers#Modeling of multiple qualifiers in constraint statements. --Lucas Werkmeister (WMDE) (talk) 10:29, 6 March 2018 (UTC)[reply]
It is still nor clear to me (and I can't find it on any pages related to these properties) why I should add constraint status (P2316)mandatory constraint (Q21502408) to property constraint (P2302)required qualifier constraint (Q21510856)? Both say that the listed qualifiers are mandatory or am I missing something? Wostr (talk) 13:34, 6 March 2018 (UTC)[reply]
One is to indicate that this would be the dream situation, the qualifier just marks that dream as currently fulfilled. Sjoerd de Bruin (talk) 17:27, 7 March 2018 (UTC)[reply]

Lift the ban of fields synthesized from other fields

Currently we ban the creation of a data field containing a url if it can be synthesized from information from another field. It would be much better if Wikidata had a field called "Worldcat url" and "CIA World Fact Book url". For instance "Worldcat url" should be synthesized automatically from the LCCN_ID into a clickable url stored directly in Wikidata. We only get a link to Worldcat if that person has an entry in Wikipedia where it is synthesized on the fly from Wikidata. Some people use Wikidata directly as a source of information. These proposals were previously dismissed.

-- RAN (talk) 17:13, 6 March 2018 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── We cater for links like http://www.worldcat.org/identities/lccn-n50051493/, using third-party formatter URL (P3303), as I have just addded here. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 17:39, 6 March 2018 (UTC)[reply]

Maybe what we need is a way to create special "derivative" properties that just rely on third-party formatter URL (P3303) attached to an existing property? third-party formatter URL (P3303) on its own seems to be insufficient - at least the ability to attach a label etc. Maybe something can be done with qualifiers on third-party formatter URL (P3303)? Wikipedia templates that use these things need some mechanism to select the right formatter url... ArthurPSmith (talk) 18:44, 6 March 2018 (UTC)[reply]
I had a proposal for that at meta:2017_Community_Wishlist_Survey/Wikidata#Create_a_new_class_of_statements_which_are_automatically_generated_based_on_a_query but it did not have much support. --Jarekt (talk) 20:46, 6 March 2018 (UTC)[reply]
@Jarekt: I would have supported if I had known it existed, unfortunately I did not pay a lot of attentions on the wishlist this time. Did you make enough noise in here about this ;) ? author  TomT0m / talk page 13:52, 7 March 2018 (UTC)[reply]
  • Wikidata items are not built for reading directly. Forking data would not be helpful. Use Lua or a userscript to change display where necessary. --Yair rand (talk) 19:20, 6 March 2018 (UTC)[reply]
    • Yes - but, this seems to be a common complaint recently. How can we make that work better/easier? Formatter URL's are handled specially by wikidata to provide links for ID's; can we make the 3rd party ones more functional somehow? Just doing it in Lua makes our P3303 entries worthless, you just code the URL directly. ArthurPSmith (talk) 20:15, 6 March 2018 (UTC)[reply]
      • Not sure I agree on "worthlessness" of P3303. Significantly easier to build a URL in something like SPARQL or Listeria or Reasonator or an external app if there's a formatter template, as per e.g. Property_talk:P1630#Using_from_within_WDQS.
      Also perhaps worth noting that formatter URI for RDF resource (P1921) is now used to create a fully-fledged linked-data url for external IDs, accessible from SPARQL via eg p:P1014/psn:P1014.
      And I don't think I agree with User:Yair rand either, that direct readability of Wikidata items is irrelevant. I suspect the take-up of relative position within image (P2677) would be a lot stronger if there was a formatter linking the value directly to an immediately visible image detail. Jheald (talk) 23:53, 6 March 2018 (UTC)[reply]
  • You are assuming that a typical user would know how to construct a query to get information that they do not know even exists. When I first asked about the Worldbook ID half of the responders did not know where the value was located. Are we running out of server space? I do not see any down side to this at all. It seems the no votes are against it for ideological reasons, not practical reasons. And, yes, people do use Wikidata directly because many entries do not appear in Wikipedia. Just as I use VIAF directly to identify people in the Library of Congress image collection. The practical usefulness of having the link in the entry on that person should override objections based on the fact that the information can be synthesized if the end user knows that the data exists, and where it is stored, and can construct a query using one of our tools, assuming that they know the query tool exists. --RAN (talk) 00:04, 7 March 2018 (UTC)[reply]

A middle ground solution would maybe to extend the ext identifier datatype to add several formatter urls ? @Lea Lacroix (WMDE): ? author  TomT0m / talk page 13:55, 7 March 2018 (UTC)[reply]

How do you imagine that to work? How would the software know when to use which formatter URL? Lea Lacroix (WMDE) (talk) 16:18, 7 March 2018 (UTC)[reply]
There could be « main » formatter, that would be used exactly as the current one, and a list of « secondary » one, used to generate as many « secondary » uris. In rdf the values would be expanded only on the full value of the statement and not as a truthy one. I have no idea on the display on Wikidata pages however. I guess for usability it’s best to display all the uris without the user having to click. Don’t know if that would be OK for the UI :) author  TomT0m / talk page 17:39, 7 March 2018 (UTC)[reply]
We can already add multiple formatter URLs; and we have third-party formatter URL (P3303). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:15, 8 March 2018 (UTC)[reply]

Deriving text statistics (such as readability scores)

I am currently developing a few bots which have access to written text, but currently ignore/discard it. For items such as Request for Comments (Q212971), news article (Q5707594), treaty (Q131569), statute (Q820655) where the content of written text (minus headers, footers, page numbers, image captions, etc) can be separated easily, it would be easy to generate some text statistics including various readability test (Q2114712) and counts of word (Q8171), sentence (Q41796), quotation (Q206287), syllable (Q8188) and perhaps even phoneme (Q8183) (if the text were spoken). A library such as textstat (Python) could be used to generate readability test (Q2114712) scores, and extraction of other metadata about text is possible using other readily available software. The difficulty I have is how this information could be included within Wikidata in a way that has accurate and reliable sourcing. The source I had in mind was the value of full work available at URL (P953) (the text itself) with determination method (P459) linking to a readability test (Q2114712) or other method, as well as a specific implementation of text statistic extraction software (including version/build numbers that would be needed to fully replicate the result). Otherwise, a web service on wmflabs (or elsewhere) could accept a URL from a supported domain, extract the text content, and return a page or data with the statistics generated. The source would then be reference URL (P854) linking to the web service results page. Does anyone have feedback or ideas on whether this idea is viable and suitable for Wikidata? Dhx1 (talk) 02:08, 7 March 2018 (UTC)[reply]

11 days left to submit proposals for Wikimania 2018

Hello all,

As every year, the call for submissions for Wikimania 2018 is running, and the deadline for talks, workshops and posters is March 18th. We hope that the Wikidata community will be well represented in the program and in the attendees.

This year, the system of submission goes through OpenChair, therefore the submissions will not be public. You won't be able to check other Wikidata-related submissions. So if you plan to submit something, I encourage you to describe your idea on Wikidata:Wikimania 2018. On this page, you can also see the ideas of other people, and suggest your help for one of them.

Here are a few ideas that have been suggested by the development team, where we would love to support volunteers: SPARQL workshop for beginners, SPARQL workshop for confirmed users, how to model lexicographical data for your language, explaining the community processes on Wikidata, showing useful tools around Wikidata...

When you do your submission through the official process, feel free to also edit Wikidata:Wikimania 2018 and add your idea into the "proposals submitted" section. That will help the other editors to avoid duplicates.

Last but not least, I also added a section "I'm attending", to have an overview of who from the Wikidata community will attend to Wikimania this year. Please register here if you already know (even if you're waiting for scholarship result for example). This list can also help you find more volunteers to run a workshop or participate to a discussion.

Thank you very much, Lea Lacroix (WMDE) (talk) 09:18, 7 March 2018 (UTC)[reply]

Things covered by Commons categories, but not by Wikidata

When trying to improve coverage of sleds, I noticed that Commons had many types of sleds Wikidata was lacking. As many can have items at Commons, I created the relevant ones at Wikidata. Obviously, we don't have an equivalent of (non existing) "Category:Green sled pulled by girl", but cutter sleigh (Q50181142) should be here.

The other day, another user improved coverage ships with much content from Commons.

As Commons is likely to rely more on Wikidata items to describe things, I think we should attempt to reach a similar coverage than Commons categories. The question is in which fields we need to improve. What are your suggestions?
--- Jura 11:57, 7 March 2018 (UTC)[reply]

Aircraft are an obvious case to look at. But in general, we can do a lot of good by improving the sitelinks to Commons, and identifying cases (there are a lot of them) where commons has images of things that don't have P18 values here. Thanks. Mike Peel (talk) 13:14, 7 March 2018 (UTC)[reply]
Commons has a lot of categories which are intersections of multiple concepts (c:Category:Portrait paintings of women of Spain in national costumes is intersection of portrait (Q134307), woman (Q467), Spain (Q29) and traditional costume (Q3172759)). Those should probably not have items unless we already have items based on other projects (see list of French artists (Q3246016)). But single topic commons categories could be a great source of of candidates for new items. I agree with need to improve image (P18) coverege, but other very important massive task is to clean up some constraints violations for commons related properties:
Especially Wikidata:Database reports/Constraint violations/P373 are troubling since that property is used so much. User:Ivan A. Krestinin, what should be done to get P18 and P935 Constraint violations reports working again? --Jarekt (talk) 14:50, 7 March 2018 (UTC)[reply]
  • I'm not sure if cleaning up P373 would help us identify categories that aren't covered yet. Maybe a cleaner version would make filtering easier, but to find an actual gap?
    Going through possible images for sleds I did find a few items that we needed, but these didn't necessarily have categories. Sometimes there are simple things Wikidata lacks, e.g. tennis shoe (Q48978644). In the meantime, that one even got an identifier.
    --- Jura 19:18, 7 March 2018 (UTC)[reply]
Cleanup of P373 and P18 is not going to help with sled items. However we need to get the number of issues under control since poor data quality affects people experience. Links to non-existing categories or images can break tools or just be annoying. --Jarekt (talk) 19:35, 7 March 2018 (UTC)[reply]
I'm not sure if it's worth fixing P373. Eventually, we might just drop it. As for P18, maybe some development is needed to ensure that this keeps working. Some checks could also be done by various WikiProjects, e.g. Wikidata:WikiProject Q5/reports/identical P18.
--- Jura 19:53, 7 March 2018 (UTC)[reply]
Jura If you look at Property_talk:P373 at the section that lists all the templates that use P373, you will see few hundred templates. Since sidelines to commons are unpredictable as to what namespace whey will link you too, P373 becomes like a sitelink, which is unfortunately stored as string. We can not just "just drop it". As a main way of connecting other wikipedia projects to Commons, it would be great if we could fix the backlog of constraint violations. As many of them indicate real issues with the data. --Jarekt (talk) 17:54, 8 March 2018 (UTC)[reply]
  • I do think Jura is absolutely right that there's a huge amount to gain from systematic comparison with the Commons category structure. Because Commons deals with pictures of physical objects, the category structure there is often more detailed, more systematic, and more complete than Wikidata categories -- but (perhaps because of the previously uneasy question of sitelinks) it hasn't had nearly the same attention paid to extraction.
In many ways the opportunities are similar to some recent exploration I've been doing with thesauruses, another hierarchical resource that (in most areas) we haven't measured ourselves up against nearly as comprehensively as we could have done. Using the topics of Wikidata:WikiProject Fashion as a test area (which has done quite a lot of benchmarking), I think it is quite useful to be able to generate hierarchical listings like Wikidata:WikiProject_Fashion/Taxonomy/aat and Wikidata:WikiProject_Fashion/Taxonomy/efv to reveal how much of the hierarchy has been matched; and also, by populating broader concept (P4900) to create a local representation of external hierarchy, quite useful to see where there are parent relations in the external hierarchy that as yet are not matched by parent relationships in our own hierarchy; and, vice-versa, parent relationships in our own hierarchy that do not correspond to any parent relationship in the external hierarchy. Sometimes this just reveals different modelling decisions, or apparent missing relationships in the external hierarchy; but quite often it can reveal incorrect matchings, or questionable relationships, or missing relationships in our own hierarchy. It would seem an obvious available opportunity, to benchmark in a similar way against the Commons hierarchy. And of course, the more we can link Commons categories to objects here, then the more help that also gives us towards understanding the contents of those categories with a view to Structured Data, as well as the possibilities now to add a multilingual Wikidata-driven infobox or other templates to the Commons category.
But the question is, what are good techniques or approaches for finding Commons categories missing Wikipedia items? (And, ideally, for matching them?)
In some work I did last year, working on settlements and civil parishes in the UK, I found SQL queries like https://quarry.wmflabs.org/query/17609 and https://quarry.wmflabs.org/query/17610 quite useful, that look down the category tree several levels, and then look back up one level to find out the categories that those categories several levels down are in. This helps to identify the categories in the tree that are still settlements, compared to those that are churches or some other thing of interest -- something that is quite useful if there is a particular hierarchy one is focussing on.
I was comparing (offline) the categories returned for parishes and settlements with the values of P373s for parishes and settlements on Wikidata, to see if there were ones in the Commons tree that didn't have incoming P373s, and whether there were Wikidata items that might match them, or whether new Wikidata items ought to be created.
But one can also go the other way, by including in results of the SQL query a column for whether the categories have a Wikidata sitelink. In some ways this can be more reliable, because of the potential data issues with P373s. But to make it work, it does mean that it's really helpful if as many categories as possible have sitelinks, if the appropriate target can be identified. (eg by harvesting P373s to categories with no sitelinks). I do think it would be a useful step if we could try to build up this number, perhaps as a bot job. The last time I looked, six months ago, there were about 1,400,000 Commons categories that could be identified with article-items; but only about 740,000 of those could be identified with a sitelink, either directly (540,000) or via a category item and then a category's main topic (P301). I think it would be quite useful to increase those numbers -- and could probably be done as a bot job, or a QuickStatement process over a few days.
(to be continued -- but do jump in, if you have good approaches for identifying Commons categories missing a Wikidata item). Jheald (talk) 21:33, 8 March 2018 (UTC)[reply]
Discussion (and RfC) advertised on Commons, at c:Commons:Village_pump#Commons_categories_and_Wikidata_notability Jheald (talk) 23:48, 8 March 2018 (UTC) [reply]

First version of Lexicographical Data will be released in April

Hello all,

After several years discussing about it, and one year of development and discussion with the communities, the development team will deploy the first version of lexicographical data on Wikidata in April 2018.

A new namespace and several new datatypes will be created in order to model words and phrases in many languages. Editors will be able to describe words in Wikidata, and in the future, to query this information, and to reuse it inside and outside the Wikimedia movement.

If you’re curious to discover how this new data structures will look like, you can have a look at the data model. It is suggesting a technical structure, but the editors will remain free to model and organize data as they prefer, with the usual open discussions and community processes that we apply on Wikidata. The documentation will be improved step by step, with the different releases and help of the community.

Please note that the version that will be deployed in April is a first version, that will be improved in the future, thanks to your tests, comments and suggestions. Some features may be missing, some bugs may occur. We can already tell you that the following features will be included in the first version:

  • Add, edit and delete Lexemes, Forms, statements, qualifiers, references
  • Link from an Item or a Lexeme to an Item or a Lexeme
  • Basic search feature

And the following features will not be included in the first version, but are planned for the future:

  • RDF support (which means: the ability to query it with query.wikidata.org)
  • Senses will not be included in the first version, to give you all some time to get properties, processes, etc in place for Lexemes and Forms
  • Entity suggestion and better search features
  • Merge Lexemes

You can have a look at a more detailed features list. After the first deployment, we will start a discussion with all of you about what are the most important features for you, so we know which ones you would like us to work on next.

Thanks to the people who already showed support and curiosity about lexicographical data on Wikidata. We hope that when it will be deployed, you will test it, experiment with the languages you know, and give us some feedback to improve the tools in the future.

While waiting for the release, here’s what you can do:

  • Improve the list of tools with ideas of tools that could be built on the top of lexicographical data
  • Add your ideas of cool queries you’d like to do with words and phrases in the future
  • Have a look at the project page and especially the talk page, where people are already asking questions, and discussing about how to model data and other topics
  • If you’re involved in a Wiktionary community, discuss with them and answer any questions they might have about Wikidata. You can also register as ambassador for your community.

Last but not least, we are kindly asking you to not plan any mass import from any source for the moment. There are several reasons behind that: first of all, like mentioned above, the release will be a first version and we need to observe how our system reacts to the manual edits before starting considering automatic ones. The system may not be ready for big massive imports at the beginning. Second reason is legal. Lexicographical data in Wikidata will be released under CC0, and the responsibility of each editor is to make sure that the data they will add is compatible with CC0. For more information, you can have a look at the advice of WMF Legal team. Finally, we strongly encourage you to discuss with the communities before considering any import from the Wiktionaries. Wiktionary editors have been putting a lot of efforts during years to build definitions, and we should be respectful of this work, and discuss with them to find common solutions to work on lexicographical data and enjoy the use of it together.

If you have any question or idea, feel free to write on Wikidata talk:Lexicographical data or contact me.

Thanks for your support and I will keep you posted about further details. Lea Lacroix (WMDE) (talk) 16:34, 7 March 2018 (UTC)[reply]

The "Structured Data for Wiktionary" project that legally can't accept any data from Wiktionary. Oh how very well done. Jheald (talk) 16:39, 7 March 2018 (UTC)[reply]
What makes you imagine that to be the case? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:55, 7 March 2018 (UTC)[reply]
(ec) @Jheald: please refrain from such snarky and wrong comments. Especially when Lea message contains a link to the legal opinion from a Harvard lawyer saying it is possible to import data from Wiktionary (or any dictionary for that matter, there is just certain conditions to respect but nothing new under the sun here) and as Wikidata has already imported a lot of data in Q items from wiktionaries in the past 5 years (without any legal issue AFAIK). Cdlt, VIGNERON (talk) 17:06, 7 March 2018 (UTC)[reply]
+1 on @VIGNERON:, whom I thank for writing exactly the same thing I was about to write. --Sannita - not just another it.wiki sysop 17:17, 7 March 2018 (UTC)[reply]
  • It would really be helpful if the Wiktionary community would be provided with an option that interests them. Which server runs what software shouldn't have such an impact.
    --- Jura 17:12, 7 March 2018 (UTC)[reply]
Good news! Great to see you guys have finally announce it will be in CC0 despite the very few critics expressed by people that, really, just don't like Wikidata. I mean, it was already decided month ago so why postpone it for more discussion on this aspect? It will be much better to start experimenting and see the horde of lexicoenthusiasts jump out of the bushes to create thousand of new entries pages. Can't wait. Noé (talk) 20:41, 7 March 2018 (UTC)[reply]
I understand the temptation to be snide, I don't think sarcasm is really what's needed here. --Yair rand (talk) 21:57, 7 March 2018 (UTC) [reply]
Great, I learned the word snide today! Thanks! I don't think it's a constructive behavior neither, but I am very tired of this depressing conversation and I suffer my arguments are not audible, and not taken into account. Humor is a safety loophole. Noé (talk) 10:17, 8 March 2018 (UTC) [reply]
  • @Lea Lacroix (WMDE): It appears that the decisions surrounding this are being made by the dev team unilaterally, as there is no consensus in favor of launching here as opposed to on a Wiktionary site. If there were consensus from all the relevant communities, that would be one thing. If a decision came from consensus from just the Wiktionaries, that would also be fine, although not ideal if Wikidata was not at least informed. If there were only consensus from Wikidata, that would be very far from legitimate, but as it stands there isn't even support coming from the community that is theoretically supposed to be launching the unilateral takeover: In the previous thread, seven (eight?) Wikidata users expressed opposition to Wikidata running this here, and I didn't see one word of disagreement to that coming from anyone outside WMDE (excluding Andy's possible implied support?). You can't just say, "This is a done deal, how about this license?" and take a license decision as support for the premise.
Please allow some discussion to take place, and subject the decisions to communities' consensus. There has never been a real discussion as to whether lexeme data should be on Wikidata or on a new site. There has never been a discussion with the involved participants outside of Wikidata about the license, or the format, or the community structure. There is room for cooperation. --Yair rand (talk) 21:57, 7 March 2018 (UTC)[reply]
If you need one person who think that it's great to have lexemes on Wikidata, you could count me. I see three possible places where to store this data: 1. on each Wiktionary but it means that the database will be duplicated n times so it's probably not a good idea because it means a huge duplication of efforts and won't help small Wiktionaries. 2. on a new wiki like data.wiktionary.org but it means have yet an other wiki with other rules, make the ability to link to the Wikidata items complicated, reuse Wikidata property impossible, at least for community reasons (the data.wiktionary community would probably not want to have this very important part of the structure design be in the hand of an other community), needs to build a new name for possible partners (Wikidata is already well known e.g. in the research community)... 3. on Wikidata that is already a structured data repository used by other wikis successfully like a lot of Wikipedia and Wikisources. Wikidata has already a well known name and en efficient communnity and processes. Having the point of view of a Wikisource contributor, if there were the project of building a Wikimedia bibliographic database I would say "do it on Wikidata, it is already doing this and, with this choices, it would be evident that the content could be as much shared by Wikipedias or Wiktionnaries than Wikisources. Tpt (talk) 10:55, 8 March 2018 (UTC)[reply]

Super exciting news. Congratulations to the team on getting this far, and this is just the beginning! Don’t let the naysayers get you down: they said it wouldn’t-shouldn’t-couldn’t be done about Wikipedia back in the day too [oh, and no-doubt “further discussion needed” too]. Wittylama (talk) 22:38, 7 March 2018 (UTC)[reply]

@Wittylama: Wikipedia wasn't about palling up to its own sister projects and then stabbing them in the back, though, was it? Sending in some cases over 10 years work and up to half a million edits to the trash bin. *Not* the kind of behaviour a WMF project should be applauded for. Jheald (talk) 22:49, 7 March 2018 (UTC)[reply]
I find it a real shame that you - a person who most certainly knows and understands the value wikidata can (and does) bring to the Wikiverse - should choose to read this as a zero-sum game where wikidata and wiktionary are competitors. There will be “disruption” as with any change, but the aggressive turns-of-phrase like “stabbing them in the back” is unfair and unkind to people you know personally, and who respect you and your opinions, in the wikidata community. I beg you to reconsider your frustrated-opposition to this project, or at the very least to stop repeating the argument that the Wiktionary content and community are being ignored or overridden. Wittylama (talk) 23:27, 7 March 2018 (UTC)[reply]
(ec) @Wittylama: I am sorry Liam, but I think that is exactly what has happened here. I think what the project direction has done here is shameful, and as WD editors we should be *ashamed* of how our leadership is treating a sister community. It seems to me Wiktionary has been led right up the garden path with promises of "Structured Data for Wiktionary", seen the detailed structure of their site cloned in minute detail, then had the door slammed in their face with an incompatible licence. "Thanks for 10 years of hard work, now f*ck off". I'm sorry, but I just can't express intensely enough how strongly I feel that dealing with a sister community in this way is unacceptable, a real stain on our site, of which we should all be utterly utterly embarrassed. Would it have been such a hardship to require reusers to include "Powered by Wiktionary" in their credits file, and to share-alike any creative additions they made to the database? But instead we apparently prefer to either rip-off or end-of-line that community's work. Jheald (talk) 00:18, 8 March 2018 (UTC)[reply]
I don’t really think that the relationship between the foundation and community is a « leader / leaded » one. Mostly, the foundation is involved into technical aspects, community about content. « then had the door slammed in their face with an incompatible licence » It’s unclear that the structure associated with language is even copyrightable … Personnally I refrained to comment on this topic but the legal and practical implications of database copyright tends to make me unconfortable, to the point I tend to think choosing a license in this case is rather symbolic. I don’t think we will do like some organisations who put wrong data in our dataset to be able to spot people to import our datas, for example. In the case of structured lexical datas, yes, « table » is an english noun. Language is a communication tool, so I don’t really think it make sense to « protect » this fact … language is made to be shared, not to be claimed by any licence. author  TomT0m / talk page 12:07, 8 March 2018 (UTC)[reply]
@Wittylama: Of course the communities are being overridden. Wiktionary and Wikidata don't need to be competitors, and they don't want to be competitors, but the decision by the dev team that they must be forced into being competitors by having the structured lexical data system set up without Wiktionary is creating the situation without good reason. The value that structured data can bring is important, and we should use it for lexical data, with the communities, not against them. --Yair rand (talk) 23:46, 7 March 2018 (UTC)[reply]
It’s a shame you feel that way. This project could benefit from your obvious energy and interest in the topic to help ensure it is as successful as possible, rather than using it to repeat your critique of it. This concern has been raised, read, debated, and responded-to many times before. So now, even though you feel the project is sub-optimal, I encourage you to be a force for positive change, to help try to make it succeed - since you agree that structured lexical data is important. Wittylama (talk) 00:15, 8 March 2018 (UTC)[reply]
@Wittylama: It has not been debated. I follow these discussions quite closely.
I don't think it's "sub-optimal", I think it's actively harmful to Wikimedia. After fifteen years of Wiktionary being entirely ignored by Wikimedia institutions, we finally hear of a plan for structured lexicographic data, lots of work goes into it, and then there's the plan for deployment... on Wikidata. The one thing that was literally built for a dictionary won't be on a Wiktionary site. The building of structured data will take place exclusively on Wikidata, set up as a competitor to Wiktionary. Wiktionary communities have no say as to how it's built or structured. It will be administered exclusively by Wikidata admins, under Wikidata policies. Wiktionary will not be permitted to use an open-source extension developed by a Wikimedia organization that would benefit it immensely.
The worst-case scenario would be if this becomes a general precedent of non-cooperation between Wikimedia communities. If the success of one project's mission is anti-correlated with the success of another, people will certainly not be inclined to support a sister project. I hope this does not happen. --Yair rand (talk) 02:11, 8 March 2018 (UTC)[reply]
  • WMF already paid a user to export content from Wiktionary. Obviously, we (at least I) hoped that it would be possible to import this into a Wikibase, but apparently this wont be possible. The time, money and effort will be lost.
    --- Jura 00:37, 8 March 2018 (UTC)[reply]
    As Sannita said it seems that some lawyers agree that we actually could import content from Wiktionaries . I am still waiting for a lawyer or a reuser stating that "Wikidata is dangerous because it have extracted and imported facts from not public domain/CC0 sources like Wikipedia or a lot of others databases.". Please correct me if I am wrong. Tpt (talk) 10:55, 8 March 2018 (UTC)[reply]
    Well Tpt it certainly worries me, because I think our community probably has cumulatively extracted a lot of data from non-CC0 sources, that is identifiably from those sources or even referenced to them, and probably did have an element of original creative selection, arrangement, or judgement. I think that could well be seen as license-washing, and could well be an accident waiting to happen. There are some quite assertive data publishers out there. I think we would not do well to underestimate our risk in this area. Jheald (talk) 11:56, 8 March 2018 (UTC)[reply]

Trying to get a summary of the concerns that the vocal critics here have expressed. Is this an accurate summary of your position?

  • Jura you say it's the wrong project.
  • Yair rand, you say it's the right project, but on the wrong wiki.
  • Jheald, you say that it's the right project, but with the wrong license.
  • Noé, you say it's moving too fast.

Wittylama (talk) 09:09, 8 March 2018 (UTC)[reply]

No. I think it's the wrong project (assuming by "project" you mean, eg, Wiktionary or Wikidata), the wrong wiki, and the license should be chosen by the right project. (Based off of their statements on the earlier thread, your summaries of Noé's and Jheald's views also seem to be incorrect, but I'll wait for them to clarify.) --Yair rand (talk) 10:03, 8 March 2018 (UTC)[reply]
(ec) As per Yair rand above. The underlying problem is the failure to get any sense of buy-in or ownership of this by Wiktionary, or even to apparently consider that important. The licensing is the sharpest manifestation of this, because it literally slams the door on their work, and says it will have no part in the new project, building an unbreachable wall between the two. But the failure to give the Wiktionary community any sense of governance over the new project is all of a piece -- not even the most cosmetic measures to present the new namespace as an extension of Wiktionary as much as an extension of Wikidata. Lea's announcement that the new licence will be CC0, before the RfC on that exact question has even closed, is just another example of the recent tone-deafness of the project direction in this area. As Yair says above: After fifteen years of Wiktionary being entirely ignored by Wikimedia institutions.. [t]he one thing that was literally built for a dictionary won't be on a Wiktionary site. The building of structured data will take place exclusively on Wikidata, set up as a competitor to Wiktionary. Wiktionary communities have no say as to how it's built or structured. It will be administered exclusively by Wikidata admins, under Wikidata policies. Wiktionary will not be permitted to use an open-source extension developed by a Wikimedia organization that would benefit it immensely. This is simply not how we treat our own. It is not acceptable for a WMF project to actively marginalise an existing Wikimedia community in this way, sabotage their licensing, and undermine their work. Jheald (talk) 11:28, 8 March 2018 (UTC)[reply]
Well, it is quite fast, mainly because m:Wikilegal/Lexicographical Data is a preliminary note with plenty questions remaining open but I am much more concerned because Lexicographical data in Wikidata is not a community-lead project. Decisions are took by a group of 5 to 10 persons with their own agenda. I was happy to see some honesty when the name changed, because there is nothing for Wiktionary in this project. It may be built on Wiktionary data but for third parties. It was clear in Denny's prose since the beginning. I mainly disliked the plebiscite organized about the license, with scarce information on the whole picture, only arguments pro and no room for discussion before to start the vote. It was a false debate, and the decision was set before the beginning of the vote. Note that I do not think the people behind the project are evil. I am convinced they want to do something good, but it's not enough for me. I think important projects in our communities have to be grounded in community discussions in which a consensus emerged after each position have been expressed and discussed. Here, there is no consensus, only a non-democratic leadership on an opaque project. But well, it appears they do not want Wiktionarians but to have a new community to collaborate on lexicographical data here. I am curious to see who will do that (how many of the voters for example) and how emergent problems will be discussed. Noé (talk) 10:45, 8 March 2018 (UTC)[reply]
@Noé: One of the biggest problem when starting a project like this is that overall, before you actually got something « in production », it’s kind of really hard to get community input. Few people are actually involved into the discussions of the data model and so on. So to advance you need to move on with the few input you got and take decisions … The « preliminary note » is what the team has after several years of on and off discussions. I think getting what you would qualify « non preliminary note » is something that would actually not happen. At some point the devteam has to propose something, and that something has to be a product because that’s the only way to get a lot of inputs. Tests wiki do not involve a lot of persons. author  TomT0m / talk page 11:26, 8 March 2018 (UTC)[reply]
@TomT0m: The preliminary note I mentioned is a legal analysis, not a technical one. It was written by a legal counsel of the Wikimedia Foundation. It is not a work by the Wikidata devteam, and it was not claimed as fund/asked by them. It is about the licensing of lexicographical information, and there is several kind of contents that are not taken into account yet. I asked polite questions on the talk page and I think a second version of the document could be made out of them, being more specific on delicate but important matters. Noé (talk) 12:39, 8 March 2018 (UTC)[reply]
@Noé: The legal issues on Databases, worldwide … pretty complex topic unlikely to be settled until tested in front of court on country with jurisprudence … especially in a « Big Data » world. Just heard about the problems of the Gutenberg project on this article. I’m afraid these issues are really complex, and that this project, right now, has to leave with legal risks just by compiling facts :/ the same risk exists for (unstructured) Wiktionary I guess. For example Wikidata descriptions should not be theoretically extracted from Wikipedia, and Wikidata lives with this since the beginning. I’m not really aware of any issue, major or minor with this. author  TomT0m / talk page 18:12, 8 March 2018 (UTC)[reply]

license, or other issues?

A meetup at Wikimania 2016 with wiktionarians and wikidatians.

I've only been on wikidata for about 2 1/2 years now, but all along I recall the lexicographical extension being discussed here, and then under active development for the last year or more. March 2018 is the first time I've seen any opposition to this effort. Where have you people been during the last 2 1/2 years? Is the CC-0 license the real issue, or is something else going on here? I find this pile-on against the new development very discouraging - let's at least see how it works in practice as Lea suggested. Maybe there's a better way to do it, or maybe there's no point in doing this at all, but given the effort invested so far to implement it here, I strongly feel it should be given a chance to prove itself. As to the license, anything other than CC-0 greatly limits the usefulness of a structured database - it's very hard to do attribution for example as required by CC-BY if you are building a box based on a hundred pieces of data from as many contributors. But if we really need CC-BY or something else for some parts of the data, let's deal with that when needed. Wikidata is partly CC-0 (the main namespace) and partly CC-BY-SA (this and other text name spaces) as it stands, so the license issue certainly shouldn't be a show-stopper. ArthurPSmith (talk) 13:53, 8 March 2018 (UTC)[reply]

We are around for a while! Maybe not in Project Chat but in Wikidata:Wiktionary (when the page was named like this). You can have a look at Wikidata talk:Lexicographical data/archive to see our past discussions, and see that similar points of view were already expressed about multiple issues. Also, we had a meeting at least in two opportunities, at Wikimania 2016 in Italy and at Wikiconvention francophone 2016 in Paris Noé (talk) 14:32, 8 March 2018 (UTC)[reply]
Thanks for the link. You (Noé) certainly commented a lot there, but it doesn't seem to have as negative a general tone as your recent remarks. What's changed? Yair rand also commented quite a bit but mostly positive. Jura had a few comments including a suggestion of a separate installation, but it wasn't followed up on. And the bulk of the discussion was long ago (late 2016 mostly) - if there was significant opposition to the whole concept why weren't you pushing the development team to redirect their efforts somewhere else this last year? ArthurPSmith (talk) 18:15, 8 March 2018 (UTC)[reply]
At first, I wasn't very enthusiastic for Wikidata because I think a multilingual project like this one give an important bonus to people that can express their opinion in English. But, after having discussed with Lydia and colleagues at Wikimania, I accepted to collaborate and I spend dozen of hours to initiate conversations in French Wiktionary and in the page already mentioned. At some point in the discussion, I rose a question about the formation of contributors. For me, it is important to document the issues we encounters by learning collaborative lexicography. Some Wiktionarians already spent years to document lexicographic problems and I was troubled to see it was planned to start from scratch and not take profit of that in Wikidata. During the discussion, I realized the devteam was not looking so much on how it could improve Wiktionaries but rather how lexicographical data can be reused by third parties. I had the feeling it was more oriented to computational operations rather than human consultation and contribution. Then, I was not very pleased by the first model. I had the feeling it was still oversimplified. Finally, it was the way the plebiscite was organized that made me as negative. The situation was not clearly stated, only with positive arguments and without room to express another option. Last disappointment, Léa have posted the announcement for April saying it will be in CC0, jumping to a conclusion I felt was already decided before the beginning of the vote. I wasn't fully opposed of any development, I just feel the project now is very different as how it was design at first and may not be good for Wiktionaries communities. Noé (talk) 22:06, 8 March 2018 (UTC)[reply]
@Noé: That’s an interesting point of view for sure. I’ll try to summary the different opinions at this point, how I see it:
  • It’s of interest to have lexicographical data for human beings. That’s what wiktionary does in a Semi-structured_data way. You’re saying that the wiktionary community had accumulated and documented over time a lot of knowledge about how structure that knowledge, and how to present it to humans.
  • The cons of Wiktionary is that it’s not easy to share information between language versions and each language version maintains its datas on each languages knowledges.
  • It’s of interest to have machine readable datas of lexical datas. There is numerous applications like translation assistants, of which our communities are big consumers. A central data centric repository for those structured datas is proposed to help achieve achieve this goal.
  • One of Wikidata goal is to propose a repository of data usable for « humans and machine alike ». It’s achieved by giving interfaces for human to enter datas in a way precise enough for machines to enter and consume datas in a documented way through technical documents and programming API. It’s a « middle ground » solution as the objects defined are generics (items, properties, statements …) and not very specific and rigid (human with a date of birth and death and that’s it). It’s designed so that community can easily create new objects (items, properties, statements …) necessary to express new things. This comes at the cost of some structure (humans may have a construction date, which is typically not what a data consumer would expect). On the other hand some stuffs easy to express in natural language are harder or more tedious to express in Wikidata model. This is a tradeoff.
  • There is a proposal to extend Wikidata to include structured datas about lexical entities, that follows the approach that was initially proposed for Wikidata : generic concepts to describe lexical entities, extendable by creating as many instances of these concepts community needs to model them.
The unknown at that points : it’s unclear on how Wiktionaries will interact with these central repos. Some answers may be similar to Wikidata and other wikis : Extension:ArticlePlaceholder could be used to automatically generates term pages in a language that currently do not have one (yet) using the structure defined by a local community from the central repo. This allows sharing lexical datas between linguistic projects while letting them keeping their specificities. There is effort to create a client editing pushed by individuals in clent wiki communities, but this has not really been finalised yet. It may be unreasonable to expect that the plan on wiktionary / wikidata interaction on editions be settled before this is done (and wiktionary could be included into those efforts). Is it worth waiting a « definitive » answer on this question before starting the pure structured part of lexical datas ? Experience proved Wikidata developed and achieve stuffs way before this, so in my opinion the answer is « no ».
You express doubts on :
  • the extensibility of the model to model the full range of lexicographical datas. Experience will tell if the proposed model is community extendable enough to express the subtleties, but to me it seems reasonable that the same approach that Wikidata had on datas to lexicographical datas : generic concepts, community extendable, maybe at the cost of some expressibility.
  • The relationships between communities and projects : this is true but hardly avoidable. For example if a structured datas repo dedicated to lexicography was enabled on its own this would not avoid a putative tension between the « text centric » lexicographers and the « data centric » one, as this exists in Wikipedias (EDIT (to push the reasoning to an end) : plus with Wikidata community if we experience a triple community pattern, not really an ideal solution imho). Plus its unrealistic in both cases to replace a semi structured wikt to a fully structured one in a heartbeat, so the two aspects will coexists at least for decades.
  • The licensing issues : It’s unclear that structured datas can express easily stuffs that are for sure subject to copyright (like) laws in the dictionary, that is elaborated definitions written by lexicographers, or paragraphs about etymology and so on. This could be expressed in a structured way but with an entirely different way, through item and properties with the expressivity tradeoff explained previously. Concept like « Gloss » are not intended to replace them.
Did I forget something, anything you do not agree with in this description of the state of the art ? author  TomT0m / talk page 09:40, 9 March 2018 (UTC)[reply]

Thanks for your well written answer. I'll follow your path, and try to answer to each dot.

  • Well semi-structured data is not a proper name, because data are very structured in Wiktionaries. For French Wiktionary, a researcher made a XML version of the project and only had to correct about 60 errors (GLAWI). It's textual and queries are not optimal, but it can already be used by translators or language learners.
  • "share information between language versions" well, that's not easy because culturally, we do not see others' languages the same way we consider our language. German speakers will not describe French the same way French is describe in French (for example wikt:de:voilà say it's an interjection like in German, but wikt:fr:voilà say it's a verb). Some grammatical categories are commons in a culture and can be used in a dictionary to describe foreign languages but rare in other and a different name may be used to let readers understand it (for example optative is a grammatical category often render in dictionary as subjunctive, it's very different from a linguistic point of view but more interesting for readers, you may also consider the definition of basic linguistics concepts such as adjective or verb may be different for each source language). Finally, there is traditions in the way information are conventionalized (for example, cases for Latin are not displayed in the same order in France and in Germany for no reason but tradition). Those problems may be solved but we had so much more work to do in describing 500k+ words in English (for English Wiktionary) and 350k+ words in French (for French Wiktionary) and adding synonyms, thesauri, examples (more than 360k for French Wiktionary!), pictures, translations in plenty languages and more. Managing this kind of very tricky problems may have interested a dozen of expert but it was not the aim of a collaborative project to work only with experts. In order to have a community of interested people, we deliberately skip those insolvable issues. It's not a default, it was a wise choice to make the projects grown up as they are now.
  • "It’s of interest to have machine readable datas of lexical datas", well, yes. For linguistic investigation, I can imagine multiple uses also.
  • The way you describe Wikidata goals is interesting.
  • I am not sure I get you point here.

"it’s unclear on how Wiktionaries will interact with these central repos." and that's a matter of concern for me. Because, it can led to a fork situation, with similar but no identical data in Wikidata and in Wiktionaries. Plus, as Denny mentioned several times, he imagines a new community emerging in Wikidata rather than people migrating from Wiktionaries. I am sure a bunch of people will contribute on both projects (like Vigneron, Pamputt or JBerkel) but I think it will mainly be to check for mistakes not to enhance existing data. If someone want to add entries in Berrichon from an old PD dictionary, will it include it in French Wiktionary or in Wikidata? In a place where attestation of uses can be added, pictures included and link to words in the definition, or in a place where one have to remember sequence of numbers that mean "Noun" or "Intransitive" in order to add the exact statement? Well, we'll see.

About doubts:

  • Is the model stronger enough, well we'll see. I am already concerned by "Pronunciation" being singular, because that sounds very normative. There is no 1:1 correspondence for a word to a pronunciation. That's a myth and a normative vision. But if the project here focus on giving the several possible pronunciation for each words, there is no need for semantics and it can be cool.
  • Tension between Wikidatians in general, Wikidatians interested in lexicography, Wiktionarians, but also the Wikidata devteam and other people that may develop tools for one community and not the others. Without a middleground and a safespace, it will be very hard to communicate. It may be in Meta, but for now on, it's mainly in Wikidata, where Wiktionarians doesn't feel welcomes. And it's almost all in English.
  • Licensing issues: I agree with your phrasing. To give an example for etymology: Saying that a French word can from Latin can be very wrong. Well, they may share a more or less common history but with plenty step in between. It may have come from regional uses, with changes in meaning due to contact with other languages or the object it designate change in time. For a nice example of a complex story, you can have a look at bataclan in French Wiktionary. I think this kind of story will not be represented in Wikidata. It is not a unique case, this is very common to have long stories for etymology. Where there is not, it's because it wasn't studied enough.
  • Finally, an import doubt you skipped is how this new project will document itself. New policies and help pages have to be written and in several languages. I fear newcomers will spent hours to reinvent pages already made in Wiktionary and that English speakers points of view will be prominent in those. I haven't see enough tools, spaces or methodology to favor communication between people that don't speak English enough to participate.

Well, I spent more than an hour writing that but I may have done some mistakes because it's not my native tongue, so I apologize for that. Thanks again to TomT0m for this nice will of communicate here Noé (talk) 11:26, 9 March 2018 (UTC)[reply]

@Noé: (en vrac, no time) I guess the choice of modelling of language has to be something different that is not a one to one correspondence to what grammarians does in one country. In Wikidata we rarely use a model « as is » and what we get is either something new, a mixture of existing solution, or the union of different viewpoints. For example on the « voilà » example Stuctured Lexicography could present both while frwikt alone would be comparatively poorer. Community decides in the end the way to model the datas, we are NPOV so no viewpoint is to exclude a priori.
« I am already concerned by "Pronunciation" being singular » well, if it stays that way, which is always sonething we can discuss with the devteam, chances are that we add a property « alternate pronunciation ». To gives you an example of how community can extend the model.
on grammar categories, this is something I don’t really no much so I don’t say much, but I imagine there is relationships between them like some being particular case of another. It’s something we can deal with in Wikidata with properties like subclass of (P279) and queries and/or programmatically arbitrary access. This seems like problems we discuss everyday in Wikidata so we’ll welcome a discussion on these :)
Finally, an import doubt you skipped is how this new project will document itself. New policies and help pages have to be written and in several languages. I fear newcomers will spent hours to reinvent pages already made in Wiktionary and that English speakers points of view will be prominent in those. I haven't see enough tools, spaces or methodology to favor communication between people that don't speak English enough to participate. Mmm the only way you could fully avoid that risk is to align a data model with the specific usages of a linguistic version of a wiktionary, this means one data model for linguistic version, and no documentation of the french data model in spanish, for example. This is indeed fundamentally a different approach of the current central approach, but it’s a lot of effort for a dubious gain if, for example, there is already structured datasets for french. The roots of central repos is to share in hope we can make something better than the sum of the part. I’ll like to emphasize again that in practice Wikidata is not the kingdom of the « English viewpoint », I think « we should not refuse datas if one Wikipedia use them » is one of our principle (is it written somewhere) as a central repo.
If someone want to add entries in Berrichon from an old PD dictionary, will it include it in French Wiktionary or in Wikidata You did not took into account the idea that he might include them in enwikt, which is interesting because a viewpoint is to see that there is 200+ Wiktionary forks :) I guess it’s of interest to understand how to work with each other, which mean strengthen the links between communities and avoid thinking in terms of « safe place » which means there is a war in a first place … why should there be a war ? Importing them in Wikidata is not incompatible to import them in frwikt. It’s of interest of other wiktionaries to import them in Wikidata to make them available to other Wiktionaries anyway. Informations that cannot be represented (easily) in a structured way may need to be expressed in the wikipages anyway. author  TomT0m / talk page 12:29, 9 March 2018 (UTC)[reply]

Termination of wiktionary.org ?

  • Funny summary, WittyLama. I wonder if you actually read the discussions.
    The problem here is that the current proposal attempts to replace Wiktionary.org with an incompatible solution. A compatible solution could be on this Wikibase or another dedicated server. The technical difference between these two should be minor.
    This is different from what Wikidata did before: I don't think there were any complaints that interwikis were moved out of Wikipedia. Why would there be: the same is provided differently in a better way. The same goes for the external identifier stuff. Both were peripheral to Wikipedia and the integration of the Wikidata approach in the existing Wikipedia is fairly well developed.
    For Wiktionary, the way the technical development is being instrumented, practically terminates Wiktionary.org without any agreement of the relevant community or an explicit decision from WMF board. This despite that it could function here or elsewhere, in a compatible way or in an incompatible way.
    If there were discussion were one or the other point was decided before, I'd be happy to read. I asked about it further up on this page. Apparently others ask themselves the same question, while some guy thinks I had already written about it. Funny story.
    --- Jura 15:16, 8 March 2018 (UTC)[reply]
    Could you explain where this idea come from? How can a content project be terminated by data? I think it's obvious for everyone that content can't be generated by data only (or only at a very low quality), and as far as I know, no one proposed replacement. PS: the WMF board can't close projects, the LangCom is in charge of enforcing the closing policy (policy which is not meant for active and common projects). Cdlt, VIGNERON (talk) 16:38, 8 March 2018 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── "The problem here is that the current proposal attempts to replace Wiktionary.org" No evidence at all is offered for this frankly bizarre assertion. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 19:25, 8 March 2018 (UTC)[reply]

It has been repeated on the discussion page and the FAQ, the project has no aim to replace or fork Wiktionaries. A lot of content that is currently on Wiktionaries will not be in Wikidata.
"There is plenty of content and community organisation work that is being done on the Wiktionaries that cannot and should not be transferred to Wikidata. Wikidata is just a tool to support the Wiktionary and other Wikimedia projects—it is not capable of replacing them and it should not be used as such." (FAQ, collaborative work between Lydia, Denny and me, September 2016)
"Wiktionary is much more than just the structured data. And the Wiktionaries would still continue to fulfill these additional functions, and could indeed focus on them and thus become more effective." (Denny, November 2016)
"Wikidata doesn't have the intention to replace the Wiktionaries with something new, but to provide a backend database that can support the Wiktionaries, if they choose so." (Denny, November 2016)
"We are not working against each other, and we don't want to. We don't want to steal the content and the communities from a project to another. We want to work together with the expertise we have on both sides." (Léa, November 2016)
"The Wikidata development team will not force any project to use Wikidata’s data, and their editors to edit on Wikidata. It’s up to the individual projects and editors to decide which parts and what data from Wikidata will be useful for them." (FAQ)
Lea Lacroix (WMDE) (talk) 11:21, 9 March 2018 (UTC)[reply]

Hybosorus illigeri / H. roei

Regular readers will know that the current three sections at the top of this page all relate to User:Succu's editing of taxonomy-related items or to taxonomy-related property proposals. There is now a fourth issue.

Noting changes to species:Hybosorus illigeri and species:Hybosorus roei, which included the addition of the text:

New Case 3768 has been submitted to ICZN on 5 March 2018 in order to save the name illigeri Reiche, 1853. Under art. 82 of the Code when a case is under consideration by the Commission, prevailing usage of names is to be maintained until the ruling of the Commission is published. Therefore the name Hybosorus illigeri must be used, being in prevailing usage.

(that case, being "under consideration", is of course - significantly - currently unresolved); and that the matter had already been the subject of a previous ICZN case (Case 3400. Hybosorus illigeri Reiche, 1853 (Insecta, Coleoptera): proposed conservation by giving it precedence over Hybosorus roei Westwood, 1845) with a finding to the opposite (OPINION 2230 (Case 3400) Hybosorus illigeri Reiche, 1853 (Insecta, Coleoptera): precedence not given over Hybosorus roei Westwood, 1845); I created Hybosorus roei (Q50355361) and marked it with said to be the same as (P460)-Hybosorus illigeri (Q1945578) and vice versa.

Note also that P460's English description is:

this item is said to be the same as that item, but the statement is disputed

Succu refuses to accept this, despite my opening a talk page discussion and pointing out his own edits ([7], [8], [9]) which show that these items are conflated. His only response was to falsely accuse me of doing "only reverts"

He has offered no justification for his edits, and has now resorted to falsely accusing me "trolling". Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:38, 7 March 2018 (UTC)[reply]

To start with a first question: Could you please link to the publication of Case 3768 in the The Bulletin of Zoological Nomenclature, Mr. Mabbett? I'm not aware of such an article. Thanks --Succu (talk) 18:52, 7 March 2018 (UTC)[reply]
OK. A second question: Why do you trust the replacement of the current ICZN ruling done by an IP, Mr. Mabbett? --Succu (talk) 21:35, 7 March 2018 (UTC)[reply]

Q48975747 and Q2402627 redundant

I don't know how to merge redundant entries Yehoshua Blau (Q48975747) and Yehoshua Blau (Q2402627) together. I thought that Wikidata would automatically follow the interwikis on the en.wikipedia article, but that doesn't seem to be the case... AnonMoos (talk) 13:19, 8 March 2018 (UTC)[reply]

✓ Done
you may use the "Merge" tool (first of the gadgets) for this kind of job : just make sure that it is really the same concept (easy for people, less easy for other items) :) --Hsarrazin (talk) 15:01, 8 March 2018 (UTC)[reply]
Thanks... AnonMoos (talk)`

Bot flag request

Hello! I have left a request for bot flag at Wikidata:Requests for permissions/Bot, but it left unnoticed. Please, pay attention to it. --Tohaomg (talk) 14:01, 8 March 2018 (UTC)[reply]

Please test pings in edit summary

1. Read this:

"You can notify users in edit summaries. They will get a ping just as if they had been mentioned on a wiki page. phab:T32750"-- meta:Tech/News/2018/10

2. Sign up at https://wikidata.beta.wmflabs.org/ using a different user name and password (not the one you use here). You may create multiple accounts if you like, just put a note on their user pages.

3. Edit a page and put a username link in edit summary. Confirm that you are receiving the notification correctly.

4. Test at different pages and in different ways.

5. Report bugs to Phabricator.

6. Share this comment with other people on other wikis, in different languages.

--Gryllida (talk) 23:51, 8 March 2018 (UTC)[reply]

Special:Diff/646435431 --Liuxinyu970226 (talk) 10:25, 9 March 2018 (UTC)[reply]

How vandalism in wikidata affect local wikis

Yesterday at 01:42 someone renamed Russia (Q159) into "mainkra". It was reverted with an hour, but unfortunately vandal version spread somehow into ru-wiki infoboxes (probably because of mysterious cache algorithms). Right now google shows that thousands of articles still affected (see [10]). I understand that this might not be a top priority issue for wikidata community, but is there anything we can do to decrease probability of similar incidents in the future? For instance, what is the reason why we allow anonymous contribution for highly used items like Russia (Q159)? --Ghuron (talk) 12:47, 9 March 2018 (UTC)[reply]