Wikidata talk:Lexicographical data/Archive/2017/09

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

CC0 and CC BY-SA

Wikidata is under CC0 licence and Wiktionary is under CC BY-SA licence. If I understand properly, ShareAlike property forbid to copy the content to Wikidata. How do you plan to deal with this legal aspect of author rights? Noé (talk) 16:03, 14 September 2016 (UTC)[reply]

If I understand correctly, simple facts are not eligible for copyright, so most data wouldn't be an issue. The only area of concern would probably be definitions, but the proposal doesn't seem to include anything about moving those to Wikidata (which seems reasonable to me, as that's not really something that would be useful for Wiktionary). --Yair rand (talk) 20:02, 14 September 2016 (UTC)[reply]
Perhaps not quite correct on several points, but the sense/definition of the term is proposed to be part of the Wikidata entry. Whether or not in a specific case there would be a technical violation of copyright is perhaps less an issue than the perception that it violates the spirit of the Creative Commons licenses. - Amgine (talk) 04:16, 15 September 2016 (UTC)[reply]
@Amgine: My understanding is that a gloss is just an identifier for distinguishing between senses (like those used in translation table headings, {{gloss}}, synonym listings, etc.), and not a definition. The proposal doesn't spell this out specifically, though, so I might be wrong about that.
Re violating the spirit of the license... Yeah, that seems like a reasonable concern. Properly attributing Wiktionary content is difficult enough as it is. --Yair rand (talk) 04:59, 15 September 2016 (UTC)[reply]
"(sense) S1989 (won't be displayed)" <- I believe this is intended to be the 'definition'; Wikidata's glossary does not appear to be using the en.WT jargon of gloss being a simplistic description/translation and sense being a more nuanced/precise conventional use for a word. (Incidentally, neither 'gloss' nor 'sense' appear in en:WT:GLOSS, which seems an odd omission.) - Amgine (talk) 16:02, 15 September 2016 (UTC)[reply]
What Amgine says. Every sense has a gloss, which is a text, not an identifier. The mock up shows that --Denny (talk) 20:05, 15 September 2016 (UTC)[reply]
@Denny: Sorry, "Identifier" may have been the wrong term here. To be clear: I was thinking that it would be like glosses as explained at wikt:Help:Glosses: a "set of words that uniquely identify a definition", as opposed to the full definition, which is typically longer. Not like that? --Yair rand (talk) 22:12, 15 September 2016 (UTC)[reply]
@Yair rand: ah, thank you for explaining and the link, I indeed misunderstood. Yes, in my mind it is more like a gloss than a definition, but ultimately it is for the community to come to an agreement on what exactly should be there. But as said, in my mind, yes, these would be rather short. --Denny (talk) 17:45, 16 September 2016 (UTC)[reply]
@Denny: Just out of curiousity, how would it be useful _for_Wiktionary_ to have a non-authoritative index to its senses which almost but not quite defines terms, and which cannot actually link to individual senses in Wiktionary since that functionality has not been implemented? (for that matter, how would that be useful to the other projects?) - Amgine (talk) 17:56, 16 September 2016 (UTC)[reply]
I am afraid I seem to fail to understand part of the question. If you don't have the senses, how else would you access the data associated with them? So if a Wiktionary decides to reuse some information about a sense, it needs somehow to identify that particular sense. And without the gloss it would not be possible to identify a sense. As said, I am sure I am missing something here. --Denny (talk) 18:08, 16 September 2016 (UTC)[reply]
For the purpose of this (brief) digression:
  • sense: a conventional use of a word/phrase.
  • definition: a single expression of that precise use.
  • gloss: a set of words to uniquely identify a sense, usually an over-simplification of the definition.
A gloss does not define a sense; it is generally too simplistic. For example, a gloss for blue is "depressed", the definition being "depressed, melancholic, sad." Depressed is clearly not exactly equal to sad, but more simply used as a human-readable identifier used as an aid to navigation for readers. The glosses serve the same purpose as an index id to a given sense.
There is currently no method to link directly to a specific sense in Wiktionary.
Therefore, how is it useful _to_Wiktionary_ to compile an index of glosses (an index of index ids), when you cannot then connect them to the definitions? Since there will be no importing of data from Wiktionary to Wikidata, how would you normalize the glosses with those of Wiktionary if (at some unknown point in the future) it becomes possible to link to Wiktionary definitions? - Amgine (talk) 19:10, 16 September 2016 (UTC)[reply]
Thank you for the explanation, that was indeed helpful. I still think that in your example the gloss could be sufficient, but maybe it is not. One way or the other, it is not me making that decision, and maybe that field would rather be used for the definition, or for something in between. It is really up to the community how to fill it. I don't want to make presuppositions here. If it ends up to be proper definitions, I am happy too. Maybe even happier than with glosses. One way or the other, up to the community.
Regarding the second question: it is exactly only useful to the Wiktionaries because you cannot connect to them. In Wiktionary, I can call data from Wikidata - because there the senses have identifiers and can be called from Wiktionary. So if, on a Wiktionary sense, for example I want to list a few example sentences from Wikidata attached to a specific sense, I can do that. I can go and specify the sense and reuse them. If I want to reuse some data about the sense, I can do that - each Sense in Wikidata will have an ID. So from Wiktionary I can go ahead and call the data from Wikidata if I want. If I want to call all senses, I can do that too. But if I disagree with the senses from Wikidata, I can also ignore that, and still I can get the data about the senses if I so want. In Wiktionary it is entirely controlled by the local community how much, if at all, they want to use from Wikidata. There is no need - and indeed, no technical way - to connect a Sense from Wikidata to the specification of a Sense in Wiktionary. But that link is not needed for Wikidata to be useful for Wiktionary - for that, what we need, are persistent identifiers in Wikidata that can be called from Wiktionary.
I hope that makes a bit sense. --Denny (talk) 19:32, 16 September 2016 (UTC)[reply]
Yes, I think it made excellent sense. From my understanding of it you wish to host the content of Wiktionary on Wikidata, because there it is possible to structure the data as it should be structured. - Amgine (talk) 19:43, 16 September 2016 (UTC)[reply]
But only the part that is actually structured and fits into the data structure that will be implemented and fits with the legal requirements. There is plenty of content on Wiktionary that would either make no sense to transfer, or that would not be allowed due to the licensing restrictions to be moved. And that's OK. Wikidata does not have to have a total representation of all content in Wiktionary in order to be useful to Wiktionary. --Denny (talk) 20:07, 16 September 2016 (UTC)[reply]
What is concern by copyright is larger than only definition. I think translation, synonym, etymology, thesaurus and so on (there exists translation dictionaries, synonym dictionaries, ...) are protected by copyright. So, I think that only few information could be not protected. The proper way to deal with that is to ask to all contributors if they agree to change the licence of their past contributions. Yet, it is really a huge amount of work so it can only be done by Wikimedia Foundation. For comparison, OpenStreetMap changed its licence some years ago and they ask to all people that have contributed if they agreed to change the licence. Contributions of people that did not agree were not kept. Pamputt (talk) 06:01, 15 September 2016 (UTC)[reply]
  • This could be solved by moving it to a separate installation. Wikidata items would have remote ids and could be read from there. If needed, we could do the same for them. This had the advantage that a running remote installation is showcased by WMF/WMDE.
    --- Jura 07:10, 15 September 2016 (UTC)[reply]
    How would this solve the issue? --Yair rand (talk) 07:19, 15 September 2016 (UTC)[reply]
    Maybe Jura meant that a separate installation could be under cc-by-sa rather than cc-0, but a separate license could be attached to a namespace within this domain as well. There might be ways to ensure that a link to the history and original history is always reachable next to any displayed gloss or other, but it would probably be easier to not store any original text at all perhaps? Nemo 09:04, 16 September 2016 (UTC)[reply]
Hello all,
Thanks for your concern and questions. Our answer to that is far from perfect, we are aware of that.
As mentioned before, some of the information can be uploaded in a database, some is protected by CC-by-SA.
From the development team's side, we provide the structure to host all this data, but we don't plan to make an automatic import of all the data from Wiktionaries. Every decision about the content should belong to the communities. Knowing this, several solutions are possible :
  • Add no data from Wiktionary in the database, and start the work from scratch
  • Import only data that its author on Wiktionary explicitly says that it's OK to upload in CC0
  • Give a different license to a part of Wikidata, separate installation or namespace (which would strongly complexify the database, be a problem for reuse and understanding the project)
  • Start a collective decision in the communities to allow the reuse of all their data in CC0
Besides the legal questions, I think we should remember that our goal is to provide free knowledge, easily accessible to a large amount of people, I question ourselves about how to achieve that.
We're still trying to figure out this problem, asking for advices and reading documentation, and open to discussion. Lea Lacroix (WMDE) (talk) 09:55, 16 September 2016 (UTC)[reply]
I'm no lawyer. As far as I understand shared commons licenses don't forbid anything. To the extend they do something they just allow certain actions. By copyright law facts in themselves have no copyright.
On the other hand databases do have copyright. The Wikimedia foundation has a copyright on the Wikidictionary database and can allow entities to forbid or allow them to use the republish the database. In this case the Wikimedia foundation could forbid the Wikimedia foundation from copying data, but it likely won't do that. If we don't look at the legal person then the Wikidictionary community is the owner of the Wikidictionary database and might be asked to allow the data to be copied to Wikidata.
I would prefer not having CC-SA for parts of Wikidata even if that means there's less data in Wikidata. ChristianKl (talk) 20:18, 16 September 2016 (UTC)[reply]
No, databases do not have copyrights. In some jurisdictions, databases have a sui generis database right. See also Luis' blog entry on that. Also, a text by me on databases and rights. --Denny (talk) 20:30, 16 September 2016 (UTC)[reply]


One should be aware that copyright is not the only legal constraint regarding database use, see sui generis for example. --Psychoslave (talk) 08:50, 27 August 2017 (UTC)[reply]

I didn't read the whole thread before my previous message, so I didn't saw the message from @Denny: about sui generis laws. Thus said, I didn't saw any clear decision about this in the thread and according to this phabricator ticket it looks like the WMDE team favor the CC-0 path. I didn't saw any community consultancy on the topic, but it's perfectly possible I just missed it if it did happened. Personally, to make my opinion/bias very clear, I dislike the CC-0 terms. To put it shortly, I'm fine working for freedom, I'm against anyone exploitation for free on works whose derivative have no watchdog or balance against arbitrary exclusion. That's my view, and that's a personal hindrance to Wikidata contribution within it's main namespace. For more developed debate on the subject see [1] and [2], and included links for more debate on the topic. Of course I also encourage you to read more on the topic with you own research, and more than all, think by yourself. :) So, as far as I'm concerned, using CC-0 for this lexeme space would imply I most likely only reluctantly contribute in sparse occasions, where a copylefted database namespace would inspire me with the greatest interest and motivation. I would prefer a license which enable to import existing Wiktionary material, but if that would really sound unpractical, then a copyleft suiting database like ODBL used by open street map would be fine for me. I would be interested to see conducted a widely communicated (banner, mailling-list, and so on) community consultancy on the subject or, failing that, feedback from the Tremendous Wiktionary User Group, that is Noé, Benoît Prieur, Delarouvraie, Lyokoï, Jberkel, psychoslave, Lydia Pintscher, Thiemo Mättig, Daniel Kinzler, Epantaleo, Ariel1024, Otourly, VIGNERON, Shavtay, TaronjaSatsuma, Rodelar, Marcmiquel, Xenophôn, Jitrixis, Xabier Cañas, Nattes à chat, LaMèreVeille, GastelEtzwane, Rich Farmbrough, Ernest-Mtl, tpt, M0tty, Nemo_bis, Pamputt, Thibaut120094, JackPotte, Trizek, Sebleouf, Kimdime, S The Singer, Amqui, LA2, Satdeep Gill, Micru, Vive la Rosière, Malaysiaboy and Stalinjeet. --Psychoslave (talk) 13:43, 27 August 2017 (UTC)[reply]

Personally I'm a copyleft person and I agree CC-0 is absolutely inappropriate for dictionary definitions. Writing a good definition for a word is an intensely creative work. Those advocating for CC-0 are probably only considering the non-definition parts of a dictionary entry (like links to related words) and/or are assuming that we will never ever integrate the existing Wiktionary entries/definitions into this future project.
It's entirely possible to make a CC-0 thesaurus without any creative content; if that's the objective, however, don't call it a dictionary and make it clear it will never replace Wiktionary entries in any meaningful sense. Nemo 13:59, 27 August 2017 (UTC)[reply]
@Psychoslave, Nemo bis: if you look at the prototype mock-up website, you can see that there is no real definitions just glosses (in part because of the licence). My personal point of view is that CC0 could be complicated but in the long run, it is the less worst solution and the best trade-off that brings more advantages than disadvantages. Cdlt, VIGNERON (talk) 15:52, 27 August 2017 (UTC)[reply]
Without more details on complications and trade-off you have in mind @VIGNERON:, it's hard to judge anything taking into account your perspective. The prototype is, well, a proptotype. I don't expect it to include everything planned. If you look at the model, there are meaning information which are planned as integrated. --Psychoslave (talk) 21:10, 27 August 2017 (UTC)[reply]
@Nemo_bis:, even "stupid list of related words" do require time and reflection, and to my mind that's what is intended to be protected with temporary monopoly on intellectual works, not their creativity. Take synonyms for example. As far as the French Wiktionary is concerned, we have much improvement to do (that might happen in the following months thanks to @Lyokoï:). For now, good databases on this subject already exits, but none under a FLOSS license as far as I know. So we can't simply download them (or crawl their web interface) and populate le Wiktionnaire with them. I do understand that with CC-0 you have more freedom and less legal problem interfering in automated operations. But in my ethic freedom is valuable when it's used in actions which are concurrently aiming equity. CC-0 is all about freedom, and nothing about equity, and I just can't adhere to that. --Psychoslave (talk) 21:10, 27 August 2017 (UTC)[reply]
@Psychoslave: Sorry to be a hero for you. XD
For all. I don't want a CC0 licence for wiktionary. Our work change the way of lexicography, we need to force linguists, lexicographers, and other languages scientists to open their databases, dictionaries and others lexicons and datas... A CC0 licence can't do that. Lyokoï (talk) 22:10, 27 August 2017 (UTC)[reply]
I am happy to see this topic alive again. This month, I made some lexicographic thesauri (this one, not that one) and it is definitively a creative work. During this last years I worked on a dictionary for a PhD in linguistics and I had to write definitions and glosses. I though both were creative work. It is not that easy to point out the exact meaning of a concept and to write a short gloss that is significant in several languages. Well, I also agree with Lyokoï argument about linguists and researchers. People that spent years on building a specific database do not want to see it lost in a huge database without mentioning their names (but names of the uploader instead). It is still an important issue to discuss. Noé (talk) 10:26, 28 August 2017 (UTC)[reply]
This is an interesting discussion. While I personally am a person that supports CC0, I see some valid points against CC0 as well. --Satdeep Gill (talk) 02:11, 29 August 2017 (UTC)[reply]

Looking at example provided in the slides for Wikimedia, I see that within potential users, most of them (if not all) are businesses which don't even have friendly policy toward free license. Personally that strenghten my fears toward CC-0 use for the lexical space of Wikidata. Could @User:Lea Lacroix (WMDE): and @Lydia Pintscher (WMDE): give us new feedback? The last message on this topic from Lea was "We're still trying to figure out this problem, asking for advices and reading documentation, and open to discussion." in september 2016. --Psychoslave (talk) 14:03, 1 September 2017 (UTC)[reply]

Talk during Wikimania

Hello all,

If you're attending Wikimania, there will be a talk introducing the project and showing a first demo of the lexicographical data! This will happen on Sunday 13th, at 12:00, room Joyce/Jarry.

See you there, Lea Lacroix (WMDE) (talk) 14:01, 1 August 2017 (UTC)[reply]

@Lea Lacroix (WMDE): thanks for sharing this information. Will there be some videos of the presentations in order we can watch them even if we cannot attent to Wikimania 2017? Pamputt (talk) 15:09, 1 August 2017 (UTC)[reply]
Unfortunately I don't know. It depends on the Wikimania organizers. We'll do our best, in any case the slides will be shared. Lea Lacroix (WMDE) (talk) 08:22, 2 August 2017 (UTC)[reply]

Some Wikimania 2017 videos are already available, but Wiktionnary and Wikidata are not yet there at the moment. --Psychoslave (talk) 14:10, 1 September 2017 (UTC)[reply]

Phrases

Will they be modelled as lexemes? Or some other type of items? --Infovarius (talk) 09:52, 23 August 2017 (UTC)[reply]

I would expect them to be lexemes. You might also find my answer at Wikidata talk:Wiktionary/Development/Proposals/2015-05#How_are_collocations.2C_idioms.2C_phrasal_verbs.2C_etc._going_to_be_handled.3F useful. - Nikki (talk) 11:18, 23 August 2017 (UTC)[reply]
Yes, they can be handled just like lexemes :) Lea Lacroix (WMDE) (talk) 11:59, 23 August 2017 (UTC)[reply]
More precisely, I'd say that phrase can be hand handled as lexeme but only some relevant phrase should be stored (for instance all the locutions that already are in the wiktionaries and sometime even already in Wikidata like Alea iacta est (Q271723)). Cdlt, VIGNERON (talk) 16:06, 27 August 2017 (UTC)[reply]

Hi @Infovarius:, I proposed an other model draft, because I had similar concerns. Your feedback would be welcome. --Psychoslave (talk) 14:21, 1 September 2017 (UTC)[reply]

Someone please help the muggles

My experience with wiktionary today: "Warning: Wikidata's notability policy does not allow links to Wiktionary entries unless the interlanguage links cannot be automatically provided. By clicking on "save", you confirm that this is the case. In general, connecting Wiktionary words to Wikidata concepts is not correct." Which links me to Wikidata:Notability, which basically says nothing and links me to Extension:Cognate which is inexplicable and then I ended on this page, and i'm still none the wiser... TheDJ (talk) 09:45, 7 September 2017 (UTC)[reply]

What are you trying to connect? Pages about words in Wiktionary should already be connected automatically between the different language editions of Wiktionary (that's what Cognate does). Pages about words in Wiktionary should, in general, not be connected to Wikidata items. A better support for the words in Wiktionary is being developed, and that should resolve the issues, until then we are in this slightly unfortunate situation which is likely a bit confusing. But it should get better soon! :) But let's talk about concrete examples: what exactly is it that you're trying to do? --Denny (talk) 16:25, 7 September 2017 (UTC)[reply]
I'm trying to create some sort of linking between nl:Grabbelton, wikt:nl:grabbelton and wikt:en:lucky dip and then failed in a technical maze that i didn't want to be exposed to :) TheDJ (talk) 14:57, 8 September 2017 (UTC)[reply]
Nice example: as said, Wikidata is currently not well suited for this kind of connection between words and meanings. Right now, the connection between wkt:en:lucky dip and wikt:nl:grabbelton should be through the Vertalingen table (see here for example), the connection between wikt:en:lucky dip and wikt:nl:grabbelton through the corresponding translation table (see here for example), and from English Wikipedia using one of these templates - I am sure nl Wikipedia has similar ones. In general, as said, Wikidata is not doing great with this task yet, so it currently relies on the old pre-Wikidata way to do so, but this should change within the next few months. --Denny (talk) 15:15, 8 September 2017 (UTC)[reply]
English Wiktionary has begun tagging individual senses with wikt:Template:senseid tags, which (among other things) can specify the Wikidata item that matches the thing the sense refers to. For example, on wikt:Paris, the English section includes various senses for things called Paris, and where possible a Wikidata id has been included. If there is a Wikidata item for a lucky dip, the same thing can be done at the wikt:lucky dip and wikt:grabbelton entries. However, are these actually the same thing? They're related concepts, but Dutch grabbelton refers specifically to some kind of barrel or container, while the Wiktionary sense for lucky dip implies that it's a game rather than a physical object. This kind of nuance is important when working with Wikidata. CodeCat (talk) 18:02, 8 September 2017 (UTC)[reply]
At the very least, a sense id for d:Q2045567 can be added to wikt:grabbelton. CodeCat (talk) 18:03, 8 September 2017 (UTC)[reply]
Sounds great! Thanks for pointing the sense link out. --Denny (talk) 20:48, 8 September 2017 (UTC)[reply]

Access to Wikidata data enabled on English Wiktionary

Hello,

For your information, a few months ago, the English Wiktionary requested to have access to Wikidata data from their wiki. This is now done. You can have a look at their working page and the different announcements, example, documentation page. This was also the starting point of an interesting discussion on Wikiproject:Books.

We hope that this new feature will allow the community to start playing with Wikidata data, getting familiar with the structure, while waiting for further development of lexicographical data.

After this first experimentation, we're of course open to deploy this on other Wiktionaries, if the community asks for it.

Cheers, Lea Lacroix (WMDE) (talk) 15:27, 7 September 2017 (UTC)[reply]