Wikidata talk:Lexicographical data/Archive/2017/01

From Wikidata
Jump to navigation Jump to search
This page is an archive. Please do not modify it. Use the current page, even to continue an old discussion.

Formatted text in definitions?

It is quite common in Wiktionaries for the definition to be either a link to another word, or to contain links to other words. Is that foreseen in the data model? Will it be possible to add links to other lexemes?--Micru (talk) 12:38, 28 November 2016 (UTC)[reply]

No, the data model does not foresee this. --Denny (talk) 17:25, 28 November 2016 (UTC)[reply]
@Denny: Why not? Most of the definitions in all wiktionaries (to be verified) contain links to other words/lexemes, wouldn't make sense to keep that? I thought that somehow T141764 was related to this.--Micru (talk) 07:44, 29 November 2016 (UTC)[reply]
So do Wikipedia descriptions. And yet Wikidata descriptions have no links. It makes reusing the data much harder for everyone. --Denny (talk) 18:18, 29 November 2016 (UTC)[reply]
Yet the difference is that descriptions stored in Wikidata are not used in the body of the Wikipedia article, but in Wiktionary definitions are part of the content. Unless the definitions are only for internal disambiguation in Wikidata, then it doesn't matter much. --Micru (talk) 08:09, 30 November 2016 (UTC)[reply]
Hello Micru, thanks for your question.
In the first steps of the project, we won't provide formatted text in definitions. We have both technical and reusability concerns about this, so we will first try the feature without formatted text, then, after a phase of experimentation, and of course discussing with you, we will see what the user cases are, if your needs can me met in another way, or if not we should allow formated text in definitions. Lea Lacroix (WMDE) (talk) 13:13, 30 November 2016 (UTC)[reply]
@Lea Lacroix (WMDE): This causes redundancy, as it creates a de-facto split version of Wiktionary whose definitions won't be synchronized with any of the Wiktionaries unless they choose to give up formatting of their definitions too, which I consider improbable. For the prototype non-formatted definitions are fine, sure.--Micru (talk) 14:31, 30 November 2016 (UTC)[reply]
It might in the end even be helpful if Wikidata itself does not allow links in descriptions. These links have different roles: sometimes they help users to find helpful, but not essential additional information. In these cases it seems OK to me that the decision to link or not to link can be made by each Wiktionary as an editorial choice of the community. In other cases it may be valuable when synonyms, hypernyms etc. are defined as explicit properties and not just show up as links within a description. --MarcoSwart (talk) 22:58, 30 November 2016 (UTC)[reply]
There has already been some demand for formatted text in Wikidata itself (e.g. here). I don't know what the technical concerns are. For reusability, could it not be restricted to certain types of formatting and provide a plain text version automatically? Where would be the best place to collect use cases (for both Wikidata now and future Wiktionary support)? - Nikki (talk) 23:17, 30 November 2016 (UTC)[reply]
@Nikki: I would be interested by these use cases, you can start a new section here or create a sub-page for exemple :) Lea Lacroix (WMDE) (talk) 10:06, 1 December 2016 (UTC)[reply]
A gloss is not a definition. A gloss is a set of words that uniquely identify a definition. Definitions with formatted text have no place on Wikidata, for the same reason we don't have a property for "Wikipedia article intro paragraph".
Wikidata is not duplicating or relocating Wiktionary, just as it's not duplicating or relocating Wikipedia. Wikidata is a repository for structured data, and is serving those purposes to the various Wikimedia projects that have use of it. Certain parts of Wiktionary can be structured data, such as pronunciation, inflection, -nym relations, rhymes, lexical categories, translations, and several others. Certain parts, like usage notes and definitions, are definitely not structured and would not be on Wikidata. Some parts may blur the line between structured and unstructured somewhat.
If we were going with the assumption that everything's better on Wikidata by default, whether prose or wikitext or images, well, I think most would agree that would not go well. --Yair rand (talk) 17:33, 1 December 2016 (UTC)[reply]
@Yair rand: "A gloss is not a definition" is your own interpretation. If you go to Wikidata:Wiktionary you will see that so far it has been stated that a lexeme on Wikidata would have "1 Gloss per language (=definition)". If gloss should not be equal to definition, then it should be clarified on that page. However, even if you make that distinction, in practice it will mean duplication of effort, and more maintenance overhead. I am not saying that everything should go on Wikidata, however in this case the definition *is* going on Wikidata. It is not a matter of discussing if it will be in a longer or a shorter form, it is a matter of seeing if it can be compatible with the current situation, or if it will be independent.
It would be appropriate to think ahead and devise some sensible workflow to work with lexical data from Wiktionary. As a wiktionarian, how do I connect a lexeme on Wiktionary with a lexeme on Wikidata? Do I have to write the definition on Wiktionary, then come to Wikidata and write again the information and then link both somehow manually? Or perhaps can it be automated with Visual editor templates so I don't need to visit Wikidata and the information is compatible both ways?--Micru (talk) 10:07, 2 December 2016 (UTC)[reply]
Probably @Micru: is correct to suggest clarification of the text he cited. As far as I understand from the more detailed proposal a Lexeme could have one or more Senses each having a single Gloss per language: a "multilingual text, like a Label or Description for Items". I would view "description" as the more general term. One type of description is a gloss: a short characterization of the meaning. Another type is a definition: careful wording aimed at specifying the meaning as unambiguously as possible. Other forms of description are possible. For instance, a synonym: another word expressing the same sense, or a link to a Wikidata-item. If this approach ia acceptable, a Gloss is not (necessarily) a definition and it does not contain wikitext. I have explained above why this could be useful. And as a Gloss has to be a multilingual text, I can imagine making them wikitext too would make things really complex. But to address the problem Micru mentioned I wonder if it would be possible to always have a Gloss, but if needed a Statement with a Language qualifier pointing to a wikitext description too.
I could imagine us starting with four workflows. The first two: simply editing a Wiktionary and simply editing Wikidata. The second two: collecting edits over a specified period on Wikidata and using them as a work list to improve a Wiktionary and the other way around. It then would depend on community decisions how often and how "automatic" these improvements would be made. The choice can be "very often and mostly automatic in both directions" but there is nothing inherently wrong with a less frequent, more manual and unidirectional workflow if that is the choice of the community. --MarcoSwart (talk) 00:57, 3 December 2016 (UTC)[reply]
  • @Micru: Workflow: A page is created on Wiktionary, with the definition written in wikitext on the definition line of the Wiktionary page. Some other data, such as synonyms and translations, have a "gloss" (which is a short summary/identifier of the definition) attached to it, to associate it with a particular definition. The gloss is written up manually.
This is the current workflow, where editors must add it in a somewhat structured format using templates. The workflow described also adequately describes what I think should be how it works using Wikidata. The data structure matches currently stored data, which would make it easier to transition both existing content and software. Whether the tools will be an extension to VisualEditor, a modification of WT:EDIT tools and similar, a new dedicated semantic editor, or some kind of synchronizing wikitext-simulating data interface, I strongly suspect that using the existing sense identifiers instead of duplicating the frequently-modified wikitext-based full definitions would substantially reduce overhead, not increase it. (I also really hope that none of this involves having to routinely go between projects to get anything done.) --Yair rand (talk) 19:46, 15 December 2016 (UTC)[reply]

A previous portion of these discussions may be usefully repeated here:

  • sense: a conventional use of a word/phrase.
  • definition: a single expression of that precise use.
  • gloss: a set of words to uniquely identify a sense, usually an over-simplification of the definition.

The gloss does not adequately define the term; it is usually too brief. It is, instead, used as a human-readable identifier, for example in a list of translations it direct the reader to locate translations of a specific sense of the term. A couple of examples:

contract
definition: noun: An agreement between two or more parties, to perform a specific job or work order, often temporary or of fixed duration and usually governed by a written agreement.
definition: verb: To draw together or nearer; to shorten, narrow, or lessen.
 
gloss: noun: agreement that is legally binding
  • French: contrat
  • German: Vertrag
gloss: verb: intransitive: draw together; shorten; lessen
  • French: se rétracter, se recroqueviller
  • German: zusammenziehen, kontrahieren

- Amgine (talk) 23:45, 15 January 2017 (UTC)[reply]