Wikidata:Property proposal/word stem type

From Wikidata
Jump to navigation Jump to search

Zaliznyak word stem class[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the type of word stem
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesstring
Allowed unitsnumbers
Example 1вода (L189) → 1
Example 2магия (L57919) → 7
Example 3Земля (L34843) → 2
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), inflection class (P5911)

Zaliznyak stress pattern[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the word's stress pattern
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesstring
Allowed unitsletters
Example 1вода (L189) → d
Example 2магия (L57919) → a
Example 3Земля (L34843) → d
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), inflection class (P5911)

Zaliznyak the alternation of the vowel[edit]

Originally proposed at Wikidata:Property proposal/Lexemes

   Not done
DescriptionDescribes the alternation of the vowel
RepresentsZaliznyak's сlassification (Q66148413)
Data typeString
DomainLexemes
Allowed valuesyes or no
Example 1вода (L189) → no
Example 2магия (L57919) → no
Example 3Земля (L34843) → yes
Sourcewikt:ru:Викисловарь:Использование_словаря_Зализняка
Planned usePlanned filling out Russian lexemes
Robot and gadget jobsIt is planned to write a bot that will update these properties for Russian lexemes.
See alsoword stem (P5187), inflection class (P5911)

Motivation[edit]

It is planned to fill in tokens from the Wiktionary. Since the basis for classifying lexemes in the Russian Wiktionary is the Zeleznyak classification, several properties are needed for support. If this property is approved, then the following properties will also be necessary: the stress position, the alternation of the vowel. This property must be specified as a qualifier for Zaliznyak's сlassification (Q66148413), which is a statement inflection class (P5911). Iniquity (talk) 20:56, 5 August 2019 (UTC)

Useful links:

Discussion[edit]

Pictogram voting comment.svg Comment the data type should probably be "item", with items created for each word stem type. ArthurPSmith (talk) 17:53, 6 August 2019 (UTC)
In Russian, these are only numbers, and there are generally about 8 for each part of the speech, do you think it is worth creating separate elements? Or do other languages have the same property? Iniquity (talk) 17:59, 6 August 2019 (UTC)
Pictogram voting comment.svg Comment If it's just for Zaliznyak's classification, the label should probably reflect that. --- Jura 18:02, 6 August 2019 (UTC)
Yes, I thought about it, but if several languages are allowed to have a similar classification, it might be better to use a common property? Or is it better to first create a less general one, and then change it? Iniquity (talk) 18:04, 6 August 2019 (UTC)
Changing it around once created isn't really a good idea. It's generally easier to build and maintain a property with a clear scope. If one language has different ways of classifying the same, this might not work out well in a single property as multiple values in a single property are harder to maintain (at least, that's my view). --- Jura 14:06, 7 August 2019 (UTC)
I think you're right, the more it is not known whether similar typing exists in other languages. What do you think, if we call it"word stem type according to the classification of Zelensky", will it be okay? Iniquity (talk) 17:30, 7 August 2019 (UTC)
Maybe it can be done shorter, but I'm not really the best person to ask about the terminology and translation to use. --- Jura 11:35, 8 August 2019 (UTC)
I think something like that. Iniquity (talk) 15:24, 8 August 2019 (UTC)
Using multiple value types in the same property make processing data a bit harder. If the property is specifically Zaliznyak's, it shouldn't be used for anything else -- e.g. we don't have a property "book ID", we have a property "ISBN-10" -- because there is a clear expectation of the data it will have. Of course this is more relevant to a free-form string type. Items can have their own instance-of/subclass-of, but still querying for such data is always messier & slower. --Yurik (talk) 15:29, 8 August 2019 (UTC)
Per chat discussion - I think it would be far better to have values be items rather than strings. There are very few and well defined types (1-8 for first, a-d for second), plus each type could be good in describing what it is in multiple languages + statements. The remaining question is how many of these properties should we need, and if that property(s) should be one or multiple (e.g. the 1-8 are set on one property, the a-d on another), or should there be just one property, but with multiple values. Also if it is only one, should it be Zaleznyak specific, or should it be applicable to all languages that have word classification system (which most probably do). So to sum up, how about all-language word classifier property with multiple values, e.g. "word classification" = "Zaleznyak class 1", "Zaleznyak type a" (two values for the same property, property is not Zaleznyak specific). --Yurik (talk) 21:06, 8 August 2019 (UTC)
  • Not sure how good Annexe:Classification de Zaliznyak is, but based on that, it might easily fit into inflection class (P5911) and (for the last part) conjugation class (P5186). --- Jura 18:15, 10 August 2019 (UTC)
    • @Jura1:: Thanks for your answer. I think we can decline this task presently. Can you join the discussion? Iniquity (talk) 19:18, 10 August 2019 (UTC)
  • Pictogram voting comment.svg Comment My opinion is to use one property with item value type. --Infovarius (talk) 12:51, 12 August 2019 (UTC)
  • The description currently doesn't give any indication of what the Zeleznyak classification is about. Can you add something that makes the concept understandable to people who aren't familiar with it? ChristianKl❫ 13:37, 13 August 2019 (UTC)
    • thx, updated motivation part with some links. --15:12, 13 August 2019 (UTC)
      • I feel like the information should be in the description of the property, as that's where people who see the property will look to understand it's meaning. ChristianKl❫ 10:02, 14 August 2019 (UTC)
  • Does anyone know if it would be possible to restrict inflection class (P5911) values to only those that have their language equal to the language of the current lexeme? For example, if it is an English lexeme, people should not be adding P5911 = "Zeleznyak's type 7a classification" - because that item would be applicable to Russian words only. If it is not possible, we would have to use a dedicated property instead of P5911. --Yurik (talk) 00:46, 14 August 2019 (UTC)
  •  Not done. We decided that we would not use this scheme. Iniquity (talk) 15:32, 14 August 2019 (UTC)