Вікідані:Лексикографічні дані/Кращий досвід
Ця сторінка слугує сховищем найкращих практик, створених з часом різними авторами лексем, часто після їх опису на інших форумах. За бажанням їх можна обговорити на сторінці обговорення цієї сторінки.
Чи має бути лексема для цього?
- Мають бути певні докази існування лексеми в мові на момент створення цієї лексеми у Вікіданих.
- Чим краще задокументована мова в цілому, тим більше вищезазначене треба розглядати як вимогу, а не просто як найкращу практику.
- As a result, for languages like English, French, Spanish, Mandarin, Russian, and Arabic—that are supported by nation-states and that, by virtue of being used to communicate all sorts of information among very large groups of people, are expected to have diverse vocabularies—this should be taken as obligatory regardless of one's fluency in that language.
- For less well-documented languages like Breton, Sindhi, Acehnese, and Guarani, this remains merely a strong recommendation: once a resource is found for that language, attempts should be taken to use it as evidence for as many existing lexemes in that language as possible.
- For even less well-documented languages like Skolt Sami, Igbo, Angika, and Cia-Cia, this is much less binding even as a recommendation—especially when you are a native speaker of that language and can thus vouch for the use of a particular lexeme in your language community.
- The evidence for the existence of a lexeme may be indicated in a number of ways:
- through adding an external identifier to a lexical resource which describes the lexeme in question; or
- through adding a described by source (P1343) statement pointing to a resource where the lexeme is described (qualified with page(s) (P304) or other specifiers of where in that resource that description occurs); or
- through adding a reference URL (P854) statement pointing to an online resource where the lexeme is described (the most specific URL for this if possible, or with the same sorts of qualifiers that might be used for P1343); or
- through adding a usage example (P5831) statement demonstrating use of that lexeme in some external source (where this source is provided as a reference); or
- through adding a gloss quote (P8394) statement on one of the lexeme's senses providing how the corresponding meaning of that lexeme is expressed in some resource (this resource provided as a reference).
- Чим краще задокументована мова в цілому, тим більше вищезазначене треба розглядати як вимогу, а не просто як найкращу практику.
- In general, while individual words that aren't merely inflections of other words might warrant lexemes, non-idiomatic phrases typically do not warrant them, since they may be treated as the sum of their parts.
- This does not necessarily discount the addition of non-idiomatic meaning senses to lexemes which do have idiomatic meanings, however, and which have those idiomatic meanings as senses already.
Lemmata
- The lemma of a lexeme should ideally be the representation of that lexeme that is provided in a dictionary. What representation this is will generally depend on the lexeme's language and lexical category.
- Take Indo-European languages: for nouns and adjectives, this may reflect some combination of nominative case, singular number, and masculine gender; for verbs, this may be the infinitival or verbal noun form.
- Other languages may present lemmata differently, for which a non-exhaustive list is given below:
- An Arabic verb generally uses the masculine third-person singular perfect active indicative as a lemma ('كَتَبَ' for 'to write').
- A Korean verb generally uses the verb stem followed by the dedicated citation suffix '-다' ('가다' for 'to go').
- An isiZulu verb generally uses the verb stem on its own, including the final vowel 'a' ('shaywa' for 'to be struck').
- If there are multiple scripts in which a language is generally written, it is desirable for the lemma to contain a representation for each script.
- Where a correspondence in representation exists between multiple related scripts, repeating that correspondence may not be necessary.
- For those Mandarin lexemes which have not been affected by character simplification, a single lemma with code 'zh' suffices.
- For those Esperanto lexemes which do not change under 'hsistemo' or 'xsistemo', a single lemma with code 'eo' suffices.
- Where a correspondence in representation exists between multiple related scripts, repeating that correspondence may not be necessary.
Лексичні категорії
- In general, a instance of (P31) value on a lexeme should be more specific than the lexeme's lexical category.
- Thus if abbreviation (Q102786) is a lexical category, there is no need to re-add it as a P31.
Твердження лексем
Похідні
Скорочення повинні кваліфікуватися derived from lexeme (P5191) з mode of derivation (P5886) acronym (Q101244) (див. ffs (L406751)).
Форми
Щоб допомогти встановити існування та використання лексеми, слід посилатися принаймні на одну форму - можливо, на твердження usage example (P5831), кваліфіковане за допомогою subject form (P5830) [форма, про яку йдеться], або на інше твердження (described by source (P1343), attested as (P7855) або attested in (P5323) - можливі інші властивості). Мета полягає в тому, щоб всі форми були завірені або посилалися принаймні на одну дату, бажано, щоб ці дати були різними за роками.
Смисли
Переклади
- It is generally good practice to avoid adding translation (P5972) between every pair of possible translations.
- If there is a path of P5972 statements between two senses, that is enough to establish a link between them.
- If every such word is connected to the same item via item for this sense (P5137), then the P5137 links are also enough to establish links between them.
- This is analogous to not stating directly that Manhattan (Q11299) located in the administrative territorial entity (P131) New York (Q1384) (we can infer that through it being P131 New York City (Q60) first), and similarly not stating directly that art museum (Q207694) subclass of (P279) museum (Q33506) (we can infer that through it being P279 museum of culture (Q28737012) first).