Wikidata:Lexicographical data/Documentation

From Wikidata
Jump to navigation Jump to search

Other languages:
العربية • ‎dansk • ‎Deutsch • ‎English • ‎español • ‎français • ‎日本語 • ‎한국어 • ‎polski • ‎русский • ‎Türkçe • ‎中文

Overview

 

Documentation

 

Development

 

Tools

 

Support for Wiktionary

 

How to help

 

Discussion

 

Wikidata:Lexicographical data

This is the main documentation page for lexicographical data on Wikidata. Since the new data system is not deployed yet, this documentation is incomplete and mostly based on the test system.

See also the technical documentation on extension WikibaseLexeme.

Introduction[edit]

Data Model[edit]

Visualization of the Lexeme data model

The data model of WikibaseLexeme describes the structure of the data that is handled as "Lexemes" in Wikibase. The text below is a summary; for more detailed information, see Extension:WikibaseLexeme/Data Model.

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

  • An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
  • A Lemma for use as a human readable representation of the lexeme, e.g. "run".
  • The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. English (Q1860).
  • The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. adjective (Q34698).
  • A list of Lexeme Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)
  • A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
    • An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
    • A representation, spelling out the Form as a string.
    • A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. participle (Q814722) for participle.
    • A list of Form Statements further describing the Form or its relations to other Forms or Items (e.g. IPA transcription (P898), pronunciation audio, rhymes with, used until, used in region)
  • A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information (not available in the user interface on wikidata.org as of 24th May 2018):
    • An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
    • A Gloss, defining the meaning of the Sense using natural language.
    • A list of Sense Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

This data model is further extended by the set of properties typically used for Lexeme statements, Form statements, and Sense statements. See Wikidata:Lexicographical data/Properties for an overview of these properties and Wikidata:Property proposal/Lexemes for current proposals of additional properties.

Sample Lexeme by Language and Lexical Category
verb noun pronoun adjective adverb preposition postposition conjunction interjection numeral determiner
Arabic ذهب كتاب انا جميل في لكن أحد هذا
English go book I beautiful usually in but Ouch one this
Pashto تلل کتاب زه ښکلی په خو یو
Persian رفتن کتاب من زیبا در را اما آخ یک این

Interface[edit]

Lexeme[edit]

Screenshot of the Lexeme creation page
Create a new Lexeme
  1. Go to Special:NewLexeme
  2. Enter a lemma (dictionary form of a word)
  3. Enter the language of the lexeme by typing the name of the language or Q-ID
  4. In the field that appears above, enter the language code of the lemma
  5. Enter the lexical category by typing its name or the Q-ID (example: verb, noun, adjective...)
  6. Click on "Create"
  7. The Lexeme is now created with this basic information, you can continue editing it
Screenshot of the top of a Lexeme page
Edit a Lexeme
  1. Click on the edit button, next to the lemma
  2. Edit the content of the different fields
    • Lemma
    • Language code of the lemma
    • Language of the Lexeme
    • Lexical category
  3. Click on "save"
Screenshot of the interface to edit a statement
Add, edit or delete statements of a Lexeme
  1. To add a statement of a Lexeme, click on "add statement"
  2. Enter a property: start typing its name in the property field (example: derived-from) and select it in the suggester
  3. Enter a value
  4. Just like on Items, you can add qualifiers and references
  5. Save by clicking "save"
  6. To edit a statement, click on "edit"
  7. To delete a statement, click on "edit", then "remove"
Delete a Lexeme

To be done.


Search for a Lexeme

Currently, Special:Search and the search field on the top-right corner of pages are not working. We will make it work very soon. In the meantime, here's a hack to look for a Lexeme:

  1. go to the sandbox
  2. add or edit a statement with the property Sandbox-Lexeme (P5188)
  3. type the lemma that you're looking for in the value field. Search is working here, you will see the lexeme displayed with the language and lexical category


Form[edit]

add a Form
Create a new Form
  1. In the Forms section, click on "add Form"
  2. Fill the representation (mandatory)
  3. Fill the language code of the representation (mandatory)
  4. Enter one or several grammatical features, by typing their name and selecting them in the list of items
Edit a Form
  1. Click on the edit button next to the representation
  2. Modify the content in the fields
  3. Click on save
Delete a Form
  1. Click on the edit button next to the representation
  2. Click on Remove

Features[edit]

See also: Wikidata:Lexicographical data/Development

What is included in the first version[edit]

  • New datatypes: Lexeme, Form
  • Add, edit, delete Lexemes
  • Add, edit, delete Forms
  • Add, edit, delete statements
  • Add, edit, delete qualifiers
  • Add, edit, delete references
  • Linking to an Item from a Lexeme or a Form
  • Linking to another Lexeme from a Lexeme, a Form or an Item
  • Search and suggestions when entering a value
  • Basic internal APIs (used for UI, you should not use them)

What will be added in the future[edit]

Ordered from near to long-term plans

  • Search for content with Special:Search
  • Display the lemma in the history pages, recent changes and watchlist
  • Add, edit, delete Senses (Senses will not be included in the first version)
  • RDF support and ability to query the data on query.wikidata.org
  • Better API support
  • Automatic generation of Forms
  • Data access on clients (other Wikimedia projects)
  • Editing data directly from Wiktionary

See also[edit]