User:Lea Lacroix (WMDE)/DraftDoc

From Wikidata
Jump to navigation Jump to search

Draft for Lexicographical data/Documentation page.

Template work in progress?

This is the main documentation page for lexicographical data on Wikidata. Since the new data system is not deployed yet, this documentation is incomplete and mostly based on the test system.

See also the technical documentation on extension WikibaseLexeme.

Introduction[edit]

Data Model[edit]

visualization of the Lexeme data model

The data model of WikibaseLexeme describes the structure of the data that is handled as "Lexemes" in Wikibase. Here is a summary, for more detailed information, see mw:Extension:WikibaseLexeme/Data Model.

A Lexeme is a lexical element of a language, such as a word, a phrase, or a prefix (see Lexeme on Wikipedia). Lexemes are Entities in the sense of the Wikibase data model. A Lexeme is described using the following information:

  • An ID. Lexemes have IDs starting with an "L" followed by a natural number in decimal notation, e.g. L3746552. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Lexeme.
  • A Lemma for use as a human readable representation of the lexeme, e.g. "run".
  • The Language to which the lexeme belongs. This is a reference to a concrete Item, e.g. Q1860 for English.
  • The Lexical category to which the lexeme belongs. This is given as a reference to a concrete Item, e.g. Q34698 for adjective.
  • A list of Statements to describe properties of the lexeme that are not specific to a Form or Sense (e.g. derived from or grammatical gender or syntactic function)
  • A list of Forms, typically one for each relevant combination of grammatical features, such as 2nd person / singular / past tense. A Form is described using the following information:
    • An ID. Forms have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "F", followed by a natural number in decimal notation: e.g. L3746552-F7
    • A representation, spelling out the Form as a string.
    • A list of grammatical features that define for which syntactic role the given form applies. These are given as references to a concrete Items, e.g. Q814722 for participle.
    • A list of Statements further describing the Form or its relations to other Forms or Items (e.g. pronunciation audio, rhymes with, used until, used in region)
  • A list of Senses, describing the different meanings of the lexeme (e.g. "financial institution" and "edge of a body of water" for the English noun bank). A sense is described using the following information:
    • An ID. Senses have IDs starting with the ID of the Lexeme they belong to, followed by a hyphen ("-") and an "S", followed by a natural number in decimal notation: e.g. L3746552-S4. These IDs are unique within the repository that manages the Lexeme. The ID can be combined with a repository's concept base URI to form a unique URI for the Sense.
    • A Gloss, defining the meaning of the Sense using natural language.
    • A list of Statements further describing the Sense and its relations to Senses and Items (e.g. translation, synonym, antonym, connotation, register, denotes, evokes).

Interface[edit]

Tbd: include screenshot of the Lexeme interface.

Lexeme[edit]

Create a new Lexeme
Edit a Lexeme
Add information in a Lexeme
Delete information in a Lexeme
Delete a Lexeme

Form[edit]

Create a new Form
Edit a Form
Add information in a Form
Delete information in a Form
Delete a Form

Features[edit]

What is included in the first version[edit]

  • Add, edit, delete Lexemes
  • Add, edit, delete Forms
  • Add, edit, delete statements
  • Add, edit, delete qualifiers
  • Add, edit, delete references
  • Search for content in the search field and value field
  • Linking to a Lexeme or a Form from an Item
  • Basic internal APIs

What will be added in the future[edit]

Ordered from near to long-term plans

  • Add, edit, delete Senses (Senses will not be included in the first version)
  • RDF support and ability to query the data on query.wikidata.org
  • Better API support
  • Automatic generation of Forms
  • Data access on clients (other Wikimedia projects)
  • Editing data directly from Wiktionary