Wikidata:WikiProject Languages/Data model

From Wikidata
Jump to navigation Jump to search

This is a draft description of how items for languages are modelled in Wikidata.

Labels[edit]

Labels should not include words like "language" when it is used for disambiguation. Labels in Wikidata do not have to be unique and are not expected to be identical to the linked Wikipedia article names.

More information: Help:Label

Examples:

German (Q188)
In English, the label is "German" because language names do not normally include "language". People say things like "They're learning German", and not "They're learning German language". The English Wikipedia page is "German language" but this is necessary because page names have to be unique and "German" is a disambiguation page.
In German, the label is "Deutsch" (German) for the same reason as English. The German Wikipedia page is "Deutsche Sprache" (German language) and "Deutsch" is a disambiguation page.
In Japanese, the label is "ドイツ語" (literally: Germany language), because language names are typically formed by adding "語" (language) to a location. This is not disambiguation because without "語", it would mean something different.
American Sign Language (Q14759)
In English, the label is "American Sign Language" because "Sign Language" is normally part of the name of a sign language. People typically say things like "They're learning American Sign Language", and not "They're learning American Sign" or "They're learning American". Since it's part of the name, the words are capitalised.

Descriptions[edit]

Descriptions should help people identify the item and distinguish it from other similarly named ones. For languages, descriptions typically include the language family and where it is spoken.

Although languages are often closely connected to particular ethnic groups, it is generally not useful to include ethnic groups in the description. The language and the ethnic group often share the same name, and people are not likely to be familiar with the language but not the ethnic group, or vice versa.

Examples:

  • Koro (Sino-Tibetan language spoken in India)
  • Koro (Oceanic language spoken in Papua New Guinea)
  • Koro (Oceanic language spoken in Vanuatu)
  • Koro (Mande language spoken in Ivory Coast)

Basic statements[edit]

instance of (P31)[edit]

subclass of (P279)[edit]

subclass of (P279) is used to indicate the next level up in a language family tree.

country (P17)[edit]

country (P17) is used to indicate the countries where a language is spoken. It does not imply any official status in that country.

indigenous to (P2341)[edit]

indigenous to (P2341) is used to indicate the ethnic groups and the locations that a language is indigenous to.

ethnic group (P172), location (P276) and located in the administrative territorial entity (P131) are not used for this information.

native label (P1705)[edit]

native label (P1705) is used to store the native label (autonym) of the language. It is also sometimes used to add the name in other languages used by the speakers of the language (such as an official language of the country where it is spoken).

If the language isn't available in the list of languages, select mis and add language of work or name (P407) as a qualifier.

writing system (P282)[edit]

number of speakers, writers, or signers (P1098)[edit]

topic's main category (P910)[edit]

To link to the category related to the language. Such categories comes often from Wiktionary projects, especially for minor languages.

Grammatical and phonetic behaviour[edit]

linguistic typology (P4132)[edit]

has grammatical case (P2989)[edit]

has tense (P3103)[edit]

has grammatical mood (P3161)[edit]

has grammatical gender (P5109)[edit]

has grammatical person (P5110)[edit]

has conjugation class (P5206)[edit]

has paradigm class (P5913)[edit]

has phoneme (P2587)[edit]

uses capitalization for (P6106)[edit]

External identifiers[edit]

Regional[edit]

described at URL (P973)[edit]

Other databases which don't yet have an external identifier property can be linked using described at URL (P973), e.g.

CLLD databases[edit]

Constructed languages[edit]