Shortcut: WD:GLOSS

wikidata:فرہنگ

From Wikidata
Jump to: navigation, search
This page is a translated version of the page Wikidata:Glossary and the translation is 8% complete.

Outdated translations are marked like this.
Other languages:
العربية • ‎azərbaycanca • ‎беларуская • ‎беларуская (тарашкевіца)‎ • ‎বাংলা • ‎bosanski • ‎català • ‎čeština • ‎dansk • ‎Deutsch • ‎Zazaki • ‎dolnoserbski • ‎Ελληνικά • ‎English • ‎British English • ‎Esperanto • ‎español • ‎euskara • ‎فارسی • ‎suomi • ‎français • ‎Frysk • ‎ગુજરાતી • ‎עברית • ‎हिन्दी • ‎hornjoserbsce • ‎magyar • ‎Հայերեն • ‎interlingua • ‎Bahasa Indonesia • ‎Ilokano • ‎íslenska • ‎italiano • ‎日本語 • ‎ქართული • ‎한국어 • ‎Ripoarisch • ‎Latina • ‎Lëtzebuergesch • ‎lietuvių • ‎latviešu • ‎македонски • ‎മലയാളം • ‎Bahasa Melayu • ‎norsk bokmål • ‎Nederlands • ‎norsk nynorsk • ‎occitan • ‎ਪੰਜਾਬੀ • ‎polski • ‎پښتو • ‎português • ‎português do Brasil • ‎română • ‎русский • ‎Scots • ‎srpskohrvatski / српскохрватски • ‎српски / srpski • ‎српски (ћирилица)‎ • ‎svenska • ‎Kiswahili • ‎ślůnski • ‎தமிழ் • ‎తెలుగు • ‎ไทย • ‎Türkçe • ‎українська • ‎اردو • ‎Tiếng Việt • ‎ייִדיש • ‎中文 • ‎中文(中国大陆)‎ • ‎中文(简体)‎ • ‎中文(繁體)‎

Wikidata is a knowledge base that anyone can edit. This page is a reference for users and will try to establish consistency in terminology, which will hopefully help to improve discussion and communication among editors.

The Glossary is ordered conceptually rather than alphabetically, with the more general concepts presented first as much as possible. This is because it is translated into several languages, and the concepts have different names in different languages. In some cases, it is not obvious how to organize the entries. In these cases, "see also" has been added to the appropriate section.

نام اور منصوبے

  • Wikimedia is the name of a movement (see for details) that provides free knowledge to the public through the Wikimedia projects.
  • Wikimedia projects (see for details) are free wikis with a specific purpose, usually divided into multiple individual wikis for each language, as with Wikipedia. Wikidata is a multilingual Wikimedia project. There are about 800 different wikis in total for Wikimedia projects. For now, only Wikimedia projects can be linked with Wikidata.

MediaWiki is the software that runs projects like Wikipedia and Wikimedia Commons; see MediaWiki.

Wikibase is the software behind Wikidata. It consists of three MediaWiki extensions: Wikibase, Wikibase client, and WikibaseLib. The Wikibase extension (for the Wikidata server, often called just repo) allows a MediaWiki installation to collect and maintain structured data and will be used on the Wikidata website. The Wikibase client extension (often called just client) enables MediaWiki installations to query and display data from a Wikidata server on its own pages, and will be deployed on Wikipedias in different languages, and probably on other sites. The WikibaseLib extension has common libraries for both of the major extensions.

Wikidata is a Wikimedia project that runs an instance of MediaWiki with the Wikibase extensions. It allows Wikidata editors to enter data and browse pages.

Basic terms

Data, in Wikidata, is the collection of all structured data, meaning database content — in general, everything entered by the Wikidata editors and bots into the entity pages, each corresponding to a dataset in the Wikidata database. These pages belong to the three data namespaces (entity namespaces): the main namespace (for items), the property namespace and the query namespace. Other Wikidata pages consist of untructured content, for example running text, and are considered meta pages. Specifically, property data is the property values in the statements, which only can have certain Datatypes.

Data is raw information, like the words you are reading right now. Wikidata is essentially a collection of structured data, or database content. Those data are generally everything entered by the Wikidata editors and bots using the entity pages and the public programming interface. The wikipages from which a user can see and enter data are organized in three data namespaces:

  1. the main namespace (for items), regrouping pages in which we can see and enter information about a specific entity,
  2. the property namespace, in which we can see information about properties, which are used to structure the information we enter into statements and the
  3. query namespace, in which we can define additional ways to extract and display the information than the main namespace.

The data in those namespaces are said to be structured because they are all organized in a way that the Wikibase software uses to ensure a certain data model and because the community defines and enforces the correct ways to enter information.

Metadata in Wikidata is structured data, that can not be created or changed by users and bots but is created by the MediaWiki software. The revision history of pages is an example of metadata. The software generates the entries with time stamps and user names.

Other Wikidata pages are classical Wikipages and consist of unstructured data or semi-structured data (Q2336004) View with Reasonator See with SQID (for example: running text or wikitext), and are meta pages, such as community discussion pages.

Specifically, an important kind of data are property data. Property data are values associated with a property to build a Claim; they are an organisation unit of the structured data. Each property is assigned a Datatype, which defines the property data values that can be used in claims built with this property.

  • Dataset

A dataset is generally any collection of (structured) Data.

In Wikidata what is called a dataset is often associated to an entity: the dataset associated to an entity is all the information shown in the identity Wikipage (the set of statements in the database who have this entity as a subject, the Wikipedia links of articles describing this entity on Wikimedia projects, ...).

We can build other datasets by combining dataset of several entities.

The datasets can be represented in different ways: as in their entity Wikipage in the form of an XML or JSON file for the robots and computing programs. Specifically in the Wikidata user interface messages, dataset refers to data associated to an entity (an item, a property or a query)

  • Dereferenceable URIs These are used during content negotiation to supply a resource description even if it is the entity itself that is addressed. This also makes it possible to supply a human-readable description or a machine readable one. The latter one would then be RDF data, according to what is more suitable. The content the dereferenced URIs point to will be available through the page Special:EntityData.

Export refers to the way data and metadata from Wikidata are made available for further consumption. The intention is to make machine-readable exports of the data available in widely used formats such as JSON or RDF/XML.

Linked data is a method for publishing structured data so that it can be interlinked and become more useful. It closely relates to how Wikidata works, by connecting entities and attaching data on linked data pages like Wikidata do for items.

  • Triplet is how to store data as a single data entry in linked data. It consists of a subject, a predicate and an object. In Wikidata this corresponds roughly to the item, property and value.
  • Ontology - ontology (Q324254) View with Reasonator See with SQID This is an explicit and formal specification of a conceptualization. It is important that an ontology convey a shared understanding of a domain. In Wikidata this would be given by using the properties and their intended meaning in statements to describe the real world entities and concepts, through their Wikidata counterpart, associated to literal data and other entities.
  • Provenance is the history of the contributor who added a data, and of the source from which the data was extracted. Provenance is important in the case of the reuse of Open data datasets or external database use.
  • Vocabulary This is the set of terms that is used to describe the ontology. The terms used in one vocabulary can be the same as (owl:sameAs) some terms from another vocabulary. Sameness is more strict than equality.

Sitelinks

Sitelink is an identification of a linked page or article on another Wikimedia site such as a Wikipedia language version. It consists of a site identifier and a Sitelink-title (the article title), and go from individual items in Wikidata. They are used both for identifying an item from an external site, and as a central storage of interlanguage (interwiki) links. Sitelinks can have attached badges and will usually show that a page has been a featured article, or of similar status. See Help:Sitelinks.

Site is a reference to an external website in general, but in sitelinks it refers to specific registered websites that can be used for internal lookup. Those sites are referenced by global site identifiers or for short siteid. For example the English Wikipedia´s siteid is enwiki. Usually the initial letters are followed by the subdomain of the registered site at Wikimedias projects. Linking to such sites can have constraints. In the current setup each external page can have only one link registered in Wikidata and one item can only have one link to each external site.

A badge is a kind of marker attached to a sitelink, which could identify, for example, that the article is a "featured article" on a specific site. This will be a feature in the future.

Namespaces

Page means a page in some wiki with a unique title, for example an article (a page in Wikipedia main namespace). In Wikidata, the term page may refer to an #Entity page (in the data namespaces), an meta page (in other namespaces) or a linked page (an external page in any Wikipedia or other Wikimedia wiki site), that is references using a sitelink. Pages in the main namespace of Wikidata are about items, and one page can only hold one item.

Meta pages are all pages that are not (entities, i.e. do not belong to the data namespaces. Wikidata meta pages contain unstructured content represented by conventional Mediawiki code, and perhaps also future Wikidata client side inclusion code. Examples are talk pages, category pages, project pages (in the Wikidata namespace) and help pages (in the help namespace). Meta pages also comprise content and data automatically generated by the Mediawiki software, for example the edit history of a page, or special pages.

Namespace is a physical division of pages in MediaWiki to group them according to overall use or some additional behavior. Examples are namespaces for categories, files, users, and in the case of Wikidata namespaces for items, properties and queries.

The mainspace is the namespace where all items are located. It is distinguished by its lack of a prefix.

Entities, items, properties and queries

Entity (also known as data set) is a Wikidata page, that either may be an item (in the main namespace), a property (in the property namespace) or a query (in the query name space). Every entity is uniquely identified by an entity ID, which is a prefixed number, for example starting with the Q prefix for an item, and P for a property and U for query. An entity is also identified by a unique combination of label and description in each language. The entity can also be assigned a set of alternative multilingual aliases. (In ontologies and library catalogues that are used as reference for Wikidata, an entity is typically a real-life topic or subject, or its database representation, and corresponds in that context to what in Wikidata is called an item.)

Item is a page in Wikidata main namespace that represents a real-life topic, concept, or subject. Just like other Wikidata entities, items are identified by an entity ID (an item ID is a number prefixed with a Q, e.g. Q1), and by a unique combination of multilingual label and description, and may also be assigned an alias. Items consist of sitelinks to linked pages. They may also consist of statements, including property-value pairs, and sources. According to Wikidata notability policy, only items that have at least one sitelink to Wikipedia article are currently allowed.

Property is the descriptor for a data value or set of values, but not the data value or values themselves. Each claim at an item page links to a property, and assigns the property one or several values. The property is stored as an entity page in the Property namespace, and includes a declaration of the the datatype of the property values. New properties are suggested and motivated at Wikidata:Property proposal. All properties should be listed at Wikidata:List of properties‎ (WD:P). Usage of the property is also discussed at its talk page. Most properties can be mapped to Wikipedia infobox parameters and categories, see Wikidata:Infoboxes task force. The inclusion of property values in Wikipedia infoboxes (using inclusion syntax) is done for each infobox and Wikipedia version individually.

Query is predefined search across items. A query is the descriptor for the predefined search, but not the hits generated by the search. Each search is described and defined on its own page, and have their own prefixed identifier.

Identifiers and languages

Many Wikimedia projects exist in different localised versions, but not Wikidata. Wikidata is multilingual, this means all parts of the user interface and also all the pages of data content can be translated into and used in many different languages. The users can determine their favorite languages. Wikidata is meant to treat all languages the same and to interconnect the knowledge of many languages allowing data content contributed in one language to be used in all the other languages as well. The users can translate all the pages into the different local languages and therefore improve the usability step by step.

Title is the name of an external linked page (known as Sitelink-title), the name of an meta page, or the Entity ID of an entity page. If the page does not belong to the main namespace, the title includes the namespace:id.

Used for items, properties and queries the name is an identifier containing the prefix and numeric id. The localized label is attached to the identifier to make the overall string more readable. The namespace is normally not attached in the string, but is prefixed in the URL. A title example is Property:P1.

Used for sitelinks the name is a canonical string that identifies a page on an external site. Together the site and title form the complete sitelink. During validation of the title the string will go through a normalization procedure, and in the end the title will be the external site´s canonical page name. Only after the normalization is completed and site-specific constraints are satisfied a new sitelink can be stored.

Used for an meta page in non-entity namespaces the title is spelled out as is and identifies the meta page. The namespace is normally prefixed to the string, and also to the URL. Title example is Wikidata:Glossary.

Language attributes are the language-specific labels, aliases and descriptions that are attached to items, properties and queries. These are human-readable text to improve understanding of the scope of the item; for example the specific type of real world entity. If they are missing some of them can be replaced by strings from alternate languages, following the language fallback chains.

Language fallbacks (language chains) are methods to systematically replace missing language attributes with strings from alternate languages. The exact replacement rules can be chosen depending on the type of page, whether the user is logged in, or if so if the user has provided information about his preferred languages.

Label (also known as name) is a language-specific name used for items, properties and queries. This is usually the most important name the entry is known under, or the most general or easily understandable phrase it will be known as internally to the project. Within Wikidata this takes the role of the title in Wikipedia and is used as the primary means to distinguish entries. For items it does not need to be unique, neither in the language or the overall project, but it must be unique together with the description. For properties and queries (not defined yet) it must be unique within the given language. Uniqueness for a combination of a label and a description is a hard constraint that must be satisfied before a change can be saved, although it may be removed in the future.
Labels should use the language specific conventions for capitalization of proper names and phrases as seems fit for the specific entry. In listings the label will be followed by the description so they join as a single list entry. Both labels and descriptions can be extracted and used independently. See Help:Label.

See Help:Label.

Description is a language-specific descriptive phrase for an item, property or query. It provides context for the label (for example, there are many items about places with the label "Cambridge"). The description therefore does not need to be unique, neither within a language or the overall project, but it must be unique together with the label. Uniqueness for a combination of a label and a description is a hard constraint that must be satisfied before a change can be saved. See Help:Description for more information, including proper styling of descriptions.

See Help:Description for more information, including proper styling of descriptions.

Aliases (called also known as in the user-interface) are language-specific alternate names for items, properties and queries that can be used for lookup the same way as labels (titles). Similar to the labels they are language specific, but unlike the labels there can be as many aliases as necessary. See Help:Aliases.

See Help:Aliases.

Claims and statements

Elements of a statement

In order to use Wikidata, the knowledge contained in different sources must be decomposed. A source might read Wolfgang Amadeus Mozart was a composer who was born 27 January 1756 and died in 5 December 1791. We need to decompose the information contained in this sentence and transform it into claims and statements: name: Wolfgang Amadeus Mozart; date of birth: 27 January 1756; date of death: 5 December 1791; occupation: composer. Both claims and (Wikidata) statements are expressed into a so called statement to be used as linked data by external websites or organization, but they are slightly refined to fit their purpose in Wikidata. Usually the statement itself in linked data is described by a single triplet, but when the statement in itself is reified, it is possible to say something more about the statement. We may say it has a value, that is our original triple (or tuple to be more general), and we may say something about that value like when and how the value is recorded or measured. Such statements about a statement is in Wikidata called qualifiers to separate them more clearly from our statements. Without doing this it could be difficult to separate the different types of statements from each other.

Statements describing references for the particular reified statement can also be made. Those are also statements about statements, but they have different roles and are also given special names. This is done by adding references. References are also reified statements so we can make statements about them, that is we can give them qualifiers. Note that references are reified statements about reified statements. It is a good thing that we can talk about references with qualifiers, that makes it somewhat clearer. (Another way to say things about references is to give them their own items and to add statements about it.)

To implement the basic assertion, the core triplet or rather the duplet as the subject is given as the item itself, a small structure called a snak is used. Those come in several versions, each specialized for a single purpose. Statements hold such snaks, and they are also the inner parts of statements about statements that is qualifiers, references and ranks. Part of the specialization for snaks is that some of them can hold a value of a special type, a datatype. A snak will refuse to hold any other type than what it is configured to store.

During the lifetime of a statement it might be set to normal, until it is deemed preferred, and later on it might be replaced by a more up to date value and marked deprecated. Those values are nothing more than statements about the reified statement, but it is given its own name and appearance in the user interface.

Claim is the part of a statement without the references.

A statement is one piece of data about an item, recorded on one item page. In the simplest case, a statement is just a "property: value" pair (for example, "Location: Germany"), but often statements can have further qualifiers (such as temporal qualifiers). See Data model. Wikidata makes no assumptions about the correctness of statements, but merely collects and reports them with a reference to a source. See Help:Statements.

Values (or datavalues) are the information pieces embedded in each claim. Depending on their datatype, they can be a single value (like a number) or a value consisting of several parts (like a geographical position with longitude and latitude). Internally they are connected to the claims through snaks.

Modify the snaktype (value/some value/no value) here.

No value is a marker when there is no known value for the property.

Some value is a marker when there is some value but the exact value is not known for the property.

Custom value is a marker when there is a known value for the property.

Snak (or connector) is an internal abstraction layer between values — including non-values and some-values — and items on one side and statements and claims on the other side. Normally this abstraction layer will not be visible in the user interface, but it will be visible Import of data in the REST API.

Datatypes (data value type or value type) are what kind of data values that may be assigned to a property, and how the data values are stored in each claim. See Special:ListDatatypes for currently available datatypes. Each property is assigned a pre-defined datatype.

String (short for character string) is a general term for a sequence of freely chosen characters interpreted as text (e.g. "Hello")—as opposed to a data interpreted as a numerical value (3.14), a link to an item (e.g. [[Q1234]]) or a more complex datatype (the set {1,3,5,7} ). Wikidata will support datatypes "monolingual-text" and "multilingual-text", both considered as string datatypes, as the value of a property. For example, the hypothetical property "given name" for Abraham Lincoln would equal "Abraham".

Qualifier is a part of the claim that says something about the specific claim, often in a descriptive way. A qualifier might be a term according to a specific vocabulary but can also be a variant descriptive phrase (if those terms or phrases are free text or part of some vocabulary would probably be up to the Wikidata community).

Rank is a quality factor used for simple selection/filtering in cases where there are many statements for a given property. There are three possible ranks:

  1. Deprecated rank is used for a statement that contains information that may not be considered reliable or that is known to include errors. (For example, a statement that documents a wrong population figure that was published in some historic document. In this case the statement is not wrong – the historic document that is given as a reference really made the erroneous claim – but the statement should not be used in most cases.)
  2. Normal rank is used for a statement that contains relevant information that is believed to be correct, but may be too extensive to be shown by default. (For example, historic population figures for Berlin over the course of many years.)
  3. Preferred rank is used for a statement with the most important and most up-to-date information. Such a statement will be shown to all users and will be displayed in Wikipedia infoboxes by default. (For example, the most recent population figures for Berlin.)

Reference (or source) describes the origin of a statement in Wikidata. A source is often an item in its own right; for example, a book. Wikidata does not aim to answer the question of whether a statement is correct, but merely whether the statement appears in a reference. What constitutes valid references is expected to be a question of debate among the Wikidata editors.

  • External identifier Some properties have values that are strings used in other organisations' databases to uniquely identify an item. For example, an ISBN for a book or the unique part of the URL of a movie or an actor in the Internet Movie Database.

Related terms

  • RDF/XML is a serialization format of RDF in XML; see RDF/XML.

See also