This is the Wikidata development plan for 2014+. It is ordered by when items will likely be started. Some of these things are already being worked on.
- 1 Access for remaining sister projects
- 2 Quantities
- 3 Badges
- 4 Simple queries
- 5 Merges and redirects
- 6 UI redesign
- 7 Data usage tracking
- 8 Access to data from arbitrary items
- 9 Wikimedia Commons
- 10 Statements on properties
- 11 Data dumps
- 12 Complex queries
- 13 Article history integration
- 14 Mono-lingual text datatype
- 15 Geo-shape datatype (optional)
- 16 Multi-lingual text datatype (optional)
- 17 Beyond the planning horizon of this plan
Access for remaining sister projects
The remaining sister projects have access to sitelinks and data via Wikidata. The roll-out is staged to allow the communities to adapt. The planned order is:
- Wikisource Partly Done
- Sitelinks: Done
- Data: 25.2.2014 Done
- Oldwikisource not done yet
- Sitelinks: 08.04.2014
- Commons (not including file metadata!)
- Sitelinks: Done
- Meta, MediaWiki, Wikispecies, Incubator
Not to be done (yet)
Users are able to enter quantitative data in Wikidata and re-use it on the clients as well as outside of Wikimedia. It is possible to express statements like “Berlin has an estimated population of 3,397,469 (+/-100) as of 31 July 2013” or “Berlin has an area of 891.85 km2”. At first only unitless quantities are supported. In a later deployment a small number of units are added and expanded in future deployments. When viewing an item with such quantitative data, the user sees these according to local conventions (decimal separator, unit conversion). On the client this data is accessed via the parser function and Lua and also shown according to the content language. The API provides a way to access the data in the preferred format of the request sender.
- Unitless Done
- With units ?
Not to be done (yet)
- Letting users add arbitrary units
- Currency conversion (Conversion rates and the value of a single currency change over time. That is very complex to model.)
Some clients want to give badges to their articles. This includes things like “good article” or “featured article”. This data is stored on the corresponding Wikidata item. Each sitelink can initially have zero or one badge attached to it - more later. The user can set and change a badge by selecting one from a list of pre-defined batches on the item page or via the API. On the client some of the badges are shown in the list of language links in the sidebar. Badges can be accessed via the API and Lua.
These badges correspond to Wikidata items.
Users are able to pose simple queries to Wikidata via a SpecialPage as well as the API. Wikidata can answer queries like “What has the ISBN 2-01-202705-9” or “What has the capital Paris”. These queries are restricted to one property/value pair and return a list of items. The returned result only includes items where the statement is marked as preferred. These queries are most useful for use with one of the many identifiers in Wikidata that connect the knowledge base to other databases.
Not to be done
- Querying for sources or qualifiers
Things to keep in mind
Some data types are easier to query than others. Time, Geo and Quantity values require range queries. For the Item and String data types, simple equality is sufficient.
Merges and redirects
When two different items about the same topic are created they can be merged. Labels, descriptions, aliases, sitelinks and statements are merged if they do not conflict. The item that is left empty can then be turned into a redirect to the other. This way, Wikidata IDs can be regarded as stable identifiers by 3rd-parties.
Reading and editing Wikidata is joyful and intuitive on desktops, tablets and mobile phones. The interface is visually pleasing, integrates nicely with other Wikimedia projects and contains no jargon. The interface provides the user with the information they were looking for quickly and does not overwhelm them (i.e. deprecated data is hidden initially and information is ordered in an intuitive way). It invites the user to add additional information (including qualifiers and sources) and offers little nudges towards making correct and useful contributions by offering suggestions. Erroneous contributions and vandalism are discouraged. Navigating and editing the website is fast. Both the data and the interface is localized in the user’s locale and language preferences. Where no data is available in a particular language a fallback is used.
Not to be done
- enforcing user-defined constraints on data input
Data usage tracking
To ease maintenance of the data in Wikidata it is possible to get a list of all articles certain data is used in. Users are thereby able to see which articles are affected by changes they are making. This also allows a better overview of where and how Wikidata’s data is used in Wikimedia’s projects.
Not to be done
- Usage tracking outside Wikimedia’s projects
Access to data from arbitrary items
Users on the client are able to include data from any Wikidata item they chose by specifying its ID. This expands on their ability to access data of the item currently associated with the page via a sitelink. This access is possible via both the parser function and Lua.
Wikimedia Commons holds a huge amount of multimedia files available for the other Wikimedia projects and the world to use. Structured data support for Wikimedia Commons is important to make it easier to maintain the files and make reuse, especially 3rd-party reuse, easier. The structured data support comes in two ways. The first is by providing access to the data stored in Wikidata. This includes things like the date of birth of an artist. The second way is by enabling Wikimedia Commons itself to store structured data related to the files stored there. This includes things like the license and subject of a photo for example. When a new file is contributed, the uploader is asked to provide some information like tags, creator name and license in the upload wizard. Users are able to then access and edit this structured data via a form as well as an API (similar to how it is done on Wikidata). It is easy to specify and retrieve the licensing and provenance information of a multimedia file. Additionally it is easy to tag and categorize images based on concepts from Wikidata. Tags and other file information is shown in the user’s language to accommodate the multi-lingual audience of Wikimedia Commons. All this information can be used to easily search for files that fit certain criteria like “picture of a cat and a child from 2010, licensed under CC-BY-SA”.
The data is stored on a “data” page attached to the file’s page that is similar to Wikidata’s item pages. Commons is thereby a repository and at the same time its own client as well as a client of wikidata.org.
Statements on properties
To improve maintainability of the data in Wikidata it is possible to add statements to property pages. This is used to store constraints for properties. An example for such a constraint is that the winner of a certain award must be human. Another example would be that the number of inhabitants needs to be a positive integer.
Not to be done
- enforcing the stored constraints in the Wikibase software
For 3rd party re-use and analysis of the data in Wikidata, we provide two kinds of dumps in addition to the regular XML dumps:
- A JSON dump, containing the canonical JSON representation of all entities (as opposed to the brittle internal JSON representation found in the XML dumps).
- A RDF dump, containing the RDF representation of all entities, for use in semantic web applications. This RDF will not assert facts, but rather represent claims. The RDF and JSON representation of individual entities is also available via Wikidata’s linked data interface (Special:EntityData).
Users are able to write queries that are more complex than the simple queries. This includes queries like “all poets who lived in 1982” or “all cities with more than 1 Million inhabitants”. They are entered (using the semantics as embodied by Semantic MediaWiki’s Ask extension) in a page in the Query namespace and internally saved as JSON. They are then executed when resources are available - usually not immediately. The result is cached. A query can be set to rerun at regular intervals or on-demand by an administrator. The result of the query is shown on the same page. It can also be accessed via the API. The clients can include the result of a query in their pages to for example create list articles. The result will be a list of items. It can then be manipulated as needed by Lua. More result formatters and visualisations will be made available in future deployments based on Semantic MediaWiki’s result formatters. These queries are making Wikidata even more useful to the Wikimedia projects and the world and are needed by the community to maintain the large database.
- transitive queries
Article history integration
Editors on a client can look at the change history of an article and see all Wikidata changes relating to this article. This way they can see all changes affecting their article without having to go to another project.
Mono-lingual text datatype
Users can add strings and specify a language for it. They can for example enter the motto of a country in the country’s language. It is shown in this language to all users regardless of their language setting.
Geo-shape datatype (optional)
Users can enter geo-shapes in Wikidata. They can for example use it to store the outline of a country.
This will likely be realised using leaflet.js.
Multi-lingual text datatype (optional)
Users can add strings and specify a language for it. This is similar to the mono-lingual string. However translations in more than one language can be provided. The one in the user’s language is shown and the others can be shown on-demand.
Beyond the planning horizon of this plan
- Wiktionary support