Wikidata:Development plan

From Wikidata
Jump to: navigation, search

Development
Plan

Status
Updates

Paper
Cuts

UI Redesign
Input

Contact
the development team


This is the Wikidata development plan for 2014+. It is ordered by when items will likely be started. Some of these things are already being worked on.

Access for remaining sister projects[edit]

The remaining sister projects have access to sitelinks and data via Wikidata. The roll-out is staged to allow the communities to adapt. The planned order is:

  • Wikisource yellow tickY Partly Done
    • Sitelinks: ✓ Done
    • Data: 25.2.2014 ✓ Done
    • Oldwikisource not done yet (see Bugzilla62717)
    • Edition interwiki links not done yet
  • Wikiquote
    • Sitelinks: 08.04.2014 ✓ Done
    • Data: 10.06.2014 ✓ Done
  • Wikinews
  • Commons (not including file metadata!)
    • Sitelinks: ✓ Done
  • Wikibooks
  • Wikiversity
  • Meta, MediaWiki, Wikispecies, Incubator

Not to be done (yet)[edit]

  • Wiktionary

Quantities[edit]

bugzilla:54318

Users are able to enter quantitative data in Wikidata and re-use it on the clients as well as outside of Wikimedia. It is possible to express statements like “Berlin has an estimated population of 3,397,469 (+/-100) as of 31 July 2013” or “Berlin has an area of 891.85 km2”. At first only unitless quantities are supported. In a later deployment a small number of units are added and expanded in future deployments. When viewing an item with such quantitative data, the user sees these according to local conventions (decimal separator, unit conversion). On the client this data is accessed via the parser function and Lua and also shown according to the content language. The API provides a way to access the data in the preferred format of the request sender.

  • Unitless ✓ Done
  • With units ?

Not to be done (yet)[edit]

  • Letting users add arbitrary units
  • Currency conversion (Conversion rates and the value of a single currency change over time. That is very complex to model.)

Badges[edit]

bugzilla:40810

Some clients want to give badges to their articles. This includes things like “good article”, “featured article” and the importance and quality of an article. This data is stored on the corresponding Wikidata item. Each sitelink can initially have zero or one badge attached to it - more later. The user can set and change a badge by selecting one from a list of pre-defined batches on the item page or via the API. On the client some of the badges are shown in the list of language links in the sidebar. Badges can be accessed via the API and Lua.

Technical details[edit]

These badges correspond to Wikidata items.

Simple queries[edit]

bugzilla:52385

Users are able to pose simple queries to Wikidata via a SpecialPage as well as the API. Wikidata can answer queries like “What has the ISBN 2-01-202705-9” or “What has the capital Paris”. These queries are restricted to one property/value pair and return a list of items. The returned result only includes items where the statement is marked as preferred. These queries are most useful for use with one of the many identifiers in Wikidata that connect the knowledge base to other databases.

Not to be done[edit]

  • Querying for sources or qualifiers

Things to keep in mind[edit]

Some data types are easier to query than others. Time, Geo and Quantity values require range queries. For the Item and String data types, simple equality is sufficient.


Merges and redirects[edit]

bugzilla:57744 and bugzilla:38664

When two different items about the same topic are created they can be merged. Labels, descriptions, aliases, sitelinks and statements are merged if they do not conflict. The item that is left empty can then be turned into a redirect to the other. This way, Wikidata IDs can be regarded as stable identifiers by 3rd-parties.

Merges: ✓ Done

UI redesign[edit]

bugzilla:52136

Reading and editing Wikidata is joyful and intuitive on desktops, tablets and mobile phones. The interface is visually pleasing, integrates nicely with other Wikimedia projects and contains no jargon. The interface provides the user with the information they were looking for quickly and does not overwhelm them (i.e. deprecated data is hidden initially and information is ordered in an intuitive way). It invites the user to add additional information (including qualifiers and sources) and offers little nudges towards making correct and useful contributions by offering suggestions. Erroneous contributions and vandalism are discouraged. Navigating and editing the website is fast. Both the data and the interface is localized in the user’s locale and language preferences. Where no data is available in a particular language a fallback is used.

Not to be done[edit]

  • enforcing user-defined constraints on data input


Data usage tracking[edit]

bugzilla:47288

To ease maintenance of the data in Wikidata it is possible to get a list of all articles certain data is used in. Users are thereby able to see which articles are affected by changes they are making. This also allows a better overview of where and how Wikidata’s data is used in Wikimedia’s projects.

Not to be done[edit]

  • Usage tracking outside Wikimedia’s projects


Access to data from arbitrary items[edit]

bugzilla:47930

Users on the client are able to include data from any Wikidata item they chose by specifying its ID. This expands on their ability to access data of the item currently associated with the page via a sitelink. This access is possible via both the parser function and Lua.


Wikimedia Commons[edit]

bugzilla:64288

Wikimedia Commons holds a huge amount of multimedia files available for the other Wikimedia projects and the world to use. Structured data support for Wikimedia Commons is important to make it easier to maintain the files and make reuse, especially 3rd-party reuse, easier. The structured data support comes in two ways. The first is by providing access to the data stored in Wikidata. This includes things like the date of birth of an artist. The second way is by enabling Wikimedia Commons itself to store structured data related to the files stored there. This includes things like the license and subject of a photo for example. When a new file is contributed, the uploader is asked to provide some information like tags, creator name and license in the upload wizard. Users are able to then access and edit this structured data via a form as well as an API (similar to how it is done on Wikidata). It is easy to specify and retrieve the licensing and provenance information of a multimedia file. Additionally it is easy to tag and categorize images based on concepts from Wikidata. Tags and other file information is shown in the user’s language to accommodate the multi-lingual audience of Wikimedia Commons. All this information can be used to easily search for files that fit certain criteria like “picture of a cat and a child from 2010, licensed under CC-BY-SA”.

Technical details[edit]

The data is stored on a “data” page attached to the file’s page that is similar to Wikidata’s item pages. Commons is thereby a repository and at the same time its own client as well as a client of wikidata.org.

Statements on properties[edit]

bugzilla:49554

To improve maintainability of the data in Wikidata it is possible to add statements to property pages. This is used to store constraints for properties. An example for such a constraint is that the winner of a certain award must be human. Another example would be that the number of inhabitants needs to be a positive integer.

Not to be done[edit]

  • enforcing the stored constraints in the Wikibase software


Data dumps[edit]

bugzilla:44581

For 3rd party re-use and analysis of the data in Wikidata, we provide two kinds of dumps in addition to the regular XML dumps:

  • A JSON dump, containing the canonical JSON representation of all entities (as opposed to the brittle internal JSON representation found in the XML dumps).
  • A RDF dump, containing the RDF representation of all entities, for use in semantic web applications. This RDF will not assert facts, but rather represent claims. The RDF and JSON representation of individual entities is also available via Wikidata’s linked data interface (Special:EntityData).


Complex queries[edit]

bugzilla:65626

Users are able to write queries that are more complex than the simple queries. This includes queries like “all poets who lived in 1982” or “all cities with more than 1 Million inhabitants”. They are entered (using the semantics as embodied by Semantic MediaWiki’s Ask extension) in a page in the Query namespace and internally saved as JSON. They are then executed when resources are available - usually not immediately. The result is cached. A query can be set to rerun at regular intervals or on-demand by an administrator. The result of the query is shown on the same page. It can also be accessed via the API. The clients can include the result of a query in their pages to for example create list articles. The result will be a list of items. It can then be manipulated as needed by Lua. More result formatters and visualisations will be made available in future deployments based on Semantic MediaWiki’s result formatters. These queries are making Wikidata even more useful to the Wikimedia projects and the world and are needed by the community to maintain the large database.

Optional[edit]

  • transitive queries
  • disjunction

Article history integration[edit]

bugzilla:40358

Editors on a client can look at the change history of an article and see all Wikidata changes relating to this article. This way they can see all changes affecting their article without having to go to another project.


Mono-lingual text datatype[edit]

bugzilla:63721

Users can add strings and specify a language for it. They can for example enter the motto of a country in the country’s language. It is shown in this language to all users regardless of their language setting.

Geo-shape datatype (optional)[edit]

bugzilla:55549

Users can enter geo-shapes in Wikidata. They can for example use it to store the outline of a country.

Technical details[edit]

This will likely be realised using leaflet.js.


Multi-lingual text datatype (optional)[edit]

Users can add strings and specify a language for it. This is similar to the mono-lingual string. However translations in more than one language can be provided. The one in the user’s language is shown and the others can be shown on-demand.


Beyond the planning horizon of this plan[edit]