Wikidata:Development plan

From Wikidata
Jump to: navigation, search


This is the Wikidata development plan. It is ordered by when items will likely be started but the order is not necessarily fixed. It can change depending on changed priorities.

Contents

Done[edit]

Badges[edit]

Some clients want to add extra meta data (badges) to their articles. This includes things like "good article", "featured article" and the importance and quality of an article.

If your wiki uses this feature, please ensure it's listed.

  • The badges will be defined on Wikidata items' sitelinks as Wikidata items, like "featured article (Q123456)".
  • The items allowed as badges will be defined in the Wikidata configuration settings. Initially "good article" and "featured article" badges will be available.
  • The user can set and change a badge by selecting one from such pre-defined list of badges.
  • On the clients sitelinks with badges will have an icon in the list of language links in the sidebar. There will be default icons shipped by a Wikimedia specific Wikibase extension, but wikis can choose to use own icons.
  • If further customization on a per wiki level is needed, that can be done by using the CSS classes that are set on sitelinks with badges. There will be CSS classes that map to badge ids (for example Q120 could map to class GA). Also there will be canonical classes which aren't setting dependent (like wb-badge-Q123) on every sitelink with badges.
  • The badges can be queried via a special page called Special:PagesWithBadges on every client wiki

Technical details[edit]

These badges correspond to Wikidata items.

Remaining problems[edit]

  • phab:T73887: Other projects sidebar should show badges if applicable

Merges and redirects[edit]

When two different items about the same topic are created they can be merged. Labels, descriptions, aliases, sitelinks and statements are merged if they do not conflict. The item that is left empty can then be turned into a redirect to the other. This way, Wikidata IDs can be regarded as stable identifiers by 3rd-parties.

Mono-lingual text datatype[edit]

Users can add strings and specify a language for it. They can for example enter the motto of a country in the country’s language. It is shown in this language to all users regardless of their language setting.

JSON dumps[edit]

For 3rd party re-use and analysis of the data in Wikidata, we provide JSON dumps in addition to the regular XML dumps. The JSON dump contains the canonical JSON representation of all entities (as opposed to the brittle internal JSON representation found in the XML dumps). The JSON representation of individual entities is also available via Wikidata’s linked data interface (Special:EntityData).

Quantities without units[edit]

Users are able to enter quantitative data in Wikidata and re-use it on the clients as well as outside of Wikimedia. It is possible to express statements like “Berlin has an estimated population of 3,397,469 (±100) as of 31 July 2013” or “Berlin has an area of 891.85 km²”. At first only unitless quantities are supported. In a later deployment a small number of units are added and expanded in future deployments. When viewing an item with such quantitative data, the user sees these according to local conventions (decimal separator, unit conversion). On the client this data is accessed via the parser function and Lua and also shown according to the content language. The API provides a way to access the data in the preferred format of the request sender.

Remaining problems[edit]

  • phab:T68580: Better support for exact values in Quantity DataType
  • phab:T59589: make it possible to show the + sign in quantities on item pages

In other projects sidebar[edit]

mw:Beta Features/Other projects sidebar

On an article a user can see links to the same topic on other projects in the sidebar. This is similar to how different languages of the same project are linked.

Entity suggester[edit]

When entering a new statement the user is shown a number of properties that he is likely to use. These properties are calculated based on which properties are used in similar items. In future versions suggestions should also be made for values.

Statements on properties[edit]

To improve maintainability of the data in Wikidata it is possible to add statements to property pages. This is used to store constraints for properties. An example for such a constraint is that the winner of a certain award must be human. Another example would be that the number of inhabitants needs to be a positive integer.

Language fallback[edit]

When viewing an item that is linking to other items the labels for these items are shown in the users language. If labels in this language are not available labels in languages are shown that the user is likely to speak.

Data usage tracking[edit]

To ease maintenance of the data in Wikidata it is possible to get a list of all articles certain data is used in. Users are thereby able to see which articles are affected by changes they are making. This also allows a better overview of where and how Wikidata’s data is used in Wikimedia’s projects.

Not to be done[edit]

  • Usage tracking outside Wikimedia’s projects

Remaining problems[edit]

  • phab:T49727: show properties used in an article
  • phab:T66591: write API module that gives a list of pages that use a given item
  • phab:T73498: benchmark database performance of usage tracking
  • phab:T75220: populate entity usage table during database schema update
  • phab:T89002: Track multi-lingual label usage
  • phab:T93191: correctly track redirect usage

Primary sources tool[edit]

Wikidata:Primary sources tool

Data from another database can be enhanced with references before being added to Wikidata.

Access to data from arbitrary items[edit]

Users on the client are able to include data from any Wikidata item they chose by specifying its ID. This expands on their ability to access data of the item currently associated with the page via a sitelink. This access is possible via both the parser function and Lua.

In progress[edit]

Access for remaining sister projects[edit]

The remaining sister projects have access to sitelinks and data via Wikidata. The roll-out is staged to allow the communities to adapt. The planned order is:

  • Wikisource
    • Sitelinks: 14.01.2014 ✓ Done
    • Data: 25.02.2014 ✓ Done
    • Oldwikisource not done yet (see phab:T64717)
    • Edition interwiki links not done yet
  • Wikiquote
    • Sitelinks: 08.04.2014 ✓ Done
    • Data: 10.06.2014 ✓ Done
  • Wikinews
    • Sitelinks: 19.08.2014 ✓ Done
    • Data: ?
  • Wikidata itself
    • Sitelinks: 19.08.2014 ✓ Done
    • Data: 19.08.2014 ✓ Done
  • Commons (not including file metadata!)
    • Sitelinks: 23.09.2013 ✓ Done
    • Data: 2.12.2014 ✓ Done
  • Wikibooks
    • Sitelinks: 24.02.2015 ✓ Done
    • Data: ?
  • Wikiversity
    • Sitelinks: ?
    • Data: ?
  • Meta, MediaWiki, Wikispecies, Incubator
    • Sitelinks: ?
    • Data: ?

Not to be done (yet)[edit]

  • Wiktionary

Quantities with units[edit]

Users are able to enter quantitative data in Wikidata and re-use it on the clients as well as outside of Wikimedia. It is possible to express statements like “Berlin has an estimated population of 3,397,469 (±100) as of 31 July 2013” or “Berlin has an area of 891.85 km²”. At first only unitless quantities are supported. In a later deployment a small number of units are added and expanded in future deployments. When viewing an item with such quantitative data, the user sees these according to local conventions (decimal separator, unit conversion). On the client this data is accessed via the parser function and Lua and also shown according to the content language. The API provides a way to access the data in the preferred format of the request sender.

Not to be done (yet)[edit]

  • Letting users add arbitrary units
  • Currency conversion (Conversion rates and the value of a single currency change over time. That is very complex to model.)

UI redesign[edit]

Wikidata:UI redesign input

Reading and editing Wikidata is joyful and intuitive on desktops, tablets and mobile phones. The interface is visually pleasing, integrates nicely with other Wikimedia projects and contains no jargon. The interface provides the user with the information they were looking for quickly and does not overwhelm them (i.e. deprecated data is hidden initially and information is ordered in an intuitive way). It invites the user to add additional information (including qualifiers and sources) and offers little nudges towards making correct and useful contributions by offering suggestions. Erroneous contributions and vandalism are discouraged. Navigating and editing the website is fast. Both the data and the interface is localized in the user’s locale and language preferences. Where no data is available in a particular language a fallback is used.

Not to be done[edit]

  • enforcing user-defined constraints on data input

Improved internal consistency checks[edit]

https://phabricator.wikimedia.org/project/profile/1202/ and Wikidata:Constraint violation report input

It is easy for a user to find and understand constraint violation reports. Fixing an item that violates a constraint is easy. When viewing an item with a statement that violates a constraint the user can easily spot the wrong statement.

Consistency checks against 3rd parties[edit]

https://phabricator.wikimedia.org/project/profile/1203/

We check Wikidata's data against other databases. Inconsistencies are visible to the user when viewing an item.

RDF export[edit]

For 3rd party re-use and analysis of the data in Wikidata, we provide access to the data in RDF. This RDF will not assert facts, but rather represent claims. The RDF representation of individual entities is available via Wikidata’s linked data interface (Special:EntityData).

Improved watchlist integration[edit]

Wikidata:Watchlist integration improvement input

Users on Wikipedia and co need to see changes on Wikidata that affect their articles easily and comprehensively in their watchlist. This includes support for showing Wikidata changes when using the enhanced recent changes option.

Hover cards[edit]

When a user hovers over a link to an item a small card is shown that holds the most important information about that item.

Wikidata Query Service[edit]

Phabricator project

The data in Wikidata can only be used to its full potential with a way to query this data. This is done using a SPARQL endpoint.

Improved user experience for referencing: One-step-adding[edit]

Adding a statement including its reference is done in one step. It is not necessary to first save the statement and then add the reference to it.

Mobile view[edit]

Wikidata should offer a view that is optimized for mobile devices.

Not to be done yet[edit]

  • editing optimized for mobile devices

Article Placeholder[edit]

Wikidata:Article placeholder input

As a Wikipedia reader I want to get information about a topic even if no article is available in my language. When searching for a topic that doesn't have an article in my language I am presented with basic information from Wikidata and get the option to write one.

On hold[edit]

Wikimedia Commons[edit]

c:Commons:Structured data

Wikimedia Commons holds a huge amount of multimedia files available for the other Wikimedia projects and the world to use. Structured data support for Wikimedia Commons is important to make it easier to maintain the files and make reuse, especially 3rd-party reuse, easier. The structured data support comes in two ways. The first is by providing access to the data stored in Wikidata. This includes things like the date of birth of an artist. The second way is by enabling Wikimedia Commons itself to store structured data related to the files stored there. This includes things like the license and subject of a photo for example.

When a new file is contributed, the uploader is asked to provide some information like tags, creator name and license in the upload wizard. Users are able to then access and edit this structured data via a form as well as an API (similar to how it is done on Wikidata). It is easy to specify and retrieve the licensing and provenance information of a multimedia file. Additionally it is easy to tag and categorize images based on concepts from Wikidata. Tags and other file information is shown in the user’s language to accommodate the multi-lingual audience of Wikimedia Commons. All this information can be used to easily search for files that fit certain criteria like “picture of a cat and a child from 2010, licensed under CC-BY-SA”.

This is on hold per [1].

Technical details[edit]

The data is stored on a “data” page attached to the file’s page that is similar to Wikidata’s item pages. Commons is thereby a repository and at the same time its own client as well as a client of wikidata.org.

Simple queries[edit]

Users are able to pose simple queries to Wikidata via a SpecialPage as well as the API. Wikidata can answer queries like “What has the ISBN 2-01-202705-9” or “What has the capital Paris”. These queries are restricted to one property/value pair and return a list of items. The returned result only includes items where the statement is marked as preferred. These queries are most useful for use with one of the many identifiers in Wikidata that connect the knowledge base to other databases.

This is on hold and may be canceled in favor of (complex) query service.

Not to be done[edit]

Querying for sources or qualifiers

Things to keep in mind[edit]

Some data types are easier to query than others. Time, Geo and Quantity values require range queries. For the Item and String data types, simple equality is sufficient.

Complex queries[edit]

Users are able to write queries that are more complex than the simple queries. This includes queries like “all poets who lived in 1982” or “all cities with more than 1 Million inhabitants”. They are entered (using the semantics as embodied by Semantic MediaWiki’s Ask extension) in a page in the Query namespace and internally saved as JSON. They are then executed when resources are available - usually not immediately. The result is cached. A query can be set to rerun at regular intervals or on-demand by an administrator. The result of the query is shown on the same page. It can also be accessed via the API. The clients can include the result of a query in their pages to for example create list articles. The result will be a list of items. It can then be manipulated as needed by Lua. More result formatters and visualisations will be made available in future deployments based on Semantic MediaWiki’s result formatters. These queries are making Wikidata even more useful to the Wikimedia projects and the world and are needed by the community to maintain the large database.

Optional[edit]

  • transitive queries
  • disjunction

Todo[edit]

Better handling of identifiers[edit]

Identifiers are better handled visually separated from other statements. They need to be moved to their own section in the sidebar and get their own datatype. The linking of identifiers to their database should be done in Wikibase itself instead of a gadget.

Infobox demos + documentation[edit]

We provide a few demo infoboxes and good documentation for people to use as a starting point for moving infoboxes on Wikipedia towards using more data from Wikidata.

Improved user experience for referencing[edit]

Wikidata:Referencing improvements input

We make adding references easier.

Duplication of an existing reference[edit]

It is possible to add a new reference by simply re-using another one from the same item.

Nudging users about adding or changing references[edit]

When adding a statement without a reference the editor is nudged to provide one. When a statement is changed but not its reference the editor is made aware of it and nudged to change the reference.

Wizard[edit]

A user adds one piece of information of a reference like its ISBN. The tool then automatically adds the other necessary information.

Article history integration[edit]

Editors on a client can look at the change history of an article and see all Wikidata changes relating to this article. This way they can see all changes affecting their article without having to go to another project.

Multi-lingual text datatype[edit]

Users can add strings and specify a language for it. This is similar to the mono-lingual string. However translations in more than one language can be provided. The one in the user’s language is shown and the others can be shown on-demand.

Formula datatype[edit]

Users can enter formulas in Wikidata using MathML. They are rendered nicely.

Geo-shape datatype[edit]

Users can enter geo-shapes in Wikidata. They can for example use it to store the outline of a country.

Technical details[edit]

This will likely be realised using leaflet.js.

Wiktionary support[edit]

Wikidata:Wiktionary

A number of proposals have been put forth for using structured data on Wiktionary.

Beyond the planning horizon of this plan[edit]

Structured Wikiquote[edit]

m:Wikidata/Development/Wikiquote

Installing Wikibase on Wikiquote and allowing structured data on this sister project has lots of advantages. However, there is also lots of work to do in the software to support all features needed for this proposal.

Access from 3rd party wikis[edit]

MediaWiki installs outside the Wikimedia cluster are able to make use of the data on Wikidata similar to how they make use of images from Wikimedia Commons via InstantCommons.