User:Lectrician1/Sitelinks 2.0

From Wikidata
Jump to navigation Jump to search

Problem[edit]

Wikimedia articles do not match exactly what their sitelinked Wikidata items describe or the other language articles that are sitelinked to their Wikidata item.

Consequences[edit]

  • Items become conflated as editors add statements that describe the multiple concepts described in the one or more Wikipedia articles sitelinked to an item.
  • Wikipedia editors become confused as to which item to add data to when a concept described in a Wikidata article does not match the Wikidata item. For example, Death and state funeral of Elizabeth II is sitelinked to death of Elizabeth II (Q113846055) but not state funeral of Elizabeth II (Q113849016). Editors may not know of the existence of state funeral of Elizabeth II (Q113849016) and in the future may add instance of (P31)state funeral (Q1052001).
  • Editors become unsure of which articles to sitelink to which items when articles on different wikis may describe both similar and different things.
  • Items for categories can become absolute messes as categories on different wikis, although they might have the same name, may not describe the same thing or be subdivided in the same way. Example consequences.

Solution[edit]

Instead of Wikidata items containing a list of sitelinks to Wikimedia articles:

JPEG (Q2195)

  • en: JPEG
  • es: JPEG
  • fr: JPEG
  • ja: JPEG

Wikipedia articles should have a Wikibase entity tied to their page like Commons Media files do that allows one to state what Wikidata items its article describes using main subject (P921):

English Wikipedia JPEG article

main subject
Normal rank JPEG-XT
0 references
add reference
Normal rank JPEG XL
0 references
add reference
Normal rank JPEG compression (Q116272068)
0 references
add reference


add value

French Wikipedia JPEG article

main subject
Normal rank JPEG compression (Q116272068)
0 references
add reference


add value

With this system we can determine that the English Wikipedia describes JPEG-XT and JPEG XL but the French Wikipedia article does not.

Implementation[edit]

Data storage[edit]

MediaWiki[edit]

This can be implemented exactly how Structured Data in Commons is implemented where a content slot part of a Media page contains a Wikibase entity.

Users could add main subject (P921) statements to state what Wikidata entities the article describes.

Users could also potentially add other article metadata statements such as the article class or the WikiProjects that the article is of relevance to. This can help cut down on the metadata stored in raw-text on the article's Talk page and also make it easily queryable.

Graph database[edit]

To provide for querying and Language selector functionality, the data from all of the Wikibases on the wikis should be uploaded to a single graph database.

This is done instead of having separate databases for each wiki so that we don't need to perform a federated SPARQL query across 280+ databases which could take a very long time.

Language selector[edit]

To allow users to find similar articles on different language wikis, the language selector will be changed so that it performs a SPARQL query in the all-wiki graph database looking for which articles have the same main subject (P921) statements as the article the user is currently viewing has. Then language selector dialog will then organize the linked articles by which concepts their articles describe that the article being viewed describes.

In the example below, the Japanese article being viewed describes the Shimizu Tunnel and the Daishimizu Tunnel. So, it has:

main subject (P921)Shimizu Tunnel (Q22329636)

main subject (P921)Daishimizu Tunnel (Q2623189)

The Korean and Japanese articles also have the same statements.

However, the English Wikipedia article only has:

main subject (P921)Shimizu Tunnel (Q22329636)

So, the language selector dialog shows that the Chinese and Korean articles describe the same two concepts just as the Japanese article does and places them in one column with a label showing which topics they describe. It then places English article in a separate column and documents that it describes only the Shimizu Tunnel.

Proposed Wikipedia Languages interface.png

This lets the user know their differences but allow them to still find the articles.

Entity access from wikis[edit]

The Wikibase Client provides parser functions and Lua functions that allow articles to display data from the Wikidata items that are linked to them. If a single article was to now link to multiple Wikidata items, these functions would no-longer work.

The solution to this is that while the sitelinks are moved from Wikidata to their respective wikis, any parser or Lua functions that do not specify their the entity that they are retreiving will be automatically edited to specify the item they originally were meant to retrieve from.