Wikidata:WikiProject Identification Keys

From Wikidata
Jump to navigation Jump to search

Goal[edit]

This project’s goal is to create identification keys for all kinds of topics: trees, birds, gems, cars, anything that can be identified. Numerous reasons exist why keys are important. For example,

  • If you do not know e.g. an insect, you currently have to browse through all (many!) insect pages on Wikipedia with low probability of actually finding the right one. In an identification key, you find the matching insect by answering some questions.
  • To battle diversity loss. If species cannot be identified, it is hard to tell whether any of them are on the brink of extinction, and what could be done about that.

Many identification keys already exist on paper (books, booklets and so on). You may ask: why is this not sufficient? I ask back: why was Wikipedia created?

Nowadays, printed keys are very important. There are many keys of very high quality. But, printed keys are by fare not sufficient.

  • Most are written in one language only. This may be fine e.g. for gems which are shipped to every part of the world, so most likely there exists one in your language too. Yet, if you consider local flora and fauna in Switzerland and you don’t speak German (replace this with any country/language combination), you are basically lost.
  • Paper is static. More complex identification keys are basically binary search trees, so each node is a question leading to either of two child nodes. Example: “[1] Does the flower have 4 or 5 petals? 4 → Go to question [2], 5 → Go to question [63]” As soon as you cannot answer a question, which is likely, you have to follow two paths, and so on, potentially making identification arbitrarily complex.

An electronic key can be dynamic and fix both problems.

Note how Wikidata fits perfectly for identification keys! Identification keys are based on accurate descriptions of objects. This is data. Wikidata is about describing objects accurately with data. With Wikidata, we can create global identification keys for anything and in any language!

Current status[edit]

The identification key exists as standalone version, resulting from User:LivingShadow’s (aka. Simon A. Eugster) Master’s Thesis[1]:

Example: Tree Identification Key

There is also a second identification key for clouds and an editor.[2]

It can use any database as back-end for retrieving data. There is support for MySQL and Wikibase, though Wikibase support needs to be upgraded due to API changes. The project sources are hosted on Mediawiki.[3]

Concept[edit]

Please read pages 13 and 14 of the thesis.[1]

In a nutshell, information is structured as follows:

   Question contains  { Character1, Character2, … }
            and has { description, topics, parent character, component, image, … }
   Component 
            has { description, topics, image, … }
   Character
            has { description, image, … }
   Taxon shows characters { CharacterX, CharacterY, … }
            and has { name, Wikilink, … }
   Topic is, for example, Trees, Birds, Gems, Cars

Completed so far[edit]

  • HTML5 identification key and editor work with MySQL and as Mediawiki/Wikidata extension
  • Identification key can be used on mobile devices

What needs to be done?[edit]

  • The amount of data in Wikidata currently is a problem. The identification key e.g. loads all objects of type tree and all corresponding data describing the tree’s properties. The MySQL database is lightning fast for a few 1000 entries, but the Wikibase extension would take hours atm. A fast way to feed all up-to-date information (currently as JSON objects) to the identification key needs to be developed.
  • The community needs to agree on the integration of the required data, i.e. the additional characters, questions, components etc. as described in the thesis or above under “concept”.
  • The editor needs to be integrated, or the Wikibase editor needs to be upgraded in order to support the user in the process of describing objects.
  • More developers (DB specialist, Wikibase specialist) and otherwise interested persons (for feedback, valuable input, project promotion, maybe even funding of developers) are needed.

Interested persons[edit]

Please add yourself to the list, ideally with a short description why you are interested and if (and where) you can help.

  • User:LivingShadow – I would love to see the project come to life. I am the author of LifeWeb and can give advise on keys and can continue working on the corresponding code. I’m not a DB or Wikibase specialist!
  • User:Daniel Mietchen - biophysicist working with digital aspects of natural history collections. Interested in incorporating Wikidata into research workflows.
  • User:Karima_Rafes - in the project CDS of Paris Saclay for the laboratory of Analytical Chemistry. This laboratory has to build a new system of information... with keys of WikiData ?
  • User:G.Hagedorn - I was convener of the group developing the tdwg xml-standard Structured Descriptive Data (SDD), which also includes data structures for identification keys (both matrix = multi-/free-access and decision tree = single access). I finished a ph.d. in 2007 on data structures for biological description and identification. Co-created http://offene-naturfuehrer.de/ - a German language site with identification keys and related information on plants and animals. We harvest the wiki-templates for the keys, and convert to a json datastructure. I am very interested in collaboration on wikidata!
  • you – description

Resources[edit]

  1. 1.0 1.1 Thesis (PDF). If you are interested in the topic, please take a look at it! It is easy to read.
  2. Cloud key and editor: [1]
  3. LifeWeb extensions: LifeWebCore and LifeWeb