Welcome to the project page for lexicographical data!
What is lexicographical data?
Since the start of Wikidata in 2012, the multilingual knowledge base was mainly focused on concepts: Q-items are related to a thing or an idea, not to the word describing it. Since 2018, Wikidata has also stored a new type of data: words, phrases and sentences, in many languages, described in many languages. This information is stored in new types of entities, called Lexemes (L), Forms (F) and Senses (S). You can learn more about the data model on the documentation page.
The structured description of the words will be directly connected to the concepts. It will allow editors to describe precisely all words in all languages, and will be reusable, just like the whole content of Wikidata, by multiple tools and queries—everything that the community creates to play with words. Lexicographical data can be reused on the Wikimedia projects, and can provide support for Wiktionary.
- 2012: first discussions about including lexicographical data into Wikidata
- 2013–2016: many discussions with editors and developers, leading to several versions of the development plan
- 2016: start of the development
- 2017: continuing the development of the structure (Wikibase/Lexeme), development of several tools for Wiktionary (Sitelinks)
- May 23rd, 2018: deployment of the first version of lexicographical data Done
- October 16th, 2018: enabling lexicographical data in the Query Service Done
- October 18th, 2018: enabling Senses Done
- 2018–2019: iteration of the project, maintenance
- Best practices
- Data Model
- How to help?
- Create a new Lexeme
- Lexeme Statistics
- Lexicographical properties
- Useful queries
- Existing tools
- Ideas on tools based on lexicographical data
- Support for Wiktionary
- Development plan
- First created lexeme on Wikidata