Wikidata:Lexicographical data/Ideas of tools

From Wikidata
Jump to navigation Jump to search

This page is a list of ideas of tools and features, that Wikidata and Wiktionary editors may need on the top of lexicographical data on Wikidata.

If you want to formulate a specific need, if you have an idea, if you think that something is essential for editing or reusing lexicographical data, please add a section and follow the template!

This list is not a commitment from the Wikidata development team. The needs expressed below will be analyzed, prioritized, developed or not during the next steps of the project, by the Wikidata team, other developers or volunteers.

German articles game[edit]

Need
I'm learning German and I'm struggling with remembering the articles of the nouns (der, die, das). I wish I had a game to practice and learn. The game would present me a noun, and I'd have to guess the article. If I'm wrong, I'd see the correct answer appearing. There could be several levels, from easy daily-life words to more unusual ones. There could be thematic levels, like one only about food.
Who would benefit
Anyone who struggles with German, so a lot of people :D
Proposition of solution
This would be an external tool, for example a website or an app, based on Wikidata's lexicographical data. It would select random Lexemes with the condition of being German nouns (example of data structure), display the lemma and check my answer against the grammatical gender. It could also display one gloss or other information about the noun.
Further comments
Access lemma and grammatical gender will be easy with Wikidata's API. Selecting the words by theme would be feasible using the connection between the lexemes (words) and the items (concepts). I'm wondering how the software could sort the words by level and decide which are "easy", "daily life-related" or not.
Proposer
Lea Lacroix (WMDE) (talk) 15:21, 28 November 2017 (UTC)Reply[reply]
Discussions

Lacroix' game is now running at http://auregann.fr/derdiedas/ Cloned versions of it are now available for French and Danish. The Danish is here: https://tools.wmflabs.org/enet/

Spell-checker[edit]

Need
A tool to help finding error in Wikisources by spotting words that don't have L item
Who would benefit
the Wikisources and the L items creation
Proposition of solution
Ideally included in the edition interface of Wikisource, the tool would highlight words that doesn't exist (and eventually suggest a replacement in correction?)
Further comments
  • it's a both way relation: at the beginning, it will be more useful to help create new L items but in a second time, when all "common" words would have a L item it will be more useful for Wikisource (still, new books with new words appears everyday on the Wikisources)
  • it could probably be used in other Wikimedia projects but it would more helpful for the Wikisources where we do faithful transcription of old books with a lot of strange spellings and variations that usual spell-checkers wrongfully consider as errors.
Proposer
VIGNERON (talk)
Discussions

Check pop-up[edit]

Need
A popup in the style of a navigation popup in the edit windows.
Who would benefit
Every project
Proposition of solution
Check for information like syllabification, translation, correct spelling, etc.
Further comments
Proposer
2003:C3:EF36:DD92:C8A5:5936:6E9D:AD2A 11:36, 27 December 2017 (UTC)Reply[reply]
Discussions


SVG translation[edit]

Need
Use Litem labels to translate SVG files. Taking the <switch> element idea in multiple languages SVG files - like File:Wikidata nodes in white.svg - to the next level and make it easier (right now, it's done by hand in a text editor).
Who would benefit
  • From readers side : anyone not speaking English (so most people on Earth)
  • From contributors side : easier and faster to translate files, translation in more languages, etc.
Proposition of solution
  • not sure for the technical part, but SVG is extensible and foreign code can easily be embarked in it (directly in the <switch> element or maybe with the <script> element, someone more tech savvy should look at the specifications in particular this section), I guess the problem will more be on how Mediawiki understand this code.
  • a downgraded solution would be to don't do it dynamically but to export the Litem labels inside the current structure of the multiple languages SVG files
Further comments
Proposer
VIGNERON (talk) 13:54, 18 February 2018 (UTC) (idea suggested on Facebook by Pigsonthewing)Reply[reply]
Discussions

Is it poetry?[edit]

Need
tool to tell if a text is a poem or not (can help categorization of pages on Wikisource and adding statements on Wikidata item)
Who would benefit
Proposition of solution
Take the last word of each lines and look if the pronunciation rhymes. A upgraded version could probably do more analysis (what type of rhyme, what type of poetry, etc.).
Further comments
Proposer
VIGNERON (talk) 12:03, 15 April 2018 (UTC)Reply[reply]
Discussions


Text modernisation[edit]

Need
modernisation of texts
Who would benefit
old books on Wikisources
Proposition of solution

On the french Wikisource, there is a tool for modernisation of old texts : a js gadget s:fr:MediaWiki:Gadget-modernisation.js and a dictionary s:fr:Wikisource:Dictionnaire (this is the general one, a template s:fr:Modèle:Modernisation is used for word to modernize specifically in a text - rare words or words who would generate false-positive in other texts).

Further comments

I'm not sure how Lexicographical data can help. The gadget works good but (AFAIK) exist only on the French Wikisourcefew projects and we have trouble from time to time for the dictionary (false-positives which need to be moved from the general dictionary to the specific one) as is only plain text string "old word: new word". The lexicographical data could be more precise and at least give a point in time (which would avoid a lot of false-positives).

Proposer
VIGNERON (talk) 09:34, 23 April 2018 (UTC)Reply[reply]
Discussions

Wikidata Cognate[edit]

Need

For Wiktionary there is the mw:Extension:Cognate that links pages with the same title, however in Wikidata there will be lexemes with the same title but with different languages (thus in different pages), so it would be good to link them somehow.

Who would benefit

Anyone who wants to navigate between the 12 languages that have fest as a lexeme.

Proposition of solution

A drop-down menu on the lexeme page with the different languages that have the same lexeme.

Further comments
Proposer

--Micru (talk) 08:54, 2 May 2018 (UTC)Reply[reply]

Discussions


Practice a new language[edit]

Need

A game to learn the basis of a new language could be created.

  1. Learn new words: Five words and their respective translations are shown and then you have to guess the correct translation of some words shown before between 4 choises or write down the correct translation of the words. Letters may be given. This could be repeated many times;
  1. Learn some grammar basis: as another proposal is related to practice a new language, I think everyone should add language-specific games. Even Italian language has many articles (il, lo/l', la for singular, i, gli, le for plural) so even Italian should have a similar game. Other games should be to learn how to decline verbes and conjugate words.

You can choose if have 3 times to correctly guess, having time, or just earn points in case of correct answer.

Who would benefit
Everyone who wants to learn a new language and don't want to use commercial apps (Duolingo, Memrise, Babel, ...)
Proposition of solution
a new website should be created. A very simple version should just start with inserting the language you want to learn and what do you want to learn (new words, articles, declinations, ...). A more complicated version should also implement a login interface where to see presonal progress, see personal errors, ...
Further comments
This proposal contains "German articles game"
Proposer
★ → Airon 90 12:41, 17 July 2018 (UTC)Reply[reply]
Discussions

Wikify with lexemes[edit]

Need

Take a text, e.g. from Wikisource, and wikify it with lexemes from Wikidata. Maybe start with identifying verbs in each sentence, then other parts. Offer to create missing lexemes.

Who would benefit
Proposition of solution
  • Try to identify verbs, locutions first
  • GUI allow selection of various possible lexemes (based on forms/lemmas)
  • basic mode: just highlight strings missing from forms (or just lemmas). Could be a first milestone for use.
  • offer to add samples to Wikidata.
  • offer to create missing Lexemes
Further comments
Proposer

--- Jura 12:17, 5 August 2018 (UTC)Reply[reply]
Discussions

Print version[edit]

Need

Generate a printable set of dictionaries solely from Wikidata entities. This could include:

  1. a monolingual dictionary
  2. a bilingual dictionary
  3. a specialized dictionary or word list.
Who would benefit
  1. Wikidata contributors (helps visualize possible output, needed data, comparison with old dictionaries)
  2. users learning languages or terminology
Proposition of solution

The set should draw from structured data in Wikidata to output all parts generally in found in dictionaries: introduction, methodology, primary entries, indexes.

Elements that can be included beyond primary entries are:

  • []

To avoid wasting paper, the proof of concept version could be limited to a defined number of entries, e.g. 100 words. It should also work with a higher number.

Further comments
Proposer
--- Jura 17:05, 6 October 2018 (UTC)Reply[reply]
Discussions

Wordmap[edit]

Need
display a map, and overlay it with words from languages spoken in the given region, where all of them have the same meaning (so, the result of this query but displayed on a map)
Who would benefit
would look nice, but also everyone interested in languages
Proposition of solution
ideally that would be just a SPARQL query result, but I am not sure that is possible out of the box
Further comments
Proposer
Denny (talk) 18:09, 19 October 2018 (UTC)Reply[reply]
Discussions

@Denny: I just pushed a first version of a wordmap based on wikidata. Any feedback would be really welcome! It is not exactly the same of what you proposed (as I am using labels). Here is the link: Wikidata Wor(l)dmap

Epantaleo (talk) 11:35, 19 October 2019 (UTC)Reply[reply]

@Epantaleo Interesting. There only seems to be two languages with coordinates. Maybe you would get more entries by using official languages of countries, and using the country's coordinates (see query). Your visualization will have to account for several languages for one coordinates, and several coordinates for some countries - e.g. Denmark. Just an idea. Robertsilen (talk) 09:19, 27 October 2022 (UTC)Reply[reply]

Etymology graph[edit]

Need
create a graph of the etymology of a word, and all other words with the same meaning
Who would benefit
would look nice
Proposition of solution
hopefully just a SPARQL query?
Further comments
Proposer
Denny (talk) 18:12, 19 October 2018 (UTC)Reply[reply]
Discussions

@ Denny: I had troubles using graphs/trees to visualize the complex and big directed graph of etymologically related words (also because of incorrect etymology links). I tried an alternative visualization (see below). In the future I plan to add word definitions to it: it should be pretty straightforward.

See Visualization of words etymologically related to English word door and to word pistachio.

Btw, do you think it would be a good idea to export the RDF database generated by etytree into Wikidata? (with supervision of course).

Epantaleo (talk) 11:38, 19 October 2019 (UTC)Reply[reply]

@ Epantaleo: Impressive graph! Some comments: I would not bother about the wrong etymology links while building the tool: the graph help identify etymologies to be corrected.

I had a similar idea before finding that page, but I was thinking of a simpler graph, in form of a tree, only including the strict translations, as inspired by the graphs of Jakub Marian like that one on pronoun "I" or that one on "hundred". Possibly that could be a different idea (Etymology tree vs. cognates graph).

If both ideas are developed and if your graphs get to complex because of the number of words, it could be considered to restrict:

  • Etymology tree: only strict translation => several languages, only one word or so by language
  • Cognates graph: maybe only one language?

For both types of graph, good features could be:

  • Links to Wiktionary and/or Wikidata article (ideally with preview as pop-up when hoovering above with the mouse)
  • Possibility to shortlist/highlight some languages => shortlisting only one language would make your graphs clearer
  • Function to show/hide etymology links with lower likeliness.

For the last point, a "weight" (e.g. in form of a percentage and/or of adverbs like "possibly"/"probably") describing the likelihood of an etymology would be needed in Wikidata and Wiktionary. I don't know if the topic was already considered. Gfombell (talk) 04:38, 22 August 2021 (UTC)Reply[reply]

@ Gfombell: Thanks for your interest and your comments! I hope to have time soon to contribute to the project. Epantaleo (talk) 12:15, 24 August 2021 (UTC)Reply[reply]

Noting that it is (now?) possible to use SPARQL. E.g. water or tea. Quiddity (talk) 19:56, 14 September 2023 (UTC)Reply[reply]

List new words found in Wikipedia[edit]

Need
Who would benefit
  • Editors and users of Wikidata
  • Wikipedia indirectly
Proposition of solution
  • Scan Wikipedia articles for words currently not in Wikidata and propose them with sample once a given number of occurrences are found.
Further comments
Proposer
--- Jura 18:20, 19 October 2018 (UTC)Reply[reply]
Discussions

Auto-specifying the language when binding senses[edit]

Need
Now, when linking senses through translation, you have to manually specify the language of the specified sense. This is not entirely rational, since the language is directly specified in the lexeme.
Who would benefit
Proposition of solution
When linking a sence, the language is automatically substituted.
Further comments
Proposer
Iniquity (talk) 12:23, 30 July 2019 (UTC)Reply[reply]
Discussions

Bilingual dictionary app for wikipedias/other wikimedia sites[edit]

Need
a script that takes a higlighted word on a page of some wiki, and searches for wikidata lexeme for that word. Then shows the meaning of the word in popup using native language. If sense and/or lexeme is missing, shows a form to add that information.
Who would benefit
any users that read wikis in languages they are not fluent in yet. Wikidata will benefit from a way to easier add new words.
Proposition of solution

User script/gadget, that, for example, on de.wikipedia.org, allows to configure language(s) I understand, then when I press Alt+Shift+t (or whatever shortcut is free to use), searches for german lexem or it's form with that spelling in wikidata, and shows senses in languages I understand and grammatical information avaliable. When sense in my native language is missing - shows field to add it. When I submit the form - publishes that change on wikiata.

Further comments
When I read some wikipedia page in foreign language, I often lookup unknown words, and would love to have a way to lookup it faster and more convenient place to store it. I like https://www.dict.cc/ project, and use it a lot while reading German texts but could not contribute because my native language support there is missing.
Proposer
Benderovec (talk) 20:54, 16 November 2020 (UTC)Reply[reply]
Discussions

Translations[edit]

Need
This is something that has caused me some discomfort since I started editing lexemes. Currently, adding (and removing) translation (P5972) of the senses to the pages is quite a hard and tiresome process. All the work is manual: it is necessary to search for senses and edit many lexemes several times, if the person wants something complete. See, for example, how many translations of അമ്മ (L480) could also be added to äiti (L7335), // (L222599) and mamma (L32675) (all having the same meaning, "mother").
Who would benefit
All editors, translators and readers of all languages.
Proposition of solution
This reminds me a little of the time when Wikipedia pages had their interlanguage links managed by robots (before Wikidata existed). I believe that, as was the case in the past, a robot could solve the current problem, adding and removing translations in the lexemes to unify and standardise their contents.
Further comments
Going further, I would say that the same could be done with image (P18), item for this sense (P5137) and glosses (the latter would not necessarily need to have the existing content exchanged among pages, but only added when it is missing). In the case of glosses, see the amount of senses in ama/𒂼 (L1) (more than 60) that could also be in أُمّ (L226769) and ina (L416267) (with only 1 and 3, respectively). Imagine an editor who only edits in his language. Creating a new lexeme and adding meaning to it, the editor, in a hurry to create more lexemes in his language, leaves the page as it is, small, short and kind of empty. Currently, a page like this will be like this until someone goes there and adds more content to it. With what I propose above, the editor would only need to create his lexeme and add/search for a translation in another language for the robot (based on that translation) to come and add the rest to it (images, senses and translations in other languages, etc.), saving time and effort that can be spent on other tasks.
Proposer
Enaldodiscussão 16:24, 21 March 2021 (UTC)Reply[reply]
Discussions

Show interlanguage sitelinks to Wiktionaries on sidebars of WD:Lexeme namespace[edit]

Need
Shows which all wiktionaries have an entry with same title as the Lexeme, and allows to navigate to Wiktionary entry quickly
Who would benefit
All lexeme editors. It helps both to show which all wiktionaries have entries on same title as well to navigate to it quickly
Proposition of solution
Further comments
Proposer
Vis M (talk) 15:29, 22 June 2021 (UTC)Reply[reply]
Discussions

 Support Nice Idea. This is very helpful --Sriveenkat (talk) 10:02, 26 September 2023 (UTC)Reply[reply]

@Vis M I found a tool for this User:Nikki/LexemeInterwikiLinks.js @Nikki Thanking you for this tool creation :)) . Sriveenkat|talk/{PING ME} 09:07, 31 December 2023 (UTC)Reply[reply]

Add sense from item[edit]

Need
adding a sense with a item for this sense (P5137) statement requires several clicks and formulating a sentence (the gloss) that likely already exists elsewehere (as the description of the item). This extension would reduce the number of steps needed do the same.
Who would benefit
lazy people who would like to contribute senses for things that have a wikidata item.
Proposition of solution
  1. The add sense link is augmented with an additional add sense from item link.
  2. Clicking this link will open a search that will autocomplete items (that have a description in the user's language).
  3. selecting an item will result in a sense being created
    1. The gloss is automatically copied from the selected item's description
    2. a item for this sense (P5137) statement is automatically created linking to the selected item
Mockup:
Add sense from item
Further comments
Proposer
Shisma (talk) 10:32, 9 October 2021 (UTC)Reply[reply]
Discussions

 Support After my first couple of days entering lexemes, I can see this as being a very handy tool. Wikidata has in many cases a suitable short gloss that can be used. Robertsilen (talk) 09:34, 27 October 2022 (UTC)  Support Please ping if done! -wd-Ryan (Talk/Edits) 20:13, 2 November 2022 (UTC)Reply[reply]

This is really good  Support Sriveenkat (talk) 15:33, 2 October 2023 (UTC)Reply[reply]

Adding synonyms, antonyms and translations[edit]

Need

Easy way to add synonym, antonym and translation statements between senses.

Who would benefit

Editors of all languages.

Proposition of solution

A gadget (or script, I don't know) similar to Merge.js for sense level edits. On a lexeme page, the user would click on a button at the top of the screen: "More ∨" → "select sense to relate". Then three buttons would appear next to the senses of that lexeme: S, A or T (or maybe three icons, in order to be more easily understood), standing for synonym, antonym and translation. After choosing one of those letters in one of the senses to click, other lexemes opened in the user's browser will show these three buttons in their senses as well. Then, by clicking one of those letters again (in another lexeme), the gadget will, detecting these clicks, add the desired statement to both lexemes, linking the senses together. A tool like this could also support other properties for senses: pertainym of, hyperonym, false friend, specified by sense, etc.

Further comments
Proposer

Enaldodiscussão 18:42, 9 February 2022 (UTC)Reply[reply]

Discussions

 Support Makes sense, sounds good. Robertsilen (talk) 09:36, 27 October 2022 (UTC)Reply[reply]

Pronunciation audios[edit]

Need

Easy way to add pronunciation audio (P443) statements to the forms of lexemes.

Who would benefit

Editors of all languages.

Proposition of solution

A new tool. In it, the user would select a Commons category with audio files and specify regular patterns (e.g.: in Lingua Libre pronunciation-cat, use the text [A-Za-z]+ between the last "-" and ".wav") so the tool could search for lexemes with forms matching the titles of these files. In the process, the tool would ignore capitalization and punctuations ("·"). Finally, the tool would display to the user both the lexeme and its form and the audio to be played. If it's a correct match, the user could click "Add" to insert a new pronunciation audio (P443) statement, and maybe also specify the pronunciation variety (P5237).

Further comments
Proposer
Enaldodiscussão 02:29, 20 November 2022 (UTC)Reply[reply]
Discussions
 Support This looks like something I would love to experiment with. You may ping me for collaboration on this. Eugene233 (talk) 12:51, 17 January 2023 (UTC)Reply[reply]
 Support Sriveenkat (talk) 15:31, 2 October 2023 (UTC)Reply[reply]

Example template[edit]

Need
Who would benefit
Proposition of solution
Further comments
Proposer
Discussions