Wikidata:Requests for permissions/Bot/MewBot 2

The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.

Approved --Lymantria (talk) 05:31, 21 November 2018 (UTC)[reply]

MewBot 2[edit]

MewBot (talk • contribs • new items • new lexemes • SUL • Block log • User rights log • User rights • xtools)
Operator: Rua (talk • contribs • logs)

Task/s: Importing attested lexemes from en.wiktionary

Code:

Function details: Since Wikidata:Requests for permissions/Bot/MewBot doesn't seem to be going anywhere, I'd like to start with importing attested lexemes instead. I would like to import Northern Sami lemmas (Wiktionary's term for lexeme) onto Wikidata. Only the lexeme itself will be imported; no senses, no forms and no etymology. It will import the Álgu lexeme ID (P5903) property if possible, though. This is again a feasibility study, to see how well it works. A problem that I foresee is that of homographs having the same part of speech. When there are already one or more lexemes with the same spelling and part of speech on Wikidata, the bot has no way of telling which of them belongs to which Wiktionary lemma. Moreover, if they were imported, there'd be no way to distinguish them on Wikidata either and they'd look like duplicate lexemes until senses and/or etymology are added. The bot will skip these, but it will mean that some lexemes cannot be imported easily. A possible solution would be to add the Wikidata lexeme ID into the wikicode on Wiktionary's side, but I doubt the people on Wiktionary would like that as they seem to be a bit allergic to Wikidata.

The code will be adapted from the first proposal, so it has already been demonstrated to work. I only need to change the language and remove the code that imports the etymology. I would like to import other languages using this method later, so that Wikidata will have a good set of lexemes to start with and other users can then add the data to them as they see fit. Having the lexemes already present also makes adding etymologies easier.

Copyright shouldn't be an issue, as lemmas and parts of speech don't seem like copyrightable things to begin with.

--—Rua (mew) 12:00, 13 November 2018 (UTC)[reply]

Support Can't you distinguish homographs via the ID property? ArthurPSmith (talk) 15:20, 19 November 2018 (UTC)[reply]
- On Wikidata, yes, but how do you know which of them a particular Wiktionary lemma belongs to? —Rua (mew) 15:40, 19 November 2018 (UTC)[reply]
  - I meant Álgu lexeme ID (P5903) - doesn't that uniquely identify each homograph? But I guess you indicated it was not available for all of them. ArthurPSmith (talk) 14:57, 20 November 2018 (UTC)[reply]
    - I suppose that can work, but the current entry format makes that a bit tricky because the "Further reading" heading is per-language and not per-lemma, so the Álgu templates of multiple lexemes are grouped together in one section. I started a proposal to change it: wikt:Wiktionary:Beer parlour/2018/November#Nest "Further reading" under lemma.
Support KaMan (talk) 17:04, 20 November 2018 (UTC)[reply]

The above discussion is preserved as an archive. Please do not modify it. Subsequent comments should be made in a new section.

Wikidata:Requests for permissions/Bot/MewBot 2

MewBot 2[edit]

Navigation menu

Search