Wikidata:Requests for permissions/Bot/AradglBot

From Wikidata
Jump to navigation Jump to search

AradglBot[edit]

AradglBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Aradgl (talkcontribslogs)

Task/s:

Create between 100,000 and 200,000 new lexemes in Aragonese language Aragonese (Q8765)

Code:

Function details: --Aradgl (talk) 19:43, 14 March 2022 (UTC)[reply]

Using a small program and the api, the bot will create new lexemes in Aragonese specifying the lexical category, the language and some of its forms

I have about 30,000 lexemes prepared and I have started uploading them

In the coming weeks and months I hope to reach 100,000 or 200,000 new lexemes.

  •  Oppose on principle, since senses (meanings) of these words, or links to references for each lexeme (such as to dictionary entries for these words, or other lexical identifiers for these words) are not also being provided. We already have massive backlogs of senseless lexemes for a bunch of languages (see the bottom of the first table); I will not support making this backlog inordinately larger. Mahir256 (talk) 20:58, 23 March 2022 (UTC)[reply]
We understand your observations. You are right that no meanings or links are provided at this stage. However, this is only natural since this is the beginning of a broader task that we are starting now.
Due to the lack of resources of a minority language such as Aragonese (spoken by less than 30.000 people), we believe this is the most sensible way to proceed: step by step. Moreover, Aragonese is on the brink of extinction according to UNESCO.
Undermining any effort to dignify its status will definitely will speed up the death of the Aragonese language. On the contrary, we ask for support to promote our beloved language.
Thank you very much. Aradgl (talk) 18:46, 24 March 2022 (UTC)[reply]
  • @Aradgl: I'm not sure where you're getting that I'm interested in undermining Aragonese's dignity or speeding up the death of Aragonese. On the contrary, I'd love to see Aragonese thrive as an independent and flourishing tongue, but there should be just enough in that language's lexemes to begin with such that improvements to them, both from inside and outside the language community, are actually conceivable. Consider Breton lexemes: the language itself is also endangered, and most Breton lexemes currently do not have senses, but they do have links to Lexique étymologique du breton moderne (Q19216625), so that someone else (not necessarily a Breton speaker) can come by and at least add information based on that lexicon (@VIGNERON, Envlh:, who imported them). On the other hand, consider Estonian lexemes; an Estonian non-native speaker created a bunch of them over the course of a few days, all of them without senses, and most still sit as empty shells, with no clear way for non-Estonians to improve them and no indication that actual Estonian speakers even know they exist. I am happy to look around for references you could add to potential Aragonese lexemes, such that you can add some potential resource links based on them, but that is not a reason to begin importing them now without any such resources (especially since you have not indicated how/when you plan to add senses/resource links later). Mahir256 (talk) 20:01, 24 March 2022 (UTC)[reply]
    @Mahir256 Right now we are discussing our timetable in order to implement next steps within Wikidata, with the prospect of relating lexemes with concepts and meanings. We count on finishing the first phase by the end of 2022.
    By no means have we wanted to create lexemes as “empty shells”. We are working in a long-term project in order to provide valuable information for the sake of Aragonese language. We are working together with our Occitan counterparts (Lo Congrès) and in fact, we want to follow their example promoting further contributions from the community. Our reference is AitalDisem, a project initiated by Lo Congrès following its collaboration with Wikidata. This project is the direct continuation of the project AitalvivemBot. Aradgl (talk) 15:09, 25 March 2022 (UTC)[reply]
  • @Aradgl: I'll believe that you don't want to create empty shell lexemes, but I find it difficult to believe, given the prior examples of Russian, Estonian, Latin, and Hebrew lexemes, that they won't stay empty shells forever. If you are basing your work on the example of Aitalvivem, then (at least judging from that bot's contributions, which stopped in July 2019) you are not likely to be applying the right amount of attention to senses/resource linkages that would be desired, and (at least judging from the outcome of this bot request, from a user who disappeared after January 2020) you might disappear if prompted later about them.
You speak of wanting to add "valuable information for the sake of the language", but I fear that if there are no paths to this valuable information (with respect to the meanings of words) early on, then it is unlikely there ever will be such paths. If you are absolutely certain that existing printed/online references about Aragonese are not suitable/worthy of at least being linked to, and thus plan to essentially only crowdsource word meanings the same way the Occitan folks appear to have attempted, then what you could instead do (and what would change my opposition to a support) is have your system create lexemes only when an appropriate meaning has been added to that lexeme in that system by a community member, rather than creating lexemes with just the forms all at once waiting to be filled in on Wikidata. Mahir256 (talk) 15:37, 25 March 2022 (UTC)[reply]
  • @Mahir256: I'm the one who was supposed to continue the work about the AitalvivemBot. Unfortunately, I suffer since March 2020 from long covid and all my works has been postponed. But we still intend to add occitan lexemes in Wikidata, if it's something that you think can be useful. I thought that the purpose of Wikidata lexeme was to inventory words from languages. I never heard we needed to add senses to them as a mandatory requirement. Is that like this, now ? If it is, of course we wouldn't disturb the work done in Wikidata by uploading a lot of words without senses. Minority languages, indeed, don't have a lot of human and financial means and we can't move forward at the speed the main languages do (you see it with occitan, one person is sick and many works are postponed for years). Of course, we can't guarantee all the words we upload will be related to a meaning. But we intend to try with the poor means we have. In the other hand, all our words are from recognized dictionaries. Is that still interesting for Wikidata or will it be better if we keep them for ourselves ? Unuaiga (talk) 14:00, 28 March 2022 (UTC)[reply]
  • @Unuaiga: I'm sorry to hear that you have had long COVID this whole time—I sincerely hope you can recover! Please re-read my reply from 20:01, 24 March 2022 (UTC) above, and VIGNERON's comments below (in other words, you don't need senses if you can provide a way to add them later). Wikidata lexicographical data can do so much more than "inventory(ing) words from languages"; it's only appropriate that if more isn't done immediately after creating a lexeme, then opportunities for doing so (through the linkages of references) ought to be provided. My offer to find references re: Aragonese to Aradgl from 20:01, 24 March 2022 (UTC) above is extended to you re: Occitan. As for minority languages not moving as fast as main languages, I point you to the examples, in addition to Breton, of Hausa, Igbo, and Dagbani as under-resourced languages making lots of progress on lexemes. Mahir256 (talk) 14:23, 28 March 2022 (UTC)[reply]
    Thanks for your explanations. I will look ath the languages you talk about with great curiosity. Unuaiga (talk) 16:04, 28 March 2022 (UTC)[reply]
@Aradgl: this is a wonderful project but I have to agree with Mahir256, this doesn't seems ready yet (for Breton, after a ~4000 lexemes import and even with some info for the meaning, I estimated at least a year of manual work every week to have good lexemes :/ this is already painfull, 100,000 to 200,000 lexemes wouldbe overwhelming).
I have some additionnal questions :
  • what is the source ? and is it public or not ? (in both case, it would be better to indicate the source in the lexemes themselves)
  • is you bot ready yet ; if so, could you do some test edit (like creating 10 lexemes) so we can better see exactly what we are talking about and maybe provide some help.
Cheers, VIGNERON (talk) 13:23, 27 March 2022 (UTC)[reply]
@VIGNERON: It seems like the edits the requestor has been making in the Lexeme namespace of late resemble those described in this request. Mahir256 (talk) 16:09, 27 March 2022 (UTC)[reply]
@Mahir256: ah thanks, I looked at the bot edit but notat the account behind the bot ;) Indeed, these lexemes are way to empty to have any use. At the very very least, you need to add a source (and ideally, multiple). Maybe you can cross it with other dataset. I'm also wondering, why « between 100,000 and 200,000 » don't you have the exact number?
Also, I'm pinging @Fjrc282a, Herrinsa, Jfblanc, Universal Life: who speak Aragonese and might want to know about this Bot and maybe even want to help.
Cheers, VIGNERON (talk) 16:24, 27 March 2022 (UTC)[reply]
@Aradgl: Thoughts on VIGNERON's reply from 16:24, 27 March 2022 (UTC)? Mahir256 (talk) 20:14, 8 June 2022 (UTC)[reply]
@Unuaiga, Miguel&IvanV: If either of you know or can get a hold of @Aradgl:, could you tell that user to reply to User:VIGNERON's messages above? Mahir256 (talk) 16:59, 19 July 2022 (UTC)[reply]
Ok, I write them an email to tell them. 217.119.181.174 12:09, 25 July 2022 (UTC)[reply]
Sorry I wasn't connected. I write to them. Unuaiga (talk) 12:10, 25 July 2022 (UTC)[reply]
@Unuaiga: Thank you for doing that; it is a bit disappointing that Aradgl has not replied, since only their ability to edit the lexeme namespace has been blocked and not their ability to do other things on Wikidata. Do you or @Miguel&IvanV: know @Uesca:, and could inform them of this discussion and the messages I placed on their talk page? Mahir256 (talk) 18:05, 30 August 2022 (UTC)[reply]
Good morning to the Wikidata community. I want to apologize for my delay in replying. For various reasons I have been absent.
The source used is from the regional government of Aragon in Spain. It can be consulted with the free and public tool: Aragonario. https://aragonario.aragon.es/
The bot is created and working. Almost all the lexemes created by the user @Aradgl have been created using the bot.
Please,
@
Mahir256
, unlock my user account (@Aradgl) and allow me to continue working for the protection and dissemination of the Aragonese language.
Aradgl (talk) 06:54, 31 August 2022 (UTC)[reply]
@Aradgl: Thank you for finally providing at least an external source for the lexemes you have created. Since it appears each lexeme has its own ID (the number "67731" in https://aragonario.aragon.es/words/67731/, for example), I would like you to do the following first: 1) propose a Wikidata property to store these IDs (maybe call it "Aragonario ID"), 2) once that property is created and I unblock you from the lexeme namespace, add values for this property to all of the Aragonese lexemes already created, and then 3) commit to only creating lexemes alongside their Aragonario IDs, rather than without these IDs. Mahir256 (talk) 07:13, 31 August 2022 (UTC)[reply]
@Aradgl: As a gesture of goodwill, I have gone ahead and did the first thing, proposing Wikidata:Property proposal/Aragonario ID which I will insist @Aradgl, Uesca: add to the lexemes they created first before creating any further new ones. Mahir256 (talk) 23:04, 31 August 2022 (UTC)[reply]
It is not possible to add the AragonarioID because both the Aragonario and my data come from the same database on a server, but the AragonarioID only exists on the Aragonario's website (the Aragonario's id is generated by the Aragonario's website and it is not in the database of the server that belongs to the Government of Aragon).
As we have already indicated, we are proposing the introduction of the Aragonese language in Wikidata in several phases that include its provision of content and even in the final phases the use of Wikidata to create chats in Aragonese, translators, etc.
The first phase consists of uploading the lexemes so that later other classmates manually add the meaning using dictionaries (on paper) and other resources. We would have liked to have all the lexemes (without meaning) created previously because it would have been easier, but given the circumstances, some colleagues have already begun to add meaning to the lexemes already created. The more lexemes (without meaning) I have created, the easier it will be for my classmates to add meanings, in fact, the ideal would be for all the lexemes (without meaning) created to start with phase two.
I wish we had the means and resources to tackle all the work in a single phase and a very short period of time, but this is not the case, there are very few of us who work for the defense and safeguard of Aragonese and many who put obstacles in our way to achieve it.
Can you please let me continue with my work? Don't give us bot permissions, but don't block us for creating lexemes in Aragonese. We will be adding meaning manually from now on to the lexemes and at the same time creating new lexemes (without meaning). Aradgl (talk) 08:15, 23 September 2022 (UTC)[reply]
Good morning,
As a result of opening this conversation, I found out about the initiative of the user Aradgl in Wikidata and I have seen the problem you mention.
I have been including verbs in the Aragonese language and as I work in the same line, I have contacted Aradgl and Iizquierdogo (another user who includes Aragonese language content in Wikidata) and we are going to support Aradgl's initiative by manually including the sense in the lexemes.
Best regards Miguel&IvanV (talk) 10:22, 23 September 2022 (UTC)[reply]