Shortcuts: WD:RFBOT, WD:BRFA, WD:RFP/BOT

Wikidata:Requests for permissions/Bot

From Wikidata
Jump to navigation Jump to search


Wikidata:Requests for permissions/Bot
To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: {{Wikidata:Requests for permissions/Bot/RobotName}}.

Old requests go to the archive.

Once consensus is obtained in favor of granting the botflag, please post requests at the bureaucrats' noticeboard.

Translate this header box!


Bot Name Request created Last editor Last edited
Bitbotje 3 2023-05-26, 19:10:52 Bdijkstra 2023-05-26, 19:10:52
ForgesB 2023-05-21, 11:15:44 Mdaniels5757 2023-05-23, 16:42:15
CEUR-WS 2023-05-19, 11:34:56 WolfgangFahl 2023-05-19, 11:37:21
AcmiBot 2023-05-16, 00:36:49 Ymblanter 2023-05-17, 19:06:27
WikiRankBot 2023-05-12, 03:36:56 Danielyepezgarces 2023-05-17, 16:15:11
ForgesBot 2023-04-26, 09:30:12 BrokenSegue 2023-04-26, 17:13:55
IngeniousBot 3 2023-03-22, 16:29:58 Hjfocs 2023-03-25, 22:58:05
HubaishanBot 2023-03-07, 08:59:18 حبيشان 2023-03-20, 21:26:57
LucaDrBiondi@Biondibot 2023-02-28, 18:25:03 LucaDrBiondi 2023-03-31, 16:10:37
CJMbot 2020-11-19, 13:31:05 Beireke1 2023-02-23, 19:38:23
BiodiversityBot 3 2023-01-15, 20:29:31 Christian Ferrer 2023-03-05, 09:10:13
Jorgesanmi 2022-12-28, 11:16:00 Lymantria 2023-03-09, 17:52:31
Kalliope 7.3 2022-12-07, 09:16:20 BrokenSegue 2022-12-08, 06:45:50
DL2204bot 2 2022-11-30, 11:19:21 Ymblanter 2022-12-11, 20:04:01
Botcrux 11 2022-11-28, 09:05:27 BrokenSegue 2022-11-28, 21:39:40
Cewbot 5 2022-11-15, 02:20:05 MisterSynergy 2022-11-21, 18:55:37
Mr Robot 2022-11-04, 14:09:41 Liridon 2023-03-02, 13:03:34
RobertgarrigosBOT 2022-10-16, 19:43:23 Robertgarrigos 2022-10-16, 19:43:23
بوٹ دا عثمان 2022-09-17, 19:32:56 Ymblanter 2022-09-27, 19:10:45
YSObot 2021-12-16, 11:33:29 Saarik 2022-10-27, 11:32:01
AradglBot 2022-03-14, 19:43:27 Miguel&IvanV 2022-09-23, 10:22:40
PodcastBot 2022-02-25, 04:38:31 Trade 2023-03-06, 22:57:24

Bitbotje 3[edit]

Bitbotje (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Bdijkstra (talkcontribslogs)

Task/s: Add Dutch (nl) labels and descriptions for chemical elements and isotopes.

Code: Upon request.

Function details: Run a SparQL query to fetch all chemical element (Q11344) items. Check that element symbol (P246) and atomic number (P1086) are present. If a Dutch label for a heavy element is missing, add a systematic name; if a Dutch description is missing, add one; report about non-standard Dutch descriptions. Run a SparQL query to fetch all isotope (Q25276) items with a of (P642) qualifier that refers to an element (isotope of hydrogen (Q466603) etc.). If a Dutch label is missing, add one; report about non-standard Dutch labels. For each element, run a SparQL query to fetch its isotopes. Check that P1086 and neutron number (P1148) are present; check for a subclass of (P279) statement that refers to the element. If a Dutch label or description is missing, add one (for nuclear isomer (Q846110) items the suffix is copied from the English label); report about non-standard Dutch labels and descriptions. I've already ran a "first run" script which put all the edit suggestions into a table. All property checks passed and there are 3800 labels and 4600 descriptions to add, and ~100 descriptions to correct. After committing the table I'd like run the (modified) script a monthly or few times per year. --bdijkstra (overleg) 19:09, 26 May 2023 (UTC)Reply[reply]


ForgesB[edit]

ForgesB (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)

Operator: QalasQalas (talkcontribslogs)

Task/s:

Code:

Function details:

 – The preceding unsigned comment was added by QalasQalas (talk • contribs) at 11:15, May 21, 2023‎ (UTC).

@QalasQalas: Two things. First, the account you indicated ("ForgesB") does not exist. Is this the name you chose? If so, you will need to create the account. If it is a typo, please let me know (or fix it yourself). Second, this request is mostly blank, please fill it out. Best, —‍Mdaniels5757 (talk • contribs) 16:42, 23 May 2023 (UTC)Reply[reply]

CEUR-WS[edit]

CEUR-WS (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: WolfgangFahl (talkcontribslogs)

Task/s: CEUR-WS Semantification bot to be used on publishing Workshop proceedings via http://ceur-ws.org

Code:

Function details: Creates Items for Proceedings, Events and potentially papers linking authors data if possible. Creating author pages is a future option.

>3000 proceedings and event entries have already been created using the same code but with personal accounts. We'd love to go the "official" bot route now. --WolfgangFahl (talk) 11:34, 19 May 2023 (UTC)Reply[reply]

ACMIsyncbot[edit]

ACMIsyncbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Pxxlhxslxn (talkcontribslogs)

Task/s: Sync links with ACMI API.

Code: https://github.com/ACMILabs/acmi-wikidata-bot/blob/main/acmi_bot.py

Function details: As part of an upcoming residency with the Australian Centre for the Moving Image (Q4823962) I have written a small bot to pull Wikidata links from their public API and write back to Wikidata to ensure sync between the two resources.The plan was to integrate this as part of the build workflow for the ACMI API (https://github.com/ACMILabs/acmi-api). This is currently set to append only, not removing any links Wikidata-side. While the initial link count is only around 1500 there will likely be significant expansion in the current weeks as we identify further overlaps. --Pxxlhxslxn (talk) 00:36, 16 May 2023 (UTC)Reply[reply]

can you add a reference? can you set an edit summary (just add a "summary" arg to the write call)? Otherwise looks good. BrokenSegue (talk) 01:23, 16 May 2023 (UTC)Reply[reply]
Oh dear, I have tried to change the bot name and now I see I have screwed things up a bit in relation to this form (ie the discussion is still under the old name). Should I just open a new request? I have also added the edit summary to the write function. Pxxlhxslxn (talk) 10:48, 16 May 2023 (UTC)Reply[reply]
No need to open a new request as far as I am concerned. Ymblanter (talk) 19:06, 17 May 2023 (UTC)Reply[reply]

WikiRankBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)

Operator: Danielyepezgarces (talkcontribslogs)

Task/s: Use Alexa rank (P1661)

Code: Coming soon i publish the code

Function details: I am making a bot that can track the monthly ranking of websites based on Similarweb Ranking. The bot will receive a list of websites with their corresponding Wikidata IDs and domains to keep the data accurate.

The bot will have to use the Similarweb Top Sites API to get the traffic ranking of each website and store it in a MySQL database along with the date of the ranking. If the website already exists in the database, the bot should update its ranking and date every time there is a new ranking update.

Soon the bot will include some new features that will be communicated in the future.

The Similarweb ranking is not this property. It is Similarweb ranking (P10768).--GZWDer (talk) 05:16, 12 May 2023 (UTC)Reply[reply]
If correct the bot uses the property P10768 and rewrites the old property P1661 since the public data of Alexa Rank ceased to exist,
when I put Similarweb Ranking I don't mean the property P10768 but that the bot took the data from similarweb.com website Danielyepezgarces (talk) 16:15, 17 May 2023 (UTC)Reply[reply]

ForgesBot[edit]

ForgesBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Dachary (talkcontribslogs)

Task/s: Add licensing information to software forges entries in accordance to what is found in the corresponding Wikipedia page. It is used as a helper in the context of the Forges project

Code: https://lab.forgefriends.org/friendlyforgeformat/f3-wikidata-bot/

Function details: ForgesBot is a CLI tool designed to be used by participants in the Forges project in two steps. First it is run to do some sanity check, such as verifying forges are associated with a license. If some information is missing, the participant can manually add it or it can use ForgesBot to do so.

The implementation includes one plugin for each task. There is currently only one plugin to verify and edit the license information. The license is deduced by querying the wikipedia pages of each software: if they consistently mention the same license the edit can be done. If there are discrepancies they are reported and no action is done.

--Dachary (talk) 09:29, 26 April 2023 (UTC)Reply[reply]

I don't think I understand the task. Can you do some (~30) test edits? Or try to explain again? BrokenSegue (talk) 17:13, 26 April 2023 (UTC)Reply[reply]

IngeniousBot 3[edit]

IngeniousBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Premeditated (talkcontribslogs)

Task/s: Adding identifiers to album items, based on existing identifiers.

Code:

Function details: Adding Spotify album ID (P2205), Apple Music album ID (U.S. version) (P2281), YouTube playlist ID (P4300), SoundCloud ID (P3040), Pandora album ID (P10138), Amazon Standard Identification Number (P5749), Tidal album ID (P4577), Deezer album ID (P2723), Yandex Music release ID (P2819), Anghami album ID (P10972), Boomplay album ID (SOON), and Napster album ID (SOON). Based on previously mentioned properties. --Premeditated (talk) 16:29, 22 March 2023 (UTC)Reply[reply]

can you go into more detail about how this lookup will be done? link to some test edits? BrokenSegue (talk) 16:36, 22 March 2023 (UTC)Reply[reply]
@BrokenSegue: Test edits. Lookups are based on a given album identifier like for example, Spotify album ID (P2205). UPC, Spotify artist ID (P1902), artist name, number of tracks, name of tracks, ISRC (P1243), and more are compared and looked up on other streaming services API/scrapping to match "identical" relases. I have made a scoring system where only relases that score 80% or better are added by the bot. The matches that does not get published will be saved to a file for later to be added to Mix'n'match, maybe. - Premeditated (talk) 23:50, 22 March 2023 (UTC)Reply[reply]
I believe you are misusing the inferred from (P3452) property. Look at the description of that property in English. Please go and fix all the test edits you made. Maybe you want stated in (P248) or similar.
I think you should add a based on heuristic (P887) statement in the reference? Maybe to record linkage (Q1266546) or similar. This whole workstream seems really similar to what is/was being done by User:Soweego bot. Can you explain how you are different/the same. Maybe we should get input from @Hjfocs:.
Can you go into more detail about what is creating these scores? How did you verify the scores are meaningful? What kind of model are you using? Is your source code available? What " looked up on other streaming services API/scrapping to match "identical" relases " are you using. Etc. BrokenSegue (talk) 16:59, 23 March 2023 (UTC)Reply[reply]
Hey folks, happy to give my 2 cents. I second BrokenSegue's comments: (based on heuristic (P887), record linkage (Q1266546)) reference nodes sound good. @Premeditated: interesting project: it would be great if you could share the code and tell us something more about it. Cheers, Hjfocs (talk) 22:57, 25 March 2023 (UTC)Reply[reply]

HubaishanBot[edit]

HubaishanBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: حبيشان (talkcontribslogs)

Task/s: The Bot will import some data from GeoNames (Q830106), Geographic Names Server (Q1194038), OpenStreetMap (Q936) and Yemen General Census of Population, Housing and Establishments 2004 (Q12202700) for only Yemen places these data includes: location, refs, names, population, sub administrative territorial, shared borders. all data imported with me in local database and checked.


Code:

Function details: I use javascript wikibsae-sdk and wikibase-edit from my computer.

--حبيشان (talk) 08:59, 7 March 2023 (UTC)Reply[reply]

LucaDrBiondi@Biondibot[edit]

LucaDrBiondi@Biondibot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: LucaDrBiondi (talkcontribslogs)

Task/s: Import us patent from a csv file

For example:

US11387028; Unitary magnet having recessed shapes for forming part of contact areas between adjacent magnets ;Patent number: 11387028;Type: Grant ;Filed: Jan 18, 2019;Date of Patent: Jul 12, 2022;Patent Publication Number: 20210218300;Assignee Whylot SAS (Cambes) Inventors: Romain Ravaud (Labastide-Murat), Loic Mayeur (Saint Santin), Vasile Mihaila (Figeac) ;Primary Examiner: Mohamad A Musleh;Application Number: 16/769,182

US11387027; Radial magnetic circuit assembly device and radial magnetic circuit assembly method ;Patent number: 11387027;Type: Grant ;Filed: Dec 5, 2017;Date of Patent: Jul 12, 2022;Patent Publication Number: 20200075208;Assignee SHENZHEN GRANDSUN ELECTRONIC CO., LTD. (Shenzhen) Inventors: Mickael Bernard Andre Lefebvre (Shenzhen), Gang Xie (Shenzhen), Haiquan Wu (Shenzhen), Weiyong Gong (Shenzhen), Ruiwen Shi (Shenzhen) ;Primary Examiner: Angelica M McKinney;Application Number: 16/491,313

US11387026; Assembly comprising a cylindrical structure supported by a support structure ;Patent number: 11387026;Type: Grant ;Filed: Nov 21, 2018;Date of Patent: Jul 12, 2022;Patent Publication Number: 20210183551;Assignee Siemens Healthcare Limited (Chamberley) Inventors: William James Bickell (Witney), Ashley Fulham (Hinkley), Martin Gambling (Rugby), Martin Howard Hempstead (Ducklington), Graeme Hyson (Milton Keynes), Paul Lewis (Witney), Nicholas Mann (Compton), Michael Simpkins (High Wycombe) ;Primary Examiner: Alexander Talpalatski;Application Number: 16/771,560


Code:

I would learn to write my bot to perform this operation. I am using Curl in c language, i have a bot account (that now i want to "request for permission") buy i get the following error message:

{"login":{"result":"Failed","reason":"Unable to continue login. Your session most likely timed out."}} {"error":{"code":"missingparam","info":"The \"token\" parameter must be set.","*":"See https://www.wikidata.org/w/api.php for API usage.

probably i think my bot account is not already approved...

Function details:

Import item on wikidata starting from title and description and these properties for now:

P31 (instance of) "United States patent" P17 (country) "united states" P1246 (patent number) "link to google patents or similar" --LucaDrBiondi (talk) 18:25, 28 February 2023 (UTC)Reply[reply]

@LucaDrBiondi How many patents are you planning to add this way? ChristianKl❫ 12:33, 17 March 2023 (UTC)Reply[reply]
The bot account to which you link doesn't exist. ChristianKl❫ 12:34, 17 March 2023 (UTC)Reply[reply]


Hi i am still writing and trying it and moreover it is not yet a bot ...because it is not automatic.

I have imported patents data into a sql server database then i read a patent and with pywikibot i try for example to search the assignee (owned by property). If i not find a match i will search manually. only if i am sure then i insert the data into wikidata. this is because i do not want to add data with errors. For example look at Q117193724 item. LucaDrBiondi (talk) 18:27, 17 March 2023 (UTC)Reply[reply]




@ChristianKl
At the end i have developed a bot using pywikibot.
It is not fully automatic because i have the property Owned_id that it is mandatory for me.
So i verify if wikidata has already an item to use for this property.
If I not find it then i not import the item (the patent)
I have already loaded some houndred items like for example this Q117349404
Do a limit of number of item that can i import each day exists?
I have received at a point a warning message from the API
Must i so somethink with my user bot?
thank you for your help! LucaDrBiondi (talk) 16:08, 31 March 2023 (UTC)Reply[reply]

CJMbot[edit]

CJMbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Meemoo_BE (talkcontribslogs)

CJMbot lets users upload a CSV file in a certain format. The data inside this file is then validated and processed. New items wil be created based on the data in the CSV file and existing items wil updated by adding statements and references.

Coming soon.

User uploads a CSV containing data of producers (mostly artist). This data is matched to Qids using the Openrife reconciliation api. The user get an e-mail with the updated CSV file containing the Qids, for manual correction. After the manual check the user can upload the updated CSV file. This will add new statements to the matched Qids (if they didn't exist) and add a reference to it. Or create a new item containing the data of the CSV. --CJMbot (talk) 13:31, 19 November 2020 (UTC)Reply[reply]

What is "Openrife"? Where does the website for this bot live? Who has access to it? BrokenSegue (talk) 18:22, 27 January 2023 (UTC)Reply[reply]
@BrokenSegue I work for @Meemoo BE, the organisation that created CJMbot. Openrife was a typo and should be Openrefine. Collection staff from museum and archives use our public domain tool (in Dutch, but more info in English can be found here) of which the bot on Wikidata is an integral part. Beireke1 (talk) 08:36, 30 January 2023 (UTC)Reply[reply]
@Beireke1: this bot is creating a bunch of constraint violations. see for example the references on Veerle Persyn (Q116317378). can we not provide a URL or identifier for these references? I'm also unclear what it means for an organization to have works in a museum as in Leonard (Q116451171). also the items being created don't have descriptions and it seems you could generate valid descriptions. there's a reason we generally require bots to be approved before operation. BrokenSegue (talk) 14:50, 30 January 2023 (UTC)Reply[reply]
@BrokenSegue: The developers we asked to write this bot activated it more than two years ago and it has been working since without any problems. I didn't know that a formal approvement was needed or that it was not obtained. The data always come from official archives or museums, but are not always publicly available online because of limitations of their systems and/or gradually evolving internal open data policies. If a url is available, this is provided in the reference. If not, the source institution is named to provide provenance of the data. The Leonard (Q116451171) example is a case were the creator / rights holder of a collection item is not a person but an organisation, for example a fashion house in this case. You certainly have a point about the descriptions. I will contact the developers and see what we can do to improve that in future data uploads. Beireke1 (talk) 15:09, 30 January 2023 (UTC)Reply[reply]
@BrokenSegue The bot seems to be blocked at this point. Is the context above sufficient to unblock it, because it is quite an important part in the workflow of an ongoing project with museum and archives in Flanders (Belgium)? Beireke1 (talk) 15:17, 30 January 2023 (UTC)Reply[reply]
yes it was blocked by me for not being approved. Maybe we can unblock if we get a promise that someone will go and fix the errors later? Our bot policy is pretty clear Wikidata:Bots and this bot account doesn't even have a bot flag. this page was created more than two years ago but the request was never properly made. BrokenSegue (talk) 15:25, 30 January 2023 (UTC)Reply[reply]
I always follow up on the bot's activity, hence my deletion request for empty items that were created by the bot. As said above, I will also make sure that the bot can be changed so that descriptions are also provided. I would appreciate it if you could unblock the bot in the meantime. Isn't this page the request page? What else should have been done more than two years ago (19 November 2020, see first message on this page) to request permission? Beireke1 (talk) 08:50, 31 January 2023 (UTC)Reply[reply]
@Beireke1: the request page was made but it was not embedded onto the Wikidata:Requests for permissions/Bot page where people actually look. The instructions for requesting bot permissions says "To request a bot flag, or approval for a new task, in accordance with the bot approval process, please input your bot's name into the box below, followed by the task number if your bot is already approved for other tasks. Then transclude that page onto this page, like this: Wikidata:Requests for permissions/Bot/RobotName." The last step was not done. As I mentioned you are not following up on errors introduced by the bot because it is introducing constraint violations and you have not fixed them. Will you fix those errors? BrokenSegue (talk) 15:28, 2 February 2023 (UTC)Reply[reply]
@BrokenSegue: Are there other constraint violations that should be fixed apart from the fact that a museum is sometimes used as a value with stated in (P248)? Strangely enough however, an archives (Q166118) is allowed as a value and a museum is not. I don't understand the logic there. Why not just add instances of museum as allowed values? The alternative would be that we need to create a separate Wikidata item for each museum's database (that is not always available online in its entirety) just to bypass the constraint violation. Beireke1 (talk) 16:14, 2 February 2023 (UTC)Reply[reply]
it is unhelpful to say that a statement is asserted in a museum (Q33506). Is it stated in the museum exhibits? On their website? In their archive? How am I to lookup this reference? It's not helpful. but this isn't the only thing I would want to see improved. Can we not do better than Quinet (Q111968317)? Like even add a description? Or an external link? Or an identifier? This item is nearly useless. How is anyone to make use of something so vague. There's also a constraint violation on Leo Dierickx (Q111968260). BrokenSegue (talk) 16:22, 2 February 2023 (UTC)Reply[reply]
What is the constraint violation on Leo Dierickx (Q111968260)? I can't see any warning on that item. As mentioned before, having descriptions added to new items is something that we can do for future uploads. It would mean that the bot won't create any new Wikidata items anymore if a description can't be provided (to avoid items like Quinet (Q111968317)). I don't really see the difference between a museum and an archival institution as a reference though. Both are cultural institutions with collections that have metadata on their collections as a central aspect of their work. Sometimes an item doesn't have a lot of statements when it is created, but just the fact that a person has work in a certain collection is sometimes enough for others to add additional data at a later stage (see for example the history of Gustave Corthouts (Q106495397). I agree with you that we need to define the bare minimum that is needed to create an item. I think that having an occupation and description derived from that occupation is a good one. Beireke1 (talk) 08:53, 3 February 2023 (UTC)Reply[reply]
The constraint violation was on the qualifier for the copyright status as a creator. Someone "fixed" it in this edit. I'm not clear if that is a correct change. Also the quotations in the references are literally just a year which isn't helpful. And is there really nothing more specific to put in the reference other than the name of the museum? Also why is this person's last name all capitals Veerle Persyn (Q116317378)? BrokenSegue (talk) 20:05, 7 February 2023 (UTC)Reply[reply]
As copyright rules are not the same in all jurisdictions, I think that the property applies to jurisdiction (P1001) as a qualifier makes very much sense. If the quotations aren't helpful, we could skip adding them of course. The last name in capitals is a mistake in the source data. I will revise those manually to solve that problem. I see two more items with the same problem, but not more. Beireke1 (talk) 16:40, 8 February 2023 (UTC)Reply[reply]
I can't really say much about the items being created, since it's not an area I edit, but looking at items like Special:History/Q116451094, I think it should try to add the data in fewer edits.
It should be possible to add all the data in the edit which creates the item, but, at the very least, it could add labels in the edit which creates the item (i.e. combine edits like Special:Diff/1820331590 and Special:Diff/1820331595), and add references in the same edit as the statement (i.e. combine edits like Special:Diff/1820331627 and Special:Diff/1820331647).
The way it adds labels definitely needs to be fixed. It is adding them using edits like Special:Diff/1820331595 which claims to be clearing the item but is actually adding labels.
- Nikki (talk) 13:43, 7 February 2023 (UTC)Reply[reply]
Thanks for your valuable and very specific feedback @Nikki. I'll check with the developer of the bot what they can do. Beireke1 (talk) 15:43, 7 February 2023 (UTC)Reply[reply]
Following from BrokenSegue's comments: most of the items about artists have no ID's whatsoever, and most of the time no data. This makes extremely borderline the acceptance of such items on Wikidata - they're in a museum catalogue, that's alright, but the museum should have a ID page for them and the item should link to it. Also, in the examples I saw, basically no description was added - another thing that must be fixed in next iterations.
Given all of that, I would say that before granting bot status to CJMbot, the items should be thoroughly revisited - since you have primary access to data, you should do it, not a volunteer - and then, once the fixes have been made, the status can be granted. --Sannita - not just another it.wiki sysop 12:20, 8 February 2023 (UTC)Reply[reply]
I agree that what you say is the ideal situation @Sannita, but that's not the reality in most GLAMs. I would like to make a plea for not sitting down and wait for every museum and archive to have persistent IDs and URIs for every object and/or creator before bringing their basic data about creators to Wikidata. Sometimes even items with few data are very useful for other to enrich at a later stage. See for example the history of Gustave Corthouts (Q106495397) as mentioned before. The kind of description that I can generate will have to be derived from the occupation of a creator. As a revision, I already added English descriptions to all newly created items by the bot (see this batch). Based on the community feedback above, I have the following adaptations to the bot on my list:
  • Don't make exact duplicates of already existing references
  • Only create new items if minimally these data can be added: label, description, instance of and occupation (for humans).
  • Add descriptions (in Dutch and English) to newly created items. These can probably be a combination of occupation and (if known) birth and death date.
  • Check if data can be added in less edits
Agree @BrokenSegue@Nikki@Sannita? Beireke1 (talk) 16:35, 8 February 2023 (UTC)Reply[reply]
Do you at least have identifiers for the references you are adding? Is there a document ID or something that you can refer us to? Or a link? Or is "somewhere in this museum" the best you can do? Personally I don't care about doing it in fewer edits. Mainly I'm concerned that the data you are adding is so undifferentiated that there's no way for anyone to use it. "Some artist named X whose name appears somewhere in this museum". How is anyone to use this? BrokenSegue (talk) 16:41, 8 February 2023 (UTC)Reply[reply]
@BrokenSegue@Marsupium@Nikki@Sannita I think I worked out a solution to solve this, based on what I see as a good practice on other items. When a reference URL (P854) is available, we'll keep using that as a reference. If not, we would create a distinct item for the database maintained by the archive/museum where the dataset originates from (see for example Royal Library of Belgium (KBR) Online Catalogue (Q104828762)) and use it as a value on catalog (P972). We would combine this with the museum/archive identifier as a value on catalog code (P528) to make the statements verifiable. I believe that this answers important questions raised above regarding constraint violations and data verifiability. I would like to propose this in combination with these changes to the bot that I mentioned before:
  • Don't make exact duplicates of already existing references
  • Only create new items if minimally these data can be added: label (Len, Lfr, Lnl), description (Den, Dfr, Dnl), instance of (P31), occupation (P106) (for humans) and has works in the collection (P6379).
  • Descriptions can be a combination of occupation and (if known) birth and death date.
  • Check if data can be added in less edits
Beireke1 (talk) 13:39, 17 February 2023 (UTC)Reply[reply]
ok I'll unblock the bot account so you can do some test edits to show us what this new version will look like. please do not start running the bot full time until this approval process is over though BrokenSegue (talk) 07:06, 18 February 2023 (UTC)Reply[reply]
To jump in as well: I'd disagree that items for humans with only "label, description, instance of and occupation" should be created. I work a lot with items for artists and that information is not enough to identify a person. There are often enough people with the same name and same occupation and we end up with items that don't make clear which concept they are about which is a long-term burden for maintenance also of other items. Also, items of not clearly identifiable entities don't meet Wikidata:Notability criterion 2. Since the potential items in question probably wouldn't meed criteria 1 or 3 either, they shouldn't get created. However, if an item has date of birth (and death for dead people) it's fine I think and can be created.
Only clicking on some of the last 20 items the bot created I've found Q116234766, Q116234763, nothing apart from label and P31=Q5 and C. J. Bisschop (Q116215346) with clearly wrong date of birth and death. I'd prefer a bot adding data with that quality rather not operating than operating. If quality improves and there is willingness not fix mess created in the future one could reconsider. To show such willingness it would be good to start to fix the messy data that is currently live. I'd be glad about such improvements! Best, --Marsupium (talk) 23:48, 14 February 2023 (UTC)Reply[reply]
@MarsupiumThe solutions proposed above should avoid data added like the items that you point to. I made a request for deletion for Q116234766 and Q116234763 and manually improved the birth and death date statements on C. J. Bisschop (Q116215346) based on the source data. Beireke1 (talk) 13:44, 17 February 2023 (UTC)Reply[reply]
Thanks that's good. But that wasn't really my point. I found the items in a random sample of 20 items. Out of the 554 item the bot created 239 still aren't deleted. I guess data quality of the others might be more or less the same. I hope the "solutions above" won't repeat this. But since there is probably still a lot of messy data in the database that hasn't been cleaned up systematically, there is no reason to assume that in the future other mess that might appear will be taken care of. And that's a bad precondition for a bot in my eyes. So I'd propose to go through those 239 items and clean them up systematically and then continue running the bot. I've there are no resources for that, then similar problems might come up in the future and a bot that no one can clean up after shouldn't work in the first place. Sorry for this other long text. Best, --Marsupium (talk) 16:46, 17 February 2023 (UTC)Reply[reply]
No problem, I'll take the time to revise those in the next weeks and at the same we'll work on the bot improvements. Beireke1 (talk) 20:58, 21 February 2023 (UTC)Reply[reply]
Do I understand right that the approval is for users working at Meemoo, so that Meemoo is responsible for all the edits? ChristianKl❫ 18:29, 18 February 2023 (UTC)Reply[reply]
The bot writes data on behalf of archives and museums who's staff use the public domain tool developed by meemoo Beireke1 (talk) 20:52, 21 February 2023 (UTC)Reply[reply]


BiodiversityBot 3[edit]

BiodiversityBot 3 (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Andrawaag (talkcontribslogs)

Task/s: TreatmentBank (Q54857867) is a CC0 resource that contains data on taxonomic treatments, treatment citations, figures, tables, material citations and bibliographic reference. It contains valuables links mapping scholarly articles, taxa and taxonomic treatments. This bot taks involves linking research papers on taxa to taxa and related taxonomic treatments.

Code: on GitHub

Function details: The bot fetches a record from treatmentbank and aligns from these records the scholarly article, the taxon record and the related taxonomic treatments. The item on the article is linked to the related taxa by main subject (P921). The taxon is linked to the taxonomic treatments, through taxonomic treatment (P10594). I intend to operate the bot first manually, which means that I will select the records to be aligned with Wikidata. First, in small sets. In a second stage, I will add daily updates from treatmentbank. --Andrawaag (talk) 20:29, 15 January 2023 (UTC)Reply[reply]

89 test edits

what's the rationale for having both an exact match statement and the Plazi ID statement if they go to the same place? BrokenSegue (talk) 21:25, 16 January 2023 (UTC)Reply[reply]
The Plazi ID statement and the exact match statement are stored slightly differently.
Plazi ID's are only stored using the string version of the ID (with out the PREFIX), while the exact match uses the the full format.
wd:Q116211926 wdt:P1992 "55DC13A5-1D62-5AA0-8ABB-C1094124B7F1" ;
and
wd:Q116211926 wdt:P2888 <http://treatment.plazi.org/id/55DC13A51D625AA08ABBC1094124B7F1> ;
Having the full URI format at SKOS:exactMatch can help in federated querying. We could argue that with BIND we can compose the full URI, but this is an expensive query operator. Having the full URI to linkout in federated queries leads to more performant querying.
Having said that, I have also added the "formatter URI for RDF resource" property on Plazi ID which should resolve to the same URI and would indeed make the exact match property redundant. However, to work the ".rdf" extension has to be added since plazi does not seem to have content negotiation in place. External RDF resource can use the URI without the .rdf suffix, which could make some federated queries not work.
With this in mind, I decided to add the skos:exactMatch property as P2888 to the items, so all (federated) query use cases can be catered. Andrawaag (talk) 22:29, 16 January 2023 (UTC)Reply[reply]
Here your bot deleted existing references and qualifieres of P225. I didn't checked other edits. --Succu (talk) 17:54, 17 January 2023 (UTC)Reply[reply]
@Andrawaag: Same here. --Succu (talk) 22:08, 18 January 2023 (UTC)Reply[reply]
 Oppose: And here! Please check your 89 „contributions”. Thx. --Succu (talk) 22:13, 18 January 2023 (UTC)Reply[reply]
How do you plan to connect treatments as Hydnellum nemorosum A. M. Ainsw. & E. Larss. 2021, sp. nov. (Q111983493) to the original publication Four new species of Hydnellum (Thelephorales, Basidiomycota) with a note on Sarcodon illudens (Q111983492)? --Succu (talk) 22:17, 19 January 2023 (UTC)Reply[reply]
I don't understand your question. I am not planning to invent any new link between treatments and the original publication. The only thing this bot does is link taxon names and treatments if there is an explicit link in plazi. Wikidata should not be a primary source. Andrawaag (talk) 21:20, 7 February 2023 (UTC)Reply[reply]
The bot links three items (one of the test runs):
1. Oxalis lourteigiana (Q116211006) <- taxon name
2. Oxalis lourteigiana Nuernberg-Silva & Fiaschi 2021 (Q116211004) <- treatment
3. Taxonomic revision and morphological delimitation of Oxalis sect. Ripariae (Oxalidaceae) (Q110696619) <- publication.
taxon name and treatment are linked using taxonomic treatment (P10594) and
taxon name and publication are linked using stated in (P248) Andrawaag (talk) 21:45, 7 February 2023 (UTC)Reply[reply]
Thats the way of how is User:Christian Ferrer using taxonomic treatment (P10594) created by UWashPrincipalCataloger after a controversional discussion. --Succu (talk) 21:40, 20 January 2023 (UTC)Reply[reply]
Succu I don't think I ever used once time taxonomic treatment (P10594). Christian Ferrer (talk) 07:21, 21 January 2023 (UTC)Reply[reply]
Another example Q116211011. --Succu (talk) 22:40, 20 January 2023 (UTC)Reply[reply]
@Andrawaag: About Oxalis (sect. Ripariae) Lourteig 2000 Plazi says Q93379839. --Succu (talk) 22:52, 20 January 2023 (UTC)Reply[reply]
See Taxonomic revision and morphological delimitation of Oxalis sect. Ripariae (Oxalidaceae) (Q110696619). --Succu (talk) 22:36, 8 February 2023 (UTC)Reply[reply]
@Andrawaag: Psiloderces (Q5515527) has three values for Plazi ID (P1992) (reffering to different articles). One treatmentment is linked to Fourteen new species of the spider genus Psiloderces Simon, 1892 from Southeast Asia (Araneae, Psilodercidae) (Q87009972) via taxon name (P225) How will yor bot handle the relationship according to taxonomic treatment (P10594)? --20:58, 23 January 2023 (UTC)
In the case of Psiloderces (Q5515527) and the three Plazi IDs (P1992). Linking three plazi id's to the same taxonname, is semantically inaccurate. The bot links the taxon name, to the distint wikidata items of instance of (P31) taxonomic treatment (Q32945461) using taxonomic treatment (P10594) Andrawaag (talk) 21:35, 7 February 2023 (UTC)Reply[reply]
The property was created to exactly to do this: linking taxon name (P225) to Plazi-IDs. --Succu (talk) 21:50, 8 February 2023 (UTC)Reply[reply]
Please note this discussion too. --Succu (talk) 22:04, 8 February 2023 (UTC)Reply[reply]

 Support Idea looks good. As we do not have clear standards for whether or not having both exact match statement and the ID, I am okay with either. However, I also agree that the bot should not delete the previous qualifiers, they are quite relevant.TiagoLubiana (talk) 19:50, 6 February 2023 (UTC)Reply[reply]

The bot has overwritten indeed a handful of references, but those edits have been reverted. Nice to have test edits for this. I have updated the bot and now the bot will no longer overwrite. Later this week I will run additional test run and report those here, so we can move forward with this bot request. Andrawaag (talk) 21:17, 7 February 2023 (UTC)Reply[reply]
Any answers to my questions? --Succu (talk) 21:22, 7 February 2023 (UTC)Reply[reply]
Yes see inline Andrawaag (talk) 21:46, 7 February 2023 (UTC)Reply[reply]
  •  Comment In the exemple with Oxalis lourteigiana (Q116211006), as far I see the taxonomic treatment and the publication are not linked one each other, though to know form where comes a taxonomic treatment is one of the most important thing in taxonomy. E.g. see Capillaster sentosus (Q1985550) or Phanogenia distincta (Q2187529), where in case of several treatment you can easily find from where comes each treatment. In that way you can find every treatments from a publication, e.g. see that query https://w.wiki/6FRP showing all the treatments available in H.L Clarck (1938). Another exemples with a publication (Mah (2021)) with Plazi treatments available [1]. The Plazi web site also groups treatments by publications, otherwise that would not make sense IMO. Christian Ferrer (talk) 05:57, 8 February 2023 (UTC)Reply[reply]
  • Note also that not all (the big majority of) taxonomic treatments don't exist in Plazi, so if in Wikidata we decide to separate the treatments in a separate item as Oxalis lourteigiana Nuernberg-Silva & Fiaschi 2021 (Q116211004), why not, but then it should be done in a way that we can modelize all other potential treatments even if they don't exist yet in Plazi. E.g. the current label of "Oxalis lourteigiana Nuernberg-Silva & Fiaschi 2021" is in fact the exact citation of the "name of the species + author citation", what will be the labels for the next potential treatments about this species, what will be the way ro modelize those treatments in separate items if they don't exist in Plazi? I'm really interested because I add a lot of treatments in Wikidata, and I like to have a possibility to retrieve externaly easily this kind of data, e.g. in Wikispecies. Christian Ferrer (talk) 06:37, 8 February 2023 (UTC)Reply[reply]

 Question: In Novakiella trituberculosa (Q1310467) we have taxon redescription (Q42696902) claiming to be a subclass of taxonomic treatment (Q32945461): Guess this should be merged with emendation (Q1335348). --Succu (talk) 21:00, 8 February 2023 (UTC)Reply[reply]

There is even an article named "Emendation" in the ICZN code [3], further more I quote "Any demonstrably intentional change in the original spelling of a name other than a mandatory change is an "emendation" ". However the German Wikipedia article seems to talk about a different concept (the change of definition/scope of a taxon), it's true that sometimes we can see in scientific articles things such as "Genus xxxx (emended)", but it is obvioulsy not the same thing. Christian Ferrer (talk) 12:32, 9 February 2023 (UTC)Reply[reply]
OK, thx, but the description of taxon redescription (Q42696902) says "modification of an existing description". --Succu (talk) 17:03, 9 February 2023 (UTC)Reply[reply]
Nothing shocks me. To make a redescription you must have "an existing description" overwise it is not a redescription but a "first description", and if the redescription is identical to the "existing description" then it is not really a redescription thus the use of the word "modification". As well as its placement as a subclass of taxonomic treatment tend to be obvious for me. Christian Ferrer (talk) 17:45, 9 February 2023 (UTC)Reply[reply]
In Description, redescription and revision of sixteen putatively closely related species of Echinoderes (Kinorhyncha: Cyclorhagida), with the proposition of a new species group – the Echinoderes dujardinii group (Q106699033) the term Emended diagnosis is used (referenced e.g. by Echinoderes gerardi (Q2620277). emendation (Q1335348) as well as Q3025852 are a mixture of different topics, but both indicate different taxonomic treatment (Q32945461). An "Emendation" in sense of the ICZN code is a treatment of it's own. Not sure what to do. --Succu (talk) 19:13, 9 February 2023 (UTC)Reply[reply]
See this interesting article: [4]. Not sure we are still in the scope of this Requests for permissions for the BiodiversityBot 3. Christian Ferrer (talk) 19:56, 9 February 2023 (UTC)Reply[reply]
  •  Info I just remember the following querry, there is currently near 5000 Plazi treatments (probably mainly added by me) stored in the reference sections:

List of items with taxonomic treatment that includes Plazi ID within the reference section of taxon name (query) Christian Ferrer (talk) 09:09, 5 March 2023 (UTC)Reply[reply]

Kalliope 7.3[edit]

Kalliope 7.3 (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Push-f (talkcontribslogs)

Task/s: Update User:Kalliope 7.3/List of bots every hour.

Code: https://git.push-f.com/wikidata-bots/tree/bots.py

Function details:

I am planning on adding more features e.g. adding a parameter to {{Bot}} to allow bots to define which properties they edit and then generating a table like:

Property Bot
software version identifier (P348) Github-wiki-bot

but I still have to implement that.

--Push-f (talk) 09:16, 7 December 2022 (UTC)Reply[reply]

@Push-f: You do not need bot right if the bot only edit subpages of your or your bot's user pages.--GZWDer (talk) 09:26, 7 December 2022 (UTC)Reply[reply]
@GZWDer: I think I do need something because any attempt to edit a subpage via the API is failing with a Captcha (and I did confirm the email address for the account). --Push-f (talk) 14:12, 7 December 2022 (UTC)Reply[reply]
You need a confirmed flag for this. GZWDer (talk) 14:13, 7 December 2022 (UTC)Reply[reply]
Ah ok thanks, then I hereby request the "confirmed" right for my bot. --Push-f (talk) 14:29, 7 December 2022 (UTC)Reply[reply]
confirmed flags are requested at Wikidata:Requests for permissions/Other rights. BrokenSegue (talk) 06:45, 8 December 2022 (UTC)Reply[reply]
oh I see you figured that out. never mind. BrokenSegue (talk) 06:45, 8 December 2022 (UTC)Reply[reply]

DL2204bot 2[edit]

DL2204bot 2 (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: DL2204 (talkcontribslogs)

Task/s: Correct messy entries for scholarly articles of Uztaro. Journal of Humanities and Social Sciences. (Q12268801) and Aldiri. Arkitektura eta abar (Q12253132) journals, add 2020-2022 articles.

Code: We are using WBI 12.0 for interaction with the source Wikibase, and with Wikidata.

Function details: In 2020-21, articles of the two journals (example Q108042527) have been uploaded using OpenRefine (see Q108042527 history). That dataset has several problems, such as repeated author statements (with and without "series ordinal" qualifier), incorrect issue number, DOI not present (although existing), download URL not present (although existing), etc. This proposal consists en re-writing all entries (see all using this query), using data from the newly created Inguma Wikibase (see items for these two journals using this query). Before the operation, we will check completeness and integrity of the data, and include some missing items (original source is the SQL database in the back of https://inguma.eus). --DL2204 (talk) 11:18, 30 November 2022 (UTC)Reply[reply]

If I'm understanding your query correctly you are planning on editing just 1000 items? Personally I would be comfortable letting you do that without bot approval. Seems like a manual audit would be possible to ensure the quality is acceptable. Either way  Support. BrokenSegue (talk) 16:43, 7 December 2022 (UTC)Reply[reply]
Please make some test edits.--Ymblanter (talk) 20:04, 11 December 2022 (UTC)Reply[reply]

Botcrux 11[edit]

Botcrux (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Horcrux (talkcontribslogs)

Task/s: Replace stated in (P248) to publisher (P123) where the value is Q1158.

Problem description: In 2020 User:Reinheitsgebot made a massive addition of references in which stated in (P248)World Athletics (Q1158) have been added as claim in the reference (edit example). The operation was ok, except that World Athletics (Q1158) is an organization, not a document or a database, therefore thousands of warnings are currently raised up (example). A more suitable property is publisher (P123).

Function details: For technical reasons, I'm not able to fix the source with a single edit, so the bot will:

  1. copy all the claims in the reference to be removed (except for stated in (P248)World Athletics (Q1158));
  2. remove the problematic reference;
  3. add a new reference with all the claims copied plus publisher (P123)World Athletics (Q1158).

The script is ready, here a couple of edits: [5][6]. --Horcrux (talk) 09:04, 28 November 2022 (UTC)Reply[reply]

  • Sounds fine though why not just remove the "Stated In" "World Atheletics" claim from the reference altogether? Surely that's implied by the athlete ID.
BrokenSegue (talk) 16:41, 28 November 2022 (UTC)Reply[reply]
@BrokenSegue: Just because I'm used to be as much complete as I can when I add a reference. But yes, it would also be ok just to execute point #2. --Horcrux (talk) 19:41, 28 November 2022 (UTC)Reply[reply]
Personally I'd prefer just doing point 2 but I don't care enough to argue either way. I might even argue that this bot doesn't need approval since the scope is so limited and there's warnings. BrokenSegue (talk) 21:39, 28 November 2022 (UTC)Reply[reply]

Cewbot 5[edit]

Cewbot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Kanashimi (talkcontribslogs)

Task/s: Add sitelink to redirect (Q70893996) for sitelinks to redirects without intentional sitelink to redirect (Q70894304).

Code: github

Function details: Find redirects in wiki projects, and check if there is sitelink to redirect (Q70893996) / intentional sitelink to redirect (Q70894304) or not. Add sitelink to redirect (Q70893996) for sitelinks without sitelink to redirect (Q70893996) or intentional sitelink to redirect (Q70894304). Also see Wikidata:Sitelinks to redirects. --Kanashimi (talk) 02:19, 15 November 2022 (UTC)Reply[reply]

sounds good. link to the source? BrokenSegue (talk) 05:28, 15 November 2022 (UTC)Reply[reply]
I haven't started writing code yet. I found that there is already another task Wikidata:Requests for permissions/Bot/MsynBot 10 running. What if I treat this task as a backup task? Or is this not actually necessary? Kanashimi (talk) 03:34, 21 November 2022 (UTC)Reply[reply]
The complete source code of my bot is here: https://github.com/MisterSynergy/redirect_sitelink_badges. It is a bit of a work-in-progress since I need to address all sorts of special situations that my bot comes across during the inital backlog processing.
You can of course come up with something similar, but after the initial backlog has been cleared, there is actually not that much work left to do. Give how complex this task turned out to be, I am not sure whether it is worth to make a complete separate implementation for this task. Yet, it's your choice.
Anyways, my bot would not be affected by the presence of another one in a similar field of work. —MisterSynergy (talk) 18:55, 21 November 2022 (UTC)Reply[reply]


Mr Robot[edit]

Mr Robot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)

Operator: Liridon (talkcontribslogs)

Task/s: Add descriptions/labels/aliases

Code: https://github.com/emijrp/wikidata

Function details: I have been using QuickStatements to work on large numbers of items and properties for a lot types of items, and have +12 mil edits so far. I intend to continue to do so, and after this discussion I am applying for the bot flag for this account in order to avoid flooding Recent Changes/Watchlists.--Liridon (talk) 14:09, 4 November 2022 (UTC)Reply[reply]

I don't think we grant blanket approval for bots. Can you specify what tasks you will be working on? BrokenSegue (talk) 16:31, 4 November 2022 (UTC)Reply[reply]
I've already done some tasks with this account using scripts which are part of the github link, eg ([7], [8] ...) through paws.wmcloud.org. Liridon (talk) 17:35, 8 November 2022 (UTC)Reply[reply]
that doesn't really answer the question. I don't think we grant blanket approval. BrokenSegue (talk) 17:25, 11 November 2022 (UTC)Reply[reply]
You guys did approve this one, which had similar task description.--Liridon (talk) 16:46, 13 December 2022 (UTC)Reply[reply]
@BrokenSegue Hello. Liridon is flooding my Watchlist with his edits adding sq labels to people items. And he's saying he cannot use the bot account because the bot request here was not approved. Can we grant him approval specifically for this kind of edits? Please - for the sake of my watchlist... Thanks... Vojtěch Dostál (talk) 18:28, 18 February 2023 (UTC)Reply[reply]
@Vojtěch Dostál: I'm not a bcrat. I can't assign the bot flag. BrokenSegue (talk) 18:43, 18 February 2023 (UTC)Reply[reply]
Or we can block the user for running unapproved bot. Ymblanter (talk) 20:26, 19 February 2023 (UTC)Reply[reply]
What? You cant block me because of this. I query Items throught https://query.wikidata.org/, find those without specific label or description, then edit all them with Quickstatements. They are not bot edits. Liridon (talk) 13:42, 20 February 2023 (UTC)Reply[reply]
the bot policy does not specify what technology the bot uses to make the edits. the point of the policy is to provide some oversight over large batch edits. BrokenSegue (talk) 21:46, 26 February 2023 (UTC)Reply[reply]
I'm not doing these edits with cadidate bot user(Mr Robot), but with my non-bot-account (Liridon). Exept for flooding watchlist of other users with my semi-automated edits (which I'm sure a lot of other users do), nothing is against any rules of Wikidata. Liridon (talk) 13:03, 2 March 2023 (UTC)Reply[reply]

RobertgarrigosBOT[edit]

RobertgarrigosBOT (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Robertgarrigos (talkcontribslogs)

Task/s: I'm using Openrefine to edit items related to Wikidata:Wikiproject_Lieder, beginning by adding the new subclass lyrico-musical work (Q114586269) to the actual lieder in WD. I hope to gain some experience before going with further edits.

Code:

Function details:

--Robertgarrigos (talk) 19:42, 16 October 2022 (UTC)Reply[reply]

بوٹ دا عثمان[edit]

بوٹ دا عثمان (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Middle river exports (talkcontribslogs)

Task/s:

  • Dual script support for Punjabi Wikidata labels and descriptions in Gurmukhi (pa) and Shahmukhi (pnb), taking a conservative approach (only making "obvious" transliterations).
  • Addition of labels and descriptions entirely consistent of information which can be inferred from other language labels (again, only for "obvious" cases).

Code:

Function details:

  • Gurmukhi to Shahmukhi: The Gurmukhi script for Punjabi represents enough information that the vast majority of strings can be transliterated to Shahmukhi through the application of a set of standard rules. For example, word-initial ਆ should always be آ, and ਨ੍ਹ should be نھ.The exceptions to this are largely concentrated in the most commonly used words in the language, such as ਮੂੰਹ (face/mouth) being مونہہ. For this reason, I intend to scope these transliterations to labels and descriptions only consistent of the most common words, as defined by the EMILLE Punjabi Corpus which maintains this data. The exceptions will then be substituted first before applying the general rules.
  • Shahmukhi to Gurmukhi: The Perso-Arabic script allows for the omission of information that Gurmukhi does not, and has more one-to-many conversions, which means that transliterations in this direction will unfortunately have to be more limited for the time being. Despite these differences, there are some common letter combinations which always correspond to the same letters in Gurmukhi, and for labels and descriptions entirely consistent of these, a Gurmukhi representation may be derived. For example, ب is always ਬ so long as it is not followed by ھ, and ٹ not followed by ھ is always ਟ. There are some more sophisticated conversions which may be done; for example: و can represent the consonant ਵ or the vowels ਉ, ਊ, ਓ, and ਔ. However, a vowel at the beginning of the word is always represented by ا, so we know word-initial و is always ਵ. So if we see وِکی ("wiki"), we can transliterate this as ਵਿਕੀ with 100% confidence.
  • Tidying / Standardizing: There are some forms which always have a standard/expected alternate spelling, so aliases will be added for these where applicable. For example, any word containing ਲ਼ may be written with ਲ instead, and any word written with ݨ may be written with ن instead. These letters are in use in the Punjabi Wikipedia editions, but are not broadly supported across keyboard layouts yet or used by all writers, so aliases containing the alternate spelling should be applied wherever possible. The other aspect of tidying I intend the bot to perform is correction of spelling errors which have "obvious" corrections which may be made with 100% confidence. For example, the shad on کّ should be moved if followed by ھ as in کھّ. Wherever کّھ appears, it is always an error which should be corrected this way. For Gurmukhi, if we see ਅ + ਾ as in ਅਾ, this should always be replaced with ਆ.
  • Deriving labels and descriptions from other languages: This functionality is intended to be similar to what the existing Mr. Ibrahembot does. For items absent of descriptions but which represent common object types, Punjabi descriptions will be provided. Preliminarily this will be limited to geographic entities (i.e. "city in Punjab, Pakistan"), books (i.e., " 1995 book"), and scholarly articles (i.e. "1995 scholarly article"). Pakistani toponyms specifically with only English labels in the format Chak ### XY, as in Chak 251 GB, will be given standard Punjabi labels as in چاک 251 گ ب. The other component to this is that any label for an item with instance of: human may be copied exactly from the Urdu label if one is present to the Punjabi Shahmukhi field. The reason for this is that despite Punjabi personal names having their own pronunciations, they are all orthographia conservadum, that is they are only supposed to be spelled identically to Urdu. Even for human items labeled for pseudonyms or other monikers, the derivation of those to Urdu would follow the exact same conventions in Punjabi.

--Middle river exports (talk) 19:32, 17 September 2022 (UTC)Reply[reply]

Please make some test edits Ymblanter (talk) 19:10, 27 September 2022 (UTC)Reply[reply]

YSObot[edit]

YSObot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: YSObot (talkcontribslogs)

Task/s: Account for mapping Wikidata with General Finnish Ontology (Q27303896) and the YSO-places ontology by adding YSO ID (P2347) and for creating new corresponding concepts in case there are no matches.

Code: n/a. Uploads will be done mainly with Openrefine, Mix'n'Match and crresopoinding tools.

Function details: YSO includes over 40.000 concepts and about half of them are already maapped. The mapping includes

Matches are checked manually before upload. Double-checking is controlled afterwords by using the Constraint violations report

Flag/s: High-volume editing, Edit existing pages, Create, edit, and move pages

--YSObot (talk) 11:33, 16 December 2021 (UTC)Reply[reply]

  • The bot was running without approval (this page was never included). I asked the operator to first get it approved. Can you please explain the creation of museum (Q113965327) & theatre (Q113965328) and similar duplicate items? Multichill (talk) 16:27, 15 September 2022 (UTC)Reply[reply]
    museo (Q113965327) & teatteri (Q113965328) are part of the Finnish National Land Survey classification for places. These classes will be mapped with existing items if they are exact matches by using Property:P2959.
    Considering duplicate YSO-ID instances: these are most often due to modeling differences between Wikdata and YSO. Some concepts are split in the other one and vice versa. These are due to linguistic and cultural differences in vocabularies and concept formation. Currently the duplicates would be added to the exceptions list in the YSO-ID property P2347. However, lifting the single value constraint for this proerty is another options here.
    Anyway, YSObot is currently an important tool in efforts to complete the mappings of the 30.000+ conepts of YSO with Wikidata. Uploads of YSO-IDs are made to reconciled items from OpenRefine. See YSO-Wikidata mapping project and the log of YSObot. For the moment, uploads are done usually only to 10-500 items at time few times per day max. Saarik (talk) 13:46, 23 September 2022 (UTC)Reply[reply]
    That's not really how Wikidata works. All your new creations look like duplicates of existing items so shouldn't have been created. Your proposed usage of {{P|P2959} is incorrect. With the current explanation I  Oppose this bot. You should first clean up all these duplicates before doing any more edits with this bot. @Susannaanas: care to comment on this? Multichill (talk) 09:58, 24 September 2022 (UTC)Reply[reply]
    This bot is very important, we just need to reach common understanding about how to model the specific Finnish National Land Survey concepts. I have myself struggled with them previously. There is no need to oppose to the bot itself. – Susanna Ånäs (Susannaanas) (talk) 18:02, 25 September 2022 (UTC)Reply[reply]
    why do we want to maintain permanently duplicated items? this seems like a bad outcome. why not instead make these subclasses of the things they are duplicates of. or attach the identifier to already existing items. BrokenSegue (talk) 20:36, 11 October 2022 (UTC)Reply[reply]
    I think this discussion went a little astray from the original purpose of YSObot.
    The creation of the Finnish National Land Survey place types were erroneously made with the YSObot account although they are not related to YSO at all. I was adding them manually with Openrefine but forgot to change the user ids in my Openrefine! I though that that would not be a big issue. The comments by @Multichilland @BrokenSegue are not really related to the original use of YSObot and do not belong here at all but rather to Q106589826 Talk page.
    About the duplicate question - Earliear, I did exactly that and added these to already existing items with "instance of" property. THe I received feedback and was told to create separate items for the types. So now I am getting to totally opposite instructions from you guys. Lets move this discussion to its proper place.
    And please, add the correct rights for this bot account, if they are still missing as we still need to add the remaining 10.000+ identifiers. Saarik (talk) 11:32, 27 October 2022 (UTC)Reply[reply]

AradglBot[edit]

AradglBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Aradgl (talkcontribslogs)

Task/s:

Create between 100,000 and 200,000 new lexemes in Aragonese language Aragonese (Q8765)

Code:

Function details: --Aradgl (talk) 19:43, 14 March 2022 (UTC)Reply[reply]

Using a small program and the api, the bot will create new lexemes in Aragonese specifying the lexical category, the language and some of its forms

I have about 30,000 lexemes prepared and I have started uploading them

In the coming weeks and months I hope to reach 100,000 or 200,000 new lexemes.

  •  Oppose on principle, since senses (meanings) of these words, or links to references for each lexeme (such as to dictionary entries for these words, or other lexical identifiers for these words) are not also being provided. We already have massive backlogs of senseless lexemes for a bunch of languages (see the bottom of the first table); I will not support making this backlog inordinately larger. Mahir256 (talk) 20:58, 23 March 2022 (UTC)Reply[reply]
We understand your observations. You are right that no meanings or links are provided at this stage. However, this is only natural since this is the beginning of a broader task that we are starting now.
Due to the lack of resources of a minority language such as Aragonese (spoken by less than 30.000 people), we believe this is the most sensible way to proceed: step by step. Moreover, Aragonese is on the brink of extinction according to UNESCO.
Undermining any effort to dignify its status will definitely will speed up the death of the Aragonese language. On the contrary, we ask for support to promote our beloved language.
Thank you very much. Aradgl (talk) 18:46, 24 March 2022 (UTC)Reply[reply]
  • @Aradgl: I'm not sure where you're getting that I'm interested in undermining Aragonese's dignity or speeding up the death of Aragonese. On the contrary, I'd love to see Aragonese thrive as an independent and flourishing tongue, but there should be just enough in that language's lexemes to begin with such that improvements to them, both from inside and outside the language community, are actually conceivable. Consider Breton lexemes: the language itself is also endangered, and most Breton lexemes currently do not have senses, but they do have links to Lexique étymologique du breton moderne (Q19216625), so that someone else (not necessarily a Breton speaker) can come by and at least add information based on that lexicon (@VIGNERON, Envlh:, who imported them). On the other hand, consider Estonian lexemes; an Estonian non-native speaker created a bunch of them over the course of a few days, all of them without senses, and most still sit as empty shells, with no clear way for non-Estonians to improve them and no indication that actual Estonian speakers even know they exist. I am happy to look around for references you could add to potential Aragonese lexemes, such that you can add some potential resource links based on them, but that is not a reason to begin importing them now without any such resources (especially since you have not indicated how/when you plan to add senses/resource links later). Mahir256 (talk) 20:01, 24 March 2022 (UTC)Reply[reply]
    @Mahir256 Right now we are discussing our timetable in order to implement next steps within Wikidata, with the prospect of relating lexemes with concepts and meanings. We count on finishing the first phase by the end of 2022.
    By no means have we wanted to create lexemes as “empty shells”. We are working in a long-term project in order to provide valuable information for the sake of Aragonese language. We are working together with our Occitan counterparts (Lo Congrès) and in fact, we want to follow their example promoting further contributions from the community. Our reference is AitalDisem, a project initiated by Lo Congrès following its collaboration with Wikidata. This project is the direct continuation of the project AitalvivemBot. Aradgl (talk) 15:09, 25 March 2022 (UTC)Reply[reply]
  • @Aradgl: I'll believe that you don't want to create empty shell lexemes, but I find it difficult to believe, given the prior examples of Russian, Estonian, Latin, and Hebrew lexemes, that they won't stay empty shells forever. If you are basing your work on the example of Aitalvivem, then (at least judging from that bot's contributions, which stopped in July 2019) you are not likely to be applying the right amount of attention to senses/resource linkages that would be desired, and (at least judging from the outcome of this bot request, from a user who disappeared after January 2020) you might disappear if prompted later about them.
You speak of wanting to add "valuable information for the sake of the language", but I fear that if there are no paths to this valuable information (with respect to the meanings of words) early on, then it is unlikely there ever will be such paths. If you are absolutely certain that existing printed/online references about Aragonese are not suitable/worthy of at least being linked to, and thus plan to essentially only crowdsource word meanings the same way the Occitan folks appear to have attempted, then what you could instead do (and what would change my opposition to a support) is have your system create lexemes only when an appropriate meaning has been added to that lexeme in that system by a community member, rather than creating lexemes with just the forms all at once waiting to be filled in on Wikidata. Mahir256 (talk) 15:37, 25 March 2022 (UTC)Reply[reply]
  • @Mahir256: I'm the one who was supposed to continue the work about the AitalvivemBot. Unfortunately, I suffer since March 2020 from long covid and all my works has been postponed. But we still intend to add occitan lexemes in Wikidata, if it's something that you think can be useful. I thought that the purpose of Wikidata lexeme was to inventory words from languages. I never heard we needed to add senses to them as a mandatory requirement. Is that like this, now ? If it is, of course we wouldn't disturb the work done in Wikidata by uploading a lot of words without senses. Minority languages, indeed, don't have a lot of human and financial means and we can't move forward at the speed the main languages do (you see it with occitan, one person is sick and many works are postponed for years). Of course, we can't guarantee all the words we upload will be related to a meaning. But we intend to try with the poor means we have. In the other hand, all our words are from recognized dictionaries. Is that still interesting for Wikidata or will it be better if we keep them for ourselves ? Unuaiga (talk) 14:00, 28 March 2022 (UTC)Reply[reply]
  • @Unuaiga: I'm sorry to hear that you have had long COVID this whole time—I sincerely hope you can recover! Please re-read my reply from 20:01, 24 March 2022 (UTC) above, and VIGNERON's comments below (in other words, you don't need senses if you can provide a way to add them later). Wikidata lexicographical data can do so much more than "inventory(ing) words from languages"; it's only appropriate that if more isn't done immediately after creating a lexeme, then opportunities for doing so (through the linkages of references) ought to be provided. My offer to find references re: Aragonese to Aradgl from 20:01, 24 March 2022 (UTC) above is extended to you re: Occitan. As for minority languages not moving as fast as main languages, I point you to the examples, in addition to Breton, of Hausa, Igbo, and Dagbani as under-resourced languages making lots of progress on lexemes. Mahir256 (talk) 14:23, 28 March 2022 (UTC)Reply[reply]
    Thanks for your explanations. I will look ath the languages you talk about with great curiosity. Unuaiga (talk) 16:04, 28 March 2022 (UTC)Reply[reply]
@Aradgl: this is a wonderful project but I have to agree with Mahir256, this doesn't seems ready yet (for Breton, after a ~4000 lexemes import and even with some info for the meaning, I estimated at least a year of manual work every week to have good lexemes :/ this is already painfull, 100,000 to 200,000 lexemes wouldbe overwhelming).
I have some additionnal questions :
  • what is the source ? and is it public or not ? (in both case, it would be better to indicate the source in the lexemes themselves)
  • is you bot ready yet ; if so, could you do some test edit (like creating 10 lexemes) so we can better see exactly what we are talking about and maybe provide some help.
Cheers, VIGNERON (talk) 13:23, 27 March 2022 (UTC)Reply[reply]
@VIGNERON: It seems like the edits the requestor has been making in the Lexeme namespace of late resemble those described in this request. Mahir256 (talk) 16:09, 27 March 2022 (UTC)Reply[reply]
@Mahir256: ah thanks, I looked at the bot edit but notat the account behind the bot ;) Indeed, these lexemes are way to empty to have any use. At the very very least, you need to add a source (and ideally, multiple). Maybe you can cross it with other dataset. I'm also wondering, why « between 100,000 and 200,000 » don't you have the exact number?
Also, I'm pinging @Fjrc282a, Herrinsa, Jfblanc, Universal Life: who speak Aragonese and might want to know about this Bot and maybe even want to help.
Cheers, VIGNERON (talk) 16:24, 27 March 2022 (UTC)Reply[reply]
@Aradgl: Thoughts on VIGNERON's reply from 16:24, 27 March 2022 (UTC)? Mahir256 (talk) 20:14, 8 June 2022 (UTC)Reply[reply]
@Unuaiga, Miguel&IvanV: If either of you know or can get a hold of @Aradgl:, could you tell that user to reply to User:VIGNERON's messages above? Mahir256 (talk) 16:59, 19 July 2022 (UTC)Reply[reply]
Ok, I write them an email to tell them. 217.119.181.174 12:09, 25 July 2022 (UTC)Reply[reply]
Sorry I wasn't connected. I write to them. Unuaiga (talk) 12:10, 25 July 2022 (UTC)Reply[reply]
@Unuaiga: Thank you for doing that; it is a bit disappointing that Aradgl has not replied, since only their ability to edit the lexeme namespace has been blocked and not their ability to do other things on Wikidata. Do you or @Miguel&IvanV: know @Uesca:, and could inform them of this discussion and the messages I placed on their talk page? Mahir256 (talk) 18:05, 30 August 2022 (UTC)Reply[reply]
Good morning to the Wikidata community. I want to apologize for my delay in replying. For various reasons I have been absent.
The source used is from the regional government of Aragon in Spain. It can be consulted with the free and public tool: Aragonario. https://aragonario.aragon.es/
The bot is created and working. Almost all the lexemes created by the user @Aradgl have been created using the bot.
Please,
@
Mahir256
, unlock my user account (@Aradgl) and allow me to continue working for the protection and dissemination of the Aragonese language.
Aradgl (talk) 06:54, 31 August 2022 (UTC)Reply[reply]
@Aradgl: Thank you for finally providing at least an external source for the lexemes you have created. Since it appears each lexeme has its own ID (the number "67731" in https://aragonario.aragon.es/words/67731/, for example), I would like you to do the following first: 1) propose a Wikidata property to store these IDs (maybe call it "Aragonario ID"), 2) once that property is created and I unblock you from the lexeme namespace, add values for this property to all of the Aragonese lexemes already created, and then 3) commit to only creating lexemes alongside their Aragonario IDs, rather than without these IDs. Mahir256 (talk) 07:13, 31 August 2022 (UTC)Reply[reply]
@Aradgl: As a gesture of goodwill, I have gone ahead and did the first thing, proposing Wikidata:Property proposal/Aragonario ID which I will insist @Aradgl, Uesca: add to the lexemes they created first before creating any further new ones. Mahir256 (talk) 23:04, 31 August 2022 (UTC)Reply[reply]
It is not possible to add the AragonarioID because both the Aragonario and my data come from the same database on a server, but the AragonarioID only exists on the Aragonario's website (the Aragonario's id is generated by the Aragonario's website and it is not in the database of the server that belongs to the Government of Aragon).
As we have already indicated, we are proposing the introduction of the Aragonese language in Wikidata in several phases that include its provision of content and even in the final phases the use of Wikidata to create chats in Aragonese, translators, etc.
The first phase consists of uploading the lexemes so that later other classmates manually add the meaning using dictionaries (on paper) and other resources. We would have liked to have all the lexemes (without meaning) created previously because it would have been easier, but given the circumstances, some colleagues have already begun to add meaning to the lexemes already created. The more lexemes (without meaning) I have created, the easier it will be for my classmates to add meanings, in fact, the ideal would be for all the lexemes (without meaning) created to start with phase two.
I wish we had the means and resources to tackle all the work in a single phase and a very short period of time, but this is not the case, there are very few of us who work for the defense and safeguard of Aragonese and many who put obstacles in our way to achieve it.
Can you please let me continue with my work? Don't give us bot permissions, but don't block us for creating lexemes in Aragonese. We will be adding meaning manually from now on to the lexemes and at the same time creating new lexemes (without meaning). Aradgl (talk) 08:15, 23 September 2022 (UTC)Reply[reply]
Good morning,
As a result of opening this conversation, I found out about the initiative of the user Aradgl in Wikidata and I have seen the problem you mention.
I have been including verbs in the Aragonese language and as I work in the same line, I have contacted Aradgl and Iizquierdogo (another user who includes Aragonese language content in Wikidata) and we are going to support Aradgl's initiative by manually including the sense in the lexemes.
Best regards Miguel&IvanV (talk) 10:22, 23 September 2022 (UTC)Reply[reply]

PodcastBot[edit]

PodcastBot (talkcontribsnew itemsnew lexemesSULBlock logUser rights logUser rightsxtools)
Operator: Germartin1 (talkcontribslogs)

Task/s: Upload new podcast episodes, extract: title, part of the series, has quality (explicit episode), full work available at (mp3), production code, apple podcast episode id, spotify episode ID. Regex extraction: talk show guest, recording date (from description) It will be manually run and only for prior selected podcasts. Code: https://github.com/mshd/wikidata-to-podcast-xml/blob/main/src/import/wikidataCreate.ts

Function details:

  • Read XML Feed
  • Read Apple podcast feed/ and spotify
  • Get latest episode date available on Wikidata
  • Loop all new episodes which do not exists in Wikidata yet
  • Extract data
  • Import to Wikidata using maxlath/wikidata-edit

--Germartin1 (talk) 04:38, 25 February 2022 (UTC)Reply[reply]

  •  Comment What is your plan for deciding which episodes are notable? Ainali (talk) 06:40, 21 March 2022 (UTC)Reply[reply]
  •  Oppose for a bot with would do blanket import of all Apple or Spotify podcasts. ChristianKl❫ 22:46, 22 March 2022 (UTC)Reply[reply]
    • Have a look at the code, it's only for certain podcasts and will run only manually. Germartin1 (talk) 05:12, 23 March 2022 (UTC)Reply[reply]
      • @Germartin1: Bot approvals are generally for a task. If that task is more narrow, that shouldn't be just noticeable from the code but be included in the task description. ChristianKl❫ 11:39, 24 March 2022 (UTC)Reply[reply]

How about episodes to podcasts with a Wikipedia article? @Ainali:--Trade (talk) 18:34, 12 June 2022 (UTC)Reply[reply]